CS 1206: Assignment 3

Due Friday, July 23, 1999 by 5pm

100 Points


1. Regular Expressions: What Strings Match an RE?

Consider the following text taken from a classic text book on compiler design "Principles of Compiler Design," by Alfred Aho and Jeffrey Ullman, 1985:
We shall now begin the discussion of the design of program for generating
lexical analyzers automatically. We first introduce a very useful notation,
called regular expressions, suitable for describing tokens. We later
show how regular expressions can be converted automatically into finite
automata, which are just formal specifications for transition diagrams. Still
later, we shall discuss the implementation of finite automata by programs.
If we refer to these lines by number (1 through 6), then which line(s), if any, match each of the following (grep) REs?
  1. /automata/
  2. /later$/
  3. /^[Wl]/
  4. /.$/
  5. / */
  6. /(now)|(later)/
  7. /program*/
  8. /[Ww].*[e]/
  9. /([sW].*[wW])/
  10. /(reg).*(ion*)/
Here, a line matches if there is at least one substring in the line that matches the given RE. Also, in the REs above "(" and ")" characters used alone match themselves, while "\(" and "\)" are used for grouping (these are the rules that grep and ex/vi use, while egrep uses the opposite conventions). All of these REs are also case-sensitive.

If you are unsure about a question, feel free to try out a few experiments with grep or with the "search" command / in vi. Remember that neither of these programs expect you to enter the "/" delimiters when you type the RE for which to search, and that both use case sensitive searches by default.

2. Regular Expressions: What Strings Match an RE? (cont.)

For the next set of (egrep) REs, identify all substrings that match each RE (without crossing lines, of course) in the six lines listed in Section 1. You can do this by listing all the matched strings. Remember that REs match the longest possible string. For example, the egrep RE /(are).*(for)/ matches the following string(s):
are just formal specifications for
Find all possible matches for each of the following (egrep) REs:
  1. /W[^f]*/
  2. /i.*e/
  3. /h.*w/
  4. /[a-z]*[s.]$/
  5. /^[al].*\,/

3. Regular Expressions: Writing REs

Now it is your turn to write some REs to match specific text fragments. For each of the following questions, write an RE that matches the specified text (including but not limited to all of the underlined phrases in each example) and no other text in the given line, assuming that your RE is intended to be used with egrep.
  1. Matching "hello", "hi", or "howdy":
  2. hello, there.  or is "hi" or "howdy" more to your liking?
    -----                 --      -----
  3. Matching "the", regardless of case:
  4. The quick brown fox jumps over the lazy dog.
    ---                            ---
  5. The last word in a sentence:
  6. How many sentences are here? There are two.  No, three!
                           -----           ----      ------
  7. A social security number:
  8. The number 045-35-2344 is a random SSN.
               -----------
  9. A word with five or more letters:
  10. This sentence does not have many long words.
         --------                         -----
  11. Any word beginning with the root "reus":
  12. reusable software is software that was designed to be reused.
    --------                                              ------
  13. An entire sentence that ends in a period:
  14. Does this sentence end in a period?  This one does.
                                         --------------
  15. Any sequence beginning with "artificial" and ending with "intelligence":
  16. Politicians can act artificial, but do they have intelligence?
                        -----------------------------------------
  17. Any of "computer", "computers", or "computing":
  18. computer science is the study of computing, and how computers work.
    --------                         ---------          ---------
  19. Matching any phrase of exactly three words separated by white space:
  20. This is a short sentence.
    ---------
         ----------
            ----------------
For these questions, you may not simply list an underlined phrase itself--you must use at least one special character in your answer.

Part III: Submitting Your Answers

Your submission must follow exactly the format described below, with no extra lines, omitted questions, or extraneous comments. For questions 1 through 10, give the problem number, a period, a space, and then the line numbers in order of the RE matches, separated by commas. For instance
1. 1,3,4,5,6
2. 2,3
3. 
4. 1,6
etc.
Give one problem per line, with no intervening extraneous lines. If there are no matches, terminate the line after the space following the period.

For questions 11 through 15, enter the problem number followed by a period on a line with nothing else. Then list the matched strings one per line, with no leading or trailing blanks, in the order they occur, left to right, top line to bottom line. Follow all the lines containing matched strings with exactly one blank line. For instance, the RE /transition/ would be answered with

11.
transition

12.
etc.
For questions 16 through 25, enter the problem number followed by a period, a space, a slash, the grep RE, and a slash terminating the line. For instance,
16. /h[eio][a-z]*/
17. /t[a-z]*/
etc.
You are to submit your assignment via email to the class account at cs1206@ei.cs.vt.edu.  Your subject line should read "HW 3".  You should immediately get an acknowledgement back.

Also, remember that no late assignments are accepted.