1 CS 431, Assignment 3, Book Questions on Chapter 2 Plus One Other Question
Questions that you need to answer are listed below, and some solutions or partial solutions are also given. The solutions are not presented presented as the only possible answers, answers, and what is given may not be complete. complete. It may only be a starting starting point for thinking about a complete, correct answer. Your goal in working the problems problems should be to come up with a solution, and if in doubt about your answer, compare with with what’s given here. In theory, even if your answer is not exactly the same, you should see the sense or understand the direction of the suggestions. suggestions. If your answer is at odds with what I have suggested you might want to check with with me. It is possible that I am off base. It is also possible that you need some things clarified. Book problems: 2.1, a-c; 2.2, a-e; 2.3; 2.4, a-e, and 2.8. Suggested answers to these questions are given after the following question. Last problem: There is an additional problem that is not from from the book. It is stated here. Implement Java code that will correctly translate infix expressions with + and – (no parentheses and no other operators) to postfix expressions. A problem solution is given in the book in C code, so your task task is mainly one of adaptation. A starting point for this problem is posted on the Web page. You may use the file InfixToPostfixCut.java as given and simply add the needed methods. You can also do the problem from from scratch if you want to. I think it would be preferable if you left your implementation recursive rather than changing it to a loop, loop, but that is your choice. Hand in a printout of the the methods you added along with your answers to the problems listed above.
Starting points for thinking about solutions: 2.1. Consider the following context-free context-free grammar grammar SSS+ SSS* Sa
(1) (2) (3)
a) Show how the string aa+a* can be generated by this grammar. Production (3) allows you to generate a string S0 which consists of a. Using S0 as a starting point, production (1) allows you to generate a string S1 which consists of aa+. Production (3) again allows you to generate a string S2 which consists of a. Then production (2) allows you to generate a string S3 = S1S2* = aa+a*. b) Construct a parse tree for this string. string.
2
S S S
S
a
a
S +
*
a
c) What language is generated by this grammar? Justify your answer. Assuming a is an identifier for a numeric value, for example, then the grammar generates a language consisting of all possible arithmetic combinations of a using only the operations + and * and postfix notation. (No particular justification is given. Check to see if you agree.)
2.2. What language is generated by the following grammars? In each case justify your answer. (No justifications are given.) a) S 0 S 1 | 0 1 All strings divided evenly between 0’s and 1’s, with the sequence of 0’s coming first and the sequence of 1’s coming second. b) S + S S | - S S | a This is the prefix analog to question 2.1. c) S S ( S ) S | ε This will generate arbitrary sequences of adjacent and nested, matched pairs of parentheses. d) S a S b S | b S a S | ε All possible strings containing equal numbers of a’s and b’s, with the a’s and b’s arranged in no particular order. e) S a | S + S | S S | S * | ( S ) I don’t see a pattern to this that I can verbally describe.
3 2.3. Which of the grammars in Exercise 2.2 are ambiguous? To show ambiguity it is sufficient to find any single string that can be parsed in more than one way using the grammar. No such strings spring immediately to mind for the grammars of a through d. (That does not mean that there aren’t any.) However, e is clearly ambiguous. Let the string S + S * be given. Here are two possible parsings: S *
S S
+
S +
S
S
S
S *
2.4. Construct unambiguous context-free grammars for each of the following languages. In each case show that your grammar is correct. (Correctness is not shown.) a) Arithmetic expressions in postfix notation. list list list + list list list – list digit list 0 | 1 | 2 | … | 9 b) Left-associative lists of identifiers separated by commas. list list, id list id c) Right-associative lists of identifiers separated by commas. list id, list list id d) Arithmetic expressions of integers and identifiers with the four binary operators +, -, *, /. Add the following rule to the grammar given at the end of section 2.2: factor identifier e) Add unary plus and minus to the arithmetic operators of (d). Add the following rules to the grammar: factor +factor
4 factor -factor
2.8 Construct a syntax-directed translation scheme that translates arithmetic expressions from postfix notation to infix notation. Give annotated parse trees for the inputs 95-2* and 952*-. Here is a simple grammar that can serve as a starting point for the solution of the problem: string digit string operator | string digit operator | digit digit 0 | 1 | 2 | … | 9 operator * | / | + | Here is an annotated parse tree of the first expression:
string: 95-2*
string: 95-
string: 9
digit: 5
digit: 9
5
digit: 2
-
*
2
9
The first production applied in forming this tree was: string string digit operator. Notice that it would have been just as possible to apply the production: string digit string operator. If that had been done you would have then had to parse the string “5-2”. This result would not parse and it would be necessary to backtrack and choose to apply the other production. At the next level down it doesn’t matter which production is chosen.
5
string: 952*-
digit: 9
9
string: 52*
digit: 5
string: 2
5
digit: 9
-
*
2
In this example there is no choice about which production to apply first. At the second level there is a choice but it doesn’t make a difference. The lack of choice at the first level illustrates clearly how you could tell whether or not the production you have chosen is the correct one, assuming you could look that far ahead: If the string that you have parsed on the right hand side does not end in an operator, then the production choice is not correct. It is also possible to see that if you could specify the order in which productions are tried, you could avoid backtracking. If you always tried to parse using this production first: string string digit operator and tried the other one if that one failed to apply, you would avoid backtracking. But again, such an approach is not allowed. The question of backtracking is discussed on pages 45 and 46 of the book. The upshot of the matter is that this grammar isn’t suitable for predictive parsing. There is another matter that will require a change in the grammar so that a syntaxdirected translation scheme can be devised. At the top of page 39 in the book the term “simple” is defined as it applies to a syntax-directed definition. The requirement is that the order of non-terminals on the right hand side of a production agree with the order of the corresponding symbols generated as the desired output. Other symbols or terminals may come before, between, or after the output for the non-terminals. Under this condition the output can be generated from a depth first traversal of the annotated tree. The problem with the grammar given above is that all of the operators are symbolized using the non-terminal “operator”. However, in the translation from postfix to infix, it is the position of the operator that changes. That means that even though tedious, for practical reasons the productions have to be rewritten with the operator symbols in-line as terminals: string digit string * | digit string /
6 | digit string + | digit string – | string digit * | string digit / | string digit + | string digit – | digit digit 0 | 1 | 2 | … | 9 The problem only gets better and better, or worse and worse, depending on your point of view. Postfix notation does not require parentheses. The relative positions of the operands and operators unambiguously determine the order of operations. This means that an arbitrary postfix expression may enforce an order of operations which would not occur naturally in an unparenthesized infix expression. In particular, 95-2* does not translate to 9 – 5 * 2, where the multiplication would be done first. It translates to (9 – 5) * 2. It is not an attractive proposition to try and implement a translation scheme that would only insert parentheses when needed. It is much more convenient to fully parenthesize the infix translation whether needed or not. The unneeded parentheses do not adversely affect the meaning of the arithmetic. Having said all of the above, here is my suggested syntax-directed translation scheme, that is, a context-free grammar with embedded semantic actions that will generate the infix translation of a postfix input: string {print(“(“)} digit {print(“*”)} string * {print(“)”)} | {print(“(“)} digit {print(“/”)} string / {print(“)”)} | {print(“(“)} digit {print(“+”)} string + {print(“)”)} | {print(“(“)} digit {print(“-”)} string - {print(“)”)} | {print(“(“)} string {print(“*”)} digit * {print(“)”)} | {print(“(“)} string {print(“/”)} digit / {print(“)”)} | {print(“(“)} string {print(“+”)} digit + {print(“)”)} | {print(“(“)} string {print(“-”)} digit - {print(“)”)} digit 0 {print(“0”)} | 1 {print(“1”)} etc. The parse tree for 95-2* showing semantic actions follows.
7
string
string
print (
digit
print *
*
2
print (
digit
9
string
print -
-
print )
Print 2
print )
digit
print 9
5
print 5
And here is the parse tree for 952*- showing semantic actions.
string
digit
print (
2
print -
string
-
print )
print 2
digit
print (
5
print *
string
*
print )
digit
print 5
2
print 2
If there are no mistakes in the parse trees, traversing them in depth-first order and executing the print statements as you go should cause the correct, parenthesized, infix translation of the postfix input to be emitted.