Documente Academic
Documente Profesional
Documente Cultură
whirlwind
tour
Simplified Compiler Structure
Source
code
if (b == 0) a = b; Understand
source code Front end
(machine-
Intermediate code independent)
Optimize Optimizer
Intermediate code
S → E+S
S → E • Problem: can’t decide which
E → num S production to apply until
E → (S)
we see symbol after first
expression
• Left-factoring: Factor
S → ES' common S prefix, add new
S' → ε non-terminal S' at decision
S' → + S
E → num point. S' derives (+E)*
E→(S)
Parsing with new grammar
S → ES ' S'→ε|+S E → num | ( S )
S ( (1+2+(3+4))+5
→ E S' ( (1+2+(3+4))+5
→ (S) S' 1 (1+2+(3+4))+5
→ (E S') S' 1 (1+2+(3+4))+5
→ (1 S') S' + (1+2+(3+4))+5
→ (1+E S' ) S' 2
(1+2+(3+4))+5
→ (1+2 S') S' + (1+2+(3+4))+5
→ (1+2 + S) S' ( (1+2+(3+4))+5
→ (1+2 + E S') S' ( (1+2+(3+4))+5
→ (1+2 + (S) S') S' 3 (1+2+(3+4))+5
→ (1+2 + (E S' ) S') S' 3
(1+2+(3+4))+5
→ (1+2 + (3 S') S') S' +
(1+2+(3+4))+5
→ (1+2 + (3 + E) S') S' 4 (1+2+(3+4))+5
Predictive Parsing
• LL(1) grammar:
– for a given non-terminal, the look-ahead
symbol uniquely determines the
production to apply
– top-down parsing = predictive parsing
– Driven by predictive parsing table of
non-terminals × terminals → productions
Using Table S→ES'
S' →ε | +S
E → num | ( S )
S ( (1+2+(3+4))+5
→ E S' ( (1+2+(3+4))+5
→ (S) S' 1 (1+2+(3+4))+5
→ (E S' ) S' 1 (1+2+(3+4))+5
→ (1 S') S' + (1+2+(3+4))+5
→ (1 + S) S' 2 (1+2+(3+4))+5
→ (1+E S' ) S' 2
(1+2+(3+4))+5
→ (1+2 S') S' + (1+2+(3+4))+5
num + ( ) $
S →ES' →ES'
S' → +S →ε →ε
E → num →(S)
How to Implement?
• Table can be converted easily into a
recursive-descent parser
num + ( ) $
S →ES' →ES'
S' → +S →ε →ε
E → num →(S)
number + ( ) $
S → ES’ → ES’
S’ → +S →ε →ε
E → number →(S)
Recursive-Descent Parser
void parse_S’() {
switch (token) {
case ‘+’: token = input.read(); parse_S();
return;
case ‘)’: return;
case EOF: return;
default: throw new ParseError();
}
}
number + ( ) $
S → ES’ → ES’
S’ → +S →ε →ε
E → number →(S)
Recursive-Descent Parser
void parse_E() {
switch (token) {
case number: token = input.read(); return;
case ‘(‘: token = input.read(); parse_S();
if (token != ‘)’) throw new
ParseError();
token = input.read(); return;
default: throw new ParseError(); }
}
number + ( ) $
S → ES’ → ES’
S’ → +S →ε →ε
E → number →(S)
Call Tree = Parse Tree
(1 + 2 + (3 + 4)) + 5
S
parse_S E S’
( S ) +S
parse_E parse_S’
E S’ 5
parse_S parse_S 1 +S
parse_E parse_S’ E S’
2 + S
parse_S
E S’
parse_Eparse_S’ ( S ) ε
parse_S E S’
parse_Eparse_S’ 3 +S
parse_S E
4
How to Construct Parsing Tables
Left-recursion elimination
Left-factoring
LL(1) grammar
recursive-descent parser
• Shift-reduce parsers
– Parsers for LR grammars
– automatic parser generators (e.g. yacc,CUP)
Bottom-up Parsing
• Right-most derivation -- backward
– Start with the tokens
– End with the start symbol S→ S+E|E
E → num | ( S )
(1+2+(3+4))+5 ← (E+2+(3+4))+5
← (S+2+(3+4))+5 ←(S+E+(3+4))+5
← (S+(3+4))+5 ← (S+(E+4))+5 ←(S+(S+4))+5
← (S+(S+E))+5 ← (S+(S))+5 ←(S+E)+5 ← (S)+5
← E+5 ← S+E ← S
(1+2+(3+4))+5 ←
Progress of Bottom-up
(1+2+(3+4))+5
←
Parsing (1
(E+2+(3+4))+5
+2+(3+4))+5
(S+2+(3+4))+5 ← (1
+2+(3+4))+5
(S+E+(3+4))+5 ← (1+2
right-most derivation
+(3+4))+5
(S+(3+4))+5 ← (1+2+(3 +4))+5
(S+(E+4))+5 ← (1+2+(3 +4))+5
(S+(S+4))+5 ← (1+2+(3 +4))+5
(S+(S+E))+5 ← (1+2+(3+4 ))+5
(S+(S))+5 ← (1+2+(3+4 ))
+5
(S+E)+5 ← (1+2+(3+4) )
+5
(S)+5 ← (1+2+(3+4) )
+5
E+5 ← (1+2+(3+4))
+5
S+E ← (1+2+(3+4))+5
Bottom-up Parsing
• (1+2+(3+4))+5 ← S→ S+E|E
(E+2+(3+4))+5 ← E → num | ( S )
(S+2+(3+4))+5 ←
S
(S+E+(3+4))+5 …
S + E
E 5
• Advantage of bottom-up
parsing: can postpone the ( S )
selection of productions until S + E
more of the input is scanned S+E (S)
E 2 S+E
1 E 4
3
Top-down Parsing
(1+2+(3+4))+5 S→ S+E|E
E → num | ( S )
S → S+E → E+E → (S)+E →
(S+E)+E
S
→ (S+E+E)+E →(E+E+E)+E
S + E
→ (1+E+E)+E → (1+2+E)+E ...
E 5
• In left-most derivation, ( S )
entire tree above a token S + E
(2) has been expanded S+E (S)
when encountered E 2 S+E
1 E 4
3
Top-down vs. Bottom-up
Bottom-up: Don’t need to figure out as much of
the parse tree for a given amount of input
scanned scanned
unscanned unscanned
Top-down Bottom-up
Shift-reduce Parsing
• Parsing actions: is a sequence of shift and reduce
operations
• Parser state: a stack of terminals and non-
terminals (grows to the right)
•Derivation
Current derivation
step step = always
stack stack+input
unconsumed
input
(1+2+(3+4))+5 ←
(1+2+(3+4))+5
(E+2+(3+4))+5 ← (E
+2+(3+4))+5
(S+2+(3+4))+5 ← (S
+2+(3+4))+5
(S+E+(3+4))+5 ← (S+E
+(3+4))+5
Shift-reduce Parsing
• Parsing is a sequence of shifts and reduces
• Shift : move look-ahead token to stack
stack input action
( 1+2+(3+4))+5 shift 1
(1 +2+(3+4))+5
• Reduce : Replace symbols γ from top of stack
with non-terminal symbol X, corresponding to
production X → γ (pop γ, push X)
stack input action
(S+E +(3+4))+5 reduce S
→ S+E
(S +(3+4))+5
Shift-reduce Parsing
S→ S+E|E
E → num | ( S )
• Issues:
– Sometimes can reduce but shouldn’t
– Sometimes can reduce in different ways
Action Selection Problem
• Given stack σ and look-ahead symbol b,
should parser:
– shift b onto the stack (making it σb)
– reduce X → γ assuming that stack has
the form α γ (making it αX)
scanned scanned
unscanned unscanned
Top-down Bottom-up