Language Processing

Top-down and Bottom-up Parsing - a
whirlwind
tour
Simplified Compiler Structure
Source
code
if (b == 0) a = b; Understand
source code Front end
(machine-
Intermediate code independent)
Optimize Optimizer
Intermediate code
Generate Back end

Assembly code (machine-
cmp $0,ecx
assembly code dependent)
cmovz edx,ecx
Simplified Front-End
Structure
Source code if (b == 0) a = b;
(character stream)
Lexical Analysis
Token
if ( b == 0 ) a = b ;
stream
Syntax Analysis
(Parsing)
if
Abstract Syntax == =
Tree (AST) b 0 a b
Semantic Analysis
Parse Tree vs. AST
• Parse tree also called “concrete syntax”
S Abstract
E + S Syntax Tree
Parse Tree ( S ) E +
(Concrete E + S 5 + 5
Syntax) 1 E + S 1 +
2 E 2 +
(S) 3 4
E+S Discards (abstracts)
E
3 4 unneeded information
How to build an AST
• Need to find a derivation for the
program in the grammar
• Want an efficient algorithm
– should only read token stream once
– exponential brute-force search out of
question
– even CKY is too slow
• Two main ways to parse:
– top-down parsing (recursive descent)
– bottom-up parsing (shift-reduce)
Parsing Top-down S→E+S |E
E → num | ( S )
Goal: construct a leftmost derivation of string

while reading in token stream
Partly-derived String Lookahead parsed part unparsed part
S ( (1+2+(3+4))+5
→ E+S ( (1+2+(3+4))+5
→ (S) +S 1 (1+2+(3+4))+5
→ (E+S)+S 1 (1+2+(3+4))+5
→ (1+S)+S 2 (1+2+(3+4))+5
→ (1+E+S)+S 2 (1+2+(3+4))+5
→ (1+2+S)+S 2 (1+2+(3+4))+5
→ (1+2+E)+S ( (1+2+(3+4))+5
→ (1+2+(S))+S 3 (1+2+(3+4))+5
→ (1+2+(E+S))+S 3 (1+2+(3+4))+5
Problem S→E+S |E
E → num | ( S )
• Want to decide which production to

apply based on next symbol
(1) S → E → (S) → (E) → (1)

(1)+2 S → E + S → (S) + S →(E) + S
→ (1)+E → (1)+2
• Why is this hard?

Grammar is Problem
• This grammar cannot be parsed top-down
with only a single look-ahead symbol
• Not LL(1) = Left-to-right-scanning, Left-
most derivation, 1 look-ahead symbol
• Is it LL(k) for some k?
• Can rewrite grammar to allow top-down
parsing: create LL(1) grammar for same
language
Making a grammar LL(1)
S → E+S
S → E • Problem: can’t decide which
E → num S production to apply until
E → (S)
we see symbol after first
expression
• Left-factoring: Factor
S → ES' common S prefix, add new
S' → ε non-terminal S' at decision
S' → + S
E → num point. S' derives (+E)*
E→(S)
Parsing with new grammar
S → ES ' S'→ε|+S E → num | ( S )
S ( (1+2+(3+4))+5
→ E S' ( (1+2+(3+4))+5
→ (S) S' 1 (1+2+(3+4))+5
→ (E S') S' 1 (1+2+(3+4))+5
→ (1 S') S' + (1+2+(3+4))+5
→ (1+E S' ) S' 2
(1+2+(3+4))+5
→ (1+2 S') S' + (1+2+(3+4))+5
→ (1+2 + S) S' ( (1+2+(3+4))+5
→ (1+2 + E S') S' ( (1+2+(3+4))+5
→ (1+2 + (S) S') S' 3 (1+2+(3+4))+5
→ (1+2 + (E S' ) S') S' 3
(1+2+(3+4))+5
→ (1+2 + (3 S') S') S' +
(1+2+(3+4))+5
→ (1+2 + (3 + E) S') S' 4 (1+2+(3+4))+5
Predictive Parsing
• LL(1) grammar:
– for a given non-terminal, the look-ahead
symbol uniquely determines the
production to apply
– top-down parsing = predictive parsing
– Driven by predictive parsing table of
non-terminals × terminals → productions
Using Table S→ES'
S' →ε | +S
E → num | ( S )
S ( (1+2+(3+4))+5
→ E S' ( (1+2+(3+4))+5
→ (S) S' 1 (1+2+(3+4))+5
→ (E S' ) S' 1 (1+2+(3+4))+5
→ (1 S') S' + (1+2+(3+4))+5
→ (1 + S) S' 2 (1+2+(3+4))+5
→ (1+E S' ) S' 2
(1+2+(3+4))+5
→ (1+2 S') S' + (1+2+(3+4))+5
num + ( ) $
S →ES' →ES'
S' → +S →ε →ε
E → num →(S)
How to Implement?
• Table can be converted easily into a
recursive-descent parser
num + ( ) $
S →ES' →ES'
S' → +S →ε →ε
E → num →(S)
• Three procedures: parse_S, parse_S’,

parse_E
Recursive-Descent Parser
void parse_S () { lookahead token
switch (token) {
case num: parse_E(); parse_S’(); return;
case ‘(’: parse_E(); parse_S’(); return;
default: throw new ParseError();
}
}
number + ( ) $
S → ES’ → ES’
S’ → +S →ε →ε
E → number →(S)
void parse_S’() {
switch (token) {
case ‘+’: token = input.read(); parse_S();
return;
case ‘)’: return;
case EOF: return;
default: throw new ParseError();
}
}
number + ( ) $
S → ES’ → ES’
S’ → +S →ε →ε
E → number →(S)
void parse_E() {
switch (token) {
case number: token = input.read(); return;
case ‘(‘: token = input.read(); parse_S();
if (token != ‘)’) throw new
ParseError();
token = input.read(); return;
default: throw new ParseError(); }
}
number + ( ) $
S → ES’ → ES’
S’ → +S →ε →ε
E → number →(S)
Call Tree = Parse Tree
(1 + 2 + (3 + 4)) + 5
S
parse_S E S’
( S ) +S
parse_E parse_S’
E S’ 5
parse_S parse_S 1 +S
parse_E parse_S’ E S’
2 + S
parse_S
E S’
parse_Eparse_S’ ( S ) ε
parse_S E S’
parse_Eparse_S’ 3 +S
parse_S E
4
How to Construct Parsing Tables
• There exists an algorithm for

automatically generating a
predictive parse table from a
grammar (take 412 for details)
N + ( ) $
S → ES’
S’ → ε | + S S ES’ ES’
E → number | ( S ) S’ +S ε ε
E N (S)
Summary for top-down
parsing
• LL(k) grammars
– left-to-right scanning
– leftmost derivation
– can determine what production to apply from
the next k symbols
– Can automatically build predictive parsing
tables
• Predictive parsers
– Can be easily built for LL(k) grammars from
the parsing tables
– Also called recursive-descent, or top-down
parsers
Top-Down Parsing Summary
Language grammar
Left-recursion elimination
Left-factoring
LL(1) grammar
predictive parsing table
recursive-descent parser
parser with AST generation

Now: Bottom-up Parsing
• A more powerful parsing technology
• LR grammars -- more expressive than

LL
– construct right-most derivation of program
– virtually all programming languages
– easier to express programming language
syntax
• Shift-reduce parsers
– Parsers for LR grammars
– automatic parser generators (e.g. yacc,CUP)
Bottom-up Parsing
• Right-most derivation -- backward
– Start with the tokens
– End with the start symbol S→ S+E|E
E → num | ( S )
(1+2+(3+4))+5 ← (E+2+(3+4))+5
← (S+2+(3+4))+5 ←(S+E+(3+4))+5
← (S+(3+4))+5 ← (S+(E+4))+5 ←(S+(S+4))+5
← (S+(S+E))+5 ← (S+(S))+5 ←(S+E)+5 ← (S)+5
← E+5 ← S+E ← S
(1+2+(3+4))+5 ←
Progress of Bottom-up
(1+2+(3+4))+5
←
Parsing (1
(E+2+(3+4))+5
+2+(3+4))+5
(S+2+(3+4))+5 ← (1
+2+(3+4))+5
(S+E+(3+4))+5 ← (1+2
right-most derivation
+(3+4))+5
(S+(3+4))+5 ← (1+2+(3 +4))+5
(S+(E+4))+5 ← (1+2+(3 +4))+5
(S+(S+4))+5 ← (1+2+(3 +4))+5
(S+(S+E))+5 ← (1+2+(3+4 ))+5
(S+(S))+5 ← (1+2+(3+4 ))
+5
(S+E)+5 ← (1+2+(3+4) )
+5
(S)+5 ← (1+2+(3+4) )
+5
E+5 ← (1+2+(3+4))
+5
S+E ← (1+2+(3+4))+5
Bottom-up Parsing
• (1+2+(3+4))+5 ← S→ S+E|E
(E+2+(3+4))+5 ← E → num | ( S )
(S+2+(3+4))+5 ←
S
(S+E+(3+4))+5 …
S + E
E 5
• Advantage of bottom-up
parsing: can postpone the ( S )
selection of productions until S + E
more of the input is scanned S+E (S)
E 2 S+E
1 E 4
3
Top-down Parsing
(1+2+(3+4))+5 S→ S+E|E
E → num | ( S )
S → S+E → E+E → (S)+E →
(S+E)+E
S
→ (S+E+E)+E →(E+E+E)+E
S + E
→ (1+E+E)+E → (1+2+E)+E ...
E 5
• In left-most derivation, ( S )
entire tree above a token S + E
(2) has been expanded S+E (S)
when encountered E 2 S+E
1 E 4
3
Top-down vs. Bottom-up
Bottom-up: Don’t need to figure out as much of
the parse tree for a given amount of input
scanned scanned
unscanned unscanned
Top-down Bottom-up
Shift-reduce Parsing
• Parsing actions: is a sequence of shift and reduce
operations
• Parser state: a stack of terminals and non-
terminals (grows to the right)
•Derivation
Current derivation
step step = always
stack stack+input
unconsumed
input
(1+2+(3+4))+5 ←
(1+2+(3+4))+5
(E+2+(3+4))+5 ← (E
+2+(3+4))+5
(S+2+(3+4))+5 ← (S
+2+(3+4))+5
(S+E+(3+4))+5 ← (S+E
+(3+4))+5
• Parsing is a sequence of shifts and reduces
• Shift : move look-ahead token to stack
stack input action
( 1+2+(3+4))+5 shift 1
(1 +2+(3+4))+5
• Reduce : Replace symbols γ from top of stack
with non-terminal symbol X, corresponding to
production X → γ (pop γ, push X)
stack input action
(S+E +(3+4))+5 reduce S
→ S+E
(S +(3+4))+5
S→ S+E|E
E → num | ( S )
derivation stack input stream action

(1+2+(3+4))+5 ← (1+2+(3+4))+5 shift
(1+2+(3+4))+5 ← ( 1+2+(3+4))+5 shift
(1+2+(3+4))+5 ← (1 +2+(3+4))+5 reduce E
→num
(E+2+(3+4))+5 ← (E +2+(3+4))+5 reduce S
→E
(S+2+(3+4))+5 ← (S +2+(3+4))+5 shift
(S+2+(3+4))+5 ← (S+ 2+(3+4))+5 shift
(S+2+(3+4))+5 ← (S+2 +(3+4))+5 reduce E
→num
(S+E+(3+4))+5 ← (S+E +(3+4))+5 reduce S
→ S+ E
(S+(3+4))+5 ← (S +(3+4))+5 shift
(S+(3+4))+5 ← (S+ (3+4))+5 shift
(S+(3+4))+5 ← (S+( 3+4))+5 shift
(S+(3+4))+5 ← (S+(3 +4))+5 reduce E
→num
Problem
• How do we know which action to
take: whether to shift or reduce, and
which production?
• Issues:
– Sometimes can reduce but shouldn’t
– Sometimes can reduce in different ways
Action Selection Problem
• Given stack σ and look-ahead symbol b,
should parser:
– shift b onto the stack (making it σb)
– reduce X → γ assuming that stack has
the form α γ (making it αX)
• If stack has form α γ, should apply

reduction X → γ (or shift) depending on
stack prefix α
α is different for different possible
reductions, since γ’s have different length.
LR Parsing Engine
• Basic mechanism:
– Use a set of parser states
– Use a stack with alternating symbols and
states
• E.g: 1 ( 6 S 10 + 5
– Use a parsing table to:
• Determine what action to apply
(shift/reduce)
• Determine the next state
• The parser actions can be precisely

determined from the table
The LR Parsing Table
Terminals Non-terminals
Next action Next

State and next state state
Action table Goto table
• Algorithm: look at entry for current state S and input

terminal C
If Table[S,C] = s(S’) then shift:
push(C), push(S’)
If Table[S,C] = X→α then reduce:
pop(2*|α|), S’=top(), push(X), push(Table[S’,X])
LR Parsing Table Example
( ) id , $ S L
1 s3 s2 g4
2 S→id S→id S→id S→id S→id
3 s3 s2 g7
g5
4 accept
5 s6 s8
6 S→(L) S→(L) S→(L) S→(L) S→(L)
7 L→S L→S L→S L→S L→S
8 s3 s2 g9
9 L→L,S L→L,S L→L,S L→L,S L→L,S
LR(k) Grammars
• LR(k) = Left-to-right scanning, Right-most

derivation, k look-ahead characters
• Main cases: LR(0), LR(1), and some

variations (SLR and LALR(1))
• Parsers for LR(0) Grammars:

– Determine the actions without any lookahead
symbol
Building LR(0) Parsing Tables
• To build the parsing table:
– Define states of the parser
– Build a DFA to describe the transitions
between states
– Use the DFA to build the parsing table
Summary for bottom-up
parsing
• LR(k) grammars
– left-to-right scanning
– rightmost derivation
– can determine whether to shift or
reduce from the next k symbols
– Can automatically build predictive
parsing tables
• Shift-reduce parsers
– Can be built for LR(k) grammars using
automated parser generator tools, eg.
CUP, yacc.
Top-down vs. Bottom-up
again
LL(k), recursive descent LR(k), shift-
reduce
scanned scanned
unscanned unscanned
Top-down Bottom-up

Language Processing

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Language Processing

Încărcat de

Drepturi de autor:

Formate disponibile

Top-down and Bottom-up Parsing - a

Generate Back end

Goal: construct a leftmost derivation of string

• Want to decide which production to

(1) S → E → (S) → (E) → (1)

• Why is this hard?

• Three procedures: parse_S, parse_S’,

• There exists an algorithm for

predictive parsing table

parser with AST generation

• LR grammars -- more expressive than

derivation stack input stream action

• If stack has form α γ, should apply

• The parser actions can be precisely

Next action Next

Action table Goto table

• Algorithm: look at entry for current state S and input

• LR(k) = Left-to-right scanning, Right-most

• Main cases: LR(0), LR(1), and some

• Parsers for LR(0) Grammars:

S-ar putea să vă placă și