Top Down Parsing

Top-down parsing
• Main idea:
– Start at the root, grow towards leaves
– Pick a production and try to match input
– May need to backtrack
2
Top down parsing
• Top-down parsing can be viewed as the problem of
constructing a parse tree for the input string, starting
from the root and creating the nodes of the parse tree
in preorder.
• Equivalently, top-down parsing can be viewed as
finding a leftmost derivation for an input string.
• At each step of a top-down parse, the key problem is
that of determining the production to be applied for a
nonterminal, say A.
• Once an A-production is chosen, the rest of the parsing
process consists of "matching” the terminal symbols in
the production body with the input string.
3
Top Down Parsing
Recursive-Descent Parsing (with backtracking)
Predictive Parsing (special form of RDP)
Recursive Form
Non-recursive Predictive parsing (LL(1) Parsing)
Using Transition Diagram (Not in Syllabus)
Using Tabular Method
4
Top Down Parsing
• Top down Parser does not allow left recursive
grammar because this type of grammar generate
infinite loop
• For example
SSA
void S()
{
S();
A();
}
5
Top Down Parsing
• Recursive decent parser allow backtracking but predictive parser does not allow it.
• Hence in predictive parser, we have to remove left factoring from the grammar
• For example
ScAd
Aab| a
• To construct parse tree for the string “cad”, we create a tree rooted with label S.
• First symbol of string is “c” so we expand tree with production S->cAd (fig.a).
• Now, leaf node labeled “c” match with first character of input string hence we
advance input pointer point to “a” in string. Next we can expand A to first
alterative A->ab (fig (b)). We now have a match for second character in input
string and advance input pointer point to “d”.
• Third input symbol “d” match with next leaf node labeled “b”. Since b does not
match with d. we report failure and go back to A to find another alternative for A
i.e. A-> a. When going back we reset input pointer to position 2.
6
Ambiguous grammar
• A grammar that produces more than one parse tree for some
sentence is said to be ambiguous.
• Another way, an ambiguous grammar is one that produces
more than one leftmost derivation or more than one
rightmost derivation for the same sentence.
• For example, consider following grammar
• For the string “id+id*id”, it generate two parse tree
7
Grammar problems
• In leftmost derivation by scanning the input from left
to right, grammars of the form A  A x may cause
endless recursion.
• Such grammars are called left-recursive and they must
be transformed if we want to use a top-down parser.
8
Left recursive Grammar
• A grammar is left recursive if for a non-terminal A,
* A
A
there is a derivation
• There are three types of left recursion:
– direct (A  A x)
– indirect (A  B C, B  A )
– hidden (A  B A, B  )
9
Left recursive Grammar
• To eliminate direct left recursion replace
A  A1 | A2 | ... | Am | 1 | 2 | ... | n
with
A  1B | 2B | ... | nB

B  1B | 2B | ... | mB | 
10
Example: Eliminating immediate left
recursion
11
Eliminating left recursion
12
Example: Eliminate Left Recursion
13
Example: Eliminating left recursion
S-> (L) | a
L-> L,S | S
14
Let A1= S and A2=L
S-> (L) | a For S:
L-> L,S | S there is no immediate left recursion in S
For L:
Replace L-> S with L-> (L) | a
So, we will have L->L,S | (L) |a
Eliminate immediate left recursion from L:
L -> (L)L’ |aL’
L’->,SL’ | 
Final Grammar
S -> (L) |a
L -> (L)L’ |aL’
L’->,SL’ | 
15
Let A1= A , A2=B , A3 = C and A4= D
For A:
there is no immediate left recursion in A
For B:
No production of B starts with A.
And there is no immediate left recursion in B
For C:
Final Grammar
Replace C-> A with C-> B | a | CBD
A->B |a | CBD
Again replace C-> B with C-> C |b
B-> C | b
So, we will have C-> b | a | CBD | c
C-> bC’ | aC’ | cC’
(Note: C-> C is useless production hence it’s
C’-> BDC’ | 
remove)
D->d
Eliminate immediate left recursion from C:
C-> bC’ | aC’ | cC’
C’-> BDC’ | 
16
Grammar problems
• Consider S  if E then S else S | if E then S
– Which of the two productions should we use to expand non-
terminal S when the next token is if?
– We can solve this problem by factoring out the common part in
these rules. This way, we are postponing the decision about which
rule to choose until we have more information (namely, whether
there is an else or not).
– This is called left factoring

17
Eliminating left factoring
18
Example: Eliminating left factoring
19
20
21
Recursive Decent Parser(RDP)
• A recursive-descent parsing program consists of a
set of procedures, one for each nonterminal.
• And sometimes one for terminals to simplify
processing of NT’s procedures.
• Execution begins with the procedure for the
start symbol, which halts and announces success
if its procedure body scans the entire input string.
22
Recursive Decent Parser(RDP)
• General recursive-descent may require
backtracking; that is, it may require repeated
scans over the input.
• However, backtracking is rarely needed to
parse programming language constructs, so
backtracking parsers are not seen frequently.
23
Pseudo code for RDP
• Note that this
• The above pseudocode is nondeterministic, since it begins by choosing the A-

production to apply in a manner that is not specified.
• To allow backtracking, the above code needs to be modified
24
Pseudo code for RDP
• First, we cannot choose a unique A-production at line (I), so
we must try each of several productions in some order.
• Then, failure at line (7) is not ultimate failure, but suggests
only that we need to return to line (1) and try another A-
production.
• Only if there are no more A-productions to try do we
declare that an input error has been found.
• In order to try another A-production, we need to be able to
reset the input pointer to where it was when we first
reached line (1).
• Thus, a local variable is needed to store this input pointer
for future use.
25
Example: Recursive Decent
Parser(RDP)
E->TE'
E'->+TE' | ^
T->FT'
T'->*FT' | ^
F->(E) | id
• Five procedure E(), E1(), T(), T1() and F() for each
NTs
• One procedure match(char) for terminals
26
void E()
E->TE' {
T();
E1();
}
E'->+TE' | ^
T->FT' void T()
{
F();
T1();
}
T'->*FT' | ^
F->(E) | id
27
void E()
E->TE' {
T();
E1(); void E1()
} {
if(lookahead=='+')
E'->+TE' | ^ {
T->FT' void T() match('+');
{ T();
F(); E1();
T1(); }
} }
T'->*FT' | ^
F->(E) | id
28
void E()
E->TE' {
T();
E1(); void E1()
} {
if(lookahead=='+')
E'->+TE' | ^ {
T->FT' void T() match('+');
{ T();
F(); E1();
T1(); void T1()
}
} {
}
if(lookahead==‘*')
{
T'->*FT' | ^
match(‘*');
F->(E) | id
F();
T1();
}
}
29
void E1()
void F()
{
{
void E() if(lookahead=='+')
if(lookahead=='(')
{ {
{ match('(');
T(); match('+');
E();
E1(); T();
match(')'); }
} E1();
else if(lookahead=='i')
}
{
}
match('i');
void T()
void T1() match('d');}
{
{ else
F();
if(lookahead==‘*') {
T1();
{ printf("\nInvalid
}
match(‘*'); String");
F(); exit(0);
T1(); }
} }
}
30
void E() void match(char t)
{ void E1() {
T(); {
if(lookahead==t)
E1(); if(lookahead=='+')
{
lookahead=fgetc(fp);
}
match('+'); else
void T() T(); {
{ E1(); printf("\nIndvalid string");
F(); } exit(0);
T1(); } }
}
}
void T1()
void F()
{
{ if(lookahead=='(')
if(lookahead==‘*')
{ match('(');
{
E();
match(‘*');
match(')'); }
F();
else if(lookahead=='i')
T1();
{ match('i');
}
match('d');}
}
else
{ printf("\nInvalid String");
exit(0); }
31
}
Recursive-Descent Parsing
• Main challenges:
1. back-tracking is messy, difficult and inefficient (solution: use
input “lookahead” to help make the right choice)
2. more alternatives --- even if we use one lookahead input
char, there are still more than 1 rules to choose --- A -> ab | a
(solution: rewrite the grammar by left-factoring)
3. left-recursion might cause infinite loop
what is the procedure for E -> E + E ?
(solution: rewrite the grammar by eliminating left-
recursions)
4. error handling --- errors detected “far away” from actual
source.
32
LL(1) Parser
 If we have two productions: A , we want a distinct way of choosing the correct
one.
 If  is any string of grammar symbols, FIRST() is the set of terminals that
begin the strings derived from . If  then  is also in FIRST().
 Define:
 for G, x  FIRST() iff  * x
 If FIRST() and FIRST() contain no common symbols, we will know whether we
should choose A or A by looking at the lookahead symbol.
33
Compute FIRST(X)
34
Example: First Set
S  aSe SaBDh
SB BcC
B  bBe CbC|
BC DEF
C  cCe
Eg|
Cd
Ff|
35
Example: First Set
SACB | cbB|Ba
Ada|BC
Bg|
Ch|
36
Predictive parsing
 What if we have a "candidate" production A where = or *?
 We could expand if we knew that there is some sentential form where

the current input symbol appears after A.
 Define:
 for AN, xFOLLOW(A) iff  S*Ax
37
Compute FOLLOW(A)
38
Construction of LL(1) Parsing Tables
39
Steps to Build LL(1) parser
E-> E+E | E*E | (E) | id
1. Remove ambiguity in the grammar
2. Remove left recursion from the grammar
3. Remove left factoring from the grammar
4. Calculate first and follow set for each non-
terminal
5. Build parsing table
6. [optional] if string is given, trace the string
41
Example: Build LL(1) parser for the
grammar E-> E+E | E*E | (E) | id
Step 1: Remove ambiguity in the grammar
The equivalent unambiguous grammar is
as follows:
E-> E+T | T
T-> T*F | F
F-> (E) | id
42
Step 2: Eliminate left recursion from the
grammar
E-> E+T | T
T-> T*F | F
F-> (E) | id
43
Step 3: Eliminate left factoring from the
grammar
the grammar does not contain left
factoring problem
44
Step 4:Calculate Firstset and Followset for each
Non-terminal
Firstset Followset
E {(,id} {$,)}
E’ {+, } {$,)}
T {(,id} ($, +, )}
T’ (*, } ($, +, )}
F {(,id} ($, +,*,)}
45
Step 5: Build Parsing Table
Firstset Followset
E {(,id} {$,)}
E’ {+, } {$,)}
T {(,id} ($, +, )}
T’ (*, } ($, +, )}
F {(,id} ($, +,*,)}
Non- INPUT SYMBOL

terminal
id + * ( ) $
E ETE’ ETE’
E’ E’+TE’ E’ E’
T TFT’ TFT’
T’ T’ T’*FT’ T’ T’
F Fid F(E)
46
Bottom-Up Parsing
 Also known as Shift-reduce parsing
 Attempts to construct a parse tree for an input string beginning at
the leaves (bottom) and working up towards the root (top).
 “Reducing” a string w to the start symbol of a grammar.
 At each step, decide on some substring that matches the RHS of
some production
 Replace this string by the LHS (called reduction).
 If the substring is chosen correctly at each step, it is the trace of a
rightmost derivation in reverse.

47
Example
 Grammar:
S  aABe
A  Abc | b
Bd
 Reduction:
abbcde Ab
aAbcde A  Abc
aAde Bd
aABe S  aABe
S
How do we know which substring to be replaced at each reduction step?
48
Unambiguous grammar
• arithmetic expression consisting of symbols id,
+,-,*, /, (, ), and ^, where ^ represents
exponent.
E  E + T | E –T | T
T T * F | T / F | F
F F ^ P | P
P (E) | id
49

Top Down Parsing

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Top Down Parsing

Încărcat de

Drepturi de autor:

Formate disponibile

Top-down parsing

– Start at the root, grow towards leaves

– Pick a production and try to match input

– May need to backtrack

Predictive Parsing (special form of RDP)

Non-recursive Predictive parsing (LL(1) Parsing)

Using Transition Diagram (Not in Syllabus)

Using Tabular Method

• For the string “id+id*id”, it generate two parse tree

to right, grammars of the form A  A x may cause

• Such grammars are called left-recursive and they must

be transformed if we want to use a top-down parser.

• There are three types of left recursion:

A  A1 | A2 | ... | Am | 1 | 2 | ... | n

A  1B | 2B | ... | nB

terminal S when the next token is if?

– We can solve this problem by factoring out the common part in

rule to choose until we have more information (namely, whether

there is an else or not).

– This is called left factoring

• Note that this

• The above pseudocode is nondeterministic, since it begins by choosing the A-

 We could expand if we knew that there is some sentential form where

 for AN, xFOLLOW(A) iff  S*Ax

Non- INPUT SYMBOL

 Attempts to construct a parse tree for an input string beginning at

the leaves (bottom) and working up towards the root (top).

 “Reducing” a string w to the start symbol of a grammar.

 At each step, decide on some substring that matches the RHS of

 Replace this string by the LHS (called reduction).

 If the substring is chosen correctly at each step, it is the trace of a

rightmost derivation in reverse.

S-ar putea să vă placă și