Documente Academic
Documente Profesional
Documente Cultură
• Main idea:
2
Top down parsing
• Top-down parsing can be viewed as the problem of
constructing a parse tree for the input string, starting
from the root and creating the nodes of the parse tree
in preorder.
• Equivalently, top-down parsing can be viewed as
finding a leftmost derivation for an input string.
• At each step of a top-down parse, the key problem is
that of determining the production to be applied for a
nonterminal, say A.
• Once an A-production is chosen, the rest of the parsing
process consists of "matching” the terminal symbols in
the production body with the input string.
3
Top Down Parsing
Recursive-Descent Parsing (with backtracking)
Recursive Form
4
Top Down Parsing
• Top down Parser does not allow left recursive
grammar because this type of grammar generate
infinite loop
• For example
SSA
void S()
{
S();
A();
}
5
Top Down Parsing
• Recursive decent parser allow backtracking but predictive parser does not allow it.
• Hence in predictive parser, we have to remove left factoring from the grammar
• For example
ScAd
Aab| a
• To construct parse tree for the string “cad”, we create a tree rooted with label S.
• First symbol of string is “c” so we expand tree with production S->cAd (fig.a).
• Now, leaf node labeled “c” match with first character of input string hence we
advance input pointer point to “a” in string. Next we can expand A to first
alterative A->ab (fig (b)). We now have a match for second character in input
string and advance input pointer point to “d”.
• Third input symbol “d” match with next leaf node labeled “b”. Since b does not
match with d. we report failure and go back to A to find another alternative for A
i.e. A-> a. When going back we reset input pointer to position 2.
6
Ambiguous grammar
• A grammar that produces more than one parse tree for some
sentence is said to be ambiguous.
• Another way, an ambiguous grammar is one that produces
more than one leftmost derivation or more than one
rightmost derivation for the same sentence.
• For example, consider following grammar
7
Grammar problems
• In leftmost derivation by scanning the input from left
endless recursion.
8
Left recursive Grammar
• A grammar is left recursive if for a non-terminal A,
* A
A
there is a derivation
– direct (A A x)
– indirect (A B C, B A )
– hidden (A B A, B )
9
Left recursive Grammar
• To eliminate direct left recursion replace
with
10
Example: Eliminating immediate left
recursion
11
Eliminating left recursion
12
Example: Eliminate Left Recursion
13
Example: Eliminating left recursion
S-> (L) | a
L-> L,S | S
14
Example: Eliminating left recursion
Let A1= S and A2=L
S-> (L) | a For S:
L-> L,S | S there is no immediate left recursion in S
For L:
Replace L-> S with L-> (L) | a
So, we will have L->L,S | (L) |a
Eliminate immediate left recursion from L:
L -> (L)L’ |aL’
L’->,SL’ |
Final Grammar
S -> (L) |a
L -> (L)L’ |aL’
L’->,SL’ |
15
Example: Eliminating left recursion
Let A1= A , A2=B , A3 = C and A4= D
For A:
there is no immediate left recursion in A
For B:
No production of B starts with A.
And there is no immediate left recursion in B
For C:
Final Grammar
Replace C-> A with C-> B | a | CBD
A->B |a | CBD
Again replace C-> B with C-> C |b
B-> C | b
So, we will have C-> b | a | CBD | c
C-> bC’ | aC’ | cC’
(Note: C-> C is useless production hence it’s
C’-> BDC’ |
remove)
D->d
Eliminate immediate left recursion from C:
C-> bC’ | aC’ | cC’
C’-> BDC’ |
16
Grammar problems
• Consider S if E then S else S | if E then S
– Which of the two productions should we use to expand non-
these rules. This way, we are postponing the decision about which
18
Example: Eliminating left factoring
19
Example: Eliminating left factoring
20
Example: Eliminating left factoring
21
Recursive Decent Parser(RDP)
• A recursive-descent parsing program consists of a
set of procedures, one for each nonterminal.
• And sometimes one for terminals to simplify
processing of NT’s procedures.
• Execution begins with the procedure for the
start symbol, which halts and announces success
if its procedure body scans the entire input string.
22
Recursive Decent Parser(RDP)
• General recursive-descent may require
backtracking; that is, it may require repeated
scans over the input.
• However, backtracking is rarely needed to
parse programming language constructs, so
backtracking parsers are not seen frequently.
23
Pseudo code for RDP
24
Pseudo code for RDP
• First, we cannot choose a unique A-production at line (I), so
we must try each of several productions in some order.
• Then, failure at line (7) is not ultimate failure, but suggests
only that we need to return to line (1) and try another A-
production.
• Only if there are no more A-productions to try do we
declare that an input error has been found.
• In order to try another A-production, we need to be able to
reset the input pointer to where it was when we first
reached line (1).
• Thus, a local variable is needed to store this input pointer
for future use.
25
Example: Recursive Decent
Parser(RDP)
E->TE'
E'->+TE' | ^
T->FT'
T'->*FT' | ^
F->(E) | id
• Five procedure E(), E1(), T(), T1() and F() for each
NTs
• One procedure match(char) for terminals
26
void E()
E->TE' {
T();
E1();
}
E'->+TE' | ^
T->FT' void T()
{
F();
T1();
}
T'->*FT' | ^
F->(E) | id
27
void E()
E->TE' {
T();
E1(); void E1()
} {
if(lookahead=='+')
E'->+TE' | ^ {
T->FT' void T() match('+');
{ T();
F(); E1();
T1(); }
} }
T'->*FT' | ^
F->(E) | id
28
void E()
E->TE' {
T();
E1(); void E1()
} {
if(lookahead=='+')
E'->+TE' | ^ {
T->FT' void T() match('+');
{ T();
F(); E1();
T1(); void T1()
}
} {
}
if(lookahead==‘*')
{
T'->*FT' | ^
match(‘*');
F->(E) | id
F();
T1();
}
}
29
void E1()
void F()
{
{
void E() if(lookahead=='+')
if(lookahead=='(')
{ {
{ match('(');
T(); match('+');
E();
E1(); T();
match(')'); }
} E1();
else if(lookahead=='i')
}
{
}
match('i');
void T()
void T1() match('d');}
{
{ else
F();
if(lookahead==‘*') {
T1();
{ printf("\nInvalid
}
match(‘*'); String");
F(); exit(0);
T1(); }
} }
}
30
void E() void match(char t)
{ void E1() {
T(); {
if(lookahead==t)
E1(); if(lookahead=='+')
{
lookahead=fgetc(fp);
}
match('+'); else
void T() T(); {
{ E1(); printf("\nIndvalid string");
F(); } exit(0);
T1(); } }
}
}
void T1()
void F()
{
{ if(lookahead=='(')
if(lookahead==‘*')
{ match('(');
{
E();
match(‘*');
match(')'); }
F();
else if(lookahead=='i')
T1();
{ match('i');
}
match('d');}
}
else
{ printf("\nInvalid String");
exit(0); }
31
}
Recursive-Descent Parsing
• Main challenges:
1. back-tracking is messy, difficult and inefficient (solution: use
input “lookahead” to help make the right choice)
2. more alternatives --- even if we use one lookahead input
char, there are still more than 1 rules to choose --- A -> ab | a
(solution: rewrite the grammar by left-factoring)
3. left-recursion might cause infinite loop
what is the procedure for E -> E + E ?
(solution: rewrite the grammar by eliminating left-
recursions)
4. error handling --- errors detected “far away” from actual
source.
32
LL(1) Parser
If we have two productions: A , we want a distinct way of choosing the correct
one.
If is any string of grammar symbols, FIRST() is the set of terminals that
begin the strings derived from . If then is also in FIRST().
Define:
for G, x FIRST() iff * x
If FIRST() and FIRST() contain no common symbols, we will know whether we
should choose A or A by looking at the lookahead symbol.
33
Compute FIRST(X)
34
Example: First Set
S aSe SaBDh
SB BcC
B bBe CbC|
BC DEF
C cCe
Eg|
Cd
Ff|
35
Example: First Set
SACB | cbB|Ba
Ada|BC
Bg|
Ch|
36
Predictive parsing
What if we have a "candidate" production A where = or *?
Define:
37
Compute FOLLOW(A)
38
Construction of LL(1) Parsing Tables
39
Steps to Build LL(1) parser
E-> E+E | E*E | (E) | id
1. Remove ambiguity in the grammar
2. Remove left recursion from the grammar
3. Remove left factoring from the grammar
4. Calculate first and follow set for each non-
terminal
5. Build parsing table
6. [optional] if string is given, trace the string
41
Example: Build LL(1) parser for the
grammar E-> E+E | E*E | (E) | id
Step 1: Remove ambiguity in the grammar
The equivalent unambiguous grammar is
as follows:
E-> E+T | T
T-> T*F | F
F-> (E) | id
42
Example: Build LL(1) parser for the
grammar E-> E+E | E*E | (E) | id
Step 2: Eliminate left recursion from the
grammar
E-> E+T | T
T-> T*F | F
F-> (E) | id
43
Example: Build LL(1) parser for the
grammar E-> E+E | E*E | (E) | id
Step 3: Eliminate left factoring from the
grammar
the grammar does not contain left
factoring problem
44
Example: Build LL(1) parser for the
grammar E-> E+E | E*E | (E) | id
Step 4:Calculate Firstset and Followset for each
Non-terminal
Firstset Followset
E {(,id} {$,)}
E’ {+, } {$,)}
T {(,id} ($, +, )}
T’ (*, } ($, +, )}
F {(,id} ($, +,*,)}
45
Step 5: Build Parsing Table
Firstset Followset
E {(,id} {$,)}
E’ {+, } {$,)}
T {(,id} ($, +, )}
T’ (*, } ($, +, )}
F {(,id} ($, +,*,)}
46
Bottom-Up Parsing
Also known as Shift-reduce parsing
some production
Reduction:
abbcde Ab
aAbcde A Abc
aAde Bd
aABe S aABe
S
How do we know which substring to be replaced at each reduction step?
48
Unambiguous grammar
• arithmetic expression consisting of symbols id,
+,-,*, /, (, ), and ^, where ^ represents
exponent.
E E + T | E –T | T
T T * F | T / F | F
F F ^ P | P
P (E) | id
49