Documente Academic
Documente Profesional
Documente Cultură
Announcements
Where We Are
Source
Code
Lexical Analysis
Syntax Analysis
Semantic Analysis
IR Generation
IR Optimization
Code Generation
Optimization
Machine
Code
Where We Are
Source
Code
Lexical Analysis
Syntax Analysis
Semantic Analysis
IR Generation
IR Optimization
Code Generation
Optimization
Flex-pert
Machine
Code
Where We Are
Source
Code
Lexical Analysis
Syntax Analysis
Semantic Analysis
IR Generation
IR Optimization
Code Generation
Optimization
Machine
Code
w h i l e
( i p
<
z ) \n \t + + i p ;
T_While
T_Ident
<
T_Ident
ip
w h i l e
( i p
<
++
T_Ident
ip
z ) \n \t + + i p ;
While
<
T_While
++
Ident
Ident
Ident
ip
ip
T_Ident
<
T_Ident
ip
w h i l e
( i p
<
++
T_Ident
ip
z ) \n \t + + i p ;
do[for] = new 0;
d o [ f o r ]
n e w
do[for] = new 0;
0 ;
T_Do
T_For
T_New
T_IntConst
0
d o [ f o r ]
n e w
do[for] = new 0;
0 ;
T_Do
T_For
T_New
T_IntConst
0
d o [ f o r ]
n e w
do[for] = new 0;
0 ;
Outline
Context-Free Grammars
Derivations
Ambiguity
Top-Down Parsing
Bottom-Up Parsing
Formal Languages
Context-Free Grammars
Arithmetic Expressions
E
E Op E
E Op (E)
E Op (E Op E)
E * (E Op E)
int * (E Op E)
int * (int Op E)
int * (int Op int)
int * (int + int)
Arithmetic Expressions
E
E Op E
E Op int
int Op int
int / int
Context-Free Grammars
Formally, a context-free
grammar is a collection of four
objects:
E int
E E Op E
E (E)
Op +
Op Op *
Op /
A Notational Shorthand
E int
E E Op E
E (E)
Op +
Op Op *
Op /
A Notational Shorthand
E int | E Op E | (E)
Op + | - | * | /
S a*b
S Ab
S Ab
A Aa |
S a(b|c*)
S aX
X (b|c*)
S aX
X b | c*
S aX
Xb|C
S aX
Xb|C
C Cc |
Chemicals!
C19H14O5S
Cu3(CO3)2(OH)2
MnO4-
S2-
IonNum 2 | 3 | 4 | ...
Num 1 | IonNum
Form
Cmp Ion
IonNum 2 | 3 | 4 | ...
Num 1 | IonNum
MnO4 Ion
MnO4-
STMT
| { STMTS }
STMTS
| STMT STMTS
STMT
|
|
|
|
|
EXPR;
if (EXPR) BLOCK
while (EXPR) BLOCK
do BLOCK while (EXPR);
BLOCK
EXPR
|
|
|
|
|
identifier
constant
EXPR + EXPR
EXPR EXPR
EXPR * EXPR
...
i.e. A, B, C, D
i.e. t, u, v, w
i.e. , ,
Examples
Derivations
E
E Op E
E Op (E)
E Op (E Op E)
E * (E Op E)
int * (E Op E)
int * (int Op E)
int * (int Op int)
int * (int + int)
If derives , we write * .
Leftmost Derivations
BLOCK
|
STMTS
STMT
{ STMTS }
STMT STMTS
STMTS
STMT STMTS
STMT
|
|
|
|
|
EXPR;
if (EXPR) BLOCK
while (EXPR) BLOCK
do BLOCK while (EXPR);
BLOCK
EXPR; STMTS
EXPR
|
|
|
|
|
|
identifier
constant
EXPR + EXPR
EXPR EXPR
EXPR * EXPR
EXPR = EXPR
...
id = id + EXPR; STMTS
Leftmost Derivations
Related Derivations
E
E Op E
E Op E
int Op E
E Op (E)
int * E
E Op (E Op E)
int * (E)
E Op (E Op int)
int * (E Op E)
E Op (E + int)
int * (int Op E)
E Op (int + int)
int * (int + E)
E * (int + int)
Derivations Revisited
Parse Trees
E
E Op E
int Op E
int * E
int * (E)
int * (E Op E)
int * (int Op E)
int * (int + E)
int * (int + int)
Parse Trees
E
E
E Op E
int Op E
int * E
int * (E)
int * (E Op E)
int * (int Op E)
int * (int + E)
int * (int + int)
Parse Trees
E
E
E Op E
int Op E
int * E
int * (E)
int * (E Op E)
int * (int Op E)
int * (int + E)
int * (int + int)
Parse Trees
E
E
E Op E
int Op E
int * E
int * (E)
int * (E Op E)
int * (int Op E)
int * (int + E)
int * (int + int)
E Op
Parse Trees
E
E
E Op E
int Op E
int * E
int * (E)
int * (E Op E)
int * (int Op E)
int * (int + E)
int * (int + int)
E Op
Parse Trees
E
E
E Op E
E Op
int Op E
int * E
int * (E)
int * (E Op E)
int * (int Op E)
int * (int + E)
int * (int + int)
int
Parse Trees
E
E
E Op E
E Op
int Op E
int * E
int * (E)
int * (E Op E)
int * (int Op E)
int * (int + E)
int * (int + int)
int
Parse Trees
E
E
E Op E
E Op
int Op E
int * E
int * (E)
int * (E Op E)
int * (int Op E)
int * (int + E)
int * (int + int)
int *
Parse Trees
E
E
E Op E
E Op
int Op E
int * E
int * (E)
int * (E Op E)
int * (int Op E)
int * (int + E)
int * (int + int)
int *
Parse Trees
E
E
E Op E
E Op
int Op E
int * E
int * (E)
int * (E Op E)
int * (int Op E)
int * (int + E)
int * (int + int)
int *
Parse Trees
E
E
E Op E
E Op
int Op E
int * E
int * (E)
int * (E Op E)
int * (int Op E)
int * (int + E)
int * (int + int)
int *
Parse Trees
E
E
E Op E
E Op
int Op E
int * E
int * (E)
int * (E Op E)
E Op E
int * (int Op E)
int * (int + E)
int * (int + int)
int *
Parse Trees
E
E
E Op E
E Op
int Op E
int * E
int * (E)
int * (E Op E)
E Op E
int * (int Op E)
int * (int + E)
int * (int + int)
int *
Parse Trees
E
E
E Op E
E Op
int Op E
int * E
int * (E)
int * (E Op E)
E Op E
int * (int Op E)
int * (int + E)
int * (int + int)
int *
( int
Parse Trees
E
E
E Op E
E Op
int Op E
int * E
int * (E)
int * (E Op E)
E Op E
int * (int Op E)
int * (int + E)
int * (int + int)
int *
( int
Parse Trees
E
E
E Op E
E Op
int Op E
int * E
int * (E)
int * (E Op E)
E Op E
int * (int Op E)
int * (int + E)
int * (int + int)
int *
( int +
Parse Trees
E
E
E Op E
E Op
int Op E
int * E
int * (E)
int * (E Op E)
E Op E
int * (int Op E)
int * (int + E)
int * (int + int)
int *
( int +
Parse Trees
E
E
E Op E
E Op
int Op E
int * E
int * (E)
int * (E Op E)
E Op E
int * (int Op E)
int * (int + E)
int * (int + int)
int *
( int + int )
Parse Trees
E
E Op E
E Op (E)
E Op (E Op E)
E Op (E Op int)
E Op (E + int)
E Op (int + int)
E * (int + int)
int * (int + int)
Parse Trees
E
E
E Op E
E Op (E)
E Op (E Op E)
E Op (E Op int)
E Op (E + int)
E Op (int + int)
E * (int + int)
int * (int + int)
Parse Trees
E
E
E Op E
E Op (E)
E Op (E Op E)
E Op (E Op int)
E Op (E + int)
E Op (int + int)
E * (int + int)
int * (int + int)
Parse Trees
E
E
E Op E
E Op (E)
E Op (E Op E)
E Op (E Op int)
E Op (E + int)
E Op (int + int)
E * (int + int)
int * (int + int)
E Op
Parse Trees
E
E
E Op E
E Op (E)
E Op (E Op E)
E Op (E Op int)
E Op (E + int)
E Op (int + int)
E * (int + int)
int * (int + int)
E Op
Parse Trees
E
E
E Op E
E Op
E Op (E)
E Op (E Op E)
E Op (E Op int)
E Op (E + int)
E Op (int + int)
E * (int + int)
int * (int + int)
Parse Trees
E
E
E Op E
E Op
E Op (E)
E Op (E Op E)
E Op (E Op int)
E Op (E + int)
E Op (int + int)
E * (int + int)
int * (int + int)
Parse Trees
E
E
E Op E
E Op
E Op (E)
E Op (E Op E)
E Op (E Op int)
E Op (E + int)
E Op E
E Op (int + int)
E * (int + int)
int * (int + int)
Parse Trees
E
E
E Op E
E Op
E Op (E)
E Op (E Op E)
E Op (E Op int)
E Op (E + int)
E Op E
E Op (int + int)
E * (int + int)
int * (int + int)
Parse Trees
E
E
E Op E
E Op
E Op (E)
E Op (E Op E)
E Op (E Op int)
E Op (E + int)
E Op E
E Op (int + int)
E * (int + int)
int * (int + int)
int )
Parse Trees
E
E
E Op E
E Op
E Op (E)
E Op (E Op E)
E Op (E Op int)
E Op (E + int)
E Op E
E Op (int + int)
E * (int + int)
int * (int + int)
int )
Parse Trees
E
E
E Op E
E Op
E Op (E)
E Op (E Op E)
E Op (E Op int)
E Op (E + int)
E Op E
E Op (int + int)
E * (int + int)
int * (int + int)
+ int )
Parse Trees
E
E
E Op E
E Op
E Op (E)
E Op (E Op E)
E Op (E Op int)
E Op (E + int)
E Op E
E Op (int + int)
E * (int + int)
int * (int + int)
+ int )
Parse Trees
E
E
E Op E
E Op
E Op (E)
E Op (E Op E)
E Op (E Op int)
E Op (E + int)
E Op (int + int)
E
E Op E
E * (int + int)
int * (int + int)
( int + int )
Parse Trees
E
E
E Op E
E Op
E Op (E)
E Op (E Op E)
E Op (E Op int)
E Op (E + int)
E Op (int + int)
E
E Op E
E * (int + int)
int * (int + int)
( int + int )
Parse Trees
E
E
E Op E
E Op
E Op (E)
E Op (E Op E)
E Op (E Op int)
E Op (E + int)
E Op E
E Op (int + int)
E * (int + int)
int * (int + int)
( int + int )
Parse Trees
E
E
E Op E
E Op
E Op (E)
E Op (E Op E)
E Op (E Op int)
E Op (E + int)
E Op E
E Op (int + int)
E * (int + int)
int * (int + int)
( int + int )
Parse Trees
E
E
E Op E
E Op
E Op (E)
E Op (E Op E)
E Op (E Op int)
E Op (E + int)
E Op E
E Op (int + int)
E * (int + int)
int * (int + int)
int *
( int + int )
For Comparison
E
E Op
int *
E
E
E Op
E Op E
E Op E
( int + int )
int *
( int + int )
Parse Trees
Challenges in Parsing
A Serious Problem
E
E Op
E
E
E Op E
E Op E
Op E
Ambiguity
Is Ambiguity a Problem?
Depends on semantics.
E int | E + E
Is Ambiguity a Problem?
Depends on semantics.
E int | E + E
R
E
R
E
R
E + R
E
E
R + R
E
5 R
E + R
E
3
E
5 + E
5
Is Ambiguity a Problem?
Depends on semantics.
E int | E + E | E - E
Is Ambiguity a Problem?
Depends on semantics.
E int | E + E | E - E
R
E
R
E
R
E +
- R
E
E
R + R
E
5 R
E + R
E
3
E
5
5
E
3
Resolving Ambiguity
()
(()())
((()))(())()
Balanced Parentheses
Balanced Parentheses
Balanced Parentheses
P
P
Balanced Parentheses
P
P
P
(
P
)
Balanced Parentheses
P
P
P
P
( (
Balanced Parentheses
P
P
P
P
( ( )
Balanced Parentheses
P
P
P
( ( ) (
) )
Balanced Parentheses
P
P
P
( ( ) ( ) )
Balanced Parentheses
P
P
P
P
( ( ) ( ) )
( ( ) ( ) ) ( ) ( ( ) )
( ( ) ( ) ) ( ) ( ( ) )
( ( ) ( ) ) ( ) ( ( ) )
( ( ) ( ) ) ( ) ( ( ) )
Rethinking Parentheses
Um... what?
Um... what?
Um... what?
Um... what?
Um... what?
Um... what?
Um... what?
Um... what?
Building Parentheses
S P S
P ( S )
Building Parentheses
S P S
P ( S )
S
PS
PPS
PP
(S)P
(S)(S)
(PS)(S)
(P)(S)
((S))(S)
(())(S)
(())()
Context-Free Grammars
Any letter
Context-Free Grammars
An Ambiguous Grammar
R
R
R
R
R
R
a|b|c|
RR
R | R
R*
(R)
R
R
R
R
R
R
R
a | b *
a | b *
a | (b*)
(a | b)*
Resolving Ambiguity
Resolving Ambiguity
Resolving Ambiguity
Resolving Ambiguity
Resolving Ambiguity
Resolving Ambiguity
Resolving Ambiguity
Resolving Ambiguity
R S | R | S
S T | ST
R RR
T U | T*
R R | R
Ua|b|c|
R R*
R (R)
U (R)
U (R)
atomic expressions
Only generates
atomic expressions
Concatenates starred
expressions
Ua|b|
U
U (R)
atomic expressions
Only generates
atomic expressions
R S | R | S
S T | ST
T U | T*
Concatenates starred
expressions
Ua|b|
U
U (R)
atomic expressions
Only generates
atomic expressions
R S | R | S
S T | ST
T U | T*
Ua|b|c|
U
U (R)
R
R
R S | R | S
S T | ST
T U | T*
Ua|b|c|
U
U (R)
R
R
R
R S | R | S
S T | ST
T U | T*
Ua|b|c|
U
U (R)
R
R
R
R S | R | S
S T | ST
T U | T*
Ua|b|c|
U
U (R)
R
R
R
R S | R | S
S T | ST
T U | T*
Ua|b|c|
S
T
U
U (R)
R
R
R
R S | R | S
S T | ST
T U | T*
Ua|b|c|
U
T T
U (R)
R
R
R
R S | R | S
S T | ST
T U | T*
Ua|b|c|
U
U (R)
T T
U
a
R
R
R
R S | R | S
S T | ST
T U | T*
Ua|b|c|
U
U (R)
T T
U
a
R
R
R
R S | R | S
S T | ST
T U | T*
Ua|b|c|
U
U (R)
T T
U U
a
R
R
R
R S | R | S
S T | ST
T U | T*
Ua|b|c|
U
U (R)
T T
U U
a
R
R
R
R S | R | S
S T | ST
T U | T*
Ua|b|c|
U
U (R)
T T
U U
a
R
R
R
R S | R | S
S T | ST
T U | T*
Ua|b|c|
U
U (R)
T T
U U
R
R
R
R S | R | S
S T | ST
T U | T*
Ua|b|c|
U
U (R)
T T
U U
R
R
R
R S | R | S
S T | ST
T U | T*
Ua|b|c|
U
U (R)
T T
U U
R
R
R
R S | R | S
S T | ST
T U | T*
Ua|b|c|
U
U (R)
T T
U U
T
T
R
R
R
R S | R | S
S T | ST
T U | T*
Ua|b|c|
U
U (R)
T T
U U
R
R
R
R S | R | S
S T | ST
T U | T*
Ua|b|c|
U
U (R)
T T
U U
Precedence Declarations
T
U
T U
U
a
S T | ST
U
U (R)
T U | T*
Ua|b|
R S | R | S
concat
T U
U
a
||
*
a
R S | R | S
S T | ST
T U | T*
Ua|b|
U
U (R)
R
S
T
R S | R | S
S T | ST
T U | T*
Ua|b|
U
U (R)
S
T
U
a
R
S
T
R S | R | S
S T | ST
T U | T*
Ua|b|
U
U (R)
concat
S
T
U
a
or
ET
E.val = T.val
T int
T.val = int.val
T int * T
T (E)
T.val = E.val
ET
E.val = T.val
T int
T.val = int.val
T int * T
T (E)
T.val = E.val
int 26
int
int
ET
E.val = T.val
T int
T.val = int.val
T int * T
T (E)
T.val = E.val
int 26
int
int
ET
E.val = T.val
T int
T.val = int.val
T int * T
T (E)
T.val = E.val
T 130
int 26
int
int
ET
E.val = T.val
T int
T.val = int.val
T int * T
T (E)
T.val = E.val
T 130
int 26
int
int
ET
E.val = T.val
T int
T.val = int.val
T int * T
T (E)
T.val = E.val
T 130
int 26
int
int
ET
E.val = T.val
T int
T.val = int.val
T int * T
T (E)
T.val = E.val
E 137
T 130
int 26
int
int
R.ast
R1.ast
S.ast
S1.ast
T.ast
T1.ast
U.ast
U.ast
U.ast
=
=
=
=
=
=
=
S.ast;
new Or(R2.ast, S.ast);
T.ast;
new Concat(S2.ast, T.ast);
U.ast;
new Star(T2.ast);
new SingleChar('a');
= new Epsilon();
= R.ast;
Summary
Next Time
Top-Down Parsing
Parsing as a Search
Backtracking Parsers
Predictive Parsers
LL(1)