Context Free Grammar

c
Chuen-Liang Chen, NTUCS&IE / #

CONTEXT-FREE GRAMMARS
Chuen-Liang Chen
Department of Computer Science
and Information Engineering
National Taiwan University
Taipei, TAIWAN
c
Parsing
function: checking syntactically validity of the input string
producing structure of the corresponding parse tree
callee: scanner (when need a token)
semantic routine (when match a production rule)
theoretical basis: context-free grammar
executor: parser, syntax analyzer
4 top-down parsing
beginning at the start symbol, expanding nonterminals in depth-
first manner (predictive in nature)
left-most derivation
pre-order traversal of parse tree
e.g. LL(k) [read from Left; Left-most derivation; k lookaheads],
recursive descent parsing
4 bottom-up parsing
beginning from terminal string, determining the production used
to generate leaves
right-most derivation in reverse order
post-order traversal of parse tree
e.g. LR(k) [read from Left; Right-most derivation; k lookaheads]
c
Definitions about context-free grammar (1/2)
context-free grammar -- G = (V
t
, V
n
, S, P)
4 V
t
-- set of terminal symbols
4 V
n
-- set of nonterminal symbols
a, b, c, ... V
t
A, B, C, ... V
n
U, V, W, ... V = V
t
V
n

u, v, w, ... V
t
* a, b, g, ... V*
4 S -- start symbol, goal symbol; S V
n

4 P -- set of production rules of the form : A a
derivation by production rule A g
4 one step derivation : a A b a g b
4 left-most derivation : u A b
lm
u g b
4 right-most derivation : a A v
rm
a g v
4 one or more steps derivation :
+

+
lm

+
rm

4 zero or more steps derivation : * *
lm
*
rm

c
set of sentential forms -- SF(G) = { b | S * b }
4 left-most sentential form -- the b so that S *
lm
b
4 right-most sentential form -- the b so that S *
rm
b
context-free language -- L(G) = SF(G) V
t
*
parse tree, derivation tree --
4 graphic representation of derivations
4 root -- start symbol
4 leaf nodes -- grammar symbols or l
4 interior nodes -- nonterminals
4 offspring of a nonterminal -- a production
for a given sentential form --
4 phrase -- a sequence of symbols derived from a single nonterminal
4 simple phrase, prime phrase -- minimal phrase
4 handle -- left-most simple phrase
c
Example of context-free grammar
grammar G
0
--

E Prefix ( E ) | V Tail
Prefix F | l
Tail + E | l
left-most derivation -- right-most derivation --

E
lm
Prefix ( E ) E
rm
Prefix ( E )

lm
F ( E )
rm
Prefix ( V Tail )

lm
F ( V Tail )
rm
Prefix ( V + E )

lm
F ( V + E )
rm
Prefix ( V + V Tail )

lm
F ( V + V Tail )
rm
Prefix ( V + V )

lm
F ( V + V )
rm
F ( V + V )
right-most sentential forms -- 1. E 2. Prefix ( E ) 3. Prefix ( V Tail )
4. Prefix ( V + E ) 5. Prefix ( V + V Tail ) 6. Prefix ( V + V ) 7. F ( V + V )
8. and so on
L(G
0
) { F ( V + V ) }
c
parse trees of left-most derivations
4 blue symbols : left-most sentential forms
Example of left-most derivation
Tail
E
Prefix ( E )
F V Tail
+ E
V
l
E
Prefix ( E )
E
Prefix ( E )
F V Tail
E
Prefix ( E )
F V Tail
+ E Tail
E
Prefix ( E )
F V Tail
+ E
V
E E
Prefix ( E )
F
c
Parsing
4 top-down parsing
4 bottom-up parsing
to generate leaves
c
trace of top-down parsing (left-most derivation)
4 orange : just derived (predicted) blue : just read (matched)
black : derived or read green : un-processed (parse stack)
Example of top-down parsing
Tail
E
Prefix ( E )
F V Tail
+ E
V
l
E
Prefix ( E )
E
Prefix ( E )
F V Tail
E
Prefix ( E )
F V Tail
+ E Tail
E
Prefix ( E )
F V Tail
+ E
V
E E
Prefix ( E )
F
c
set of sentential forms -- SF(G) = { b | S * b }
4 left-most sentential form -- the b so that S *
lm
b
4 right-most sentential form -- the b so that S *
rm
b
context-free language -- L(G) = SF(G) V
t
*
parse tree, derivation tree --
4 graphic representation of derivations
4 root -- start symbol
4 leaf nodes -- grammar symbols or l
4 interior nodes -- nonterminals
4 offspring of a nonterminal -- a production
for a given sentential form --
4 phrase -- a sequence of symbols derived from a single nonterminal
4 simple phrase, prime phrase -- minimal phrase
4 handle -- left-most simple phrase
c
Example of right-most derivation (1/2)
parse trees of right-most derivations and corresponding sentential
form, phrases, simple phrases, handle
4 blue symbols : sentential form
4 : phrase
4 : simple phrase
4 : handle

E
Prefix ( E )
E
Prefix ( E )
V Tail
E
Prefix ( E )
V Tail
+ E
E
Prefix ( V + E ) Prefix ( V Tail ) E Prefix ( E )
c
Example of right-most derivation (2/2)
E
Prefix ( E )
F V Tail
+ E
V Tail
l
E
Prefix ( E )
V Tail
+ E
V Tail
E
Prefix ( E )
V Tail
+ E
V Tail
l
Prefix ( V + V Tail ) Prefix ( V + V l ) F ( V + V l )
c
Parsing
4 top-down parsing
4 bottom-up parsing
to generate leaves
c
trace of bottom-up parsing (inverse order of right-most derivation)
4 blue : just read (shifted) orange : just derived (reduced to)
pink : not read green : derived or read (parse stack)
Example of bottom-up parsing
( ) F V + V l
Prefix ( )
F
V + E
V Tail
l
Prefix ( )
F
V + V l
Prefix ( )
F
V + V Tail
l
Prefix ( )
F
V Tail
+ E
V Tail
l
Prefix ( E )
F V Tail
+ E
V Tail
l
E
Prefix ( E )
F V Tail
+ E
V Tail
l
c
Examples -
example 1
4

4 lookahead is unnecessary
example 2
4 service
|
service | (l)
4 lookahed is required
c
Ambiguity of grammar
a string with two different parse trees (i.e., two different structures)
example : <exp> <exp> - <exp>
<exp> id
for an unambiguous grammar, parse trees of leftmost derivation and
right-most derivation are the same
<exp> <exp>
<exp> <exp>
<exp>
id
-
-
id
id
<exp> <exp>
<exp> <exp>
<exp>
id
-
-
id
id
c
First set and Follow set (1/2)
First(a) = { a V
t
| a * a b } ( if a * l then {l} else )
4 set of all terminals that can begin a sentential form derived from a
4 First
k
(a) -- set of k-symbol terminal strings that can begin a
sentential form derived from a
4 QUIZ: for what?
Follow(A) = { a V
t
| S
+
a A a b } ( if S
+
a A then {l} else )
4 set of all terminals that may follow A in some sentential form
4 Follow
k
(A) -- set of k-symbol terminal strings that may follow A in
some sentential form
4 QUIZ: for what?
c
First set and Follow set (2/2)
example 1 --
E Prefix ( E )
E V Tail
Prefix F | l
Tail + E | l
example 2 --
S a S e | B
B b B e | C
C c C e | d
example 3 --
S A B c
A a | l
B b | l
S B C
First_set { a, b, c, d } { b, c, d } { c, d }
Follow_set { e, l } { e, l } { e, l }
S A B
First_set { a, b, c } { a, l } { b, l }
Follow_set { l } { b, c } { c }
E Prefix Tail
First_set { V, F, ( } { F, l } { +, l }
Follow_set { l, ) } { ( } { l, ) }
c
Algorithms for First & Follow sets (1/6)
typedef int symbol;
/* a symbol in the grammar */

/* The symbolic constants used
* below, NUM_TERMINALS,
* NUM_NONTERMINALS, and
* NUM_PRODUCTIONS are
* determined by the grammar.
* MAX_RHS_LENGTH should
* simply be "big enough."
*/

#define VOCABULARY
(NUM_NONTERMINALS +
NUM_TERMINALS)
typedef struct gram {
symbol terminals[NUM_TERMINALS];
symbol nonterminals[NUM_NONTERMINALS];
symbol start_symbol;
int num_productions;
struct prod {
symbol lhs;
int rhs_length;
symbol rhs[MAX_RHS_LENGTH];
} productions[NUM_PRODUCTIONS];
symbol vocabulary[VOCABULARY];
} grammar;

typedef struct prod production;

typedef symbol terminal;
typedef symbol nonterminal;
c
typedef short boolean;
typedef boolean marked_vocabulary[VOCABULARY];
/*
* Mark those vocabulary symbols found to derive l (directly or indirectly).
*/
marked_vocabulary mark_lambda(const grammar g)
{
static marked_vocabulary derives_lambda;
boolean changes; /* any changes during last iteration? */
boolean rhs_derives_lambda; /* does the RHS derive l? */
symbol v; /* a word in the vocabulary */
production p; /* a production in the grammar */
int i, j; /* loop variables */

for (v = 0; v < VOCABULARY; v++)
derives_lambda[v] = FALSE;
/* initially, nothing is marked */
c
do {
changes = FALSE;
for (i = 0; i < g.num_productions; i++) {
p = g.productions[i];
if (! derives_lambda[p.lhs]) {
if (p.rhs_length == 0) {
/* derives l directly */
changes = derives_lambda[p.lhs] = TRUE;
continue;
}
/* does each part of RHS derive l? */
rhs_derives_lambda = derives_lambda[p.rhs[0]];
for (j = 1; j < p.rhs_length, j++)
rhs_derives_lambda = rhs_derives_lambda && derives_lambda[p.rhs[j]];
if (rhs_derives_lambda)
changes = derives_lambda[p.lhs] = TRUE;
}
}
} while (changes);
return derives_lambda;
}
c
typedef set_of_terminal_or_lambda termset;
termset follow_set[NUM_NONTERMINAL];
termset first_set[SYMBOL];
marked_vocabulary derives_lambda = mark_lambda(g);
/* mark_lambda(g) as defined above */
termset compute_first(string_of_symbols alpha)
{
int i, k;
termset result;
k = length(alpha);
if (k == 0)
result = SET_OF( l );
else {
result = first_set[alpha[0]] - SET_OF( l ) ;
for (i = 1; i < k && l first_set[alpha[i-1] ]; i++)
result = result ( first_set[alpha[i]] - SET_OF( l ) );
if (i == k && l first_set[alpha[k - 1]])
result = result SET_OF( l );
}
return result;
}
c
extern grammar g;

void fill_first_set(void)
{
nonterminal A;
terminal a;
production p;
boolean changes;
int i, j;

for (i = 0; i < NUM_NONTERMINAL;
i++) {
A = g.nonterminals[i];
if (derives_lambda[A])
first_set[A] = SET_OF( l );
else
first_set[A] = ;
}

for (i = 0; i < NUM_TERMINAL; i++) {
a = g.terminals[i];
first_set[a] = SET_OF( a );
for (j = 0; j < NUM_NONTERMINAL; j++) {
A = g.nonterminals[j];
if (there exists a production Aab)
first_set[A] = first_set[A] SET_OF( a );
}
}
do {
changes = FALSE;
for (i = 0; i < g.num_productions; i++) {
p = g.productions[i];
first_set[p.lhs] = first_set[p.lhs]
compute_first(p.rhs);
if ( first_set changed )
changes = TRUE;
}
} while (changes);
}
QUIZ: termination?
QUIZ: correctness?
c
void fill_follow_set(void)
{
nonterminal A, B;
int i;
boolean changes;

for (i = 0; i < NUM_NONTERMINAL; i++) {
A = g.nonterminals[i];
follow_set[A] = ;
}
follow_set[g.start_symbol] = SET_OF( l );

do {
changes = FALSE;
for (each production A a B b ) {
/*
* I.e. for each production and each
* occurrence of a nonterminal in its
* right-hand side.
*/
follow_set[B] = follow_set[B]
(compute_first(b) - SET_OF( l ));
if ( l compute_first(b) )
follow_set[B] = follow_set[B]
follow_set[A];
if ( follow_set[B] changed )
changes = TRUE;
}
} while (changes);
}
QUIZ: termination?
QUIZ: correctness?
c
Tracing examples
example 1 --
E Prefix
C
( E
C
)O
E V Tail
C
O
Prefix FO| lO
Tail + E
C
O | lO
example 2 --
S a S
C
eO | B
C
OO
B b B
C
eO| C
C
O
C c C
C
eO | dO
example 3 --
S A
C
B
C
cO
A aO | lO
B bO | lO
S A B
First_set { a, b, c } { a, l } { b, l }
Follow_set { l } { b, c } { c }
C C C
O O O O O O O
S B C
First_set { a, b, c, d } { b, c, d } { c, d }
Follow_set { l, e } { e, l } { e, l }
C C C C C
O O O O O
C C
O O O O
E Prefix Tail
First_set { V, F, ( } { F, l } { +, l }
Follow_set { l, ) } { ( } { l, ) }
C C C C
O O
C C
O O O O O
c
From extended BNF to CFG
<statement list> <statement> { <statement> }
+
<statement list> <statement> <statement tail>
<statement tail> <statement> <statement tail>
<statement tail> l
QUIZ: how, systematically?
c
Other types of grammars
regular grammar -- A a B or C l
4 QUIZ: how?
context-free grammar -- A a
context-sensitive grammar -- a A b a d b
type 0 grammar -- a b

regular grammar : too simple, e.g., { [
i
]
i
| i 1 }
4 QUIZ: how to specify { [
i
]
i
| i 1 } by context-free grammar?
context-sensitive, type 0 : without sufficient parser
context-free grammar : a balance between generality and practicality

Context Free Grammar

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Context Free Grammar

Încărcat de

Drepturi de autor:

Formate disponibile

c

Chuen-Liang Chen, NTUCS&IE / #

S-ar putea să vă placă și