CONTEXT-FREE GRAMMARS Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, TAIWAN c Chuen-Liang Chen, NTUCS&IE / # Parsing function: checking syntactically validity of the input string producing structure of the corresponding parse tree callee: scanner (when need a token) semantic routine (when match a production rule) theoretical basis: context-free grammar executor: parser, syntax analyzer 4 top-down parsing beginning at the start symbol, expanding nonterminals in depth- first manner (predictive in nature) left-most derivation pre-order traversal of parse tree e.g. LL(k) [read from Left; Left-most derivation; k lookaheads], recursive descent parsing 4 bottom-up parsing beginning from terminal string, determining the production used to generate leaves right-most derivation in reverse order post-order traversal of parse tree e.g. LR(k) [read from Left; Right-most derivation; k lookaheads] c Chuen-Liang Chen, NTUCS&IE / # Definitions about context-free grammar (1/2) context-free grammar -- G = (V t , V n , S, P) 4 V t -- set of terminal symbols 4 V n -- set of nonterminal symbols a, b, c, ... V t A, B, C, ... V n U, V, W, ... V = V t V n
u, v, w, ... V t * a, b, g, ... V* 4 S -- start symbol, goal symbol; S V n
4 P -- set of production rules of the form : A a derivation by production rule A g 4 one step derivation : a A b a g b 4 left-most derivation : u A b lm u g b 4 right-most derivation : a A v rm a g v 4 one or more steps derivation : +
+ lm
+ rm
4 zero or more steps derivation : * * lm * rm
c Chuen-Liang Chen, NTUCS&IE / # Definitions about context-free grammar (2/2) set of sentential forms -- SF(G) = { b | S * b } 4 left-most sentential form -- the b so that S * lm b 4 right-most sentential form -- the b so that S * rm b context-free language -- L(G) = SF(G) V t * parse tree, derivation tree -- 4 graphic representation of derivations 4 root -- start symbol 4 leaf nodes -- grammar symbols or l 4 interior nodes -- nonterminals 4 offspring of a nonterminal -- a production for a given sentential form -- 4 phrase -- a sequence of symbols derived from a single nonterminal 4 simple phrase, prime phrase -- minimal phrase 4 handle -- left-most simple phrase c Chuen-Liang Chen, NTUCS&IE / # Example of context-free grammar grammar G 0 --
E Prefix ( E ) | V Tail Prefix F | l Tail + E | l left-most derivation -- right-most derivation --
E lm Prefix ( E ) E rm Prefix ( E )
lm F ( E ) rm Prefix ( V Tail )
lm F ( V Tail ) rm Prefix ( V + E )
lm F ( V + E ) rm Prefix ( V + V Tail )
lm F ( V + V Tail ) rm Prefix ( V + V )
lm F ( V + V ) rm F ( V + V ) right-most sentential forms -- 1. E 2. Prefix ( E ) 3. Prefix ( V Tail ) 4. Prefix ( V + E ) 5. Prefix ( V + V Tail ) 6. Prefix ( V + V ) 7. F ( V + V ) 8. and so on L(G 0 ) { F ( V + V ) } c Chuen-Liang Chen, NTUCS&IE / # parse trees of left-most derivations 4 blue symbols : left-most sentential forms Example of left-most derivation Tail E Prefix ( E ) F V Tail + E V l E Prefix ( E ) E Prefix ( E ) F V Tail E Prefix ( E ) F V Tail + E Tail E Prefix ( E ) F V Tail + E V E E Prefix ( E ) F c Chuen-Liang Chen, NTUCS&IE / # Parsing function: checking syntactically validity of the input string producing structure of the corresponding parse tree callee: scanner (when need a token) semantic routine (when match a production rule) theoretical basis: context-free grammar executor: parser, syntax analyzer 4 top-down parsing beginning at the start symbol, expanding nonterminals in depth- first manner (predictive in nature) left-most derivation pre-order traversal of parse tree e.g. LL(k) [read from Left; Left-most derivation; k lookaheads], recursive descent parsing 4 bottom-up parsing beginning from terminal string, determining the production used to generate leaves right-most derivation in reverse order post-order traversal of parse tree e.g. LR(k) [read from Left; Right-most derivation; k lookaheads] c Chuen-Liang Chen, NTUCS&IE / # trace of top-down parsing (left-most derivation) 4 orange : just derived (predicted) blue : just read (matched) black : derived or read green : un-processed (parse stack) Example of top-down parsing Tail E Prefix ( E ) F V Tail + E V l E Prefix ( E ) E Prefix ( E ) F V Tail E Prefix ( E ) F V Tail + E Tail E Prefix ( E ) F V Tail + E V E E Prefix ( E ) F c Chuen-Liang Chen, NTUCS&IE / # Definitions about context-free grammar (2/2) set of sentential forms -- SF(G) = { b | S * b } 4 left-most sentential form -- the b so that S * lm b 4 right-most sentential form -- the b so that S * rm b context-free language -- L(G) = SF(G) V t * parse tree, derivation tree -- 4 graphic representation of derivations 4 root -- start symbol 4 leaf nodes -- grammar symbols or l 4 interior nodes -- nonterminals 4 offspring of a nonterminal -- a production for a given sentential form -- 4 phrase -- a sequence of symbols derived from a single nonterminal 4 simple phrase, prime phrase -- minimal phrase 4 handle -- left-most simple phrase c Chuen-Liang Chen, NTUCS&IE / # Example of right-most derivation (1/2) parse trees of right-most derivations and corresponding sentential form, phrases, simple phrases, handle 4 blue symbols : sentential form 4 : phrase 4 : simple phrase 4 : handle
E Prefix ( E ) E Prefix ( E ) V Tail E Prefix ( E ) V Tail + E E Prefix ( V + E ) Prefix ( V Tail ) E Prefix ( E ) c Chuen-Liang Chen, NTUCS&IE / # Example of right-most derivation (2/2) E Prefix ( E ) F V Tail + E V Tail l E Prefix ( E ) V Tail + E V Tail E Prefix ( E ) V Tail + E V Tail l Prefix ( V + V Tail ) Prefix ( V + V l ) F ( V + V l ) c Chuen-Liang Chen, NTUCS&IE / # Parsing function: checking syntactically validity of the input string producing structure of the corresponding parse tree callee: scanner (when need a token) semantic routine (when match a production rule) theoretical basis: context-free grammar executor: parser, syntax analyzer 4 top-down parsing beginning at the start symbol, expanding nonterminals in depth- first manner (predictive in nature) left-most derivation pre-order traversal of parse tree e.g. LL(k) [read from Left; Left-most derivation; k lookaheads], recursive descent parsing 4 bottom-up parsing beginning from terminal string, determining the production used to generate leaves right-most derivation in reverse order post-order traversal of parse tree e.g. LR(k) [read from Left; Right-most derivation; k lookaheads] c Chuen-Liang Chen, NTUCS&IE / # trace of bottom-up parsing (inverse order of right-most derivation) 4 blue : just read (shifted) orange : just derived (reduced to) pink : not read green : derived or read (parse stack) Example of bottom-up parsing ( ) F V + V l Prefix ( ) F V + E V Tail l Prefix ( ) F V + V l Prefix ( ) F V + V Tail l Prefix ( ) F V Tail + E V Tail l Prefix ( E ) F V Tail + E V Tail l E Prefix ( E ) F V Tail + E V Tail l c Chuen-Liang Chen, NTUCS&IE / # Examples - example 1 4
4 lookahead is unnecessary example 2 4 service | service | (l) 4 lookahed is required c Chuen-Liang Chen, NTUCS&IE / # Ambiguity of grammar a string with two different parse trees (i.e., two different structures) example : <exp> <exp> - <exp> <exp> id for an unambiguous grammar, parse trees of leftmost derivation and right-most derivation are the same <exp> <exp> <exp> <exp> <exp> id - - id id <exp> <exp> <exp> <exp> <exp> id - - id id c Chuen-Liang Chen, NTUCS&IE / # First set and Follow set (1/2) First(a) = { a V t | a * a b } ( if a * l then {l} else ) 4 set of all terminals that can begin a sentential form derived from a 4 First k (a) -- set of k-symbol terminal strings that can begin a sentential form derived from a 4 QUIZ: for what? Follow(A) = { a V t | S + a A a b } ( if S + a A then {l} else ) 4 set of all terminals that may follow A in some sentential form 4 Follow k (A) -- set of k-symbol terminal strings that may follow A in some sentential form 4 QUIZ: for what? c Chuen-Liang Chen, NTUCS&IE / # First set and Follow set (2/2) example 1 -- E Prefix ( E ) E V Tail Prefix F | l Tail + E | l example 2 -- S a S e | B B b B e | C C c C e | d example 3 -- S A B c A a | l B b | l S B C First_set { a, b, c, d } { b, c, d } { c, d } Follow_set { e, l } { e, l } { e, l } S A B First_set { a, b, c } { a, l } { b, l } Follow_set { l } { b, c } { c } E Prefix Tail First_set { V, F, ( } { F, l } { +, l } Follow_set { l, ) } { ( } { l, ) } c Chuen-Liang Chen, NTUCS&IE / # Algorithms for First & Follow sets (1/6) typedef int symbol; /* a symbol in the grammar */
/* The symbolic constants used * below, NUM_TERMINALS, * NUM_NONTERMINALS, and * NUM_PRODUCTIONS are * determined by the grammar. * MAX_RHS_LENGTH should * simply be "big enough." */
#define VOCABULARY (NUM_NONTERMINALS + NUM_TERMINALS) typedef struct gram { symbol terminals[NUM_TERMINALS]; symbol nonterminals[NUM_NONTERMINALS]; symbol start_symbol; int num_productions; struct prod { symbol lhs; int rhs_length; symbol rhs[MAX_RHS_LENGTH]; } productions[NUM_PRODUCTIONS]; symbol vocabulary[VOCABULARY]; } grammar;
typedef struct prod production;
typedef symbol terminal; typedef symbol nonterminal; c Chuen-Liang Chen, NTUCS&IE / # Algorithms for First & Follow sets (2/6) typedef short boolean; typedef boolean marked_vocabulary[VOCABULARY]; /* * Mark those vocabulary symbols found to derive l (directly or indirectly). */ marked_vocabulary mark_lambda(const grammar g) { static marked_vocabulary derives_lambda; boolean changes; /* any changes during last iteration? */ boolean rhs_derives_lambda; /* does the RHS derive l? */ symbol v; /* a word in the vocabulary */ production p; /* a production in the grammar */ int i, j; /* loop variables */
for (v = 0; v < VOCABULARY; v++) derives_lambda[v] = FALSE; /* initially, nothing is marked */ c Chuen-Liang Chen, NTUCS&IE / # Algorithms for First & Follow sets (3/6) do { changes = FALSE; for (i = 0; i < g.num_productions; i++) { p = g.productions[i]; if (! derives_lambda[p.lhs]) { if (p.rhs_length == 0) { /* derives l directly */ changes = derives_lambda[p.lhs] = TRUE; continue; } /* does each part of RHS derive l? */ rhs_derives_lambda = derives_lambda[p.rhs[0]]; for (j = 1; j < p.rhs_length, j++) rhs_derives_lambda = rhs_derives_lambda && derives_lambda[p.rhs[j]]; if (rhs_derives_lambda) changes = derives_lambda[p.lhs] = TRUE; } } } while (changes); return derives_lambda; } c Chuen-Liang Chen, NTUCS&IE / # Algorithms for First & Follow sets (4/6) typedef set_of_terminal_or_lambda termset; termset follow_set[NUM_NONTERMINAL]; termset first_set[SYMBOL]; marked_vocabulary derives_lambda = mark_lambda(g); /* mark_lambda(g) as defined above */ termset compute_first(string_of_symbols alpha) { int i, k; termset result; k = length(alpha); if (k == 0) result = SET_OF( l ); else { result = first_set[alpha[0]] - SET_OF( l ) ; for (i = 1; i < k && l first_set[alpha[i-1] ]; i++) result = result ( first_set[alpha[i]] - SET_OF( l ) ); if (i == k && l first_set[alpha[k - 1]]) result = result SET_OF( l ); } return result; } c Chuen-Liang Chen, NTUCS&IE / # Algorithms for First & Follow sets (5/6) extern grammar g;
void fill_first_set(void) { nonterminal A; terminal a; production p; boolean changes; int i, j;
for (i = 0; i < NUM_NONTERMINAL; i++) { A = g.nonterminals[i]; if (derives_lambda[A]) first_set[A] = SET_OF( l ); else first_set[A] = ; }
for (i = 0; i < NUM_TERMINAL; i++) { a = g.terminals[i]; first_set[a] = SET_OF( a ); for (j = 0; j < NUM_NONTERMINAL; j++) { A = g.nonterminals[j]; if (there exists a production Aab) first_set[A] = first_set[A] SET_OF( a ); } } do { changes = FALSE; for (i = 0; i < g.num_productions; i++) { p = g.productions[i]; first_set[p.lhs] = first_set[p.lhs] compute_first(p.rhs); if ( first_set changed ) changes = TRUE; } } while (changes); } QUIZ: termination? QUIZ: correctness? c Chuen-Liang Chen, NTUCS&IE / # Algorithms for First & Follow sets (6/6) void fill_follow_set(void) { nonterminal A, B; int i; boolean changes;
for (i = 0; i < NUM_NONTERMINAL; i++) { A = g.nonterminals[i]; follow_set[A] = ; } follow_set[g.start_symbol] = SET_OF( l );
do { changes = FALSE; for (each production A a B b ) { /* * I.e. for each production and each * occurrence of a nonterminal in its * right-hand side. */ follow_set[B] = follow_set[B] (compute_first(b) - SET_OF( l )); if ( l compute_first(b) ) follow_set[B] = follow_set[B] follow_set[A]; if ( follow_set[B] changed ) changes = TRUE; } } while (changes); } QUIZ: termination? QUIZ: correctness? c Chuen-Liang Chen, NTUCS&IE / # Tracing examples example 1 -- E Prefix C ( E C )O E V Tail C O Prefix FO| lO Tail + E C O | lO example 2 -- S a S C eO | B C OO B b B C eO| C C O C c C C eO | dO example 3 -- S A C B C cO A aO | lO B bO | lO S A B First_set { a, b, c } { a, l } { b, l } Follow_set { l } { b, c } { c } C C C O O O O O O O S B C First_set { a, b, c, d } { b, c, d } { c, d } Follow_set { l, e } { e, l } { e, l } C C C C C O O O O O C C O O O O E Prefix Tail First_set { V, F, ( } { F, l } { +, l } Follow_set { l, ) } { ( } { l, ) } C C C C O O C C O O O O O c Chuen-Liang Chen, NTUCS&IE / # From extended BNF to CFG <statement list> <statement> { <statement> } + <statement list> <statement> <statement tail> <statement tail> <statement> <statement tail> <statement tail> l QUIZ: how, systematically? c Chuen-Liang Chen, NTUCS&IE / # Other types of grammars regular grammar -- A a B or C l 4 QUIZ: how? context-free grammar -- A a context-sensitive grammar -- a A b a d b type 0 grammar -- a b
regular grammar : too simple, e.g., { [ i ] i | i 1 } 4 QUIZ: how to specify { [ i ] i | i 1 } by context-free grammar? context-sensitive, type 0 : without sufficient parser context-free grammar : a balance between generality and practicality