Sunteți pe pagina 1din 20

11-711 Algorithms for NLP

LR Parsing

Reading: Hopcroft and Ullman, Intro. to Automata Theory, Lang. and Comp. Section 10.6-10.7, pp. 248256

Shift-Reduce Parsing
A class of parsers with the following principles: Parsing is done Bottom-Up, reducing the input into the grammar start symbol The parser builds a right-most derivation of the input in reverse Parsing algorithm simulates the operation of a PDA Prex of the sentential form is kept on the stack Two types of operation: Shift the next input symbol onto the stack Reduce the stack by popping the RHS of a grammar rule, and pushing the corresponding LHS non-terminal symbol Parser is usually deterministic and with no back-tracking Extremely efcient, operating in linear time

But - possible to construct for only a limited class of CFGs


1 11-711 Algorithms for NLP

LR Parsing
General Principles: Use sets of dotted grammar rules to reect the state of the parser: What constituents have we constructed so far What constituents are we predicting next Pre-compile the grammar into a collection of nite sets of dotted rules Use these sets to capture the state of the parser during parsing The Parser is a deterministic shift-reduce parser. Developed by Knuth in the late 1960s - as a framework for compiling programming languages
2 11-711 Algorithms for NLP

LR Parsing Algorithm
Performs shift and reduce parsing actions on the stack, and changes state with each operation Is driven by a pre-compiled parsing table that has two parts The action table species the next shift or reduce parsing operation The goto table species which state to transfer to after a reduction The stack stores a string of the form 0 1 1 2 are parser states and the are grammar symbols


where the

At each step the parser does one of the following types of operations: Shift(s): Push the current input symbol the new state

on the stack followed by

Reduce(i): Reduce the stack according to rule of the grammar Reject: Reject the input as ungrammatical and signal an error Accept: Accept the input as grammatical and halt
3 11-711 Algorithms for NLP


 

LR Parsing - Example
The Grammar: 1 2
    !"# !$ &'%   !"# &      

3


4


5


6
  1 )

The original input: The large can can hold the water POS assigned input: art adj n aux v art n art adj n aux v art n $ Parser input:
1 ) 1 )

   !$ & % '   !()   0    

11-711 Algorithms for NLP

LR Parsing - Example
Constructed Parsing Table for the Grammar:

11-711 Algorithms for NLP

Reduce State 0 1 2 3 4 5 6 7 8 9 10 11 12 13 r3 r4 r5 r6 r2 sh3 sh4 r1 sh8 art sh3 adj sh4 n

Shift aux v $ acc sh6 sh9 sh10 sh6 sh13 sh7 12 sh7 NP 2

Goto VP S 1 5

11

11-711 Algorithms for NLP

LR Parsing - Example
The input:
)

art adj n aux v art n $

11-711 Algorithms for NLP

Step 0 1 2 3 4 5 6 7 8 9 10 11 12 13

Action

Stack after action


0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 3 3 8 8 13
2 2 7 5 2 2

sh3 sh8 sh13 r2 sh6 sh7 sh3 sh9 r3 r6 r5 r1 accept

2 2 2 2 2 2 2 2 2 2 2 2 2

34 34 34 9 9 9 9 9 9 9 9 @ @ @ @ @ @ @ @ 5 5

1 1

2 2 2 2

36 36 7 B B B B B B @

2 2 2 2 2 2 2 2 6 6 6 6 6 6 7 7 7 7 3 3 9
2 5 2 2 2

2 2 2 2 2 2 2 2

3A 3A 3A 3A 3A 3A 2 D

2 2 2 2 2

C C C C D

2 2 2

34 34 9

11

5 @

12

E E

11-711 Algorithms for NLP

Constructing an SLR Parsing Table


An 0 item is a dotted grammar rule
  F G I H I H P U R  S T V  W   Q

We construct a deterministic FSA that recognizes prexes of rightmost sentential forms of the grammar . The states of the FSA are sets of LR(0) items We augment the grammar with a new start rule We dene the closure operation on a set 1. Every item in 2. If add
R  c T d S   Q

of

0 items:
 

is also in
XY` (a "b

XY`

a(

"b

and

  

is a rule in
e

, then

to
T

The closure operation adds predicted new items to the set (similar to Earleys Predictor operation)
10

R  e H

XY`  

a(

"b

11-711 Algorithms for NLP

Constructing an SLR Parsing Table


We dene the Goto operation for an item set and a grammar symbol : is the closure of the set of all items such that
 f f I H f

U`

Example:

Similar to Earleys Scanner and Completer operations

I H P

c T S

1     h H Ti p q q t q q    



U` 

  !()

g  r q q   0 s v  u q q

11

11-711 Algorithms for NLP

Constructing an SLR Parsing Table


l

We construct the collection of sets of LR(0) items for an augmented grammar G We start with the item set So = {closure({[S1+ a s ] ) ) ) The algorithm:
procedure iterns ( G' ) ; begin C := { ~ . l o s u r e ( { l S1 . .SI})}; repeat for each set o f items I in C and each grammar symbol X such that goto ( I . X ) is not empty and not in C do add g o t o ( / , X ) to C until no more sets o f items can be added to C end

11-711 Algorithms for NLP

0 items for our simple NL Grammar:

@ 87 @ 365 9 3A @ @ @ C 34 BD

Constructing an SLR Parsing Table - Example

11 :

12 :

E E

@ @ BD 9 3A 3A 9 @ D D D D 9 @ @ @ 9 @ 9 @ E @ @ 9 C C 34 34 36 34 9 365 87 536 85 D @ B D @ @ 7 @ 87 8

13 :

58 34 @ 9

87 36 @

x w

Building the collection of sets of

E E

@ D @ E y E @ E 9 9 34 536

87 @ 58 34 @ 9 9 @ y E 36 @ E 9 87

@ BD D 3A @ E D @ D 9 9 C @ 34 @ 36 5

87 8 5 7 34 @ 9 9 @ 36 8

0:

1:

2:

3:

4:
E

10 : 8: 9:
E

5:

6:

7:

13

11-711 Algorithms for NLP

Constructing an SLR Parsing Table


Building the FSA and Parsing Table: 1. Construct the collection of sets of grammar 2. State corresponds to item set
! a & g   V    H `  g  

0 items from the


 

F G 

(a) For any terminal symbol , if then set


1

and
a

I H

Q c T S 1

(b) If (c) If
#!

(d) For any non-terminal symbol then set


1 R U` #  # `  

(e) All table entries not dened in (a)-(d) are set as error
14 11-711 Algorithms for NLP

U`

#!

&

and
$ ( b

is rule then set for all terminal symbols


Q

I H `

 c T 1 a ! "b  c T

then set

$
 

1 I

#!

&

bX

, if
a 

and

I H P R

R Q cdT S 1 

` g 

Constructing an SLR Parsing Table - Example


The constructed FSA for our example grammar:
S0
S

S1

NP adj art

S2

VP

S5

aux

S4
n n

S3
art adj v v

aux

S6
VP

S7 S10 S9 S8
NP n

S11

S13
adj
15

S12

11-711 Algorithms for NLP

LR(k) Parsing
How to handle conicts in the SLR table: A table conict: more than one action is specied in
 

Conicts can be either shift-reduce or reduce-reduce Parser will not be able to parse deterministically A Grammar for which this happens is not SLR More powerful techniques for building item sets can sometimes resolve the problem, by making use of lookaheads into the input Known techniques: Canonical LR(k), LALR(k) A lookahead of one is sufcient (and optimal) in many cases Another option - extending the LR Parsing algorithm: GLR Parsing

16

11-711 Algorithms for NLP

Parsing with an LR Parser


The pointers that form the parse tree can be created while performing reduce actions A parse node is created for each constituent that is pushed onto the stack When we do a reduce - we create a new parse node for the LHS non-terminal and link it to the parse-nodes of the popped RHS constituents At the end - the the parse tree constituent on the stack points to the root of


17

11-711 Algorithms for NLP

LR Parsing - Example
The input:
)

art adj n aux v art n $

18

11-711 Algorithms for NLP

Step 0 1 2 3 4 5 6 7 8 9 10 11 12 13

Action

Stack after action


0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 4 4 4 4 4 4 4 4 3 3 3 2 2 8 8 3 13
2 2 7 2

Parse Node

sh3 sh8 sh13 r2 sh6 sh7 sh3 sh9 r3 r6 r5 r1 accept

1 2 3 4
5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 9 3 3 8 9
2 5 2 2

art adj n NP (1 2 3) aux v art n NP (7 8)

2 2 2 2 2 2 2 2 2 2 2 2 2

34 34 34 9 9 9 9 9 9 9 9 E E

12
2

5 5 5 @ @ @ @ @ @ @ @

2 2 2

36 36

2 2 2 2 2 2 2 2
B 2

2 2 2 2 2 2 2 2

5 6 7 8 9

3A 3A 3A 3A 3A 3A @ D B B B B B

2 2 2 2 2

C C C C D

2 2 2

10

34 34 9 2

11

5 @

12

10 VP (6 9) 11 VP (5 10) 12 S (4 11)

11

19

11-711 Algorithms for NLP

S-ar putea să vă placă și