Sunteți pe pagina 1din 21

CS 598 JH: Advanced NLP (Spring ʼ09)

Review

Julia Hockenmaier
juliahmr@illinois.edu
3324 Siebel Center
Office Hours: Fri, 2:00-3:00pm

http://www.cs.uiuc.edu/~juliahmr/cs598
What is the structure
of a sentence?

Sentence structure is hierarchical:


A sentence consists of words (I, eat, sushi, with, tuna)
..which form phrases or constituents: “sushi with tuna”

Sentence structure defines dependencies


between words or phrases:

[I [eat [ sushi [ with tuna ] ] ] ]


CS 598 JH: Advanced NLP (Springʼ09)
2
Strong vs. weak
generative capacity
Formal language theory:
- defines language as string sets
- is only concerned with generating these strings
(weak generative capacity)

Formal/Theoretical syntax (in linguistics):


- defines language as sets of strings with (hidden) structure
- is also concerned with generating the right structures
(strong generative capacity)

CS 598 JH: Advanced NLP (Springʼ09)


3
Context-free grammars (CFGs)
capture recursion

Language has complex constituents


(“the garden behind the house”)

Syntactically, these constituents behave


just like simple ones.
(“behind the house” can always be omitted)

CFGs define nonterminal categories


to capture equivalent constituents.

CS 598 JH: Advanced NLP (Springʼ09)


4
Context-free grammars
A CFG is a 4-tuple〈N,Σ,R,S〉
A set of nonterminals N
(e.g. N = {S, NP, VP, PP, Noun, Verb, ....})

A set of terminals Σ
(e.g. Σ = {I, you, he, eat, drink, sushi, ball, })

A set of rules R
R ⊆ {A → β with left-hand-side (LHS) A ∈ N
and right-hand-side (RHS) β ∈ (N ∪ Σ)* }
A start symbol S (sentence)

CS 598 JH: Advanced NLP (Springʼ09)


5
An example
N → {ball, garden, house, sushi }
P → {in, behind, with}
NP → N
NP → NP PP
PP → P NP

N: noun
P: preposition
NP: “noun phrase”
PP: “prepositional phrase”
CFGs define parse trees

N → {sushi, tuna}
P → {with} Correct analysis
V → {eat} VP
NP
NP → N PP
V NP P NP
NP → NP PP
eat sushi with tuna eat
PP → P NP
VP
VP → V NP
VP PP
V NP P NP
eat sushi with chopsticks eat sus

Incorrect analys
CFGs are equivalent to
Pushdown automata (PDAs)
PDAs are FSAs with an additional stack:
Emit a symbol and push/pop a symbol from the stack

Push ʻxʼ Pop ʻxʼ Accept if


on stack. from stack. stack
Emit ʻaʼ Emit ʻbʼ empty.

This is equivalent to the following CFG:


S→aXb
X→aXb
X→ab
The Chomsky Hierarchy
Parsing
Language Automata Dependencies
complexity

Type 3 Regular Finite-state linear adjacent words

Type 2 Context-Free Pushdown cubic nested

Context- Linear
Type 1 exponential
sensitive Bounded

Recursively Turing
Type 0
Enumerable machine
Constituents:
Heads and dependents
There are different kinds of constituents:
Noun phrases: the man, a girl with glasses, Illinois
Prepositional phrases: with glasses, in the garden
Verb phrases: eat sushi, sleep, sleep soundly

Every phrase has a head:


Noun phrases: the man, a girl with glasses, Illinois
Prepositional phrases: with glasses, in the garden
Verb phrases: eat sushi, sleep, sleep soundly
The other parts are its dependents.
Dependents are either arguments or adjuncts

CS 598 JH: Advanced NLP (Springʼ09)


10
Two ways to represent structure

Phrase structure trees Dependency trees


Correct analysis
VP
NP
PP
V NP P NP
eat sushi with tuna eat sushi with tuna
VP

VP PP
V NP P NP
eat sushi with chopsticks eat sushi with chopsticks

Incorrect analysis
Structure (Syntax)
corresponds to
Meaning (Semantics)
Correct analysis
VP
NP
PP
V NP P NP
eat sushi with tuna eat sushi with tuna
VP

VP PP
V NP P NP
eat sushi with chopsticks eat sushi with chopsticks

Incorrect analysis
VP

VP PP
V NP P NP
eat sushi with tuna eat sushi with tuna
VP
NP
PP
V NP P NP
eat sushi with chopsticks eat sushi with chopsticks
Dependency grammar
DGs describe the structure of sentences as graph.
The nodes of the graph are the words
The edges of the graph are the dependencies.

The relationship between DG and CFGs:


If a CFG phrase structure tree is translated into DG,
the resulting dependency graph has no crossing edges.

CS 598 JH: Advanced NLP (Springʼ09)


13
CKY chart parsing algorithm
Bottom-up parsing:
start with the words
Dynamic programming:
save the results in a table/chart
re-use these results in finding larger constituents

Complexity: O(n3|G|)
n: length of string, |G|: size of grammar)

Presumes a CFG in Chomsky Normal Form:


Rules are all either A → B C or A → a
(with A,B,C nonterminals and a a terminal)

CS 598 JH: Advanced NLP (Springʼ09)


14
The CKY parsing algorithm

NP
we we eat S
we eat sushi

S → NP VP V
eat VP
eat sushi

VP → V NP
V → eat
NP → we NP
sushi

NP → sushi
We eat sushi
NP VP
Exercise: CKY parser
DT N VP PP

S → NP VP
NP → NP PP
NP → Noun
VP → VP PP
VP → Verb NP

I eat sushi with chopsticks


eat sushi with tuna eat sushi
VP

Dealing with Ambiguity V


VP
NP P
PP
NP
eat sushi with chopsticks eat sushi wi

A grammar might generate multiple trees for a sentence:


Correct analysis Incorrect analysis
VP VP
NP
PP VP PP
V NP P NP V NP P NP
eat sushi with tuna eat sushi
eat sushiwith tuna
with tuna eat sushi
VP VP
NP
VP PP PP
V NP P NP V NP P NP
eat sushi with chopsticks eatsushi
eat sushi
withwith chopsticks
chopsticks eat sushi wit

Whatʼs the most likely parse τ for sentence S ?


Incorrect analysis
VP
We need a model
VP of P(τPP| S)
V NP P NP
eat sushi with tuna eat sushi with tuna
CSVP
598 JH: Advanced NLP (Springʼ09)
NP 17
Computing P(τ | S)
Using Bayesʼ Rule:
P (τ, S)
arg max P (τ |S) = arg max
τ τ P (S)
= arg max P (τ, S)
τ
= arg max P (τ ) if S = yield(τ )
τ

The yield of a tree is the string of terminal symbols


that can be read offCorrect
the leaf analysis
nodes

( )
VP
NP
PP
V NP P NP
yield eat sushi with tuna eat sushi
= eat sushiwith with
tuna tuna
VP
CS 598 JH: Advanced NLP (Springʼ09)
VP PP 18
Computing P(τ)
T is the (infinite) set of all trees in the language:
L = {s ∈ Σ | ∃τ ∈ T : yield(τ) = s}

Weed to define P(τ) such that:


∀τ ∈ T : 0 ≤ P(τ) ≤ 1
∑τ∈T P(τ) = 1
The set T is generated by a context-free grammar
S → NP VP VP → Verb NP NP → Det Noun
S → S conj S VP → VP PP NP → NP PP
S → ..... VP → ..... NP → .....

CS 598 JH: Advanced NLP (Springʼ09)


19
Probabilistic Context-Free Grammars
For every nonterminal X, define a probability distribution
P(X → α | X) over all rules with the same LHS symbol X:
S → NP VP 0.8
S → S conj S 0.2
NP → Noun 0.2
NP → Det Noun 0.4
NP → NP PP 0.2
NP → NP conj NP 0.2
VP → Verb 0.4
VP → Verb NP 0.3
VP → Verb NP NP 0.1
VP → VP PP 0.2
PP → P NP 1.0

CS 598 JH: Advanced NLP (Springʼ09)


20
Computing P(τ) with a PCFG
The probability of a tree τ is the product of the probabilities
of all its rules:
S → NP VP 0.8
S
S → S conj S 0.2
NP VP NP → Noun 0.2
Noun VP PP NP → Det Noun 0.4
John Verb NP P NP
NP → NP PP 0.2
NP → NP conj NP 0.2
eats Noun with Noun
VP → Verb 0.4
pie cream VP → Verb NP 0.3
VP → Verb NP NP 0.1
P(τ) = 0.8 ×0.3 ×0.2 ×1.0 ×0.23 VP → VP PP 0.2
PP → P NP 1.0
= 0.00384
CS 598 JH: Advanced NLP (Springʼ09)
21

S-ar putea să vă placă și