Sunteți pe pagina 1din 4

Natural Langauge Understanding

Assignment 1: Semantic Parsing


Siva Reddy, Frank Keller, Mirella Lapata
Due date: Friday 14th February 2014, 4:00pm
Submission Information To submit your assignment use the submit command in the direc-
tory with your code and answers and run the following command:
submit nlu cw1 geoparser.py output*.txt answers.pdf
Please do not submit any other code/data.
Plagiarism Please remember that plagiarism is a university oense. Do not show your writ-
ten/coded solutions to anyone else, or try to see anyone elses, and do not discuss the specics of
your solutions with other students. You may discuss the general topics surrounding the problems
with one another, ideally after you have considered them yourself. However, to ensure that you
actually understand the issues yourself, you must write up your solutions by yourself, away from
your friends. The solution or approach you describe should be one you have chosen. If you
dont understand it dont write it it will generally be obvious you dont understand. And if
you have questions or problems involving the specics of your solution, please contact the course
teaching sta rather than your fellow students. Finally, if you choose to use any outside sources
of information or ideas for this or future assignments, remember to acknowledge those sources
appropriately (e.g., provide the URL, the name of the book, or, if youre at a loss for specics,
even say In my class on XX last year, we learned that . . . ).
Question 1 [10 marks]
Answer all the questions in a PDF le named answers.pdf. These questions test your under-
standing of the use of lambda calculus for semantic construction (if you require a refresher on
lambda calculus, please consult Jurafsky and Martin, Ch. 17 and 18).
1. Simplify (x.x)2.
2. Simplify (x.x x)3.
3. Simplify ((x.y.eat(y, x))John)Pizza.
4. Simplify (f.f(f(f(x))))g.
5. Simplify (f.f(f(f(x))))(t.a(c(t))).
6. If F(Edinburgh) = Scotland loves Edinburgh, what is F? (Write a lambda expression for F.)
7. If A = x.president(x), B = y.human(y) and ((C)A)B = z[statement(z, president(z),
human(z))], what is C?
8. If F(x.peak(Texas, x)) = (x.highest(peak(Texas, x))) what is F?
1
9. In the sentence Edinburgh loves Scotland what is the lambda expression for the verb?
10. In the sentence Edinburgh loves Scotland what is the lambda expression for the verb phrase?
Question 2 [20 marks]
Your task is to implement a Semantic Parser in Python which converts natural language (NL)
questions to database (DB) queries. For example,
NL Question : which rivers run through states bordering newmexico
DB Query : answer(A,(river(A), traverse(A,B),state(B),
next_to(B,C), const(C,stateid(newmexico))))
We will work with a database of US geography called Geoquery. When we run the above
query on Geoquery, we get the following answers: [arkansas, canadian, cimarron, col-
orado, gila, green, neosho, north platte, pecos, red, republican, rio grande,
san juan, smoky hill, south platte, washita]. You are provided with geolib.py, a python
API containing useful functions. Use geolib.execute geoquery to retrieve answers to database
queries. Please download the code from from the NLU course homepage.
Your rst task is to build semantic parses of questions from their corresponding syntactic
phrase structure trees. To simplify the problem, we will assume syntactic tree and semantic
parse tree have equivalent structure. You are provided with a lexicon which maps leaf words
to lambda semantic categories (use the function geolib.word to semantic categories to retrieve
all semantic categories of a word). Use these semantic categories and build up the semantic
parses of each subtree in the syntactic tree in a bottom-up fashion using lambda functional
application. An example semantic parse construction is shown in Figure 1. Here, the word
bordering consumes newmexico to form bordering newmexico, and this in turn is consumed by
states to form states bordering newmexico. To simplify further, assume a word on the left hand
side consumes (functional application) the word on the right hand side.
The syntactic trees are provided to you along with the question in json format. You could
use your own tree parser or use geolib.read tree and geolib.get children functions to load trees
into objects. Write all your code in geoparser.py.
1. Complete the function convert tree to semantic expressions. It should retrieve all the se-
mantic expressions of a sentence. Pseudo code is provided inside the function. Use the
package nltk.logic to work with lambda expressions. [10 marks]
2. How would you reduce the number of parses generated? (You need not implement this.)
[5 marks]
3. How do you think we built the lexicon les data/main lexicon.txt and data/types.txt?
[5 marks]
Question 3 [50 marks]
In this question, use the training data (data/training.json.txt) to learn a Structured Perceptron
model (code provided in structuredperceptron.py) which ranks semantic parses of a given sentence.
Your learning aim is to score the correct parse higher than all other predictions. As you know,
Question 2.1 predicts all possible parses of a given sentence, but only one (or few) among them
will be correct. Use the gold query provided in the training data to pick a gold parse. Using the
2
rst best prediction of your model and the gold parse, you can train your model to learn to rank
gold parses higher than other predicted parses.
Dene a set of features which you think should help in scoring the correct parses higher. A
feature has a name and a tupled value containing dierent linguistic qualities of a semantic parse
which help in determining if the parse is good. For example, we know that whenever run appears
in the NL question, we are likely to see the predicate traverse in the DB query. So we can create
a feature named Word Predicate with (word, predicate) as its value, e.g., (run, traverse). You
can use tuple of any length but as size increases the sparsity increases too. You can create a
feature using the class geolib.Feature(feature name, feature value). Another useful feature could
be Predicate Predicate which signies the frequently co-occurring predicates, e.g., traverse and
river. Implement this feature. Adding additional features will attract extra credit.
The code already implements Word Predicate feature. It is easier to extract features while
you are building up a parse (in convert tree to semantic expressions itself) than to extract fea-
tures after getting the parse. While building the parse from constituents, sum up the features
of the constituents to form a new feature dictionary along with adding new features. Semantic-
ParseLambda class contains lambda expression and its features stored in a dictionary. A features
dictionary contains the features and their associated frequency/probability in the semantic parse.
Once you extract features, you can train a perceptron model using predicted and gold parse
features. If properly trained, your model will learn to weight good features like Feature{name:
Word Word, value:(run, traverse)} with higher weights.
Split up the training data into two parts one, for training and the other for development.
Train your model on the training set and test it on the development set to get an idea of your
models performance and to understand which features worked. You can write a simple evaluation
function to see how many of your predictions are correct. Once you are satised, retrain the
model on the complete data. You might also have to choose the number of training iterations to
build a good model (for large number of iterations the model might overt, and for small number
of iterations the model might not generalize well).
1. Give some examples of useful features. [5 marks]
2. Complete the function learn from training data (check the pseudo code for details). [15 marks]
3. Complete the function rank semantic parse queries (check the pseudo code). [10 marks]
4. Complete the function predict test data answers (check the pseudo code). [10 marks]
5. How good is your model on the development dataset? [10 marks]
6. Run evaluate me which will generate outputs required for us to evaluate your model 2.1,
3.2, 3.3, 3.4.
Question 4 [20 marks]
1. Give an example sentence where our simplied assumption of left hand side node consuming
right hand side node in the syntactic phrase tree fails to generate correct semantic parse.
How do you think you could handle these? [10 marks]
2. How do you think you can learn a semantic parser when you are not provided with gold
standard queries like we did in this assignment? [10 marks]
3
St

x
[
a
n
s
w
e
r
(
x
,
(
r
i
v
e
r
(
x
)

y
[
t
r
a
v
e
r
s
e
(
x
,
y
)

s
t
a
t
e
(
y
)

[
n
e
x
t
t
o
(
y
,
y

c
o
n
s
t
(
y

,
s
t
a
t
e
i
d
(
n
e
w
m
e
x
i
c
o
)
)
]
]
)
)
]
W
H
N
P

e
t
,
t

Q
.
[

x
[
a
n
s
w
e
r
(
x
,
(
r
i
v
e
r
(
x
)

Q
(
x
)
)
)
]
]
W
D
T

e
t
,

e
t
,
t

P
.

Q
.
[

x
[
a
n
s
w
e
r
(
x
,
(
P
(
x
)

Q
(
x
)
)
)
]
]
w
h
i
c
h

e
t
,

e
t
,
t

P
.

Q
.
[

x
[
a
n
s
w
e
r
(
x
,
(
P
(
x
)

Q
(
x
)
)
)
]
]
N
P
e
t

x
.
[
r
i
v
e
r
(
x
)
]
N
N
S
e
t

x
.
[
r
i
v
e
r
(
x
)
]
r
i
v
e
r
s
e
t

x
.
[
r
i
v
e
r
(
x
)
]
V
P
e
t

x
.
[

y
[
t
r
a
v
e
r
s
e
(
x
,
y
)

s
t
a
t
e
(
y
)

[
n
e
x
t
t
o
(
y
,
y

c
o
n
s
t
(
y

,
s
t
a
t
e
i
d
(
n
e
w
m
e
x
i
c
o
)
)
]
]
]
V
B
P

e
t
,
e
t

P
.

x
.
[

y
[
t
r
a
v
e
r
s
e
(
x
,
y
)

P
(
y
)
]
]
r
u
n

e
t
,
e
t

P
.

x
.
[

y
[
t
r
a
v
e
r
s
e
(
x
,
y
)

P
(
y
)
]
]
P
P
e
t

x
.
[
s
t
a
t
e
(
x
)

y
[
n
e
x
t
t
o
(
x
,
y
)

c
o
n
s
t
(
y
,
s
t
a
t
e
i
d
(
n
e
w
m
e
x
i
c
o
)
)
]
]
I
N

e
t
,
e
t

P
.

x
.
[
P
(
x
)
]
t
h
r
o
u
g
h

e
t
,
e
t

P
.

x
.
[
P
(
x
)
]
N
P
e
t

x
.
[
s
t
a
t
e
(
x
)

y
[
n
e
x
t
t
o
(
x
,
y
)

c
o
n
s
t
(
y
,
s
t
a
t
e
i
d
(
n
e
w
m
e
x
i
c
o
)
)
]
]
N
P

e
t
,
e
t

P
.

x
.
[
s
t
a
t
e
(
x
)

P
(
x
)
]
N
N
S

e
t
,
e
t

P
.

x
.
[
s
t
a
t
e
(
x
)

P
(
x
)
]
s
t
a
t
e
s

e
t
,
e
t

P
.

x
.
[
s
t
a
t
e
(
x
)

P
(
x
)
]
V
P
e
t

x
.
[

y
[
n
e
x
t
t
o
(
x
,
y
)

c
o
n
s
t
(
y
,
s
t
a
t
e
i
d
(
n
e
w
m
e
x
i
c
o
)
)
]
]
V
B
G

e
t
,
e
t

P
.

x
.
[

y
[
n
e
x
t
t
o
(
x
,
y
)

P
(
y
)
]
]
b
o
r
d
e
r
i
n
g

e
t
,
e
t

P
.

x
.
[

y
[
n
e
x
t
t
o
(
x
,
y
)

P
(
y
)
]
]
N
P
e
t

x
.
[
c
o
n
s
t
(
x
,
s
t
a
t
e
i
d
(
n
e
w
m
e
x
i
c
o
)
)
]
N
N
e
t

x
.
[
c
o
n
s
t
(
x
,
s
t
a
t
e
i
d
(
n
e
w
m
e
x
i
c
o
)
)
]
n
e
w
m
e
x
i
c
o
e
t

x
.
[
c
o
n
s
t
(
x
,
s
t
a
t
e
i
d
(
n
e
w
m
e
x
i
c
o
)
)
]
F
i
g
u
r
e
1
:
S
e
m
a
n
t
i
c
P
a
r
s
e
c
o
n
s
t
r
u
c
t
i
o
n
f
r
o
m
P
h
r
a
s
e
S
t
r
u
c
t
u
r
e
T
r
e
e
f
o
r
t
h
e
s
e
n
t
e
n
c
e
w
h
i
c
h
r
i
v
e
r
s
r
u
n
t
h
r
o
u
g
h
s
t
a
t
e
s
b
o
r
d
e
r
i
n
g
n
e
w
m
e
x
i
c
o
.
F
i
g
u
r
e
g
e
n
e
r
a
t
e
d
u
s
i
n
g
h
t
t
p
:
/
/
d
y
l
n
b
.
g
i
t
h
u
b
.
i
o
/
L
a
m
b
d
a
C
a
l
c
u
l
a
t
o
r
/

S-ar putea să vă placă și