Sunteți pe pagina 1din 9

November, 2015

Natural Language Technologies Assignment

Motivation

Its easy to write a context-free grammar that can handle any small set of sentences; the real
question is how to write a grammar that can handle many sentences.
The purpose of this assignment is to write your own grammar and see how good it is.
A good grammar is one that:
A produces/accepts all the sentences of a language
B does not produce sentences which are ungrammatical (important for language models
for speech understanding)
C is not too ambiguous
D produces reasonable parse trees for all sentences of a language (so a semantic interpreter
can subsequently extract the meaning)
E can be parsed efficiently (without taking too much time or disk space)
This assignment, and associated software, comes from Nigel Ward, who is now at the University
of Texas at El Paso. You can read more about it, if you desire, at http://www.cs.utep.
edu/nigel/pstone/
(Note: the browser program mentioned in that web site does not work.)

How to run Pstone

Setting up to use Pstone


1. Log into gem125.valpo.edu or irving.cis.valpo.edu. Use an ssh terminal
emulator program, such as ssh -X from a Linux computer or MobaXTerm from a Windows
computer.
2. Copy the necessary files into your directory (do this only once)
mkdir nlphw
(or whatever directory name you prefer)
cd nlphw
cp mglass/pstonefiles.zip .
(dont forget the dot!)
unzip pstonefiles.zip
3. Prepare to use pstone (do this once every time you log in to use it)
cd nlphw

(or whatever directory name you set up)

The programs have been installed on gem125 and irving only.


4. You can use a graphical text editor such as gedit to examine and edit files.

Assignment 1

Experimenting with Pstone


1. Parse all sentences in the file easy.in
parse -l lexicon.le -g simple-gramm.gr -i easy.in -o easy.pt
The file easy.in contains the sentences to be parsed, the result easy.pt contains the parse
trees. You are encouraged to look at them. The trees are in parenthesized form, e.g.:
(S
(NP
(Article the)
(Noun man))
(VP
(Verb loves)
(NP
(Article some)
(Noun kittens))))
2. Examine the results. In addtion to easy.pt (your parse tree) there is easy.ref (the
reference: what the tree is supposed to look like).
gedit &
(it will pop up a gedit editor window.)
(Now you can open up easy.pt in gedit)
(Also you can open up easy.ref, and compare the two.)
Another program can sometimes be helpful but dont spend much time with it:
compare easy.pt easy.ref easy.cmp
(open up easy.cmp in gedit)
In the above example the compare program compares the shapes of the parse trees you produced against the reference trees, and produces a statistical summary of the differences in the
output easy.cmp
The .cmp comparison files are not easy to read. For each sentence it will contain the two trees
(the one from the parser and the comparison tree). But at the bottom of each sentence there are
some statistics on how well the two compared.
3. Try a harder sentence: repeat steps 1 and 2 for english.in.
4. If you dont redirect your output to a file, you can directly look at partial parses of a sentence
fragment:
parse -s -l lexicon.le -g simple-gramm.gr -i "of an Englishman"

Assignment 1

Your Assignment

Your activity is to edit the context-free grammar to try to produce decent coverage of a corpus of
about sixty English sentences and phrases. The sentences are all extracted from Mother Goose,
and are in a file goose.in. They are included with this document.
You edit the .gr grammar file, again parse the goose.in sentences file, examine the goose.pt
output, and compare with the goose.ref suggested trees.
Repeat until you get frustrated, but before you go mad.
For comparison purposes, there is a file goose.ref which has dummy parse trees for many of
the sentences. These parse trees have xx in place of many of the non-terminal symbols. These
trees are not necessarily the correct answer. They are how a linguist viewed the structure of
these sentences. Your own trees may not be quite as complicated, and lingiusts recognize a lot
of constitutents (many of which were replaced with xx) that are advanced topics.
I expect that you will spend about 4 hours playing with the grammar. You will not achieve
anything close to perfect coverage. Initially, however, only about a quarter of the sentences will
parse at all, so there is considerable room for improvement.
Put together into a single document all the following:
1. Your grammar.
2. A copy-paste of the final score for the output of your grammar when run on goose.in.
The final score is appears at the end of the output of the compare program run with your
goose files and the goose.ref file. In the lines preceded by %%% you can see the scores.
3. For two of the two input sentences for which your grammar produced multiple parses:
a brief discussion of whether these sentences are truly ambiguous or your grammar is
somehow wrong.
4. Of the sentences which your grammar could not handle, pick one and explain why you
think it is hard.
5. Five sentences generated at random from your grammar. For each one, say how strange
you think it is (on a scale of 1 to 10, with 1 = OK and 10 = terrible).
(Create this with generate 5 -g my-grammar.gr -l my-lexicon.le)

Description of the Pstone Toolkit


(DIAGRAM OMITTED SEE THE WEB PAGE)
Figure 1: Pstone Overview

Pstone (Figure 1) consists of


parse This is a context-free parser.
(It actually does left-corner parsing, but thats not important for this assignment.)
As a default, parse only outputs complete parses; that is, parses which cover the whole
input. If you want to see incomplete sub-parses use the -s option. This is useful when
debugging a grammar.

Assignment 1

compare This lets you compare the parse trees produced by your grammar to the reference parse
trees.
(These reference trees are based on my opinion of what is consistent with the facts of
English syntax in general, and on my opinion of what sort of trees are suitable for semantic
interpretation.)
The final lines of output from compare look like this:
%%%total: judgable inputs 66
(=32.2% of 205 total inputs)
%%%
of which 62 had parses, (93.9%)
%%%
avg number of parses for these: 2.68
%%%equal 62 (93.9%), average best bracket score 93.9%
The judgable inputs are those of the sentences in the corpus for which reference trees
have been made. (Not all of the sentences in the corpus have reference trees yet, sorry.)
The number of inputs that had parses is the number of input sentences which your
grammar could parse.
The avg number of parses for these measures how ambiguous your grammar is: ideally
this number should be 1 (one parse tree per input sentence).
The number equal is the number of sentences for which one of your parses exactly
matched the reference parse.
The best bracket score is a measure of how similar your best parse tree was to the
reference parse tree.
The rationale for this metric is that an exact match is too much to expect. For one thing,
your grammar may simply use different names for your nodes. For another, your grammar
may assign a tree that is slightly or less detailed than the reference tree.
The crossing-brackets metric comes from Mitch Marcus at the University of Pennsylvania. In compare it is implemented as follows:
First, we count the number of interior nodes in a sentence, ignoring nodes which have
only one child.
For example there are three such nodes (for S, NP, and VP) in:
(S
(NP
(Art the)
(NP
(Noun mouse)))
(VP
(VP
(Complex-Verb
(Verb ran)))
(Direction-Particle down)))
Second, for each of these interior nodes in the reference tree, we see whether there is a
matching node in the candidate parse tree, that is, a node which subsumes the same set of
leaf nodes (words). The metric is the number of matches divided by the total number of
interior nodes.
There should also be a tagging score, but compare does not implement this; sorry.
(The idea is that, instead of worrying about the internal structure of your parse tree, we
just compare whether the words have the right part of speech; that is, whether your parser
has correctly identified the verbs, the nouns, etc.)

Assignment 1

generate This generates sentences at random from your grammar and lexicon.
wcheck This checks whether the lexicon includes all the words present in the input file.
Full information on the options to these programs can be done by invoking them with the -help
option; for example parse -help will tell you all about how to run parse.

A simple-gramm.gr
S > NP VP
NP > Article Noun
NP > Noun
NP > Pronoun
VP
VP
VP
VP

>
>
>
>

Verb NP
Verb
VP PP
VP Preposition # the mouse ran down

PP > Preposition NP

B goose.in
# these are taken from Mother Goose, Penguin 1995: no author, no copyright.
# I chose this because
#
probably simple syntax only
#
no copyright
#
charming, memorable phrases
#
meaning is unambiguous and easy to envision, for humans
# I have omitted or changed:
#
truly archaic forms
#
blatant Britishisms
#
nonsense words like higgledy-piggledy
#
phrases that disagree with what I heard at mothers knee
#
line-initial "ands",
#
things that dont feel like constituents to me.
Do you have any wool?
Three full bags.
One for my master.
One for my dame.
None for the little boy who cries in the lane.
pigs and swine
rats and mice
Whose dog are you?
Daddys gone hunting, to get a little rabbit skin to wrap his baby in.
My dame has lost her shoe.
My masters lost his fiddling stick.
Cocks crow in the morning.
to tell us to get up

Assignment 1

he will never be wise


the way to be healthy, wealthy, and wise
My son John went to bed with his stockings on.
The mouse ran up the clock.
The clock struck one.
The mouse ran down.
The Princess lost her shoe.
Her Highness hopped.
The fiddler stopped, not knowing what to do.
I smell the blood of an Englishman.
Ill grind his bones to make my bread.
for lack of a nail the shoe was lost
the lack of a horseshoe nail
Georgey Porgey kissed the girls and made them cry.
When the boys come out to play, Georgey Porgey runs away.
There I met an old man, who would not say his prayers.
I took him by the left leg, and threw him downstairs.
the
The
The
The

cat and the fiddle


cow jumped over the moon.
little dog laughed. # , to see such sport.
dish ran away with the spoon.

My black hen lays good eggs for gentlemen.


Gentlemen come every day, to see what my black hen lays.
Humpty-Dumpty sat on a wall.
All the kings horses and all the kings men, couldnt put Humpty together again.
on the treetop
When the wind blows, the cradle will rock.
When the bough bends, the cradle will fall.
I like little pussy.
Her coat is so warm.
If I dont hurt her, she wont hurt me.
Pussy and I will play very gently.
I wont pull her tail, nor drive her away.
I will be the fiddlers wife, and have music when I want.
if all the world were water and all the water were ink
what would we do for drink?
If wishes were horses, beggars would ride.
If turnips were watches, I would wear one by my side.
There would be no work for tinkers.
Its raining.
Its pouring.
The old man is snoring.
He bumped his head and went to bed.
He didnt get up in the morning.
Jack and Jill went up the hill to fetch a pail of water.
Jack fell down.
vinegar and brown paper

Assignment 1

Jack be nimble.
Jack be quick.
Jack jump over the candlestick.
Jack Spratt could eat no fat.
His wife could eat no lean.
They licked the platter clean.
come blow your horn
The sheeps in the meadow.
The cows in the corn.
Wheres the little boy who looks after the sheep?
Hes under the haystack, fast asleep.
Will you wake him up?
If I do hell be sure to cry.
Little Jack Horner sat in a corner, eating a Christmas pie.
He put in his thumb and pulled out a plum.
Little Miss Muffet, sat on a tuffet, eating her curds and whey.
Along came a spider and sat down beside her, and frightened Miss Muffet away.
Mary had a little lamb with fleece as white as snow.
Everywhere that Mary went the lamb would go too.
It followed her to school one day.
It was against the rule.
It made the children laugh.
The teacher turned it out, but still it lingered near.
It waited patiently until Mary appeared.
Why does the lamb love Mary so much?
Mary loves the lamb.
Thursdays child has far to go.
Fridays child is loving and giving.
Saturdays child works hard for its living.
the child that is born on Sunday is bonny and good and happy
He had ten thousand men.
He marched them up a great big hill, and marched them down again.
When they were up they were up.
When they were down they were down.
all the birds I ever saw
The owl is the fairest by far.
All day long she sits on a tree.
When the night comes she flies away.
Whats the matter?
Johnnys at the fair so long.
He promised to buy me a bunch of blue ribbons to tie up my bonny brown hair.
Buckle my shoe.
Shut the door.
Pick up sticks.
Lay them straight.
My stomachs empty.
Make me a cake as fast as you can.
Roll it and pat it and mark it with T.
Put it in the oven for Baby and me.
Some like it hot.
Some like it cold.
Some like it in the pot, nine days old.

Assignment 1

Peter Piper picked a peck of pickled peppers.


Wheres the peck of pickled peppers Peter Piper picked?
Peter had a wife.
He put her in a pumpkin shell.
Peter learned to read and spell.
He loved her very much.
Polly put the kettle on.
Lets drink tea.
Sukey take it off again.
Theyve all gone away.
Pussy-cat, where have you been?
Ive been to London to visit the queen.
Pussy-cat, what did you do there?
I frightened a little mouse under her chair.
Rain, go away.
Come again some other day.
Little Johnny wants to play.
Three men in a tub
Who do you think they are?
Which is the way to London?
That is the way to London.
Sing a song.
a pocket full of rye
twenty-four blackbirds baked in a pie
When the pie was opened, the birds began to sing.
The king was in his counting-house, counting out his money.
The queen was in the parlor, eating bread and honey.
The maid was in the garden, hanging out the clothes.
A blackbird came and pecked off her nose.
The Man in the Moon looked out of the moon.
all children
time to think about getting to bed.
The Queen of Hearts made some tarts.
on a summers day
The Knave of Hearts stole the tarts and took them away.
The King of Hearts beat the Knave sore.
The Knave of Hearts brought back the tarts.
He promised he would not steal again.
There was a little girl who had a little curl, right in the middle of her forehead.
When she was good, she was very, very good.
When she was bad she was horrid.
There was an old women who lived under a hill.
If shes not gone, she still lives there.
She sold baked apples and cranberry pies.
Shes the old woman who never told lies.
September has thirty days
All the rest have thirty-one, except February.
every fourth year
one more day
This little pig went to market.
This little pig stayed home.
This little pig had roast beef.

Assignment 1

This little pig had none.


all the way home
Tom was a pipers son.
He learned to play when he was young.
He pleased both the girls and the boys.
The only tune that he could play was Tootles.
over the hills and far away
They all stopped to hear him play.
Those who heard him could never stand still.
Whenever they heard him they began to dance.
He met old dame Trot with a basket of eggs.
He used his pipe.
She danced about until the eggs were all broken.
He laughed at the joke.
He saw an angry fellow was beating an ass.
pots, pans, dishes, and glass
He took out his pipe and played them a tune.
Tom stole a pig and ran away.
Tom was beaten.
Tom ran crying down the street.
There was a crooked man.
He bought a crooked cat, which caught a crooked mouse.
They all lived together, in a crooked little house.
There were two blackbirds, sitting on a hill.
One named Jack.
Fly away, Jack!
Fly away, Jill!
Come back, Jack!
Come back, Jill!
Three wise men of Gotham went to sea in a bowl.
little star
I wonder what you are.
up above the world
like a diamond in the sky
When the sun is gone, then you show your little light.
How could he see where to go?
You never shut your eye, until the sun is in the sky.
Your bright and tiny spark.
All that goes up must come down.
Yankee Doodle went to town, riding on a pony.
He stuck a feather in his hat, and called it Macaroni.

S-ar putea să vă placă și