NLP Week 2 Rationalist and Empiricist Paradigms in Natural Language Processing

NLP week 2
Rationalist and Empiricist paradigms in

Natural Language Processing
Rationalist, rule based, symbolic AI approach:
finding a grammar which represents the structure
of natural language.
Empiricist, data driven approach: finding
regularities in language by examining large
corpora, constructing models from training data.
Rationalist position
Knowledge of language involves in the first
place knowledge of grammar language is a
derivative and perhaps not very interesting concept
Chomsky, 1980
Empiricist position
Dont think but look! the more we examine
actual language, the sharper becomes the conflict
between it and
our requirement, the crystalline
purity of logic
Wittgenstein, 1945
NB Chomsky has modified his position over the years,
and now says that empirical evidence is relevant.
Empirical, or corpus based, methods

Basic principle: Set the parameters of the system by
exposure to training data, store them and then apply to new,
unseen data
A language model from training data: a set of
characteristics of language, drawn from a corpus.
A corpus is a collection of natural language texts.
Empirical methods include probabilistic systems and neural
processors. The concepts underlying these methods are
related.
Types of grammar: the Chomsky hierarchy

Type 0:
Turing equivalent (not needed for natural

language processing)
Type 1:
Context sensitive
Type 2:
Context free
Type 3:
Regular
Empirical methods and grammar type (1)

Based on regular, type 3, grammars:
Dependencies between adjacent symbols e.g. words or
tags, are captured. Can be modelled by finite state
automata.
Cannot model phrase structure, or higher level
grammar.
Empirical methods and grammar type (2)

Little semantic information can be captured from a regular
grammar.
E.g. the agent and victim cannot be extracted from:
John kicked Bill.
John was kicked by Bill.
This may not matter. E.g the Ferret plagiarism detector
Even if it does matter, a regular grammar base may be
chosen to make computation tractable
6
Rule based methods and grammar type (1)

Rule based methods typically founded on context free
grammars (CFGs) type 2 grammars. Phrase structure
grammars (PSGs) and Link Grammar are context free.
Models dependencies between adjacent symbols, and
between groups of symbols
E.g. (noun phrase - verb phrase)
Can be modelled by push down automata.
Cannot model context sensitive grammar.
Rule based methods and CFGs

Basic phrase structure can be modelled. It may be possible
to determine subject / predicate roles.
However,
In practice the rule base cannot usually cover unrestricted
natural language.
CFGs typically generate a large number of alternative
parses unless sentences are short and simple.
Semantic knowledge may be needed to disambiguate
sentence structure.
Context sensitive grammars

Context sensitive information is not captured. Consider
translating the sentence:
The pipe connections of the cooler must be checked.
The head of the subject connections is plural, so verb
must should be plural.
In English modal verbs have same sing/plural form, so we
cannot tell from looking at the word which it is. A CFG
could not represent the necessary context sensitive number
agreement.
Example of an empirical, probabilistic method

The CLAWS tag disambiguation system
Many
words have more than one part-of-speech tag.
The right tag needs to be found

Examples
light
adjective, noun, verb
race
noun, verb
cut
verb, noun, past participle
her
pronoun, possesive pronoun
flies
noun, verb
10
Tagsets
Tagsets of different sizes are used for different purposes,

depending on the degree of resolution needed. For
instance, nouns can be subdivided into singular and plural,
verbs into different tenses and number.
Tagsets in use vary in size from 2 (just content and
function words) to over 100.
11
Sample tagset
Tagset with 10 classes
Content words
noun
verb
adjective
adverb
Function words
determiner
pronoun
preposition
conjunction
auxiliary verb
Other
CLAWS version 5, has 61 tags

12
Tag disambiguation with CLAWS

Tag disambiguation is often a preliminary task The Claws
system is a well known method
Underlying principle:
- Based on probabilities
- How likely is a word to have a certain tag, given its context
Based on a Markov model, uses information from local
dependencies.
13
CLAWS the underlying concepts (cont.)
A markovian sequence is one in which the probability

of a certain output element occurring depends on a
finite number of preceding terms.
Here the number of preceding terms is one.
The training data is tagged by hand.
Then the frequencies of the tag transitions are counted.
These frequencies constitute the language model.
14
Current implementation of CLAWS

2 million words of the BNC tagged by hand.
Transition frequencies, how often one tag is followed by
another, were found.
This is the training data: the transition frequencies
constitute the language model.
15
Tag disambiguation - example 1

NP
NNS
VBZ
.
proper noun
plural noun
3rd person singular verb
full stop
We want to disambiguate the sentence:

Henry likes stews .
where Henry NP
likes
NNS or VBZ
stews NNS or VBZ

16
Part of transition matrix from training data

Preceding tag
NP
NNS
VBZ
NNS
28
VBZ
17
---
135
37
Following tag
NP_NNS_NNS_ . = 7 * 5 * 135 = 4725

NP_NNS_VBZ_ . = 7 * 1 * 37 = 259
NP_VBZ_NNS_ . = 17 * 28 * 135 = 64260
NP_VBZ_VBZ_ . = 17 * 0 * 37 = 0
17
Tag transition probabilities

Let t and t be part-of-speech tags
(t,t) means t follows t
C (t) is a count of the number of times that tag t occurs in the
training set
C (t,t) is a count of the number of times that tag t follows t
P ( t,t) = C (t,t)
C (t)
18
CLAWS process
Lexicon holds a list of words and possible tags.
The processor will take each word in turn and look it up.
If not there, tags are assigned using suffix or prefix rules.
Examples:
suffix
probable tag
-able
adjective
e.g. suitable
-cle
noun
e.g. article
-dle
noun or verb e.g. handle
If word does not have suffix or prefix in list, default to
noun or verb
19
Claws process (continued)

When words in text have been allocated one or more tags,
then the disambiguation process is begun.
The transition probabilities are looked up in a table, which
has been created from the training data.
Then the most likely tags are calculated
20
Tag disambiguation example 2

Norman forced her to cut down on smoking.
Norman
forced
her
to
cut
down
on
smoking
proper noun, adj

verb, adj
possessive pronoun, pronoun
prep, infinitive
verb, noun, past participle
adverb, prep, noun
prep
verb, noun, adj (present participle)
21
Corpus based application, example 2

The Alpine neural parser
The Alpine parser takes natural language text and divides
sentences up into constituent parts. For example:
Yesterday
that dog with a long tail
{pre-subject}
{subject}
bit the boy

{predicate}
Steps in the process:

1. Using CLAWS get possible tags for each word. The text
is mapped onto a sequence of parts-of-speech tags
2. Use the trained neural net to find the boundary markers
of the subject.
22
The Alpine parser (continued)

A single layer neural net is used, data turns is linearly
separable. The net is trained in supervised mode.
The neural net has 2 outputs: grammatical and
ungrammatical.
Each sentence generates a set of alternative tag sequences,
and alternative locations for the boundary markers. These
are converted into sets of features that can be entered into
an input vector.
The net determines the sequence that is grammatical.
The correct location of the boundary markers together with
the correct tags are displayed.
23
Alpine as a hybrid system

Alpine is a corpus based application, using empirical
methods
It also has a rule based component. We assert that a
sentence can be decomposed into
{pre-subject} {subject} {predicate}
24
Development of the two paradigms in historical context

Empirical methods dominant in early 1950s:
- behaviourism in psychology
- information theory.
- Rosenblatt introduced perceptrons.
Chomsky demolished the empirical position in linguistics
in his seminal book Syntactic Structures (1956)
- showed the limitations of regular grammars.
- his work on compilers contributed to his reputation.
Minsky and Papert showed the limitations of the
perceptron (1969)
Empirical, data driven approaches went out of fashion.
25
Development of the two paradigms (cont.)

Much work done developing rule based systems in limited
language domains.
Limitations of rule based language processors became
apparent in 1980s.
Modest achievements in speech recognition, based on
empirical methods, excited enthusiasm.
Empirical methods became possible as the necessary large
corpora were collected.
Increasing computing power made data driven, empirical
methods feasible.
Data driven, empirical methods came back into favour.
26
Development of the two paradigms cont.

Research shifted from academia to industry:
- more interest in working systems than theories.
Focus shifted from deep analysis of small samples of
language (rule based methods) to shallow analysis of real
language (data driven methods).
Information theory provided metrics to evaluate
empirically derived language models.
Neural networks revived:
- methods found to avoid the problems with perceptrons.
Many recent developments are based mainly on data
driven methods.
27
Integration of the two paradigms

Integrated systems are now necessary for many key tasks.
1. Hybrid systems that are primarily data driven can operate
within a rule based framework. E.g. probabilistic or neural
parsers.
2. Systems may combine modules based on different paradigms
Example : See paper LCC Tools for Question Answering, an
information retrieval system, which integrates different
processing modules. These include a probabilistic parser and
a rule based logic prover.
28

NLP Week 2 Rationalist and Empiricist Paradigms in Natural Language Processing

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

NLP Week 2 Rationalist and Empiricist Paradigms in Natural Language Processing

Încărcat de

Drepturi de autor:

Formate disponibile

NLP week 2

Rationalist and Empiricist paradigms in

Empirical, or corpus based, methods

Types of grammar: the Chomsky hierarchy

Turing equivalent (not needed for natural

Empirical methods and grammar type (1)

Empirical methods and grammar type (2)

Rule based methods and grammar type (1)

Can be modelled by push down automata.

Cannot model context sensitive grammar.

Rule based methods and CFGs

Context sensitive grammars

Example of an empirical, probabilistic method

words have more than one part-of-speech tag.

The right tag needs to be found

adjective, noun, verb

verb, noun, past participle

pronoun, possesive pronoun

Tagsets of different sizes are used for different purposes,

CLAWS version 5, has 61 tags

Tag disambiguation with CLAWS

CLAWS the underlying concepts (cont.)

A markovian sequence is one in which the probability

Current implementation of CLAWS

Tag disambiguation - example 1

We want to disambiguate the sentence:

stews NNS or VBZ

Part of transition matrix from training data

NP_NNS_NNS_ . = 7 * 5 * 135 = 4725

Tag transition probabilities

Claws process (continued)

Tag disambiguation example 2

proper noun, adj

Corpus based application, example 2

bit the boy

Steps in the process:

The Alpine parser (continued)

Alpine as a hybrid system

Development of the two paradigms in historical context

Development of the two paradigms (cont.)

Development of the two paradigms cont.

Integration of the two paradigms

S-ar putea să vă placă și