Sunteți pe pagina 1din 49

Foundations of Statistical Natural Language Processing Introduction To Course

Sangeeta

FSNLP - Introduction

Course Book
Foundations of Statistical Natural Language

Processing: Christopher D. Manning and Hinrich Schtze

FSNLP - Introduction

Computational Linguistics
The Study of computer systems for understanding and generating natural languages How sentences are generated How people communicate to each other

FSNLP - Introduction

Why To Study NLP


Written Aids (Spelling Checker, Grammar Checker) Speech Recognition OCR and OLCR Intelligent Information Retrieval We will study some of the applications

FSNLP - Introduction

Computational Linguistics
Rules To distinguish well formed and Ill formed utterances All Grammar Leak: people bend grammar rules to meet their communication needs Rationalist Approach Common Patterns Statistical NLP Known as counting things Empiricist approach

FSNLP - Introduction

Rationalist Approach
1960-1985 Noam Chromsky Chromskyan linguistics

A significant part of the knowledge in

the human mind is not derived by the senses but is fixed in advance, presumably by genetic inheritance. Key parts are hardwired in the brain

FSNLP - Introduction

Rationalist Approach cont


Children run complex task such as natural

language with limited input


Hence rules are hardwired in brain during birth

FSNLP - Introduction

Empiricist Approach
1920-1960, 198X-

Some cognitive abilities at birth


Some initial structure to prefer certain ways of

organizing and generalizing not tabula rasa General operations upon senses Patter Recognition, Association and Generalization
Language structure can be understood by

general language model and statistical processing on large amount of language use

FSNLP - Introduction

Difference between both the approaches


The difference is not absolute Differ in level of initial knowledge brain have

FSNLP - Introduction

Rationalist approach: linguistic competence Knowledge of language structure in the mind of native-

speaker
Empiricist Approach: Linguistic performance
Delivery of language by speaker Affected by many factors, distracting/noise in the

environment, memory limitations etc.

FSNLP - Introduction

Rationalist Approach: Categorical


Either a sentence will follow rule or not

Empiricist Approach: Non-categorical


Commonly occurring patterns Finding probabilities whether a sentence is usual or

not

FSNLP - Introduction

Rule based approach


Categorical view of language Measures linguistic competences : Language

structure in mind of speaker


Either a sentence is correct/in-correct (using

rules)
Sometime its difficult for average humans being Any answers why?

FSNLP - Introduction

Questions that linguistic should answer


What kind of things do people say?
What do these things say/sak/request about

world?

FSNLP - Introduction

What kind of things do people say?


Traditionally people used to describe competent

grammar
On the basis of competent grammar

grammatically correct and wrong sentences are identified


Checks only syntax of the sentence

FSNLP - Introduction

say?
See only syntax (Rule based approach)

Colorless green ideas sleep


Valid as per syntax No one uses such sentence

FSNLP - Introduction

say?
Leads to movement to non-

categorical way i.e empiricist approach Categorical dividing a sentence in correct or wrong gives no or less information
Sometimes its very difficult to identify

if a sentence in correct or wrong

FSNLP - Introduction

#Exercise
Identify which sentences are grammatically

correct 1. John I believe Sally said Bill believed Sue saw. 2. What did Sally whisper that she had secretly read? 3. John wants very much for himself to win. 4. (Those are) the books you should read before it becomes difficult to talk about. 5. (Those are) the books you should read before talking about becomes difficult. 6. Who did Jo think said John saw him?

FSNLP - Introduction

#Exercise
Identify which sentences are grammatically

correct 1. John I believe Sally said Bill believed Sue saw. 2. What did Sally whisper that she had secretly read? 3. John wants very much for himself to win. 4. (Those are) the books you should read before it becomes difficult to talk about. 5. (Those are) the books you should read before talking about becomes difficult. 6. Who did Jo think said John saw him?

FSNLP - Introduction

What kind of people say?


Changes in the language pattern Words can change their meaning and part of

speech
Example;
While: Time Take a while While : Complementizer While you were out Although valid today, but was invalid before

FSNLP - Introduction

What kind of people say?


Example:

I am googling

FSNLP - Introduction

What kind of people say?


Changes in the language pattern Blending of part of speech Example: Near
Can be used as adjective or preposition

(simultaneously)

FSNLP - Introduction

What kind of people say?


Changes in the language pattern Language change Example: kind of / sort of
Kind and sort were basically noun But over the period of time their meaning is

changed and of is attached to them We are kind of hungry


We can not attach of to any other noun

FSNLP - Introduction

say?
Example: In addition to this, she insisted that

women were regarded as a different existence from men unfairly


This sentence in grammatically correct This sentence can be expressed in better form i.e

convention

FSNLP - Introduction

convention (How frequently people express the

idea)
Convection changes gradually and can be

identified by measuring frequencies of the pattern


Empiricist approach

FSNLP - Introduction

say?
Empiricist approach find common pattern Simple sentences are clearly acceptable or

unacceptable

FSNLP - Introduction

Non-Categorical
Meaning of words change gradually

kind of / sort of
Does not behave as normal Noun +

Proposition pair
Example: He is kind of hungry He sort of understood whats going wrong

FSNLP - Introduction

Probabilistic
The argument for a probabilistic approach to

cognition is that we live in a world filled with uncertainty and incomplete information.
Unseen events Ambiguity

FSNLP - Introduction

Disadvantages of Rule Based Approach


Hand coding rule is time consuming Performs poorly on natural occurring text, Not

scalable
Example: Verb: swallow Rule: Animate being as subject and a physical

object
I swallowed his story The supernova swallowed the planet

FSNLP - Introduction

#Exercise
Dis-advantages of Statistical Approach
Preparing database is a time consuming

Generalization is poor for small-size database

FSNLP - Introduction

Rule-based VS Corpus-based: Advantages


Rule-based

No need to prepare database

Reasoning processes are explainable and traceable

FSNLP - Introduction

Rule-based VS Corpus-based: Advantages


Corpus-based Knowledge acquisition can be automatically

achieved by the computer


Offers a good solution to ambiguity problems, by

identifying words that form group together commonly

FSNLP - Introduction

Why NLP is difficult??

FSNLP - Introduction

Why NLP is difficult


Ambiguities
Example: Our Company is training workers What is the meaning of this sentence?

FSNLP - Introduction

Why NLP difficulties?


Multiple parse trees

S NP
Our company Aux is

S VP VP V
training

NP
Our company

VP
V
is

NP

NP
workers

VP
AdjP
training

NP
workers

FSNLP - Introduction

How many possible parses following sentence will have?


List the sales of the products produced in

1973 with the products produced in 1972.


Any guess? 455 possible parses

FSNLP - Introduction

Lexical resources (corpora)


Canadian Hansards
Bilingual corpus, parallel texts

WordNet
Electronic dictionary Synset Relations between words Meronymy (part-whole relations)

FSNLP - Introduction

Word counts
Function words Word tokens V.S. Word types Some facts 100 most common words: 50.9% tokens
Almost half(49.8%) of word types are

hapax legomena (occur only once)


Over 90% of the word types occur <=10

times
12% of the text is words that occur <=3

times

FSNLP - Introduction

Zipfs laws
Principle of Least Effort : People try to minimize

their work

In a conversation speaker tries to use most

general words, and listener tries to hear most rare words

FSNLP - Introduction

Zipfs laws
Principle of Least Effort Zipfs law (Language):

1 f or f r k r

FSNLP - Introduction

Any Problem with the equation?

FSNLP - Introduction

Zipfs laws
Weak points Highest/lowest rank Refined by Mandelbrot:

f Pr

FSNLP - Introduction

Mandelbrot: Approximation

FSNLP - Introduction

Zipfs laws (cont.)


m = the number of meanings of a word:

1 f or m r

FSNLP - Introduction

The significance of power laws


Zipfs law also stands for randomly-

generated text?

FSNLP - Introduction

Applications of NLP
Machine Translation

Meaning in English

six writings

FSNLP - Introduction

Applications of NLP cont


Information Extraction

E-mail:

Hi, we have exam on 5th Jan, 2014 at 12.00 PM.


Make automated calendar entry:
Event : Eaxm
Date: 5-1-2014 Time: 12:PM

FSNLP - Introduction

Applications of NLP contd

Unbeatable package of image quality You really need to spend some quality time going

through all the settings before using the Olympus OM-D E-M1.

FSNLP - Introduction

Applications of NLP contd

S-ar putea să vă placă și