Sunteți pe pagina 1din 61

Natural Language

Natural Language in AI

INFO 629 Dr. R. Weber


Copyright R. Weber
Outline
Natural Language

• Text-based natural language


• Dialogue-based natural language

INFO 629 Dr. R. Weber


Copyright R. Weber
Methods in Natural Language
Natural Language
Processing

Methods in NLP can be oriented to two


categories of tasks:
• NL generation
• NL understanding

INFO 629 Dr. R. Weber


Copyright R. Weber
NaturalNatural
Language problems
Language

• dialogue-based
– NL interfaces
– spoken and written communication
– uses natural language understanding
• discourse (any string more than 1 Sentence
long)
• text-based
– text categorization, text generation, information
extraction, machine translation

INFO 629 Dr. R. Weber


Copyright R. Weber
Natural Language

Text-Based Natural Language

INFO 629 Dr. R. Weber


Copyright R. Weber
Text-based NL
Natural Language
problems
• story/text understanding;
• information extraction: extracting information
from text;
• translating documents, manuals,
communications;
• drafting documents;
• summarizing texts;
• text generation, categorization or clustering,
text DB retrieval, text mining, topic
identification;
INFO 629 Dr. R. Weber
Copyright R. Weber
Text-based
Natural
Natural
Language
Language Topics

• Information extraction
• Machine translation
• Drafting
• Text summarization

INFO 629 Dr. R. Weber


Copyright R. Weber
Information Extraction
Natural Language

• Extracting specific types of information from large


volumes of unrestricted text;
• The IE system must be input with domain guidelines
that specify what to find and what to extract;
• They seek for the portions that might contain the
relevant information intended.
• IE systems are not required to understand completely
the text source;

INFO 629 Dr. R. Weber


Copyright R. Weber
Types
Natural of IE
Language

• Knowledge-based Information
Extraction
• Machine learning IE
• Template-based, Wrappers
• Template Mining

INFO 629 Dr. R. Weber


Copyright R. Weber
Types
Natural of IE
Language

Knowledge-based Information Extraction


• Use of linguistic patterns to support the interpretation
of input texts in knowledge-based information
extraction.
Machine learning IE
• inductive learning mechanism to automatically
construct a knowledge base of patterns.

INFO 629 Dr. R. Weber


Copyright R. Weber
Types
Natural of IE
Language

Template-based, Wrappers
• IE’s output is a populated database, which can be
used as a case base
• The values for the slots are strings from the source
text
• The resulting database works as a template
Template Mining
• well suited for areas, “where the text is terse and
sentences are unambiguous and declarative in
nature”.

INFO 629 Dr. R. Weber


Copyright R. Weber
Relation between IE and
Natural Language
NLP
Using linguistic patterns:
• knowledge-based (represents patterns)
• inductive learning based (learns patterns)
• template mining (skips parsing)

• NLP is needed whenever there is need for


disambiguating negation and ordering makes
a difference in meaning

INFO 629 Dr. R. Weber


Copyright R. Weber
Examples of applications of
Natural Language
IE

INFO 629 Dr. R. Weber


Copyright R. Weber
References of IE
Natural Language
Robert Gaizauskas and Yorick Wilks (1998) Information Extraction: Beyond
Document Retrieval. Computational Linguistics and Chinese Language
Processing, vol. 3, no. 2, pp. 17-60.
Riloff, E. Lehnert, W. (1994). Information Extraction as a Basis for High-
Precision Text Classification. ACM Transactions in Information Systems,
12, 3, 296-333.
Lehnert, W., McCarthy, J., Soderland, S., Riloff, E., Cardie, C., Peterson, J.,
Feng, F.,Dolan, C., and Goldman, S., (1993) UMASS/HUGHES:
Description of the CIRCUS System Used for MUC-5. Proceedings of the
Fifth Message Understanding Conference,pp. 277-291. San Mateo,
CA:Morgan Kaufmann.
S. Soderland and W. Lehnert (1994) Wrap Up: a Trainable Discourse Module
for Information Extraction, Journal of Artificial Intelligence Research, 2,
131-168.

Natural Language Processing Laboratory Online Information Extraction


Bibliography online at: http://www-nlp.cs.umass.edu/ciir-pubs/tepubs.html

INFO 629 Dr. R. Weber


Copyright R. Weber
Text-based Natural Language
Natural Language
Topics

• Information extraction
• Machine translation
• Drafting
• Text summarization

INFO 629 Dr. R. Weber


Copyright R. Weber
Can you translate this
Natural Language
sentence?
Ever since computers were invented, it has been natural
to wonder whether they might be able to learn.

By Tom Mitchell

INFO 629 Dr. R. Weber


Copyright R. Weber
Natural Language

Describe the steps you used to


translate the sentence

INFO 629 Dr. R. Weber


Copyright R. Weber
Natural Language

List the words you used in the


translated sentence and associate to
the ones in the source sentence

INFO 629 Dr. R. Weber


Copyright R. Weber
Natural Language
Ever since Desde que
computers computadores
were foram
invented inventados
it has been tem sido
natural natural
to wonder imaginar
whether que
they eles
might be sejam
able capazes de
to learn. aprender.

INFO 629 Dr. R. Weber


Copyright R. Weber
Online
Naturaltranslators
Language


http://babelfish.altavista.com/babelfish/tr
http://world.altavista.com/tr
http://www.systransoft.com/

• What’s wrong with them?

INFO 629 Dr. R. Weber


Copyright R. Weber
Can you translate this
Natural Language
sentence?

…cursing my head for things that I've said till


I finally died, which started the whole world
living…

INFO 629 Dr. R. Weber


Copyright R. Weber
What Language
Natural works?
The KANT project:
• Knowledge-based, Accurate Translation for technical
documentation
• founded in 1989
• large-scale, practical translation systems
• for technical documentation
•Kant project homepage:
http://www.lti.cs.cmu.edu/Research/Kant/

INFO 629 Dr. R. Weber


Copyright R. Weber
KANT
Natural Language

• uses a controlled vocabulary and grammar for each


language
• explicit yet focused semantic models for each
technical domain
• achieves very high accuracy in translation
• multilingual document production
• has been applied to the domains
– electric power utility management
– heavy equipment technical documentation.

INFO 629 Dr. R. Weber


Copyright R. Weber
Machine Translation
Natural Language
• Unrestricted MT is still inadequate. Will it ever
change?
• Why would MT target outperforming human
translation?
• An alternative is using humans to edit the original
document into a subset of the original language
(canonical form)
Cost of MT
• lexicons of 20,000-100,000 words
• grammars with 100 to 10,000 rules

INFO 629 Dr. R. Weber


Copyright R. Weber
Text-based Natural Language
Natural Language
Topics

• Information extraction
• Machine translation
• Drafting
• Text summarization

INFO 629 Dr. R. Weber


Copyright R. Weber
Drafting
Natural Language

• applications in the legal domain


– drafting of wills
– petitions for restraining orders
• use of rhetorical structure

INFO 629 Dr. R. Weber


Copyright R. Weber
1. Heading: surface features such as date, district, reporter and petition type.
2. Abstract: varies in its length, starts after the end of the Heading and ends
with two paragraphs easily identifiable, the first describes who applies for the
petition and the second presents the result. Three constituent parts:
Example Rhetorical

abstract:main, abstract:applicant, and abstract:result.


:
:
who applies for the petition
the result of the petition
Structure

3. Body: in its conclusion it is usually the court decision and its foundations.
This is where the search for illocutionary expressions takes place. Upper
paragraphs describe details of the situation, indicating the laws that categorize
the subject, and points to values to foundation.
:
:
:
foundation
:
:
conclusion
4. Closing: Starts with one paragraph about votes followed by date, place and
names of attorneys.
one paragraph about votes followed by date, place and names of attorneys
Text-based Natural Language
Natural Language
Topics

• Information extraction
• Machine translation
• Drafting
• Text summarization

INFO 629 Dr. R. Weber


Copyright R. Weber
Summarize text
Natural Language

INFO 629 Dr. R. Weber


Copyright R. Weber
Natural Language

Describe the steps you used to


summarize text

INFO 629 Dr. R. Weber


Copyright R. Weber
Text summarization
Natural Language
applications

•Generate a summary of many documents;


•Generate a summary of one document
only;
•Headline generation;

INFO 629 Dr. R. Weber


Copyright R. Weber
TextNatural
summarization
Language
The traditional idea of summarization is to extract sentences and
concatenate them.

Human beings produce summaries of documents by creating new


sentences that capture the most salient pieces of information in
the original document and that are grammatical, that cohere with
one another, and .

Given that large collections of text/abstract pairs are available


online, it is now possible to envision algorithms that are trained to
mimic this process.
From Knight, K. and Marcu, D. 2000.

INFO 629 Dr. R. Weber


Copyright R. Weber
Text summarization
Natural Language steps

•Identify most relevant segments;


•Apply rules for deleting redundant parts;
•Compress/aggregate long sentences;
•Assess coherence of segments;
•Revise.

INFO 629 Dr. R. Weber


Copyright R. Weber
Example
Natural Language

INFO 629 Dr. R. Weber


Copyright R. Weber
Natural Language

Dialogue-based natural language

INFO 629 Dr. R. Weber


Copyright R. Weber
Dialogue-based natural language
Natural Language

NL Understanding
• Speech recognition
– intonation, pronunciation, speed
• Natural Language Processing
– syntactic , semantic , pragmatic analysis
Natural Language Generation
– intention, generation, speech synthesis

INFO 629 Dr. R. Weber


Copyright R. Weber
Speech
Naturalrecognition
Language

• analog signal from voice is digitized


• identify phonemes produced
• template matching attempts to match
phonemes from a library of sounds with sounds
produced
• outcome is a list of phonemes and probabilities
• find the words using hidden Markov modeling

INFO 629 Dr. R. Weber


Copyright R. Weber
Natural Language

How to recognize speech

How to wreck a nice beach

Ice cream
I scream

INFO 629 Dr. R. Weber


Copyright R. Weber
Speech Recognition
Natural Language
Methods
• speech recognition can also be implemented with an
inductive method such as neural networks
• individual and continuous recognizers
• controlled vocabulary can increase chances of
success e.g., Jupiter
• limit to one speaker , when multiple speakers are
needed, retraining may be often necessary
• speech understanding includes speech recognition
and understanding of the recognized utterance

INFO 629 Dr. R. Weber


Copyright R. Weber
Natural Language
Natural Language
Understanding

- Syntactic Analysis
- Parsing
- Semantics
- Pragmatics

INFO 629 Dr. R. Weber


Copyright R. Weber
Syntactic analysis
Natural Language

• a parser recovers the phrase structure of an


utterance, given a grammar (rules of syntax)
• parser’s outcome is the structure (groups of
words and respective parts of speech)
• phrase structure is represented in a parse tree
• Parsing is the first step towards determining the
meaning of an utterance

INFO 629 Dr. R. Weber


Copyright R. Weber
Natural Language
Parsing
• Parsing: method to analyze a sentence to
determine its structure according to the
grammar
• Grammar: formal specification of the
structures allowable in the language

INFO 629 Dr. R. Weber


Copyright R. Weber
Examples of Symbols in a
Natural Language
Grammar
• (S) sentence
• (NP) noun phrase
• (VP) verb phrase
• (PP) prepositional phrase
• (RelClause) relative clause
• (Det) determiner

INFO 629 Dr. R. Weber


Copyright R. Weber
Grammar rules
Natural Language
S  NP VP NP  Det Adjective N
S  VP VP VP  V Adjective
S  VP PP NP  Adjective N
S  NP VP VP Dictionary entries:
VP  V S V  ate
VP  V NP NAME  John
VP  V PP Det(art)  the
NP  Noun N  cat
PP  P Noun
NP  Det Noun

INFO 629 Dr. R. Weber


Copyright R. Weber
S
Parsing Tree

NP VP

Article Noun Verb Adjective

The terrain is insurmountable


Natural Language

• the outcome of the syntactic analysis can still


be a series of alternate structures with
respective probabilities
• sometimes grammar rules can disambiguate a
sentence,
“John set the set of chairs”
Sometimes they can’t.
…the next step is semantic analysis

INFO 629 Dr. R. Weber


Copyright R. Weber
Semantic analysis
Natural Language

• semantics provide a partial representation


for meaning
• represents the sentence in meaningful
parts
• uses possible syntactic structures and
meaning
• builds a parse tree with associated
semantics
• semantics typically represented with logic
INFO 629 Dr. R. Weber
Copyright R. Weber
Compositional semantics
Natural Language

• The semantics of a phrase is a function of the


semantics of its sub-phrases
• It does not depend on any other phrase
• So, if we know the meaning of sub-phrases, then we
know the meaning of the phrases
• “A goal of semantic interpretation is to find a way that
the meaning of the whole sentence can be put
together in a simple way from the meanings of the
parts of the sentence.” (Alison, 1997 p. 112)

INFO 629 Dr. R. Weber


Copyright R. Weber
Semantic analysis
Natural Language

• transitiveness of a verb enhances the


meaning in a parse tree (e.g., jump is
intransitive, love is transitive)
-John died Mary
Is there a period missing or is it:
-John dyed Mary

INFO 629 Dr. R. Weber


Copyright R. Weber
Natural Language
Pragmatic analysis

• uses context
• uses partial representation
• includes purpose and performs
disambiguation
• Where, when, by whom an utterance was
said

INFO 629 Dr. R. Weber


Copyright R. Weber
Example using
Natural Ontology
Language

–Fred saw the plane flying over Zurich.


–Fred saw the mountains flying over Zurich.
Traditional NL systems will have difficulty resolving this
syntactic ambiguity, but because CYC knows that planes
fly and mountains do not, it will be able to parse these
sentences just as easily as a human.
It's difficult to see how this could be done without relying on
a large database of common sense.
http://www.cyc.com/products2.html

INFO 629 Dr. R. Weber


Copyright R. Weber
Example using
Natural Ontology
Language

• because it includes context it can recognize that


another sentence that followed the previous: The
man saw the plane flying over Zurick. It was dark,
when he looked up to the sky again the plane was
gone.
• Another interpretation would be given if the
following sentence was: The man saw the plane
flying over Zurick. He also saw the building where
the plane crashed.

INFO 629 Dr. R. Weber


Copyright R. Weber
Pronoun disambiguation
Natural Language
using Ontology
Pronoun disambiguation:
The police arrested the demonstrators because they feared
violence.
The police arrested the demonstrators because they
advocated violence.
Mary saw the coat in the store window and wanted it.
Mary saw the coat in the store window and pressed her nose
up against it.

INFO 629 Dr. R. Weber


Copyright R. Weber
Communication and
Natural Language
Planning
• Decide what to say relates to planning
• Understanding relates to plan recognition

INFO 629 Dr. R. Weber


Copyright R. Weber
Currently
Natural NLP
Language

• logic-based NLP is less accurate


• statistical natural language processing
increases accuracy to around 98%
• still not good, given that the average size of a
sentence in a newspaper is such that this
accuracy can result in 1 error per sentence

INFO 629 Dr. R. Weber


Copyright R. Weber
Processes in NL
NaturalNatural
Language Generation
Language
communication
Communication involves three steps by the
speaker:
• the intention to convey an idea (what to say)
• the mental generation of words (how to say)
• their synthesis (say it)

INFO 629 Dr. R. Weber


Copyright R. Weber
whatLanguage
Natural to say
• text planning
– utterances that achieve a goal, may include
ordering
• result of reasoning (e.g., retrieval)
• a confirmation or thanks (Jupiter sounds a
beep)
• question motivated by need of confirmation
• question motivated by need of missing
information

INFO 629 Dr. R. Weber


Copyright R. Weber
howLanguage
Natural to say
• how to convert a semantic representation into a
sentence
• grammatically correct
• proper choice of words
• in limited problem types, templates are helpful
• e.g., JUPITER says “I have no knowledge of that”
• starts sentences with:
– In (city) (day of the week), chances…
• finishes sentences with:
– Is there something else? or “Can I help you with something
else?”

INFO 629 Dr. R. Weber


Copyright R. Weber
say
Natural it!
Language

• speech synthesis
• from words into speech signal
• applications of neural networks
• templates with recordings from humans
• record every word in a dictionary
• record every phoneme (worst choice!)
• JUPITER uses a commercial speech
synthesizer

INFO 629 Dr. R. Weber


Copyright R. Weber
Natural Language
Example
• Nitrogen is a prototype natural language
generation system
– that combines symbolic rules with linguistic information
gathered statistically from large online text corpora.

– http://www.isi.edu/natural-language/mt/nitrogen/
http://www.mri.mq.edu.au/~peba/MLPeba/system.html
http://cslu.cse.ogi.edu/HLTsurvey/ch4node3.html#SECTIO
N4

INFO 629 Dr. R. Weber


Copyright R. Weber
JUPITER 1-888-573-8255
Natural Language
http://www.sls.lcs.mit.edu/sls/whatwedo/applications/jupiter.html

"What will the weather be like in Boston tomorrow?" Jupiter invokes the following
procedure:
- Speech recognition: SUMMIT converts the spoken sentence into text
- Language understanding: TINA parses the text into a semantic frame -- a
grammatical structure containing the basic terms needed to query the Jupiter
database
- Language generation: GENESIS uses the semantic frame's basic terms to
build a Structured Query Language (SQL) query for the database
- Information retrieval: Jupiter executes the SQL query and retrieves the
requested information from the database
- Language generation: TINA and GENESIS convert the query result into a
natural language sentence
- Information delivery: Jupiter delivers the generated sentence to the user via
voice (using a speech synthesizer) and/or display

INFO 629 Dr. R. Weber


Copyright R. Weber

S-ar putea să vă placă și