Sunteți pe pagina 1din 78

Digital Enterprise Research Institute

www.deri.ie

Natural Language Processing - for the Semantic Web Paul Buitelaar

Copyright 2009 Digital Enterprise Research Institute. All rights reserved

SemanticWebChallenge:LegacyData
Digital Enterprise Research Institute www.deri.ie

LinkedData

Unstructured,Un-Linked

LegacyData

OverviewoftheTutorial
Digital Enterprise Research Institute www.deri.ie

SemanticAnalysisofUnstructuredLegacyData

Examplesin:SemanticSearch,Ontology-basedInformationExtraction, OntologyLearning

NLPLayerCakewithPointers

PartofSpeechTagging,Morphology,PhraseStructure SemanticTagging
WordNet,FrameNet,WordSenseDisambiguation, NamedEntities,Terms,Thesauri,Ontologies, AdvancedTopic:Ontology-LexiconInterface

GrammaticalFunctions,DependencyStructure,DiscourseAnalysis

FurtherRelevantPointers

GeneralTools,Organizations,Conferences,Journals,Sites,Lists,

WhattheTutorialwillnotaddress
Digital Enterprise Research Institute www.deri.ie

MachineLearningin/forTextMining:Feature ExtractioninClustering,Classification, TextMiningin/forInformationRetrieval

SeeTutorialonInformationMiningbyConorHayes
DERIStreamonSemanticInformationMiningbringstogetherNatural LanguageProcessingandInformationMining

OverviewoftheTutorial
Digital Enterprise Research Institute www.deri.ie

SemanticAnalysisofUnstructuredLegacyData

Examplesin:SemanticSearch,Ontology-basedInformationExtraction, OntologyLearning

NLPLayerCakewithPointers

PartofSpeechTagging,Morphology,PhraseStructure SemanticTagging
WordNet,FrameNet,WordSenseDisambiguation, NamedEntities,Terms,Thesauri,Ontologies, AdvancedTopic:Ontology-LexiconInterface

GrammaticalFunctions,DependencyStructure,DiscourseAnalysis

FurtherRelevantPointers

Conferences,Journals,Websites,MailingLists,

SomeExampleApplications
Digital Enterprise Research Institute www.deri.ie

SemanticAnnotation&Search

MuchMore(2000-2003) Semanticannotationofasetofmedicalscientificabstracts&patientrecordsasqueriesacross languages(English,German)

OntologyLearning

OntoLT(2004-2005) Extractionofclasses,subclassesandrelations(objectproperties)fromalinguisticallyannotated documentset

Ontology-basedInformationExtraction

SmartWeb(2005-2008) Extractionofentitiesandeventsfromasetoffootballmatchreports

SemanticVideoBrowsing

K-Space(2007-2008) Extractionofentitiesandeventsfromasetoffootballmatchreports,alignedwithfootballvideo, enablingsemantic-levelvideoindexingandbrowsing

OpenCalais,GIST

Industrialstrengthopensource/commercialsemanticannotation&retrieval

SemanticAnnotation&SearchMuchMore
Digital Enterprise Research Institute www.deri.ie

SemanticAnnotation&SearchMuchMore
Digital Enterprise Research Institute www.deri.ie

SemanticAnnotation&SearchMuchMore
Digital Enterprise Research Institute www.deri.ie

SemanticAnnotation&SearchMuchMore
Digital Enterprise Research Institute www.deri.ie

10

SemanticAnnotation&SearchMuchMore
Digital Enterprise Research Institute www.deri.ie

11

SemanticAnnotation&SearchMuchMore
Digital Enterprise Research Institute www.deri.ie

12

OntologyLearningOntoLT
Digital Enterprise Research Institute www.deri.ie

13

LinguisticStructure2OntologyMappingRules

OntologyLearningOntoLT
Digital Enterprise Research Institute www.deri.ie

14

Extraction&InspectContexts

OntologyLearningOntoLT
Digital Enterprise Research Institute www.deri.ie

15

ExtractOntologyFragments

OntologyLearningOntoLT
Digital Enterprise Research Institute www.deri.ie

German Clinical Report: An 40 Kniegelenkprparaten wurden mittlere Patellarsehnendrittel mit einer neuen Knochenverblockungstechnik in einem zweistufigen Bohrkanal bzw. mit konventioneller Interferenzschraubentechnik femoral fixiert. English Translation: In 40 human cadaver knees, mid patellar ligament thirds were fixed with a trapezoid bone block on one side on the femoral side in a two-level drill hole, or with a conventional interference screw.

16

LinguisticAnnotation(fragments)

Ontology-basedInformationExtractionSmartWeb
Digital Enterprise Research Institute www.deri.ie

Oliver Kahn konnte den Schuss von Beto halten. Oliver Kahn could stop the shot by Beto.

Information Extraction

semistruct#Deutschland_vs_Brasilien_30_Juni_2002_18:00 [ sportevent#matchEvents -> soba#ID11 ].

Ontology Population

soba#ID11:sportevent#Parry [ sportevent#committedBy -> semistruct#Deutschland_vs_Brasilien_30_Juni_2002_18:00_Oliver_Kahn_PFP ].

17

Ontology-basedInformationExtractionSmartWeb
Digital Enterprise Research Institute www.deri.ie

18

SemanticVideoBrowsingK-Space
Digital Enterprise Research Institute www.deri.ie

http://keg.vse.cz/wf/kspace/smil/

Extracted Entities and Events

A/V Feature Analysis

Minute-by-Minute Match Reports

Non-Linear Event and Entity Browsing 19

IndustrialApplicationsGIST(CALAIS)
Digital Enterprise Research Institute www.deri.ie

20

Open Calais Extracts Entities, Facts


Digital Enterprise Research Institute www.deri.ie

21

Open Calais Extracts Entities, Facts


Digital Enterprise Research Institute www.deri.ie

With a split decision in the final two primaries and a flurry of superdelegate endorsements, Sen. Barack Obama sealed the Democratic presidential nomination Tuesday night after a grueling, historymaking campaign that will make him the first African American to head a majorparty ticket. Before a chanting, cheering audience in St. Paul, Minn., the first-term Illinois senator savored what once seemed an unlikely outcome to the Democratic race against Sen. Hillary Rodham Clinton. He now faces another hard-fought battle, against Sen. John McCain, the presumptive Republican candidate. 22

OverviewoftheTutorial
Digital Enterprise Research Institute www.deri.ie

SemanticAnalysisofUnstructuredLegacyData

Examplesin:SemanticSearch,Ontology-basedInformationExtraction, OntologyLearning

NLPLayerCakewithPointers

PartofSpeechTagging,Morphology,PhraseStructure SemanticTagging
WordNet,FrameNet,WordSenseDisambiguation, NamedEntities,Terms,Thesauri,Ontologies, AdvancedTopic:Ontology-LexiconInterface

GrammaticalFunctions,DependencyStructure,DiscourseAnalysis

FurtherRelevantPointers

Conferences,Journals,Websites,MailingLists,

23

NLP-ACompleteExample
Digital Enterprise Research Institute www.deri.ie

He booked the large table in the corner.


S

...

It was still available.


S

he
NP Subject , Agent X

booked the large table in the corner


VP

it
NP Subject, Patient Y

was still available


VP

he
Pronoun 3rd Person Animate

book
Verb Past, 3rd Person Head Predicate

the large table in the corner


NP Direct Object, Patient Definite Y

it
Pronoun 3rd Person Inanimate

is
Verb Past, 3rd Person Head Predicate

the large table


NP

in the corner
PP

still available
AdvP

large
Adjective Modifier

table
Noun Singular Head furniture_01

in

Preposition NP Head Definite Z Predicate

the corner

24

NLPLayerCake
Digital Enterprise Research Institute www.deri.ie

Hebookedthelargetableinthecorner...

[table] [table:noun] [work~ing][Sommer~schule] [table~s]

[table:ARTIFACT,furniture_01] [[[the][large][table]NP][[in][the][corner]PP]NP]

PartofSpeechTagging

MorphologicalAnalysis

[[the:SPEC][large:MOD][table:HEAD]NP] [[He:SUBJ][booked:PRED][[this][table:HEAD]NP:DOBJ]S]

SemanticTagging

Tokenization

Dependency Structure

[He:SUBJ][booked:PRED]this[[table:HEAD]NP:DOBJ:X1] [[It:SUBJ:X1][was:PRED]available]

25

Discourse Analysis

Phrases

NLPLayers
Digital Enterprise Research Institute www.deri.ie

Tokenization

Where are the words?

Part of Speech Tagging

Is this word a verb or a noun or something else?

Morphology

Can I split this word up?

Phrase Structure

Do these words go together?

Semantic Tagging

What objects are expressed by the words/phrases in the sentence?

Grammatical Functions & Dependency Structure

Which objects do what? And in relation to which others?

Discourse Analysis

Which events are expressed throughout a text/discourse? How do they interact? And which objects are involved?

26

Part-of-SpeechTagging
Digital Enterprise Research Institute

Adaptedfrom:http://www.cs.utexas.edu/~mooney/cs388/slides/pos-tagging.ppt
www.deri.ie

Annotate each word in a sentence with a part-of-speech (PoS) tag useful for subsequent syntactic parsing Most common PoS tag set for English is Penn Treebank set of 45 tags, e.g. John saw the saw and decided to take it to the table. NNP VBD DT NN CC VBD TO VB PRP IN DT NN

Other tag sets in use for other languages, e.g. Stuttgart-Tbingen Tag Set (STTS) for German Challenge in Part-of-Speech Tagging is ambiguity

like can be
Verb: I like/VBP candy. Preposition: Time flies like/IN an arrow.

around can be
Preposition: I bought it at the shop around/IN the corner. Particle: I never got around/RP to getting a car. Adverb: A new Prius costs around/RB $25K.

27

Part-of-SpeechTaggingPoSEnglish
Adaptedfrom:http://www.cs.utexas.edu/~mooney/cs388/slides/pos-tagging.ppt
Digital Enterprise Research Institute www.deri.ie

Noun (person, place or thing) Singular (NN): dog, fork Plural (NNS): dogs, forks Proper (NNP, NNPS): John, Springfields Personal pronoun (PRP): I, you, he, she, it Wh-pronoun (WP): who, what Verb (actions and processes) Base, infinitive (VB): eat Past tense (VBD): ate Gerund (VBG): eating Past participle (VBN): eaten Non 3rd person singular present tense (VBP): eat 3rd person singular present tense: (VBZ): eats Modal (MD): should, can To (TO): to (to eat)

Adjective (modify nouns) Basic (JJ): red, tall Comparative (JJR): redder, taller Superlative (JJS): reddest, tallest Adverb (modify verbs) Basic (RB): quickly Comparative (RBR): quicker Superlative (RBS): quickest Preposition (IN): on, in, by, to, with Determiner: Basic (DT) a, an, the WH-determiner (WDT): which, that Coordinating Conjunction (CC) and, but, or Particle (RP) off (took off), up (put up)

28

28

Part-of-SpeechTaggingClosedvs.Open
Adaptedfrom:http://www.cs.utexas.edu/~mooney/cs388/slides/pos-tagging.ppt
Digital Enterprise Research Institute www.deri.ie

Closed class categories are composed of a small, fixed set of grammatical function words for a given language:

Pronouns (it, he, she, ) Prepositions (on, for, from, to, ) Modals (will, can, may, ) Determiners (a, the) Particles (to, up, off) Conjunctions (and, or)

Open class categories are composed of large sets of content words and are open to new additions:

Nouns (a googler) Verbs (to google) Adjectives (geeky)

29

29

Part-of-SpeechTaggingStateOfTheArt
Digital Enterprise Research Institute www.deri.ie

Overview of available PoS taggers:

http://www-nlp.stanford.edu/links/statnlp.html#Taggers

Many are widely-used (often retrainable), a very small selection:

TreeTagger, decision trees, free research license http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger TnT, Thorsten Brants, HMM, free research license http://www.coli.uni-saarland.de/~thorsten/tnt/ ENGCG, lexicon & rules, commercial (LingSoft) http://www2.lingsoft.fi/cgi-bin/engcg

30

30

MorphologicalAnalysis
Digital Enterprise Research Institute

Adaptedfrom:http://courses.washington.edu/ling570/fei_fall07/10_15_morph.ppt
www.deri.ie

Some definitions

Morphological Analysis split up words into component morphemes and build a (formal) representation of word-internal structure Morpheme minimal meaning-bearing unit in a language Stem morpheme that forms central meaning unit in a word Affix word element that can only occur attached to a stem Prefix Suffix Infix Circumfix specific unspecific (English) wonder wonderful (English) hingi humingi (Tagalog) sagen gesagt (German)

Morphological complexity varies between languages


Isolated languages (no morphology): e.g., Chinese Morphologically poor languages: e.g., English Morphologically complex languages: e.g., Turkish

31

MorphologicalAnalysisOverview,Ambiguity
Adaptedfrom:http://courses.washington.edu/ling570/fei_fall07/10_15_morph.ppt
Digital Enterprise Research Institute www.deri.ie

Inflection: stem + morpheme (same PoS class)


writing books

write + V + Progressive book + N + Plural

writes write + V + 3rd Person + Singular flies fly + N + Plural fly + V + 3rd Person + Singular

Derivation: stem + morpheme (different PoS class)

civil civilized civilization

Compounding: multiple stems


cabdriver cab + driver doghouse dog + house Flachbildschirm (flat screen) flach + Bildschirm (flat screen) Flachbild + Schirm (flat view screen) flach + Bild + Schirm (flat picture screen) Ive I + have

Cliticization: stem + clitic

32

MorphologicalAnalysisStateOfTheArt
Digital Enterprise Research Institute www.deri.ie

Overview (selective, not uptodate) of Morphological Analyzers


http://www.semarweb.com/morphology.html

Morphology is relatively simple in English


Porter stemmer may sometimes be sufficient (strips words: e.g., writes, written, writer writ), e.g., http://drupal.org/project/porterstemmer

Morphological resources for EN & other languages, selection:


MMORPH, lexicon & rules, free license (UNIX) http://packages.debian.org/etch/mmorph Dutch (MBLEM), lexicon & instance-based classifier, free research license ? http://ilk.uvt.nl/mblem/ Arabic, commercial (Xerox) http://www.xrce.xerox.com/competencies/content-analysis/arabic/ German (GERTWOL), commercial (LingSoft) http://www2.lingsoft.fi/cgi-bin/gertwol

33

PhraseStructureAnalysisDefinitions
Withinputfrom:http://www.kwary.net/linguistics/gl/GLSyntax01.ppt
Digital Enterprise Research Institute www.deri.ie

Phrase

Group of words that functions as a single unit in syntax (Wikipedia)


NP : Noun Phrase (the car, a clever student) VP : Verb Phrase (study hard, play the guitar) PP : Prepositional Phrase (in the class, above the earth) AP : Adjective Phrase (very tall, incredibly large)

Phrase Structure Analysis


Breaking up a sentence into recursively defined coherent units (constitutional parts), e.g., an NP consisting of several NPs First step in sentence parsing (see also further NLP layers)

Chunks

Non-recursive phrases, as introduced by shallow parsing approach

Chunking

Also known as shallow parsing (without overall sentence structure & grammatical functions see also further NLP layers)

34

PhraseStructureAnalysisNP,PPExample
Adaptedfrom:http://www.kwary.net/linguistics/gl/GLSyntax01.ppt
Digital Enterprise Research Institute www.deri.ie

NP (Det) N
NP Det the N bus

PP P NP
PP P Det in the NP N yard

NP (Det) N (PP)
NP Det N P Det the bus in the PP NP N yard

35

PhraseStructureAnalysisVPExample
Adaptedfrom:http://www.kwary.net/linguistics/gl/GLSyntax01.ppt
Digital Enterprise Research Institute www.deri.ie

VP V (NP) (PP)
V

VP NP Det N P Det took the money from the PP NP N bank

36

PhraseStructureAnalysisStateOfTheArt
Digital Enterprise Research Institute www.deri.ie

Overview on Parsers (including shallow parsing) for English

http://www.aclweb.org/aclwiki/index.php?title=Parsers_for_English

Overview for other languages

http://www.aclweb.org/aclwiki/index.php?title=List_of_resources_by_language

Some shallow parser demos on the web:

TreeTagger (PoS, Chunking), decision trees, free research license http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger CNTS Memory Based Shallow Parser (Univ. of Antwerpen), classifier, license? http://www.cnts.ua.ac.be/cgi-bin/jmeyhi/MBSP-instant-webdemo.cgi Univ. of Illinois at Urbana-Champaign, classifier, license? http://l2r.cs.uiuc.edu/~cogcomp/shallow_parse_demo.php

37

SemanticTagging
Digital Enterprise Research Institute www.deri.ie

Definition and History


Classification of words, phrases with a semantically defined category Nowadays associated with Semantic Web (semantic annotation, knowledge markup) and Web 2.0 tagging In NLP refers to assigning a sense to a word or phrase

Sense sets defined by


Originally, machine readable dictionaries, e.g., LDOCE Recent years, wordnets (nouns), framenets (verbs) Increasingly, general & domain ontologies

38

SemanticTaggingWordNet
Digital Enterprise Research Institute www.deri.ie

WordNet is a Semantic Lexicon & Lexical Database


Organized around meaning rather than word forms Maps words to meanings/interpretations or senses Senses are represented by synsets (sets of synonyms), e.g.,
{board, plank} : piece of lumber {board, committee} : group of people

Machine readable (has a formal structure)


Freely downloadable: http://wordnet.princeton.edu/

Integrated wordnets for several European languages


EuroWordNet: http://www.illc.uva.nl/EuroWordNet/

Wordnets for many languages with interoperable format


http://www.globalwordnet.org/gwa/wordnet_table.htm

39

SemanticTaggingWordNetOrigin
Digital Enterprise Research Institute www.deri.ie

In 1985 a group of psychologists and linguists at Princeton University undertook to develop a lexical database The initial idea was to provide an aid to use in searching dictionaries conceptually, rather than merely alphabetically WordNet instantiates hypotheses based on results of psycholinguistic research
In anomic aphasia, there is a specific inability to name objects. When confronted with an apple, say, patients may be unable to utter apple, even though they will reject such suggestions as shoe or banana, and will recognize that apple is correct when it is provided. Caramazza/Berndt 1978

expose such hypotheses to the full range of the common vocabulary


Miller, George A., Richard Beckwith, Christiane Fellbaum, Derek Gross and Katherine J. Miller. Introduction to WordNet: an on-line lexical database. In: International Journal of Lexicography 3 (4), 1990, pp. 235 - 244.

40

SemanticTaggingSynsets,Senses
Digital Enterprise Research Institute www.deri.ie

Synsets represent different Senses

Words that occur in several synsets have a corresponding number of senses i.e. are ambiguous:

41

SemanticTaggingAmbiguitycont.
Digital Enterprise Research Institute www.deri.ie

Homonymy

Unrelated Senses, e.g.


The ball went over the fence - artifact The ball went on into the late hours - event

Systematic Polysemy

Related Senses, e.g.


The Boston office has been newly decorated - building The Boston office was founded in 1985. - organization The Boston office called. - group-of-people

Also referred to in the literature as regular polysemy (Apresjan 1973) or logical polysemy (Pustejovsky 1991, 1995 ) systematic polysemy introduced by (Nunberg & Zaenen 1992) - see also Bierwisch 1983 (school example), Hobbs et al 1993 (office example)

42

SemanticTaggingSynsetHierarchy
Digital Enterprise Research Institute www.deri.ie

Synsets are organized in hierarchies, defining:


generalization (hypernymy) specialization (hyponymy)

Example
{entity} {whole, unit} {building material} {lumber, timber} {board, plank}
hypernymy hyponymy

43

SemanticTaggingHierarchyExample
Digital Enterprise Research Institute www.deri.ie

44

SemanticTaggingFrameNet
Digital Enterprise Research Institute www.deri.ie

FrameNet is a Semantic Lexicon & Lexical Database


Categorizes verbs and their syntactic/semantic arguments Based on frame semantics theory (Fillmore 1968) Frame describes a particular situation, object, or event and the participants, properties involved, e.g.
Frame apply_heat Frame Elements (or Roles) cook, food, heating_instrument Lexical Units (evoking the Frame): bake, boil, brown, simmer, steam, ...

Freely available from: http://framenet.icsi.berkeley.edu/


10,000 Lexical Units (more than 6,100 fully annotated) Distributed over more than 825 semantic Frames Exemplified in more than 135,000 annotated sentences

45

SemanticTaggingFrameAmbiguity
Digital Enterprise Research Institute www.deri.ie

framearranging:AgentputsacomplexThemeintoaparticularConfiguration Davidarrangedthestonesinacircle. arrange.v,arrangement.n,array.v,deploy.v,deployment.n,format.v,setup.v frameplacing:AgentplacesaThemeatalocation,theGoal Davidarrangedhisbriefcaseonthefloor. archive.v,arrange.v,bag.v,bestow.v,billet.v,bin.v,bottle.v,box.v,brush.v,cage.v,cram.v,crate.v,dab.v, daub.v,deposit.v,drape.v,drizzle.v,dust.v,embed.v,emplace.v,file.v,garage.v,hang.v,heap.v, immerse.v,implant.v,inject.v,insert.v,insertion.n,jam.v,lay.v,lean.v,load.v,lodge.v,mount.v,pack.v, package.v,park.v,perch.v,pile.v,place.v,placement.n,plant.v,plunge.v,pocket.v,position.v,pot.v,put.v, rest.v,rub.v,set.v,sheathe.v,shelve.v,shoulder.v,shower.v,sit.v,situate.v,smear.v,sow.v,stable.v, stand.v,stash.v,station.v,stick.v,stow.v,stuff.v,tuck.v,warehouse.v,wrap.v

46

SemanticTaggingWordSense
Digital Enterprise Research Institute www.deri.ie

Word Sense Disambiguation


Classification of the correct sense to a word Based on wordnets & similar resources for many languages Sense-annotated corpora enable classifier training No longer very active area of research in NLP community Annotated corpora, tools, evaluation data sets available from SenseVal (1-4) evaluation campaigns:
http://www.senseval.org/

Recently attention turned to Semantic Role Labelling and variety of other tasks in Computational Lexical Semantics
see SemEval evaluation campaign: http://semeval2.fbk.eu/

47

SemanticTaggingSemanticRoles
Digital Enterprise Research Institute www.deri.ie

Semantic Role Labelling


Classification of correct frame category (sense) to a verb & assign semantic roles to its syntactic arguments Based on FrameNet availability and similar resources, e.g.
PropBank http://verbs.colorado.edu/~mpalmer/projects/ace.html NomBank http://nlp.cs.nyu.edu/meyers/NomBank.html VerbNet http://verbs.colorado.edu/~mpalmer/projects/verbnet.html SemLink http://verbs.colorado.edu/semlink/ OntoNotes http://www.bbn.com/ontonotes/ German FrameNet http://www.coli.uni-saarland.de/projects/salsa

Frame-annotated corpora enable classifier training Recently very active area of research in NLP community

48

SemanticTaggingStateOfTheArt
Digital Enterprise Research Institute www.deri.ie

Word Sense Disambiguation tools, selection

WSD tools by Ted Pedersen (University of Minnesota, Duluth), free http://sourceforge.net/projects/wsdgate/ & others SenseLearner, Rada Mihalcea (Univ. of North Texas), free http://www.cse.unt.edu/~rada/downloads.html#senselearner SuperSenseTagger, SemTechLab Rome ?, license? http://sourceforge.net/projects/supersensetag/

Semantic Role Labelling tools, selection

Shalmaneser (Saarland Univ.), pluggable parsing & classifiers, free license http://www.coli.uni-saarland.de/projects/salsa/shal/ Univ. of Illinois at Urbana-Champaign, parsing & classifiers, license? http://l2r.cs.uiuc.edu/~cogcomp/srl-demo.php SWIRL (Universitat Politecnica de Catalunya), parsing & classifiers, GPL license http://www.surdeanu.name/mihai/swirl

49

SemanticTaggingLexicalInference
Digital Enterprise Research Institute www.deri.ie

Metonymy (part stands for whole)


The Boston office called. to call expects an object of type Human in Agent position coerce office into an object of type (Group-of) Person > Human lexical semantic inference: Person Work-at Office
Located-at Building Has-address Representation-of Office Person
Organization

Work-for

office
50

Work-at

SemanticTaggingLexicalInference
Digital Enterprise Research Institute www.deri.ie

Metonymy in Bridging (of discourse referents)


Peter bought a car. The engine runs well. the engine refers to already introduced object (discourse referent) lexical semantic inference: Engine Part-of Car

Part-of Car car Has-part

Engine

51

Semantic Tagging Ontologies


Digital Enterprise Research Institute www.deri.ie

studies_at

Campus

located_at

Student

University
works_at

School

is_part_of

Staff

label

label

school

staff

Semantic Tagging Classes, Terms


Digital Enterprise Research Institute www.deri.ie

RDF(S) & OWL current status

studies_at

Campus

located_at

Student

University
works_at

School

is_part_of

Staff

has_US-English_term

has_German_term

has_Dutch_term

School

Fakultt

Faculteit

Semantic Tagging Lexicalized Ontologies


Digital Enterprise Research Institute www.deri.ie

Mucosa

OralMucosa
hasLingInfo

LingInfo
instanceOf

hasMorphSynInfo

WordForm

instanceOf hasLingInfo

Term-1
hasMorphSynInfo

hasOrthographicForm

hasLang

WordForm-1 DE
hasPoS hasStem

Mundschleimhaut

Term-2 N

Term-3

http://olp.dfki.de/LingInfo/ http://ontoware.org/projects/lexonto/

hasOrthographicForm

Mund

Schleimhaut

SemanticTaggingTerms,Classes
Digital Enterprise Research Institute www.deri.ie

Semantic tagging beyond word senses & semantic roles


Terms, Classes, Relations, Properties/Attributes Names

Terms, Classes, Relations, Properties/Attributes


Semantic annotation on the basis of a thesaurus or ontology Term recognition & extraction
terms are domain-specific phrases

Relation extraction
relations are domain-specific semantic roles

Ontology-based information extraction

55

SemanticTaggingTerms,Relations
Withinputfrom:http://www.lrec-conf.org/proceedings/lrec2008/slides/496.ppt
Digital Enterprise Research Institute www.deri.ie

Terms/Classes & Relations in genetics domain


GENIA corpus
http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/home/wiki.cgi?page=GENIA+corpus

Examples
Interleukin 1 beta inhibits insulin secretion IL-1beta is known to inhibit insulin secretion Insulin secretion is inhibited by IL-1 beta

Term recognition & extraction Grammatical function annotation: subject, direct object, etc. see further NLP layers
GENIA Relation inhibit GENIA Term (Class) interleukin 1 beta IL-1beta insulin secretion Semantic Role Agent Target Grammatical Function Subject Direct Object

56

SemanticTaggingTerms,Classes
Digital Enterprise Research Institute www.deri.ie

Eurovoc Thesaurus Terminology in all EU languages on all EU areas: politics, trade, law, science, energy, agriculture,
MT UF 3606 natural and applied sciences gene pool genetic resource genotype heredity biology life sciences DNA genetic engineering (6411)

Medical Subject Headings (MeSH) Thesaurus with taxonomy of ~ 250,000 terms, representing medical subjects for retrieval purposes
MeSH Heading Entry Term Entry Term Entry Term Entry Term Entry Term Entry Term Entry Term See Also Databases, Genetic Genetic Databases Genetic Sequence Databases OMIM Mendelian Inheritance in Man Genetic Data Banks Genetic Data Bases Genetic Information Databases Genetic Screening

BT1 BT2 NT1 RT

Gene Ontology
Accession Synonyms Term Lineage GO:0009292 broad : genetic exchange all : all (164142) GO:0008150 : biological process (115947) GO:0007275 : development (11892) GO:0009292 : genetic transfer (69)

57

SemanticTaggingNames
Digital Enterprise Research Institute www.deri.ie

Semantic tagging beyond word senses & semantic roles


Terms, Classes, Relations, Properties/Attributes Names

Semantic annotation of names


Named Entity Recognition Originally intended as extension of Tokenization, e.g. in recognizing Names and other specific tokens such as Dates, Times Evolved into a more general identification and classification of names of People, Organisations, Companies, Countries, Cities, etc. Currently merging with ontology-based information extraction

58

SemanticTaggingStateOfTheArtcont.
Digital Enterprise Research Institute www.deri.ie

Named Entity Recognition

Good overview of many available tools http://en.wikipedia.org/wiki/Named_entity_recognition

Semantic annotation with thesauri, ontologies in various domains, e.g.,

Annotate biomedical text with UMLS Metathesaurus MetaMap (US National Library of Medicine), free license http://mmtx.nlm.nih.gov/

Annotate business text with KIM ontology KIM (Ontotext), free research license http://www.ontotext.com/kim/

Annotate football (soccer) text with SWIntO ontology SProUT (DFKI), free research license http://www.dfki.de/sw-lt/heartofgold/ (web demo)

59

ParsingOverview
Digital Enterprise Research Institute www.deri.ie

Parsing

parsing, or syntactic analysis, is the process of analyzing a sequence of tokens to determine their grammatical structure with respect to a given grammar (Wikipedia) Part of Speech tags Non-recursive phrases (chunks) Constituent structure

Shallow parsing (discussed above) provides


Full (or deep) parsing provides on top of this


complete syntactic structure in terms of interconnected recursive phrases predicate (mostly a verb) and one or more syntactic arguments (phrases) grammatical functions for predicate arguments: subject, direct object, head-modifier analysis, semantic roles

and/or Clause structure

and/or Dependency structure

60

ParsingFullParseExample
Digital Enterprise Research Institute www.deri.ie

He booked the large table in the corner.


S

he
NP Subject, Agent

booked the large table in the corner


VP

he
Pronoun 3rd person Animate

book
Verb Past, 3rd person Head Predicate

the large table in the corner


NP Direct Object, Patient

the large table


NP

in the corner
PP

large
Adjective Modifier

table
Noun Singular Head furniture_01

in
Preposition Head Predicate

the corner
NP

61

ParsingFullParseExample
Digital Enterprise Research Institute www.deri.ie

Part of Speech Morphology

He booked the large table in the corner.


S

he
NP Subject, Agent

booked the large table in the corner


VP

he
Pronoun 3rd person Animate

book
Verb Past, 3rd person Head Predicate

the large table in the corner


NP Direct Object, Patient

the large table


NP

in the corner
PP

large
Adjective Modifier

table
Noun Singular Head furniture_01

in
Preposition Head Predicate

the corner
NP

62

ParsingFullParseExample
Digital Enterprise Research Institute www.deri.ie

Phrases

He booked the large table in the corner.


S

he
NP Subject, Agent

booked the large table in the corner


VP

he
Pronoun 3rd person Animate

book
Verb Past, 3rd person Head Predicate

the large table in the corner


NP Direct Object, Patient

the large table


NP

in the corner
PP

large
Adjective Modifier

table
Noun Singular Head furniture_01

in
Preposition Head Predicate

the corner
NP

63

ParsingFullParseExample
Digital Enterprise Research Institute www.deri.ie

Predicates Grammatical Functions

He booked the large table in the corner.


S

he
NP Subject, Agent

booked the large table in the corner


VP

he
Pronoun 3rd person Animate

book
Verb Past, 3rd person Head Predicate

the large table in the corner


NP Direct Object, Patient

the large table


NP

in the corner
PP

large
Adjective Modifier

table
Noun Singular Head furniture_01

in
Preposition Head Predicate

the corner
NP

64

ParsingFullParseExample
Digital Enterprise Research Institute www.deri.ie

Semantic Tags Semantic Roles

He booked the large table in the corner.


S

he
NP Subject, Agent

booked the large table in the corner


VP

he
Pronoun 3rd person Animate

book
Verb Past, 3rd person Head Predicate

the large table in the corner


NP Direct Object, Patient

the large table


NP

in the corner
PP Modifier

large
Adjective Modifier

table
Noun Singular Head furniture_01

in
Preposition Head Predicate

the corner
NP

65

ParsingFullParseExample
Digital Enterprise Research Institute www.deri.ie

Head-Modifier Analysis

He booked the large table in the corner.


S

he
NP Subject, Agent

booked the large table in the corner


VP

he
Pronoun 3rd person Animate

book
Verb Past, 3rd person Head Predicate

the large table in the corner


NP Direct Object, Patient

the large table


NP

in the corner
PP Modifier

large
Adjective Modifier

table
Noun Singular Head furniture_01

in
Preposition Head Predicate

the corner
NP

66

ParsingDependencyStructure
Digital Enterprise Research Institute www.deri.ie

He booked the large table in the corner. book


Verb Past, 3rd person Head Predicate Agent Patient

he
Pronoun 3rd person Animate Size

table
Noun Singular Head furniture_01 Location

large
Adjective Modifier Size

corner
Noun Singular Modifier Location

67

ParsingDependencyStructureforIE
Digital Enterprise Research Institute www.deri.ie

He booked the large table in the corner.


book
Verb Past, 3rd person Head Predicate Agent Patient

he
Pronoun 3rd person Animate Size

table
Noun Singular Head furniture_01

Location

large
Adjective Modifier Size

corner
Noun Singular Modifier Location

Class + Properties Booking Booking-sponsor Booking-order x, y Male(x)

Extracted Objects & Values

Source Predicate Agent Patient

Table(y), Size(large), Location(y,z), Corner(z)

68

ParsingStateOfTheArt
Digital Enterprise Research Institute www.deri.ie

Widely-used parsers

MINIPAR, Dekang Lin, free research license


Download: http://www.cs.ualberta.ca/~lindek/minipar.htm Web demo: http://dbis.nankai.edu.cn/miniparweb/

Stanford Parser, Klein/Manning, free research license


http://nlp.stanford.edu/software/lex-parser.shtml Web demo: http://nlp.stanford.edu:8080/parser/

Rasp Parser (Sussex Univ.), Briscoe/Carroll, free research license


http://www.informatics.susx.ac.uk/research/groups/nlp/rasp/

Link Grammar Parser (CMU), Temperley et al., free license


http://www.link.cs.cmu.edu/link/ Web demo: http://nlp.stanford.edu:8080/parser/

69

DiscourseAnalysisAnaphoraResolution
Withinputfrom:http://www.ling.su.se/DaLi/education/courses/ngslt_nlp06/PragmaticsGSLTLecture06.ppt
Digital Enterprise Research Institute www.deri.ie

Linking event participants (Semantic Role fillers) within and across sentences, i.e., an anaphor can be linked back to a discourse referent that serves as its antecedent, e.g.,
He bought a bottle of wine, sat down on a stone, and drank it. he AND it are anaphora a bottle of wine AND a stone introduce discourse referents it can be linked back to antecedent a bottle of wine OR a stone

70

DiscourseAnalysisAnaphoraResolution
Digital Enterprise Research Institute www.deri.ie

He booked the large table in the corner.


S

...

It was still available.


S

he

booked the large table in the corner

it
NP Subject, Patient Y

was still available


VP

NP VP Subject , Agent X

he
Pronoun 3rd Person Animate

book

the large table in the corner it


Pronoun 3rd Person Inanimate

NP Verb Direct Object, Patient Past, 3rd Person Definite Y Head Predicate

is
V Past, 3rd Person Head Predicate

the large table


NP

in the corner
PP

still available
AdvP

large
Adjective Modifier

table
Noun Singular Head furniture_01

in
Preposition Head Predicate

the corner
NP Definite Z

71

DiscourseAnalysisDiscourseStructure
Withinputfrom:http://www.ling.su.se/DaLi/education/courses/ngslt_nlp06/PragmaticsGSLTLecture06.ppt
Digital Enterprise Research Institute www.deri.ie

Linking events in terms of temporal sequence, causality etc., e.g.,


John bought a Mercedes, so Bill leased a BMW. (temporal sequence)

John hid Bills car keys as he had drunk too much. (causality)

72

DiscourseAnalysisStateOfTheArt
Digital Enterprise Research Institute www.deri.ie

No readily available black-box tools Anaphora resolution often built-in functionality in NER, parsing, etc. To experiment with discourse referents, anaphora resolution etc., try out e.g. Boxer

Johan Bos, Univ. of Rome http://svn.ask.it.usyd.edu.au/trac/candc/wiki/boxer

73

OverviewoftheTutorial
Digital Enterprise Research Institute www.deri.ie

SemanticAnalysisofUnstructuredLegacyData

Examplesin:SemanticSearch,Ontology-basedInformationExtraction, OntologyLearning

NLPLayerCakewithPointers

PartofSpeechTagging,Morphology,PhraseStructure SemanticTagging
WordNet,FrameNet,WordSenseDisambiguation, NamedEntities,Terms,Thesauri,Ontologies, AdvancedTopic:Ontology-LexiconInterface

GrammaticalFunctions,DependencyStructure,DiscourseAnalysis

FurtherRelevantPointers

GeneralTools,Organizations,Conferences,Journals,Sites,Lists,

74

FurtherRelevantPointersGeneralTools
Digital Enterprise Research Institute www.deri.ie

GATE, Univ. of Sheffield


Eclipse of Natural Language Engineering http://gate.ac.uk/

UIMA, IBM / OpenSource


'Open, Industrial-Strength Platform for Unstructured Information Analysis and Search http://incubator.apache.org/uima/

NLTK (Natural Language Toolkit), Melbourne Univ. ?


Open source Python modules for research and development in natural language processing - book (June 2009): Natural Language Processing with Python http://www.nltk.org/

MBT: Memory-based tagger-generator and tagger, Univ. of Tilburg/Antwerpen


can generate a sequence tagger on the basis of a training set of tagged sequences http://ilk.uvt.nl/mbt/

SProUT, DFKI

platform for development of multilingual shallow text processing and information extraction systems http://sprout.dfki.de/

75

FurtherRelevantPointersPublications
Digital Enterprise Research Institute www.deri.ie

Conferences

Association for Computational Linguistics


ACL (Int.), EACL (Europe), NAACL (North-America), IJCNLP (AFNLP - Asia) http://www.aclweb.org/ ACL SIGS: http://aclweb.org/aclwiki/index.php?title=Special_interest_groups

International Conference on Computational Linguistics


COLING: http://nlp.shef.ac.uk/iccl/

International Conference on Language Resources and Evaluation


LREC: http://www.lrec-conf.org/

Other NLP conferences: EMNLP, CONLL, RANLP, CICLing,

Journals

Computational Linguistics, MIT Press Natural Language Engineering, Cambridge University Press Journal of Logic, Language and Information, Springer Language Resources and Evaluation, Springer

76

FurtherRelevantPointersMoreReading
Digital Enterprise Research Institute www.deri.ie

Handbooks

Handbook of natural language processing, CRC Press, 2000 new edition in progress (2009) Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice Hall, 2008 The Oxford handbook of computational linguistics, Oxford University Press, 2005 Foundations of statistical natural language processing, MIT Press, 2003

Relevant Mailing Lists Corpora list: http://gandalf.aksis.uib.no/corpora/ Linguist list: http://linguistlist.org/ Other NLP sites - broad overviews of tools, resources, people ACL Wiki: http://aclweb.org/aclwiki LT World: http://www.lt-world.org/

77

Digital Enterprise Research Institute

www.deri.ie

Thanks! Further Questions: paul.buitelaar@deri.org

78

S-ar putea să vă placă și