Documente Academic
Documente Profesional
Documente Cultură
www.deri.ie
SemanticWebChallenge:LegacyData
Digital Enterprise Research Institute www.deri.ie
LinkedData
Unstructured,Un-Linked
LegacyData
OverviewoftheTutorial
Digital Enterprise Research Institute www.deri.ie
SemanticAnalysisofUnstructuredLegacyData
Examplesin:SemanticSearch,Ontology-basedInformationExtraction, OntologyLearning
NLPLayerCakewithPointers
PartofSpeechTagging,Morphology,PhraseStructure SemanticTagging
WordNet,FrameNet,WordSenseDisambiguation, NamedEntities,Terms,Thesauri,Ontologies, AdvancedTopic:Ontology-LexiconInterface
GrammaticalFunctions,DependencyStructure,DiscourseAnalysis
FurtherRelevantPointers
GeneralTools,Organizations,Conferences,Journals,Sites,Lists,
WhattheTutorialwillnotaddress
Digital Enterprise Research Institute www.deri.ie
SeeTutorialonInformationMiningbyConorHayes
DERIStreamonSemanticInformationMiningbringstogetherNatural LanguageProcessingandInformationMining
OverviewoftheTutorial
Digital Enterprise Research Institute www.deri.ie
SemanticAnalysisofUnstructuredLegacyData
Examplesin:SemanticSearch,Ontology-basedInformationExtraction, OntologyLearning
NLPLayerCakewithPointers
PartofSpeechTagging,Morphology,PhraseStructure SemanticTagging
WordNet,FrameNet,WordSenseDisambiguation, NamedEntities,Terms,Thesauri,Ontologies, AdvancedTopic:Ontology-LexiconInterface
GrammaticalFunctions,DependencyStructure,DiscourseAnalysis
FurtherRelevantPointers
Conferences,Journals,Websites,MailingLists,
SomeExampleApplications
Digital Enterprise Research Institute www.deri.ie
SemanticAnnotation&Search
OntologyLearning
Ontology-basedInformationExtraction
SmartWeb(2005-2008) Extractionofentitiesandeventsfromasetoffootballmatchreports
SemanticVideoBrowsing
OpenCalais,GIST
Industrialstrengthopensource/commercialsemanticannotation&retrieval
SemanticAnnotation&SearchMuchMore
Digital Enterprise Research Institute www.deri.ie
SemanticAnnotation&SearchMuchMore
Digital Enterprise Research Institute www.deri.ie
SemanticAnnotation&SearchMuchMore
Digital Enterprise Research Institute www.deri.ie
SemanticAnnotation&SearchMuchMore
Digital Enterprise Research Institute www.deri.ie
10
SemanticAnnotation&SearchMuchMore
Digital Enterprise Research Institute www.deri.ie
11
SemanticAnnotation&SearchMuchMore
Digital Enterprise Research Institute www.deri.ie
12
OntologyLearningOntoLT
Digital Enterprise Research Institute www.deri.ie
13
LinguisticStructure2OntologyMappingRules
OntologyLearningOntoLT
Digital Enterprise Research Institute www.deri.ie
14
Extraction&InspectContexts
OntologyLearningOntoLT
Digital Enterprise Research Institute www.deri.ie
15
ExtractOntologyFragments
OntologyLearningOntoLT
Digital Enterprise Research Institute www.deri.ie
German Clinical Report: An 40 Kniegelenkprparaten wurden mittlere Patellarsehnendrittel mit einer neuen Knochenverblockungstechnik in einem zweistufigen Bohrkanal bzw. mit konventioneller Interferenzschraubentechnik femoral fixiert. English Translation: In 40 human cadaver knees, mid patellar ligament thirds were fixed with a trapezoid bone block on one side on the femoral side in a two-level drill hole, or with a conventional interference screw.
16
LinguisticAnnotation(fragments)
Ontology-basedInformationExtractionSmartWeb
Digital Enterprise Research Institute www.deri.ie
Oliver Kahn konnte den Schuss von Beto halten. Oliver Kahn could stop the shot by Beto.
Information Extraction
Ontology Population
17
Ontology-basedInformationExtractionSmartWeb
Digital Enterprise Research Institute www.deri.ie
18
SemanticVideoBrowsingK-Space
Digital Enterprise Research Institute www.deri.ie
http://keg.vse.cz/wf/kspace/smil/
IndustrialApplicationsGIST(CALAIS)
Digital Enterprise Research Institute www.deri.ie
20
21
With a split decision in the final two primaries and a flurry of superdelegate endorsements, Sen. Barack Obama sealed the Democratic presidential nomination Tuesday night after a grueling, historymaking campaign that will make him the first African American to head a majorparty ticket. Before a chanting, cheering audience in St. Paul, Minn., the first-term Illinois senator savored what once seemed an unlikely outcome to the Democratic race against Sen. Hillary Rodham Clinton. He now faces another hard-fought battle, against Sen. John McCain, the presumptive Republican candidate. 22
OverviewoftheTutorial
Digital Enterprise Research Institute www.deri.ie
SemanticAnalysisofUnstructuredLegacyData
Examplesin:SemanticSearch,Ontology-basedInformationExtraction, OntologyLearning
NLPLayerCakewithPointers
PartofSpeechTagging,Morphology,PhraseStructure SemanticTagging
WordNet,FrameNet,WordSenseDisambiguation, NamedEntities,Terms,Thesauri,Ontologies, AdvancedTopic:Ontology-LexiconInterface
GrammaticalFunctions,DependencyStructure,DiscourseAnalysis
FurtherRelevantPointers
Conferences,Journals,Websites,MailingLists,
23
NLP-ACompleteExample
Digital Enterprise Research Institute www.deri.ie
...
he
NP Subject , Agent X
it
NP Subject, Patient Y
he
Pronoun 3rd Person Animate
book
Verb Past, 3rd Person Head Predicate
it
Pronoun 3rd Person Inanimate
is
Verb Past, 3rd Person Head Predicate
in the corner
PP
still available
AdvP
large
Adjective Modifier
table
Noun Singular Head furniture_01
in
the corner
24
NLPLayerCake
Digital Enterprise Research Institute www.deri.ie
Hebookedthelargetableinthecorner...
[table:ARTIFACT,furniture_01] [[[the][large][table]NP][[in][the][corner]PP]NP]
PartofSpeechTagging
MorphologicalAnalysis
[[the:SPEC][large:MOD][table:HEAD]NP] [[He:SUBJ][booked:PRED][[this][table:HEAD]NP:DOBJ]S]
SemanticTagging
Tokenization
Dependency Structure
[He:SUBJ][booked:PRED]this[[table:HEAD]NP:DOBJ:X1] [[It:SUBJ:X1][was:PRED]available]
25
Discourse Analysis
Phrases
NLPLayers
Digital Enterprise Research Institute www.deri.ie
Tokenization
Morphology
Phrase Structure
Semantic Tagging
Discourse Analysis
Which events are expressed throughout a text/discourse? How do they interact? And which objects are involved?
26
Part-of-SpeechTagging
Digital Enterprise Research Institute
Adaptedfrom:http://www.cs.utexas.edu/~mooney/cs388/slides/pos-tagging.ppt
www.deri.ie
Annotate each word in a sentence with a part-of-speech (PoS) tag useful for subsequent syntactic parsing Most common PoS tag set for English is Penn Treebank set of 45 tags, e.g. John saw the saw and decided to take it to the table. NNP VBD DT NN CC VBD TO VB PRP IN DT NN
Other tag sets in use for other languages, e.g. Stuttgart-Tbingen Tag Set (STTS) for German Challenge in Part-of-Speech Tagging is ambiguity
like can be
Verb: I like/VBP candy. Preposition: Time flies like/IN an arrow.
around can be
Preposition: I bought it at the shop around/IN the corner. Particle: I never got around/RP to getting a car. Adverb: A new Prius costs around/RB $25K.
27
Part-of-SpeechTaggingPoSEnglish
Adaptedfrom:http://www.cs.utexas.edu/~mooney/cs388/slides/pos-tagging.ppt
Digital Enterprise Research Institute www.deri.ie
Noun (person, place or thing) Singular (NN): dog, fork Plural (NNS): dogs, forks Proper (NNP, NNPS): John, Springfields Personal pronoun (PRP): I, you, he, she, it Wh-pronoun (WP): who, what Verb (actions and processes) Base, infinitive (VB): eat Past tense (VBD): ate Gerund (VBG): eating Past participle (VBN): eaten Non 3rd person singular present tense (VBP): eat 3rd person singular present tense: (VBZ): eats Modal (MD): should, can To (TO): to (to eat)
Adjective (modify nouns) Basic (JJ): red, tall Comparative (JJR): redder, taller Superlative (JJS): reddest, tallest Adverb (modify verbs) Basic (RB): quickly Comparative (RBR): quicker Superlative (RBS): quickest Preposition (IN): on, in, by, to, with Determiner: Basic (DT) a, an, the WH-determiner (WDT): which, that Coordinating Conjunction (CC) and, but, or Particle (RP) off (took off), up (put up)
28
28
Part-of-SpeechTaggingClosedvs.Open
Adaptedfrom:http://www.cs.utexas.edu/~mooney/cs388/slides/pos-tagging.ppt
Digital Enterprise Research Institute www.deri.ie
Closed class categories are composed of a small, fixed set of grammatical function words for a given language:
Pronouns (it, he, she, ) Prepositions (on, for, from, to, ) Modals (will, can, may, ) Determiners (a, the) Particles (to, up, off) Conjunctions (and, or)
Open class categories are composed of large sets of content words and are open to new additions:
29
29
Part-of-SpeechTaggingStateOfTheArt
Digital Enterprise Research Institute www.deri.ie
http://www-nlp.stanford.edu/links/statnlp.html#Taggers
TreeTagger, decision trees, free research license http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger TnT, Thorsten Brants, HMM, free research license http://www.coli.uni-saarland.de/~thorsten/tnt/ ENGCG, lexicon & rules, commercial (LingSoft) http://www2.lingsoft.fi/cgi-bin/engcg
30
30
MorphologicalAnalysis
Digital Enterprise Research Institute
Adaptedfrom:http://courses.washington.edu/ling570/fei_fall07/10_15_morph.ppt
www.deri.ie
Some definitions
Morphological Analysis split up words into component morphemes and build a (formal) representation of word-internal structure Morpheme minimal meaning-bearing unit in a language Stem morpheme that forms central meaning unit in a word Affix word element that can only occur attached to a stem Prefix Suffix Infix Circumfix specific unspecific (English) wonder wonderful (English) hingi humingi (Tagalog) sagen gesagt (German)
Isolated languages (no morphology): e.g., Chinese Morphologically poor languages: e.g., English Morphologically complex languages: e.g., Turkish
31
MorphologicalAnalysisOverview,Ambiguity
Adaptedfrom:http://courses.washington.edu/ling570/fei_fall07/10_15_morph.ppt
Digital Enterprise Research Institute www.deri.ie
writing books
writes write + V + 3rd Person + Singular flies fly + N + Plural fly + V + 3rd Person + Singular
cabdriver cab + driver doghouse dog + house Flachbildschirm (flat screen) flach + Bildschirm (flat screen) Flachbild + Schirm (flat view screen) flach + Bild + Schirm (flat picture screen) Ive I + have
32
MorphologicalAnalysisStateOfTheArt
Digital Enterprise Research Institute www.deri.ie
33
PhraseStructureAnalysisDefinitions
Withinputfrom:http://www.kwary.net/linguistics/gl/GLSyntax01.ppt
Digital Enterprise Research Institute www.deri.ie
Phrase
Breaking up a sentence into recursively defined coherent units (constitutional parts), e.g., an NP consisting of several NPs First step in sentence parsing (see also further NLP layers)
Chunks
Chunking
Also known as shallow parsing (without overall sentence structure & grammatical functions see also further NLP layers)
34
PhraseStructureAnalysisNP,PPExample
Adaptedfrom:http://www.kwary.net/linguistics/gl/GLSyntax01.ppt
Digital Enterprise Research Institute www.deri.ie
NP (Det) N
NP Det the N bus
PP P NP
PP P Det in the NP N yard
NP (Det) N (PP)
NP Det N P Det the bus in the PP NP N yard
35
PhraseStructureAnalysisVPExample
Adaptedfrom:http://www.kwary.net/linguistics/gl/GLSyntax01.ppt
Digital Enterprise Research Institute www.deri.ie
VP V (NP) (PP)
V
36
PhraseStructureAnalysisStateOfTheArt
Digital Enterprise Research Institute www.deri.ie
http://www.aclweb.org/aclwiki/index.php?title=Parsers_for_English
http://www.aclweb.org/aclwiki/index.php?title=List_of_resources_by_language
TreeTagger (PoS, Chunking), decision trees, free research license http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger CNTS Memory Based Shallow Parser (Univ. of Antwerpen), classifier, license? http://www.cnts.ua.ac.be/cgi-bin/jmeyhi/MBSP-instant-webdemo.cgi Univ. of Illinois at Urbana-Champaign, classifier, license? http://l2r.cs.uiuc.edu/~cogcomp/shallow_parse_demo.php
37
SemanticTagging
Digital Enterprise Research Institute www.deri.ie
Classification of words, phrases with a semantically defined category Nowadays associated with Semantic Web (semantic annotation, knowledge markup) and Web 2.0 tagging In NLP refers to assigning a sense to a word or phrase
Originally, machine readable dictionaries, e.g., LDOCE Recent years, wordnets (nouns), framenets (verbs) Increasingly, general & domain ontologies
38
SemanticTaggingWordNet
Digital Enterprise Research Institute www.deri.ie
Organized around meaning rather than word forms Maps words to meanings/interpretations or senses Senses are represented by synsets (sets of synonyms), e.g.,
{board, plank} : piece of lumber {board, committee} : group of people
39
SemanticTaggingWordNetOrigin
Digital Enterprise Research Institute www.deri.ie
In 1985 a group of psychologists and linguists at Princeton University undertook to develop a lexical database The initial idea was to provide an aid to use in searching dictionaries conceptually, rather than merely alphabetically WordNet instantiates hypotheses based on results of psycholinguistic research
In anomic aphasia, there is a specific inability to name objects. When confronted with an apple, say, patients may be unable to utter apple, even though they will reject such suggestions as shoe or banana, and will recognize that apple is correct when it is provided. Caramazza/Berndt 1978
40
SemanticTaggingSynsets,Senses
Digital Enterprise Research Institute www.deri.ie
Words that occur in several synsets have a corresponding number of senses i.e. are ambiguous:
41
SemanticTaggingAmbiguitycont.
Digital Enterprise Research Institute www.deri.ie
Homonymy
Systematic Polysemy
Also referred to in the literature as regular polysemy (Apresjan 1973) or logical polysemy (Pustejovsky 1991, 1995 ) systematic polysemy introduced by (Nunberg & Zaenen 1992) - see also Bierwisch 1983 (school example), Hobbs et al 1993 (office example)
42
SemanticTaggingSynsetHierarchy
Digital Enterprise Research Institute www.deri.ie
Example
{entity} {whole, unit} {building material} {lumber, timber} {board, plank}
hypernymy hyponymy
43
SemanticTaggingHierarchyExample
Digital Enterprise Research Institute www.deri.ie
44
SemanticTaggingFrameNet
Digital Enterprise Research Institute www.deri.ie
Categorizes verbs and their syntactic/semantic arguments Based on frame semantics theory (Fillmore 1968) Frame describes a particular situation, object, or event and the participants, properties involved, e.g.
Frame apply_heat Frame Elements (or Roles) cook, food, heating_instrument Lexical Units (evoking the Frame): bake, boil, brown, simmer, steam, ...
45
SemanticTaggingFrameAmbiguity
Digital Enterprise Research Institute www.deri.ie
framearranging:AgentputsacomplexThemeintoaparticularConfiguration Davidarrangedthestonesinacircle. arrange.v,arrangement.n,array.v,deploy.v,deployment.n,format.v,setup.v frameplacing:AgentplacesaThemeatalocation,theGoal Davidarrangedhisbriefcaseonthefloor. archive.v,arrange.v,bag.v,bestow.v,billet.v,bin.v,bottle.v,box.v,brush.v,cage.v,cram.v,crate.v,dab.v, daub.v,deposit.v,drape.v,drizzle.v,dust.v,embed.v,emplace.v,file.v,garage.v,hang.v,heap.v, immerse.v,implant.v,inject.v,insert.v,insertion.n,jam.v,lay.v,lean.v,load.v,lodge.v,mount.v,pack.v, package.v,park.v,perch.v,pile.v,place.v,placement.n,plant.v,plunge.v,pocket.v,position.v,pot.v,put.v, rest.v,rub.v,set.v,sheathe.v,shelve.v,shoulder.v,shower.v,sit.v,situate.v,smear.v,sow.v,stable.v, stand.v,stash.v,station.v,stick.v,stow.v,stuff.v,tuck.v,warehouse.v,wrap.v
46
SemanticTaggingWordSense
Digital Enterprise Research Institute www.deri.ie
Classification of the correct sense to a word Based on wordnets & similar resources for many languages Sense-annotated corpora enable classifier training No longer very active area of research in NLP community Annotated corpora, tools, evaluation data sets available from SenseVal (1-4) evaluation campaigns:
http://www.senseval.org/
Recently attention turned to Semantic Role Labelling and variety of other tasks in Computational Lexical Semantics
see SemEval evaluation campaign: http://semeval2.fbk.eu/
47
SemanticTaggingSemanticRoles
Digital Enterprise Research Institute www.deri.ie
Classification of correct frame category (sense) to a verb & assign semantic roles to its syntactic arguments Based on FrameNet availability and similar resources, e.g.
PropBank http://verbs.colorado.edu/~mpalmer/projects/ace.html NomBank http://nlp.cs.nyu.edu/meyers/NomBank.html VerbNet http://verbs.colorado.edu/~mpalmer/projects/verbnet.html SemLink http://verbs.colorado.edu/semlink/ OntoNotes http://www.bbn.com/ontonotes/ German FrameNet http://www.coli.uni-saarland.de/projects/salsa
Frame-annotated corpora enable classifier training Recently very active area of research in NLP community
48
SemanticTaggingStateOfTheArt
Digital Enterprise Research Institute www.deri.ie
WSD tools by Ted Pedersen (University of Minnesota, Duluth), free http://sourceforge.net/projects/wsdgate/ & others SenseLearner, Rada Mihalcea (Univ. of North Texas), free http://www.cse.unt.edu/~rada/downloads.html#senselearner SuperSenseTagger, SemTechLab Rome ?, license? http://sourceforge.net/projects/supersensetag/
Shalmaneser (Saarland Univ.), pluggable parsing & classifiers, free license http://www.coli.uni-saarland.de/projects/salsa/shal/ Univ. of Illinois at Urbana-Champaign, parsing & classifiers, license? http://l2r.cs.uiuc.edu/~cogcomp/srl-demo.php SWIRL (Universitat Politecnica de Catalunya), parsing & classifiers, GPL license http://www.surdeanu.name/mihai/swirl
49
SemanticTaggingLexicalInference
Digital Enterprise Research Institute www.deri.ie
The Boston office called. to call expects an object of type Human in Agent position coerce office into an object of type (Group-of) Person > Human lexical semantic inference: Person Work-at Office
Located-at Building Has-address Representation-of Office Person
Organization
Work-for
office
50
Work-at
SemanticTaggingLexicalInference
Digital Enterprise Research Institute www.deri.ie
Peter bought a car. The engine runs well. the engine refers to already introduced object (discourse referent) lexical semantic inference: Engine Part-of Car
Engine
51
studies_at
Campus
located_at
Student
University
works_at
School
is_part_of
Staff
label
label
school
staff
studies_at
Campus
located_at
Student
University
works_at
School
is_part_of
Staff
has_US-English_term
has_German_term
has_Dutch_term
School
Fakultt
Faculteit
Mucosa
OralMucosa
hasLingInfo
LingInfo
instanceOf
hasMorphSynInfo
WordForm
instanceOf hasLingInfo
Term-1
hasMorphSynInfo
hasOrthographicForm
hasLang
WordForm-1 DE
hasPoS hasStem
Mundschleimhaut
Term-2 N
Term-3
http://olp.dfki.de/LingInfo/ http://ontoware.org/projects/lexonto/
hasOrthographicForm
Mund
Schleimhaut
SemanticTaggingTerms,Classes
Digital Enterprise Research Institute www.deri.ie
Semantic annotation on the basis of a thesaurus or ontology Term recognition & extraction
terms are domain-specific phrases
Relation extraction
relations are domain-specific semantic roles
55
SemanticTaggingTerms,Relations
Withinputfrom:http://www.lrec-conf.org/proceedings/lrec2008/slides/496.ppt
Digital Enterprise Research Institute www.deri.ie
GENIA corpus
http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/home/wiki.cgi?page=GENIA+corpus
Examples
Interleukin 1 beta inhibits insulin secretion IL-1beta is known to inhibit insulin secretion Insulin secretion is inhibited by IL-1 beta
Term recognition & extraction Grammatical function annotation: subject, direct object, etc. see further NLP layers
GENIA Relation inhibit GENIA Term (Class) interleukin 1 beta IL-1beta insulin secretion Semantic Role Agent Target Grammatical Function Subject Direct Object
56
SemanticTaggingTerms,Classes
Digital Enterprise Research Institute www.deri.ie
Eurovoc Thesaurus Terminology in all EU languages on all EU areas: politics, trade, law, science, energy, agriculture,
MT UF 3606 natural and applied sciences gene pool genetic resource genotype heredity biology life sciences DNA genetic engineering (6411)
Medical Subject Headings (MeSH) Thesaurus with taxonomy of ~ 250,000 terms, representing medical subjects for retrieval purposes
MeSH Heading Entry Term Entry Term Entry Term Entry Term Entry Term Entry Term Entry Term See Also Databases, Genetic Genetic Databases Genetic Sequence Databases OMIM Mendelian Inheritance in Man Genetic Data Banks Genetic Data Bases Genetic Information Databases Genetic Screening
Gene Ontology
Accession Synonyms Term Lineage GO:0009292 broad : genetic exchange all : all (164142) GO:0008150 : biological process (115947) GO:0007275 : development (11892) GO:0009292 : genetic transfer (69)
57
SemanticTaggingNames
Digital Enterprise Research Institute www.deri.ie
Named Entity Recognition Originally intended as extension of Tokenization, e.g. in recognizing Names and other specific tokens such as Dates, Times Evolved into a more general identification and classification of names of People, Organisations, Companies, Countries, Cities, etc. Currently merging with ontology-based information extraction
58
SemanticTaggingStateOfTheArtcont.
Digital Enterprise Research Institute www.deri.ie
Annotate biomedical text with UMLS Metathesaurus MetaMap (US National Library of Medicine), free license http://mmtx.nlm.nih.gov/
Annotate business text with KIM ontology KIM (Ontotext), free research license http://www.ontotext.com/kim/
Annotate football (soccer) text with SWIntO ontology SProUT (DFKI), free research license http://www.dfki.de/sw-lt/heartofgold/ (web demo)
59
ParsingOverview
Digital Enterprise Research Institute www.deri.ie
Parsing
parsing, or syntactic analysis, is the process of analyzing a sequence of tokens to determine their grammatical structure with respect to a given grammar (Wikipedia) Part of Speech tags Non-recursive phrases (chunks) Constituent structure
complete syntactic structure in terms of interconnected recursive phrases predicate (mostly a verb) and one or more syntactic arguments (phrases) grammatical functions for predicate arguments: subject, direct object, head-modifier analysis, semantic roles
60
ParsingFullParseExample
Digital Enterprise Research Institute www.deri.ie
he
NP Subject, Agent
he
Pronoun 3rd person Animate
book
Verb Past, 3rd person Head Predicate
in the corner
PP
large
Adjective Modifier
table
Noun Singular Head furniture_01
in
Preposition Head Predicate
the corner
NP
61
ParsingFullParseExample
Digital Enterprise Research Institute www.deri.ie
he
NP Subject, Agent
he
Pronoun 3rd person Animate
book
Verb Past, 3rd person Head Predicate
in the corner
PP
large
Adjective Modifier
table
Noun Singular Head furniture_01
in
Preposition Head Predicate
the corner
NP
62
ParsingFullParseExample
Digital Enterprise Research Institute www.deri.ie
Phrases
he
NP Subject, Agent
he
Pronoun 3rd person Animate
book
Verb Past, 3rd person Head Predicate
in the corner
PP
large
Adjective Modifier
table
Noun Singular Head furniture_01
in
Preposition Head Predicate
the corner
NP
63
ParsingFullParseExample
Digital Enterprise Research Institute www.deri.ie
he
NP Subject, Agent
he
Pronoun 3rd person Animate
book
Verb Past, 3rd person Head Predicate
in the corner
PP
large
Adjective Modifier
table
Noun Singular Head furniture_01
in
Preposition Head Predicate
the corner
NP
64
ParsingFullParseExample
Digital Enterprise Research Institute www.deri.ie
he
NP Subject, Agent
he
Pronoun 3rd person Animate
book
Verb Past, 3rd person Head Predicate
in the corner
PP Modifier
large
Adjective Modifier
table
Noun Singular Head furniture_01
in
Preposition Head Predicate
the corner
NP
65
ParsingFullParseExample
Digital Enterprise Research Institute www.deri.ie
Head-Modifier Analysis
he
NP Subject, Agent
he
Pronoun 3rd person Animate
book
Verb Past, 3rd person Head Predicate
in the corner
PP Modifier
large
Adjective Modifier
table
Noun Singular Head furniture_01
in
Preposition Head Predicate
the corner
NP
66
ParsingDependencyStructure
Digital Enterprise Research Institute www.deri.ie
he
Pronoun 3rd person Animate Size
table
Noun Singular Head furniture_01 Location
large
Adjective Modifier Size
corner
Noun Singular Modifier Location
67
ParsingDependencyStructureforIE
Digital Enterprise Research Institute www.deri.ie
he
Pronoun 3rd person Animate Size
table
Noun Singular Head furniture_01
Location
large
Adjective Modifier Size
corner
Noun Singular Modifier Location
68
ParsingStateOfTheArt
Digital Enterprise Research Institute www.deri.ie
Widely-used parsers
69
DiscourseAnalysisAnaphoraResolution
Withinputfrom:http://www.ling.su.se/DaLi/education/courses/ngslt_nlp06/PragmaticsGSLTLecture06.ppt
Digital Enterprise Research Institute www.deri.ie
Linking event participants (Semantic Role fillers) within and across sentences, i.e., an anaphor can be linked back to a discourse referent that serves as its antecedent, e.g.,
He bought a bottle of wine, sat down on a stone, and drank it. he AND it are anaphora a bottle of wine AND a stone introduce discourse referents it can be linked back to antecedent a bottle of wine OR a stone
70
DiscourseAnalysisAnaphoraResolution
Digital Enterprise Research Institute www.deri.ie
...
he
it
NP Subject, Patient Y
NP VP Subject , Agent X
he
Pronoun 3rd Person Animate
book
NP Verb Direct Object, Patient Past, 3rd Person Definite Y Head Predicate
is
V Past, 3rd Person Head Predicate
in the corner
PP
still available
AdvP
large
Adjective Modifier
table
Noun Singular Head furniture_01
in
Preposition Head Predicate
the corner
NP Definite Z
71
DiscourseAnalysisDiscourseStructure
Withinputfrom:http://www.ling.su.se/DaLi/education/courses/ngslt_nlp06/PragmaticsGSLTLecture06.ppt
Digital Enterprise Research Institute www.deri.ie
John hid Bills car keys as he had drunk too much. (causality)
72
DiscourseAnalysisStateOfTheArt
Digital Enterprise Research Institute www.deri.ie
No readily available black-box tools Anaphora resolution often built-in functionality in NER, parsing, etc. To experiment with discourse referents, anaphora resolution etc., try out e.g. Boxer
73
OverviewoftheTutorial
Digital Enterprise Research Institute www.deri.ie
SemanticAnalysisofUnstructuredLegacyData
Examplesin:SemanticSearch,Ontology-basedInformationExtraction, OntologyLearning
NLPLayerCakewithPointers
PartofSpeechTagging,Morphology,PhraseStructure SemanticTagging
WordNet,FrameNet,WordSenseDisambiguation, NamedEntities,Terms,Thesauri,Ontologies, AdvancedTopic:Ontology-LexiconInterface
GrammaticalFunctions,DependencyStructure,DiscourseAnalysis
FurtherRelevantPointers
GeneralTools,Organizations,Conferences,Journals,Sites,Lists,
74
FurtherRelevantPointersGeneralTools
Digital Enterprise Research Institute www.deri.ie
'Open, Industrial-Strength Platform for Unstructured Information Analysis and Search http://incubator.apache.org/uima/
Open source Python modules for research and development in natural language processing - book (June 2009): Natural Language Processing with Python http://www.nltk.org/
can generate a sequence tagger on the basis of a training set of tagged sequences http://ilk.uvt.nl/mbt/
SProUT, DFKI
platform for development of multilingual shallow text processing and information extraction systems http://sprout.dfki.de/
75
FurtherRelevantPointersPublications
Digital Enterprise Research Institute www.deri.ie
Conferences
Journals
Computational Linguistics, MIT Press Natural Language Engineering, Cambridge University Press Journal of Logic, Language and Information, Springer Language Resources and Evaluation, Springer
76
FurtherRelevantPointersMoreReading
Digital Enterprise Research Institute www.deri.ie
Handbooks
Handbook of natural language processing, CRC Press, 2000 new edition in progress (2009) Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice Hall, 2008 The Oxford handbook of computational linguistics, Oxford University Press, 2005 Foundations of statistical natural language processing, MIT Press, 2003
Relevant Mailing Lists Corpora list: http://gandalf.aksis.uib.no/corpora/ Linguist list: http://linguistlist.org/ Other NLP sites - broad overviews of tools, resources, people ACL Wiki: http://aclweb.org/aclwiki LT World: http://www.lt-world.org/
77
www.deri.ie
78