Sunteți pe pagina 1din 25

Ontology Matching for Linked Open

Data
Data and Web Semantics
University of Illinois at Chicago - Fall2010

Concept Introduction
What is Linked Data?

Set of best practices for publishing and connecting structured


data on the Web.[1]

Relationship between Linked Data,


Semantic Web and Web of Data

Linked Data is the Semantic Web done right


( http://www.w3.org/2008/Talks/0617-lod-tbl/#%281%29 )

W3C Linking Open Data Project

Grassroots community effort to publish existing open license


datasets as Linked Data on the Web and Interlink things
between different data sources
(

http://esw.w3.org/SweoIG/TaskForces/CommunityProjects/Linki
ngOpenData
)

A feel of how LOD is growing

Reviewed Papers

Jain, P., Hitzler, P., Sheth, A., Verma, K.:


Ontology alignment for linked open data.
Julius Volz1, Christian Bizer, Martin
Gaedke, and Georgi Kobilarov Discovering
and maintaining links on the web of data
Data Linking: Capturing and Utilising
Implicit Schemalevel Relations Andriy
Nikolov Victoria Uren Enrico Motta

Ontology alignment for linked open data.


Main Challenges

LOD datasets are interlinked. These interlinks are mainly on


instance level (owl:sameAs)
Schema level information that is taxonomies built using
rdfs:subClassof is relatively scarce.
There is a lack of interlinks between the different schemas.
Applications based on LOD face difficulties due to loosely
connected pieces of information.
There are no established benchmarks or available baselines for
measuring precision and recall for LOD schema alignment.
Most competitive state-of-art ontology alignment systems
performed poorly on LOD schema datastes.

Detailed Analysis

Results

Some background information .


The chosen datasets give significant coverage of the
LOD cloud. They cover seperate domains such as
Music and Publication.
Some of the dataset providers such as LinkedMDB
have not made their schema publicly available.
There are no established benchmarks or available
baselines for measuring precision and recall for LOD
schema alignment
Human experts familiar with the domains created reference
alignments
The experts identified all possible subclass and equivalence
mappings via a subclass or an equivalence relationship

BLOOMS approach
Preprocessing of the input ontologies
Remove property restrictions
Tokenize composite class names to obtain a list of all
simple words contained within them

Construction of the BLOOMS forest


The forest is built using information from Wikipedia

Comparison of the constructed BLOOMS forest


Which yields decisions such as alignment of class names

Post Processing
Using a reasoner and Alignment API

Evaluation of Results
BLOOMS have compared more generic schema and have used Wikipedia for
handling the diverse domain of LOD. Following were the shortcomings of various
ontology alignment systems suggested by Jain et al.
Ontology Alignment
System

Issues

RiMOM

Failed due to Ontology size

AROMA

Unable to find any relevant relations

OMViaUo

Able to find only few correct analogies

Alignment API

Able to find few correct analogy but found some wrong


analogies as well

S-Match

Computed correct anlogies but in general evaluated


many results, which resulted in low precision

Discovering and maintaining links on the


web of data
The Gap

There are tools available for publishing Linked Data


on the Web but there is still a lack of tools that
support data publishers in setting RDF Links to
other data sources and to maintain RDF links over
time as data sources change

Silk Linking Framework

A toolkit for discovering and maintaining data links


between Web data sources

Discovering and maintaining links on the


web of data
Components
A link discovery engine, which computes links
between data sources based on a declarative
specification of the conditions that entities must fulfill
in order to be interlinked.
A tool for evaluating the generated data links in order
to fine-tune the linking specification
A protocol for maintaining data links between
continuously changing datasources

Silk Link Discovery engine


Main Features
Support the greneration of owl:sameAs links as well as other
types of RDF links
Flexible,declarative language for specifying link conditions
Can be employed in distributed environments without having to
replicate datasets locally
Capablity of being used where terms from different vocalbularies
are mixed and where no consistent RDFS or OWL schemata
exist.
Link specification Language
Data Access
<DataSource> Directive for data access

Silk Link Discovery engine


Main Features
Link Conditions
<LinkCondition> section is the heart of a Silk Link Specification
<LinkCondition>
<AVG>
<MAX >
<Compare>
<Param>

Pre-Matching
<PreMatchingDefinition sourcePath="?a/rdfs:label" hitLimit="10">
<Index targetPath="?b/rdfs:label" />
<Index targetPath="?b/drugbank:synonym" />
</PreMatchingDefinition>

Evaluating Links
Resource Comparison

Link Maintenance Protocol


Link Transfer to target
Request for Target Change List
Subscription of Target Changes

Data Linking: Capturing and Utilizing Implicit Schema level


Relations

Challenges:
The Web of Data is constantly growing [1], and the co-reference
links between data instances stored in different repositories
represent a major added value of the Linked Data approach.
(Co-reference resolution, or the determination of equivalent URIs
referring to the same concept or entity.)

Using a automatic co-reference resolution tool


Challenges for automatic co-reference resolution tool in the light of
heterogeneity in schema used by the repositories

Schema Matching and Co-reference resolution in Linked


Data Environment

Specific features of Linked Data Environment


Consider several interlinked datasets in combination
Involve information contained in third-party datasets
as background knowledge to support matching.
Exploit data patterns present in large volumes of
instance data
Develop methods to deal with relations like class
overlap and relation overlap rather than strict
equivalence

Goal of this approach and Related work in past


Utilize the features in Linked Data to perform
schema-level matching between repositories
and in turn facilitate instance co-reference
resolution
Related Work
SILK Linking Framework
A lot of user effort required

Hub Repository Approach


May lead to loss of some data

Explanation
Background
LinkedMDB repository describes movies from the IMDB database
DBPedia describes Wikipedia entries

Using Background Data for Ontology Matching


Infer schema level relations
movie: music_contributor and dbpedia:Artist
movie: actor and dbpedia:starring

Infer data patterns


Identical movies will have a overlap in release year and set of
actors

Explanation
Inferring schema-level mappings
Data level evidence
Schema level evidence
Establishing relation between movie:music-contributor and
dbpedia:artist via MusicBrainz

Inferring data patterns and refining the set of


existing mappings
{movie:actor,dbpedia:starring}(sim = 0.98)
{movie:initial_release_date;dbpedia:releaseDate}
(sim=0.96)
continued

Test Results

Test Results
Finding equivalence links between music_contributor individuals in
LinkedMDB and corresponding individuals in DBPedia(auxillary
dataset:Musicbrainz,goldstandard size942)
Steps
Instance-based schema-matching algorithm
The relations obtained in the above step were passed as input to data level
coreference resolution tool KnoFuss to discover owl:sameAs links between
instances

The two sets of result


Baseline: involves computing transitive closure of already existing links
Aligned: combined set of existing result and new results obtained by
algorithm after schema alignment

What Do I intend to Do !!!


Linked Open Data Ontology Alignment on
Agreement Maker
Taking BLOOMS result into consideration aligning the
ontologies on Linked Open Data.
Evaluate the results with respect to BLOOMS and
other Ontology Alignment Systems used in BLOOMS
Implement or Suggest solution for co-reference
resolution

Reference Papers

S-ar putea să vă placă și