Sunteți pe pagina 1din 7

A Review: Ontology Mapping Techniques

Jawad Usman Arshed, Rab Nawaz, Mehtab Afzal


COMSATS Institute of Information Technology Abbottabad
{jawadusman, rabnawaz, mehtabafzal}@ciit.net.pk

one of the most influential. It includes


Abstract: five regulations: clarity and objectivity,
coherence, completeness, maximal
Ontology heterogeneity is the primary extendibility, minimal commitment.
obstacle for interoperation of ontology. Today, many ontologies are widely used,
Ontology mapping is the best way to such as Word net, Framenet, GUM,
solve this problem. The major problem SENSUS and MIKROKMOS [4].
in ontology mapping is checking the
similarity between concepts. Different 2. Ontology Mapping:
similarity techniques are used for
ontology mapping. In this paper we will Currently, many people use the internet
compare different similarity techniques to collect information as a decision
and will use the best technique to get making tool. For example, when making
better result of ontology mapping. vacation plans, users search on the
internet for suitable lodging, routes, and
1. Introduction: sightseeing spots. However, these
internet sites are operated by individual
1.1 Ontology: enterprises, which mean that we are
required to check the sites manually in
Ontology is a philosophical concept order to collect information. In order to
originally, and it describes natures of resolve this problem, the Semantic Web
objects [1]. In 1980’s, scientific is expected to become a next generation
researchers introduced ontology into the web standard that will be capable of
field of artificial intelligence (AI), so connecting different data resources. On
ontology has new meanings. In the fields the Semantic Web, the semantics of the
of computer science, ontology is a data are provided by ontologies for
formal specification of a shared interoperability of the resources.
conceptualization [3]. At present, there Because there is not a unique criterion to
isn’t a unique and widely received ontological construction and the
criterion of classification, because of a distribution of ontology, different users
great number of classification methods. can build different ontology. Therefore,
In general, Ontology includes five basic many heterogeneous ontologies are
primitives: class, relation, function, constructed in one or overlapping fields,
axiom and instance. Researchers propose even in a small background field. In the
many criteria for building ontology [2]. same area, the problem of ontology
The criterion proposed by T.R Gruber is heterogeneity must be solved in order to
realize the interoperation among and if we have similarity value 0 then it
different ontologies. means mapping cannot be done. Keeping
in view the similarity techniques a
2.1 BASIC CONCEPTS OF Table 1 can be formed which on the
ONTOLOGY MAPPING: basis of multiple similarity values can
decide the mapping of concepts.
The Semantic Web community has
achieved a good standing within the last We have different similarity techniques
years. As more and more people get used to check the similarity of concepts.
involved, many individual ontologies are However, the single similarity measure
created. Interoperability among different is insufficient for determining the table
ontologies becomes major problem. This because of the diversity of ontologies.
problem can be solved by ontology For example, we can assume the concept
mapping. of a “bank” in two ontologies. The
As it can easily be imagine that mapping concepts seem to be mapped when we
can not be done manually beyond a use the string similarity measure.
certain complexity, size, or number of However, when one ontology has a
ontologies may longer. Automatic or at super concept of “finance” and another
least semi-automatic techniques have to has that of “construction,” these two
be developed to reduce the burden of concepts should not be mapped because
manual creation and maintenance of each represents a different concept. In
mappings. Ontology mapping describes such a case, we should also use another
that how to map and connect between similarity measure of concepts.
different ontologies. It is a specification Therefore, it is necessary to use multiple
of consistence about concepts and similarity measures to determine the
relations. The process of ontology correct mappings. The ID shown in the
mapping has five steps: information table represents a pair of concepts: Class
ontology, obtaining similarity, semantic represents the validity of the mapping,
mapping execution and mapping post- and the columns in the middle represent
processing [5]. The key of ontology the similarity of the concept pairs. For
mapping is the computation of example, the first line of the table
conceptual similarity. First define represents the ontology mapping for Ca1
similarity: Sim: w1 w2 o1 o2 [0, 1], the and Cb1, and has a similarity value of
similar value from 0 to 1. 0.75 for similarity measure 1. When we
Sim (A, B) denote the similarity of A know some mappings, such as Ca1 Cb1
and B. w1 and w2 are two term sets. o1 and Ca1 Cb2, we can use the mapping to
and o2 are two ontologies. determine the importance of the
similarity measures. Then, we can make
----Sim (e, f) =1: denote concept e and a decision on unknown classes such as
concept f are completely sameness. Ca5 Cb7 by using the importance of the
similarity measures.
----Sim (e, f) =0: denote concept e and
concept f are completely dissimilar.

So if we have similarity value 1 then it


means these concepts can be mapped
Table 1. Table formulation of the ontology mapping problem
Similarity Similarity Similarity measure
ID …. Class
measure 1 measure 2 n
Ca1 Cb1
0.75 0.4 …. 0.38 1 (Positive)
Ca1 Cb2
0.52 0.7 …. 0.42 0 (Negative)
….
…. …. .... …. ….
Ca5 Cb7
0.38 0.6 …. 0.25 ?
….
…. …. …. …. ….

3. Concept Similarity Measures: 3.1 Word Similarity:

Many similarity measures have been In order to calculate the Word similarity,
proposed for concept similarities, we introduce four string based
including the string-based similarity, similarities and also four knowledge
graph-based similarity, instance based similarities as the base. The string-
classification similarity, and knowledge- based similarity and Knowledge base
based similarity. The string-based similarity is calculated for words. The
similarity is widely used for ontology following similarities are used for string-
mapping. The graph-based similarity based similarity.
utilizes the similarity of the structures of
ontologies. The ontologies are organized • prefix
as tree structures, so we can calculate the • Suffix
graph similarity of the ontologies: • Edit distance
examples include Similarity Flooding [9] • n-gram
and S-Match [7]. Instance classification
similarity uses principles that, if the The prefix similarity measure is for the
classification of instances is similar to similarity of word prefixes such as Pak.
the concepts in different ontologies, the and Pakistan.
concepts are similar. The knowledge- The suffix similarity measure is for the
based similarity utilizes other knowledge similarity of word suffixes such as phone
resources, such as a dictionary and and telephone.
Word-Net [6] to calculate the similarity. Edit distance can calculate the similarity
Although there are many similarity as a count of the string substitutions,
measures, we discuss four similarity deletions and additions.
measures for use in our framework. The For n-gram, the word is divided into n
similarities are “word similarity,” “word number of strings, and the similarity is
list similarity,” “concept hierarchy calculated by the number of same string
similarity,” and “structure similarity.” sets. For example, “word” and “ward”
similarity is counted as follows: The first Similarity (W1, W2) = 2x depth (LCS)
word, “word” is divided into “wo, or, rd” depth (W1) + depth (W2)
for the 2-gram, and the second word
“ward” is divided into “wa, ar, rd” for W1 and W2 denote word labels for the
the 2-gram. As a result, we can find the concept pair to calculate the similarity,
similar string “rd” as the similarity the depth is the depth from the root to
measure for the 2-gram. Similarly we the word and LCS is the least common
can use similarity measure for 3-gram. super concept of W1 and W2.

The knowledge-based similarity is also The third similarity measure,


calculated for words. We use Word Net description, utilizes the description of a
as the knowledge resource for concept in Word Net. The similarity
calculating the similarity. Although a is calculated as the square of the
wide variety of similarities for Word Net common word length in both
is proposed, we discuss four similarities: descriptions of the words.

• Synset The last similarity measure is proposed


• Wu & Palmer by Lin [8]. This measure is calculated
• Description using a formula similar to that of Wu
• Lin and Palmer, except it uses information
criteria instead of depth.
The first similarity measure synset
utilizes the path length of the synset in 3.2 Word List Similarity:
Word Net. Word Net is organized with
synsets. Therefore, we can calculate the The word similarity measures are
shortest path of the different word pairs designed for words, and the measure is
using synsets. Synset similarity not applicable to a word list such as
measures use the path length as the “Food Wine.” Such a word list can
similarity measure. usually be used as a concept label. If we
divide such words using a hyphen or
3.1.1 Introduction of Word Net: underscore, we can obtain a word list.
Two types of similarities: maximum
Word Net is a dictionary reference word similarity and word edit distance
system online. It is an English dictionary can be defined for word list similarity.
and based on the mental language rules.
Synsets is the information unit. Synsets 3.2.1 Maximum word similarity:
is a thesaurus set and includes verb,
noun, adjective and adverb. These words If we are using combination of words in
can exchange one another in given both lists, we can calculate the similarity
context. for each pair of words by word similarity
measures. We use the maximum value of
Wu & Palmer similarity measures use the word similarity for word pairs in the
the depth and the least common super word list as the maximum word
concept (LCS) of words [10]. The similarity. We can obtain eight
similarity is calculated in the following maximum word similarities by using
equation:
word similarities and we will use the measures. As a result, we define 16
same. similarity measures for word lists,
consisting of eight maximum word
The second similarity measure, word similarities and eight words edit distance
edit distance, is derived from the edit similarities.
distance. In the edit distance definition,
the similarity is calculated by each 3.3 Concept Hierarchy Similarity
string. We extend this method by
considering words as strings. Let us As ontologies are organized as concept
assume two word lists, “Atomic” and hierarchies. In order to utilize the
“Atomic, Theory;” the similarity similarity of a concept hierarchy, we
between the two lists is considerably introduce concept hierarchy similarity
apparent. If we consider one word as a measures for concept hierarchies. The
component; we can calculate the edit concept hierarchy similarity measure is
distance for the word lists. In this case, calculated for the path from the root to
“Atomic” is the same in both word lists, the concept. Let us explain using the
so then we can calculate the word edit example shown in Table 2. We assume
distance as one. On the other hand, if we the calculation of the path “Top / Pak
assume “Top” and “Atomic, Theory,” study” in ontology A and “Top /
the word edit distance is two. Pakistan study” in ontology B. For
Consequently, we can therefore calculate calculation of the similarity, we divide
the similarity by the word edit distance. the path into a list of concepts, as shown
However, another problem occurs for in the middle column of Table 2. Then
similar word lists. For example, when the similarity can be calculated by the
we assume “Pakistan, Study” and “Pak, edit distance if we consider the concept
Study” how do we decide the similarity? as a component. For example, the
The problem is the calculation of concept “Top” is the same in both
similarity for “Pakistan” and “Pak”: that ontologies, but the second concept is
is, we have to decide whether the two different. Then, we can calculate the edit
words are the same word or not. If we distance for the path. However, how do
decide that the two words are the same, we decide whether the concept is the
the word edit distance is zero, but if not, same or not? To calculate this, we divide
the word edit distance is one. In order to the concept into the word list for
calculate the similarity of the words, we calculating the similarity by using the
employ the word similarity measure with word list similarity. In this case, if “Pak
a particular threshold once more. For study” and “Pakistan study” are
example, if we use the prefix as the word considered to be a similar concept using
similarity measure, we can consider the the word list similarity, the edit distance
two words are the same for calculating is zero; if the two concepts are not
the word edit distance. However, if we considered as a similar concept using the
use the synset as the word similarity word list similarity, the edit distance is
measure, we cannot consider the two one. As we can use any word list
words as the same because “Pak” is not similarity measures for deciding the
in Word Net. From the above discussion, similarity of word list, we obtain sixteen
we have decided to define the word edit concept hierarchy similarity measures.
distance for eight word similarity
Table2. Example of concept hierarchies

Path Path List Word List

Ontology A Top/Pak_study {Top, Pak_study} {Top}, {Pak,study}


Ontology B Top/Pakistan_st {Top, {Top},
udy
Pakistan_study} {Pakistan,study}
3.4 Structure Similarity: 2. T Gruber .Towards principles for the
design of ontologies used for knowledge
The similarity presented above cannot sharing”, International Journal of
handle the similarity of graphical Human-Computer Studies .Vol.
structures. We define the similarity 43(5/6), pp.907-928. 1995
measures using the structure of
ontologies. In order to use graphically 3. TR Gruber. “A translation approach to
close concepts, we utilize the parent portable ontology specification “,
concept label for calculating the Knowledge Acquisition, Vol.5 (2),
similarity. Because the similarity is pp.199-220, 1993.
calculated by the word list similarity, we
can obtain 16 similarity measures for 4. Deng Zhihong, Tang Shiwei, Zhang
parents. This similarity can be seen as Ming. Research overview of ontology”.
one of the variations of graph Journal of Beijing University (Natural
similarities. science deition).Vol.38 (5), pp.730-738,
2002.
4. Conclusion:
5. Alexander Maedche, Boris Motik.
Ontology mapping is one of the main Ontologies for Enterprise Knowledge
challenges for semantic web. We defined Management “, IEEE Intelligent
various similarity measures in this paper. Systems, 2003, 26-33.
If we apply these similarity measures for
ontology mapping then we can get a 6. C.Fellbaum. WorldNet: An Electronic
powerful framework for ontology Lexical Database. MIT Press, 1998.
mapping problems. In our future work,
we are planning to explore new 7. F. Giunchiglia, P. Shvaiko, and M.
similarity and plan to investigate the best Yatskevich. S-match: an algorithm and
combination of similarity measures to an implementation of semantic
improve performance. matching. In C. Bussler, J. Davies, D.
Fensel, and R. Studer, editors,
References: Proceedings of the 1st European
1. A Kivela, E Hyvonen .Ontological Semantic Web Symposium, volume
theories for the Semantic Web”, 3053 of Lecture Notes in Computer
Helsinki: HIIT Publications, 2002, Science, pages 61–75. Springer, 2004.
pp.111-136
8. D. Lin. An information-theoretic
definition of similarity. In Proceedings
of the 15th International Conference on
Machine Learning, pages 296–304.
Morgan Kaufmann, San
Francisco, CA, 1998.

9. S. Melnik, H. Garcia-Molina, and E.


Rahm. Similarity flooding: A versatile
graph matching algorithm and its
application to schema matching. In
Proceedings of the 18th
International Conference on Data
Engineering, San Jose, CA, Feb. 2002.

10. Z. Wu and M. Palmer. Verb


semantics and lexical selection. In Proc.
of the 32nd Annual Meeting of the
Association for Computational
Linguistics, pages 133–138, New
Mexico State University, Las Cruces,
New Mexico, 1994.

S-ar putea să vă placă și