Documente Academic
Documente Profesional
Documente Cultură
Many similarity measures have been In order to calculate the Word similarity,
proposed for concept similarities, we introduce four string based
including the string-based similarity, similarities and also four knowledge
graph-based similarity, instance based similarities as the base. The string-
classification similarity, and knowledge- based similarity and Knowledge base
based similarity. The string-based similarity is calculated for words. The
similarity is widely used for ontology following similarities are used for string-
mapping. The graph-based similarity based similarity.
utilizes the similarity of the structures of
ontologies. The ontologies are organized • prefix
as tree structures, so we can calculate the • Suffix
graph similarity of the ontologies: • Edit distance
examples include Similarity Flooding [9] • n-gram
and S-Match [7]. Instance classification
similarity uses principles that, if the The prefix similarity measure is for the
classification of instances is similar to similarity of word prefixes such as Pak.
the concepts in different ontologies, the and Pakistan.
concepts are similar. The knowledge- The suffix similarity measure is for the
based similarity utilizes other knowledge similarity of word suffixes such as phone
resources, such as a dictionary and and telephone.
Word-Net [6] to calculate the similarity. Edit distance can calculate the similarity
Although there are many similarity as a count of the string substitutions,
measures, we discuss four similarity deletions and additions.
measures for use in our framework. The For n-gram, the word is divided into n
similarities are “word similarity,” “word number of strings, and the similarity is
list similarity,” “concept hierarchy calculated by the number of same string
similarity,” and “structure similarity.” sets. For example, “word” and “ward”
similarity is counted as follows: The first Similarity (W1, W2) = 2x depth (LCS)
word, “word” is divided into “wo, or, rd” depth (W1) + depth (W2)
for the 2-gram, and the second word
“ward” is divided into “wa, ar, rd” for W1 and W2 denote word labels for the
the 2-gram. As a result, we can find the concept pair to calculate the similarity,
similar string “rd” as the similarity the depth is the depth from the root to
measure for the 2-gram. Similarly we the word and LCS is the least common
can use similarity measure for 3-gram. super concept of W1 and W2.