Documente Academic
Documente Profesional
Documente Cultură
Yasser F. Hassan
Maged El-Sayed
Department of Information Systems and Computers
Faculty of Commerce, Alexandria University
Alexandria, Egypt
maged@alexu.edu.eg
I. INTRODUCTION
The important goal of the semantic web is to make the
meaning of information explicit so that it will be machineaccessible information which enabling more effective access
to knowledge contained in heterogeneous information
environments, such as the web [1]. Current search engines are
not very effective because they are based on simple querydocument matches without considering the semantics of the
keywords [2] and also the intended semantic of a keyword is
difficult to know if the keyword has more than one semantic
according to the context.
Fuzzy sets are the type of sets in set theory that are
imprecise and no boundaries. Accordingly, they only relay on
the degree of their memberships. As the extension of web
grows, the thorny problem of data heterogeneity is also
increasing. However, the review/comparability of the fuzzy
sets, crisp sets and semantic web technologies were given
based on their extensions [3].
Synonyms are different words with similar meanings. Words
that are synonyms are said to be synonymous, so synonymous
are have the same membership values to a category. For
example the keywords movies and films are synonymous.
Also Short words such as msg and err are considered to
be synonymous for the original words message and error
respectively. Due to Polysemy, a homonym is one of a group
of words that share the same spelling and the same
978-1-4673-2824-1/12/$31.00 2012 IEEE
(2)
Let TS be a set of training examples consisting of n queries
and their classifications.
TS = {q1, c(q1),q2, c(q2) , ,qn, c(qn) }
Each query in the training set is represented by a set of
term-frequency pairs as follows.
q = {k1, w1, k2,w2, ,kn, wn }
where wj is the occurrence frequency of keyword kj in the
query. Given a set of training queries TS, the membership
value (ki, cj) is calculated from the total number of
occurrences of keyword ki in category cj divided by the total
number of keyword frequency ki in all categories as follows.
(3)
w i ,{w i q k q k TS }
Query
Figure 1. Layered approach for the proposed methodology
q1
q2
q3
K1
K2
K3
Category
1
2
0
2
1
1
0
0
3
C1
C1
C2
Category
C1
C2
3
3
0
0
1
3
133
C1
C2
1
0.75
0
0
0.25
1
( , )=
( , )
( )
( , )
( )
(4)
Where
are fuzzy conjunction (t-norm) and
disjunction (t-conorm) and ( ) is the membership value
of a keyword to a query. The conjunction operator could be
generalized for any triangular norm, also called t norm and
denoted by t(x, y), which is a mapping [0, 1] u [0, 1] into [0,
1]. In the same way, it is possible to define fuzzy disjunction
operators as a triangular conorms, also called tconorm and
denoted by s(x, y), as a mapping [0, 1] u [0, 1] into [0, 1]
[10]. The most commonly used conjunction and disjunction
operators are Einstein Product and sum which used to
calculated t-norm and t-conorm as shown in (5) and (6).
134
t(x, y) =
.
2( + .
.
s(x, y) =
(5)
(6)
1+ .
(7)
2) Query Classification
The query classification is performed using the k-nearestneighbor (KNN) classifier according to the query-category
similarity. The KNN is selected as a baseline classifier as it
is easy to implement and often results in very good
classification performances in many practical applications
[11]. Fig. 3 illustrates how classification is performed. Given
the similarity measure of the given query to each category
sim(q,Ci),the final classification is Cc according to (8) such
that [12] CcC1, C2,, CM .
d(q, Cc) = min d(q, sim(q,Ci))
i=1,2,,M (8)
3) Query reformation
The intended semantic for each keyword ki is extracted
from the ontology according to the final classification of the
query to formulate the enriched query according to (9). For
example consider the keyword switch in the two queries
statements how to switch the HDD Rack on and I have a
A. Inverse Property
The property hasCategory has an inverse property which is
categoryOf that is if switch hasCategory network, then
because of inverse property, it could be inferred that Network
is category of switch.
135
B. Transitive Properties
Fig. 6 shows an example of the transitive property
hasSynonym. If the individual Network has a synonym that is
Net, and Net has a synonym that is Mesh, then we can infer
that Network has a synonym that is Mesh. Inverse property is
indicated by the dashed line in Fig. 6.
C. Symmetric Property
Fig. 7 shows an example of a symmetric property. If the
individual Fuzzy is related to the individual Uncertain via the
hasSynonym property, then, it could be inferred that Uncertain
must also be related to Fuzzy via the hasSynonym property. In
other words, if Fuzzy is the synonym of Uncertain, then
Uncertain must be the synonym of Fuzzy.
D. Reasoning
One of the main services offered by a reasoner is to test
whether or not one class is a subclass of another class. By
performing such tests on the classes in an ontology it is
possible for a reasoner to compute the inferred ontology class
hierarchy. Another standard service that is offered by
reasoners is consistency checking. Based on the description
(conditions) of a class the reasoner can check whether or not it
is possible for the class to have any instances. Protg 4
allows different OWL reasoners to be plugged in; the reasoner
shipped with Protg is called FaCT++ [14]. The ontology can
be sent to the reasoner to automatically compute the
classification hierarchy, and also to check the logical
consistency of the ontology.
The W3C has proposed the Resource Description
Framework (RDF) [15] for exposing the meaning of a
document to the web community of people, machines, and
intelligent agents. Rules provide the natural and wideaccepted mechanism to perform automated reasoning, with
mature and available theory and technology. This has been
identified as a design issue for the semantic web, as clearly
stated by Tim Berners-Lee et al [16]. The mapping rules that
we consider are following.
x hasSynonym(x,y)hasSynonym(y,z)o hasSynonym(x,z)
x hasSynonym(x,y) o hasSynonym (y,x)
Search query
Dell network switch
Dell database server
network router
DHCP
IP
switch table
query table
Classification
NETWORK
DATABASE
NETWORK
NETWORK
NETWORK
NETWORK
DATABASE
136
queries that are correctly classified. For a query in the test set
that was assigned into several classes, the classification is
considered correct if it is same as the given classification. The
classification accuracy in our model is 89.2%.
We compared the results of a given query to a keywordbased search engine (we chose Google) with the results of the
enriched query produced by our solution. The enriched query
excludes many irrelevant results as shown in Table V. The
comparisons of the original query and the enriched query in
this regard are shown in Fig. 9.
The experimental results show that our solution enhances
the results retrieved by the keyword-based search engines
rather than using the original query.
Figure 9. Comparison between precision of results using original query and enriched query.
137
VI. CONCLUSION
EnrichSearch was designed as a plug-in for any search
engine. Thus, implementation of the EnrichSearch focused on
the modification of the query string itself, instead of
modifying the target search engine directly which is easier.
The proposed solution provides a new method for query
statement classification by using the semantics of the query
keywords. Classification is done not just by keyword-based
semantic but using a sentence-level semantics which means
that the system can utilize the relationship between keywords
(if any) to get the intended semantic of a keyword. Another
benefit of this approach is that the algorithm is continuously
improving itself by adjusting the weight of keywords which
increases the classification accuracy by time. Ambiguity
comes from homonyms is reduces by getting the intended
meaning of keywords.
Our proposed solution facilitates online searching for user
query by using the already-implemented keyword-based
search engines (such as YAHOO! and Google) which
produced by many companies and research labs with the
benefit of semantic web and fuzzy logic to get better results
than submitting the original query. This avoid many irrelevant
results to the intended meaning of the user and save time to
search for the needed information in the huge result set
returned by the keyword-based search engines for the original
query. Our future work is to apply ontology learning to enrich
the ontology and use multiple ontologies in different domains
not just only one domain.
REFERENCES
[1] G. Antoniou, F. Van Harmelen: A Semantic Web Primer, Cambridge
MA: MIT Press, 2008.
[2] M. Daoud, L. Tamine-Lechani and M. Boughanem, Using a conceptbased user context for search personalization. In Proc. of the 2008
International Conference of Data Mining and Knowledge Engineering,
2008.
[3] U. Kamaluddeen, J. Jafreezal, L. Shahir, Comparability between fuzzy
sets and crisp sets: a semantic web approach, 2010.
[4] M. Lan, C. Lim Tan, J. Su, Y. Lu. Supervised and traditional term
weighting methods for automatic text categorization. IEEE transactions
on pattern analysis and machine intelligence; 31(4), pp. 72135, 2009.
[5] A. Sieg, B. Mobasher, R. Burke, G. Prabu, S. Lytinen, Representing user
information context with ontologies, In uahci05, 2005
[6] S. Verberne, L. Boves, N. Oostdijk, and P. Coppen, What is not in the
bag of words for why-QA?, ACL, pp. 719-727, 2010.
[7] P. Mika. Microsearch, An Interface for Semantic Search. In Semantic
Search, International Workshop located at the 5th European Semantic
Web Conference (ESWC 2008), volume 334 of CEUR Workshop
Proceedings, pp. 79-88, 2008.
[8] E. Prud'hommeaux and A. Seaborne, "SPARQL Query Language for
RDF, http://www.w3.org/TR/rdf-sparql-query/,2011.
[9] Jena, A Semantic Web Framework for Java. http://jena.sourceforge.net/,
2012.
[10] E. Massad, N. Regina Siqueira Ortega, L. Carvalho de Barros and C. Jos
Struchiner.Fuzzy Logic in Action: Applications in Epidemiology and
Beyond Studies in Fuzziness and Soft Computing, 2008.
[11] F. Sebastiani, Machine Learning in Automated Text categorization.
ACM computing surveys, Vol. 34, pp. 1-47, 2002.
138