Documente Academic
Documente Profesional
Documente Cultură
Ranking
S Algorithm
S Array of Aggre-
TreeMaps gated
Results
Parallel processes query
different search engines
and obtain the results TreeMap sorted
on rank
Figure 1
dealing with. This request is sent to the the results are obtained in the form of a
search engine via the java URL object and HTML page. This HTML results page is
parsed by the process and for each result, the 5.2 Ranking Aggregation Methods
URL, Title, Description, Rank and Implemented
SearchSource are stored, creating a Result Take the Best Rank
object. These results are entered into a In this algorithm, we try to place a URL at
TreeMap data structure with the key as the the best rank it gets in any of the search
url and the item as the Result object. engine rankings.
That is,
The GUI also provides for advanced search
MetaRank (x) =
options for entering Boolean queries, Phrase
Min(Rank1(x),Rank2(x),…. , Rankn(x));
searches, selecting the number of results per
Clashes are avoided by an ordering of the
search engine and the selection of search
search engines based on popularity. That
engines to be queried.
means, if two results claim the same position
5.1 Design Decisions in the meta-rank list, the result from a more
During the design of Tadpole, we popular search engine, (say Google) is
various design decisions were taken. Some preferred to the result from a less popular
of them are listed below: one.
400
Time( in milli
300
seconds)
Naïve Ranking
200 Borda's Ranking
100 Foot Rule Ranking
0
1
10
13
16
19
22
query
19%
Google
Altavista
22% 59%
Msn
We have plotted the precision of the ranking We have taken the relevance feedback from
strategies with respect to both the number of two different judges. The Kappa measure of
search results and the recall. this relevance feedback is 0.78. In the
following graphs, we present the results for
In considering the recall, we have taken the
two out of the 38 queries run. We also
total number of relevant documents based on
present the average of the results obtained
user evaluation of all the top 10 results
over the 38 queries.
retrieved by each search engine. The recall
is calculated as the number of relevant 6.4.1 Precision with respect to Number of
documents retrieved/ total number of Results returned
relevant results thus judged.
Query:Gardening
1.5
Borda Method
Pre cision
0 Foot rule
22
10
14
18
26
2
6
Ranking
Number of Results
Query:Alcoholism
1.5
Borda Method
Pre cision
0 Foot rule
26
10
14
18
22
2
6
Ranking
Number of Results
1.5
Borda Method
Pre cision
0 Foot rule
18
10
14
22
26
2
6
Ranking
Number of Results
It can be observed that on an average, the set of results. Also, easily computable
footrule distance ranking aggregation Borda’s method does a good job when
method gives better precision for the given compared to the Naïve ranking method.
Query: Alcoholism
1.2
1
Precision
0.56
0.84
0.98
0.14
0.42
ranker
Recall
Query: Gardening
1.5
Precision
1 Naïve Ranker
0.5 Borda's
Ranker
0
Foot rule
0.14
0.98
0.28
0.42
0.56
0.84
0.7
ranker
Recall
A similar observation can be made with respect to the precision at a given recall for each of the
ranking strategies.
8.Conclusion and Future Work Methods for the web. In proceedings of the
In the context of our project, we have Tenth World Wide Web Conference, 2001.
studied some trade-offs that are involved in [2]Hungarian Method
the design of meta-search engines. We have http://www.math.nus.edu.sg/~matcgh/MA
observed that the computational complexity 3252/lecture_notes/Hungarian.pdf
of ranking algorithms used and performance http://www.cob.sjsu.edu/anaya_j/HungMe
of the meta-search engine are conflicting th.htm
parameters. A compromise must be achieved [3]http://www.lib.berkeley.edu/TeachingLib/
between these two, based on the perceived Guides/Internet/MetaSearch.html
applications and environment in which the [4]K. Bharat and M. Henzinger, Improved
meta-search engine will be used. algorithms for topic distillation in a
hyperlinked environment.ACM SIGIR, pages
Future work involves, incorporating more 104--111, 1998.
number of search engines in the study, [5]S. Chakrabarti, B. Dom, D. Gibson, R.
studying the performance for the most Kumar, P. Raghavan, S. Rajagopalan, and A.
popular queries published by the various Tomkins.
search engines, incorporate local Experiments in topic distillation. Proc. ACM
kemmenization to e spam, to incorporate SIGIR Workshop on Hypertext Information
methods for avoiding mirrored search Retrieval on the Web, 1998.
results. [6]H. P. Young. An axiomatization of
Borda's rule. Journal of Economic Theory,
Bibliography 9:43--52, 1974.
[1] Cynthia Dwork, Ravi Kumar, Moni
Naor, D Siva Kumar, Rank Aggregation