Documente Academic
Documente Profesional
Documente Cultură
Introduction
What is Deep Web
Deep Resources
Dynamic Web Pages
returned in response to a submitted query or accessed only
through a form
Unlinked Contents
Pages without any backlinks
Private Web
sites requiring registration and login (password-protected
resources)
Scripted Pages
Page produced by javascrips, Flash, AJAX etc
Approach towards
crawling
Deep Web
Federated Search
Federated search is the process of
performing a real-time search of multiple
diverse and distributed sources from a
single search page, with the federated
search engine acting as intermediary.
Why federated?
Content from different sources are combined
instead of searching the sources one at a
time.
Federated Search:
Properties (1)
Real Time
Fed search occurs live and results are
current.
Federated Search:
Properties (2)
Single Search page
Fed search engines provide a single point of
searching.
A web form that a normal search engine cannot crawl . This involv
in the textbox, clicking search and retreiving the results.
Fed Search In Ac
Metasearch example
Federated Search
(Advantages)
Efficiency, Time Savings
Instead of querying many search engines
one at a time , the federated search
engine does it on the users behalf
Quality of results
searches only authoritative sources since
it has been programmed to do so.
Most Current content
Searches in real time.
Federated Search
(Challenges)
Aggregation
The process of combining search results
from different sources in some helpful
way
eg: sorting by date,title,author
Ranking
Displaying results relevant to search
De-duplication
A federated search engine may retreive
the same result from multiple resources
Case Study:
Googles Crawling
mediated form
semantic mappings
deep-web sources
Googles approach:
Selecting wild card for form submission
Some fields are mandetory
Query template
Testing with all possible values in select
menu
Predicting form values from datatypes
References(1)
1. Wikipedia,
http://en.wikipedia.org/wiki/Deep_web
2. Bergman, Michael K , "The Deep Web: Surfacing Hidden Value". The
Journal of Electronic Publishing , August 2001
3. Alex Wright, "Exploring a 'Deep Web' That Google Cant Grasp". The
New York Times. Sept 23, 2009.
http://www.nytimes.com/2009/02/23/technology/internet/23search.ht
ml?th&emc=th
4. Jesse Alpert & Nissan Hajaj, We knew the web was big, 2008
http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html
5. He, Bin; Patel, Mitesh; Zhang, Zhen; Chang, Kevin ChenChuan ,"Accessing the Deep Web: A Survey". Communications of the
ACM (CACM), May 2007
References(2)
6. Madhavan, Jayant; David Ko, ucja Kot, Vignesh Ganapathy, Alex
Rasmussen, Alon Halevy, Googles Deep-Web Crawl, 2008
7. Maureen Flynn-Burhoe, "Timeline of events related to the Deep Web"
,2008, http://papergirls.wordpress.com/2008/10/07/timeline-deepweb/
8. Darcy Pedersen, "Federated Search Finds Content that Google Cant
Reach Part I of III" , 2009,
http://deepwebtechblog.com/federated-search-finds-content-thatgoogle-cant-reach-part-i-of-iii/
9. Darcy Pedersen, "A Federated Search Primer Part II of III" , 2009,
http://deepwebtechblog.com/a-federated-search-primer-part-ii-of-iii/
10. Darcy Pedersen, "A Federated Search Primer Part IIIof III" , 2009,
http://deepwebtechblog.com/a-federated-search-primer-part-iii-of-iii/
THANK YOU