Documente Academic
Documente Profesional
Documente Cultură
Over 12 years in software world Israeli Air Force Israel Discount Bank SAP
Apache Solr
In this talk we will take Apache Solr as an example to Search Engine, but the majority of concepts and mechanisms are true for most of the available products on the market
Agenda
History Market at a glance Anatomy of a typical search system Scenarios and problems Scaling the search scenario Handling large data-sets Handling request load Achieving high availability
History
1994 - Lycos 1995 - AltaVista, Yahoo! 1997 - Yandex 1998 - Google, MSN search 2000 - First lucene version (marks the raise of custom search implementations) 2006 - ask.com, AOL search 2009 - Bing
Market at a Glance
Many open source offerings: Apache Lucine, Apache Solr (built on lucine), Nutch, Sphinx, ElasticSearch (built on lucine), Xapian, many more... Some enterprise solutions: Google (Google Search Appliance, Google Mini) Sap (TREX, Enterprise search) IBM (OmniFind) Oracle (Oracle Secure Enterprise Search) Microsoft (FAST search server) Almost no standards: OpenSearch, Robot Exclusion Standard
Anatomy
of typical search system
2 main scenarios
Search scenario: search for a term Problems: How to execute a search on a big data-set, fast. How to scale the solution to serve any given number of concurrent requests. How to provide a highly available service. Indexing scenario: build indexes via add/delete/update document operations Problems: How to index a large number of documents, fast. We will discuss only search scenario, if we will have time, we will touch the indexing scenario too.
"Reduce" step - The master node then collects the answers to all the subproblems and combines them in some way to form the output the answer to the problem it was originally trying to solve.
Summary
To handle more data, split the system into shards. To handle more requests add more Slave nodes. We achieved highly available, and fully salable (data and load wise) search system.
Questions
?
Thank you