Documente Academic
Documente Profesional
Documente Cultură
Elasticsearchfulltext search
Data In Elasticsearch
Elasticsearch is document oriented, which meaning that it stores entire objects
or documents. It uses JSON as the serialization format for documents.
Document belongs to a type and those types live inside an index, while each
document has one or more fields. This is what it looks like compare
to Relational Database.
While working with Elasticsearch you will likely come across the
term index quite a lot, and it has different meaning depending on the context,
so lets clearify this before we can move on.
1. Node Client: this client doesn’t hold any data itself, but it knows what
data lives on which node in the cluster, and can forward requests
directly to the correct node.
2. Transport Client: The lighter weight transport client can be used to send
requests to a remote cluster. It doesn’t join the cluster itself, but simply
forwards requests to a node in the cluster.
Both of this nodes talk to the cluster over port 9300. A cluster is a group of
nodes with the same cluster name that are working together to share data and
to provide failover and scale. A node is a running instance of Elasticsearch.
RESTful API
Beside Java API, elasticsearch also provides RESTful API with JSON over HTTP
on port 9200 which will have the following form:
"query": {
"match_all": {}
'
Document Mapping
Elasticsearch supports the following simple field types:
String: string Number: byte, short, integer, long Float: float,
double Boolean: boolean Date: date Beside these core type elasticsearch also
support custom mapping the the using object notation.
Exact values & Full Text
Data in Elasticsearch can be broadly divided into two types: exact values & full
text. Exact values are exactly what they sound like. For example the exact
value "Foo" is not the same as the exact value "foo". Full text, on the other
hand, refers to textual data, usually written in some human language. Exact
values are easy to query. The decision is binary either matches the query, or it
doesn’t. Similar to WHERE clause in SQL. Querying full text data is different,
instead of asking "Does this document match the query?" it asks
"How well does this document match the query?". Full text search is where
elasticsearch were set apart from tranditional SQL. In order to facilitate these
types of queries on full text fields, Elasticsearch first analyzes the text, then
uses the results to build an inverted index.
Analysis & Analyzer
Analysis is the process of:
The two steps above perform by analyzer which is the combination of three
functions:
Full text query understand how each field is defined, and so they can do the
right thing. When you query a full text field, the query will apply the same
analyzer to the query string to produce the correct list of terms to search for.
By default analyzer will be apply to field with string type and turns this field
into full text field.
Inverted Index
Elasticsearch uses a structure called an inverted index which is designed to
allow very fast full text searches. An inverted index consists of a list of all the
unique words that appear in any document, and for each word, a list of the
documents in which it appears. For example two documents, each contain field
with string:
| No | x | |
| anime | x | x |
| no | x | |
| life | x | |
| The | | x |
| Term | | x |
| is | | x |
| a | | x |
| japanese | | x |
| word | | x |
| for | | x |
| animation | | x |
| video | | x |
Now if we search for the term anime word we will get two results(hit) because
the term anime appear in both Doc 1 and Doc 2, but because the
term word only appear in Doc 2 this will pull Doc 2 in higher position(score) in
the result list.
Conclusion
In this part we cover elasticsearch basic concept like how it store & structure
data, how analysis being performed on data to create inverted index which is
later on used to perform full text query. In the next part we will dive
into Searching and take a look at Query DSL, which is a powerful feature of
elasticsearch.