Florian Gilcher, asquera GmbH
Skim the top of Elasticsearch and give you pointers on where to start and what not to ignore.
Downloads are found at http://elasticsearch.org/downloads.
$ curl https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.3.0.zip
$ unzip elasticsearch-1.3.0.zip
$ cd elasticsearch-1.3.0
$ bin/elasticsearch
[Tempus] version[1.3.0], pid[17392], build[c8714e8/2013-09-17T12:50:20Z]
{
"id": 123,
"name": "Florian Gilcher",
"place": "Berlin",
"birthdate": "1983-10-04T00:00:00+01:00",
"interests": ["code", "data", "elasticsearch"],
"age": 30
}
Elasticsearch handles:
Let’s put the stuff in the database.
$ export $HOST=http://localhost:9200
$ curl -XPOST $HOST/my_index/person/123 -d @person.json
{"ok":true,"_index":"my_index","_type":"person","_id":"123","_version":1}
This operation is part of the Document API and is called “Index”.
Here is how we get it back:
$ curl -XGET $HOST/my_index/person/123
We can also search for content:
$ curl -XGET $HOST/my_index/person/_search?q=florian&pretty
$ curl -XGET $HOST/my_index/_search?q=florian&pretty
This searches all fields!
"hits" : [ {
"_index" : "test",
"_type" : "person",
"_id" : "1",
"_score" : 1.0,
"_source" : { "id": 123, "name": "Florian Gilcher", "place": "Berlin", "birthdate": "1983-10-04T00:00:00+01:00", "interests": ["code", "data", "elasticsearch"], "age": 30}
} ]
...
Beyond debugging, the Query DSL is recommended for search queries:
{
"query": { "match" : { "name" : "florian" } }
}
Queries can be constraint by filtering the data before running the query:
{
"query": { "match" : { "name" : "florian" } },
"filter": { "range" : { "age" : { "gte": 25, "lte": 35 } } }
}
The query DSL is a tiny programming language in itself and merits learning it properly.
All results get ranked by a score. The score represents how good a document matches by:
There are multiple queries that can influence scoring.
The get go is the function_score query that can for example:
Mappings describe how incoming values are stored in the Lucene index.
Elasticsearch automatically detects the mapping of newly added types and fields.
$ curl localhost:9200/test/person/_mapping?pretty
...
"person" : {
"properties" : {
"age" : { "type" : "long" },
"birthdate" : { "type" : "date", "format" : "dateOptionalTime" },
...
"name" : { "type" : "string" },
}
}
Analysis is the step of breaking text data into terms that can be indexed.
Searches are also analyzed.
Lucene builds a reverse index of your data.
keyword | documents |
---|---|
florian | 1 |
gilcher | 1,2 |
felix | 2 |
step | Input | Output |
---|---|---|
Whitespace Tokenizer | “The quick fox” | “The” “quick” “fox” |
lowercase filter | “The” “quick” “fox” | “the” “quick” “fox” |
stopword filter | “the” “quick” “fox” | “quick” “fox” |
synoym filter | “quick” “fox” | “quick” “fast” “fox” |
A match query for quick
, fast
or fox
will find this document.
Getting analysis right is the difference between a good and a bad search.
Aggregations are split in 2 parts:
{ "aggregations": {
"range": {
"date_range": {
"field": "date",
"format": "MM-yyy",
"ranges": [ { "to": "now" }, { "from": "now" } ]
...
{ "aggregations": {
"range": { ... },
"aggregations": {
"monthly" : {
"date_histogram" : {
"field" : "date",
"interval" : "1M",
...
{
"aggs" : {
"min_price" : { "min" : { "field" : "price" } }
}
}
Metrics can be used as the last nested aggregation as well.
Distribution works at an index-level by breaking the index into shards and distributing it over the cluster.
A node is a running instance of Elasticsearch. A production system should at least consist of 2 nodes.
An Index stores documents. It consists of Shards. The “Elasticsearch index” is not a Lucene index.
An index is split into multiple shards (5 per default). For reliability and performance reasons, each shard is copied multiple times. These copies are called “replica”.
The shards are distributed using the following strategy:
Replicas allow two things:
The main pitfalls when starting Elasticsearch are:
Cover: Bonsai Rock, Lake Tahoe
http://www.flickr.com/photos/tensafefrogs/4513403767/