elasticsearch basics workshop

elasticsearch basics
workshop
mathieu Elie at giroll
mardi 17 décembre 13

speaker : @mathieuel
• freelance & founder @oneplaylist
• full stack skills
• see what i’ve done on http://www.mathieuelie.net


goal
• go from ﬁrst steps
• and get over ﬁrst frustation
• give the you the power needed to learn by
yourself


install
• be sure you have java runtime
• apt-get install openjdk-6-jre-headless -y
• consider oracle jvm


unzip and run !
## Get the latest stable archive
wget https://download.elasticsearch.org/elasticsearch/
elasticsearch/elasticsearch-0.90.7.zip
## Extract the archive
unzip elasticsearch-0.90.7.zip
cd elasticsearch-0.90.7
## run !
# This will run elasticsearch on foreground.
./bin/elasticsearch -f


its alive !
[2013-12-13 15:45:25,187][INFO ][node
] [Bridge, George Washington]
version[0.90.7], pid[37998], build[36897d0/2013-11-13T12:06:54Z]
[2013-12-13 15:45:25,189][INFO ][node
initializing ...
[2013-12-13 15:45:25,202][INFO ][plugins
loaded [], sites []
[2013-12-13 15:45:28,342][INFO ][node
initialized
[2013-12-13 15:45:28,342][INFO ][node
starting ...
[2013-12-13 15:45:28,491][INFO ][transport
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.1.12:9300]}
[2013-12-13 15:45:31,545][INFO ][cluster.service
new_master [Bridge, George Washington][pKCdh1b_TP2TlurO1gm4_g][inet[/192.168.1.12:9300]],
reason: zen-disco-join (elected_as_master)
[2013-12-13 15:45:31,577][INFO ][discovery
elasticsearch/pKCdh1b_TP2TlurO1gm4_g
[2013-12-13 15:45:31,595][INFO ][http
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.1.12:9200]}
[2013-12-13 15:45:31,596][INFO ][node
started
[2013-12-13 15:45:31,629][INFO ][gateway
recovered [0] indices into cluster_state

ping es on port 9200
curl http://127.0.0.1:9200
{
"ok" : true,
"status" : 200,
"name" : "Gideon, Gregory",
"version" : {
"number" : "0.90.6",
"build_hash" : "e2a24efdde0cb7cc1b2071ffbbd1fd874a6d8d6b",
"build_timestamp" : "2013-11-04T13:44:16Z",
"build_snapshot" : false,
"lucene_version" : "4.5.1"
},
"tagline" : "You Know, for Search"
}%


Store a Document
curl -XPUT http://localhost:9200/workshop/site/1 -d '
{
"url": "http://www.elasticsearch.org",
"title": "Open Source Distributed Real Time Search & Analytics",
"description": "Elasticsearch is a powerful open source search and
analytics engine that makes data easy to explore.",
"tags": ["Open Source", "elasticsearch", "Distributed"]
}'
{"ok":true,"_index":"workshop","_type":"sites","_id":"1","_version":1}%


retreive the document
curl -XGET http://localhost:9200/workshop/site/1
{"_index":"workshop","_type":"site","_id":"1","_version":2,"exists":true,
"_source" :
{
}}%


add more documents
{
"url": "http://www.mathieu-elie.net",
"title": "Mathieu ELIE Freelance - Full Stack Data Engineer, Data
Visualization",
"description": "Freelance Consultant in Bordeaux, System & Software
Architect. Love dataviz, redis, elasticsearch, architecture scalability
recipes and playing with data.",
tags: ["elasticsearch", "Data Visualization"]
}'
{
"url": "http://www.giroll.org",
"title": "Collectif Giroll - Gironde Logiciels Libres",
"description": "Giroll, collectif basâˆšé âˆšà Bordeaux, râˆšéunis
autour des Logiciels et des Cultures libres. Ateliers tous les mardis de
18h30 âˆšà 20h30 et organisation d''Install Party Linux tous les six",
tags: ["Open Source", "Collectif"]
}'

now search !


curl 'http://localhost:9200/workshop/_search?pretty=true'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [ {
"_index" : "workshop",
"_type" : "site",
"_id" : "1",
"_score" : 1.0, "_source" :
{
"description": "Elasticsearch is a powerful open source search and analytics engine
that makes data easy to explore.",
}
}, {
"_type" : "site",
"_id" : "3",
"_score" : 1.0, "_source" :
{
"description": "Giroll, collectif basâˆšé âˆšà Bordeaux, râˆšéunis autour des Logiciels
et des Cultures libres. Ateliers tous les mardis de 18h30 âˆšà 20h30 et organisation
dInstall Party Linux tous les six",

ok great, but now i
want to search for
text !

step 1 : pass query as a
request body
curl -XPOST 'http://localhost:9200/workshop/site/_search?pretty=true' -d
'{
"query" : {
"match_all" : { }
}
}'


It returns all documents
because we use the match all query
http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/query-dsl-match-all-query.html


match_all query is part of the queries dsl
reference/current/query-dsl-queries.html


so lets use the
query_string query dsl
curl -XPOST 'http://localhost:9200/workshop/site/_search?pretty=true' -d '{
"query" : {
"query_string" : {
"query" : "elasticsearch"
}
}
}'


result is a a quiet
verbose lets get only
title and tags ﬁelds
"fields" : ["title", "tags"],
"query" : {
"query_string" : {
}
}
}'


{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.081366636,
"hits" : [ {
"_type" : "site",
"_id" : "1",
"_score" : 0.081366636,
"fields" : {
"tags" : [ "Open Source", "elasticsearch", "Distributed" ],
"title" : "Open Source Distributed Real Time Search & Analytics"
}
}, {
"_type" : "site",
"_id" : "2",
"_score" : 0.06780553,
"fields" : {
"tags" : [ "elasticsearch", "Data Visualization" ],
"title" : "Mathieu ELIE Freelance - Full Stack Data Engineer, Data
Visualization"
}

lets go for facets on tags !!
reference/current/search-facets.html

do you see the wall ??? ;)


Facets dsl
"query" : {
"query_string" : {
}
},
"facets" : {
"tags" : { "terms" : {"field" : "tags"} }
}
}'


"facets" : {
"tags" : {
"_type" : "terms",
"missing" : 0,
"total" : 7,
"other" : 0,
"terms" : [ {
"term" : "elasticsearch",
"count" : 2
}, {
"term" : "visualization",
"count" : 1
}, {
"term" : "source",
"count" : 1
}, {
"term" : "open",
"count" : 1
}, {
"term" : "distributed",
"count" : 1
}, {
"term" : "data",
"count" : 1
} ]
}
}

ho no!!

• hey ! see "Open Source" !

it is lower cased
and exploded in multiple tokens !

• this is done by the defautl mapping and
analyzer


curl 'http://localhost:9200/workshop/site/_mapping?pretty=true'
{
"site" : {
"properties" : {
"description" : {
"type" : "string"
},
"tags" : {
"type" : "string"
},
"title" : {
"type" : "string"
},
"url" : {
"type" : "string"
}
}
}
}


• tags is a type of string and we have a default
analyzer

• http://www.elasticsearch.org/guide/en/

elasticsearch/reference/current/analysisstandard-analyzer.html

• An analyzer of type standard is built using
the Standard Tokenizer with the Standard
Token Filter, Lower Case Token Filter, and
Stop Token Filter.


test the default analyzer
curl -XGET 'localhost:9200/workshop/_analyze?pretty=true' -d 'Open Source'
{
"tokens" : [ {
"token" : "open",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 1
}, {
"token" : "source",
"start_offset" : 5,
"end_offset" : 11,
"type" : "<ALPHANUM>",
"position" : 2
} ]
}


• what about keyword analyzer ?
• http://www.elasticsearch.org/guide/en/

elasticsearch/reference/current/analysiskeyword-analyzer.html


curl -XGET 'localhost:9200/workshop/_analyze?
analyzer=keyword&pretty=true' -d 'Open Source'
{
"tokens" : [ {
"token" : "Open Source",
"start_offset" : 0,
"end_offset" : 11,
"type" : "word",
"position" : 1
} ]
}

got it ! now how to apply this to our tags ﬁeld ?


curl
{

'http://localhost:9200/workshop/site/_mapping?pretty=true' -d '
"site" : {
"properties" : {
"url" : {"type" : "string"},
"title" : {"type" : "string"},
"description" : {"type" : "string"},
"tags" : {"type" : "string", "analyzer": "keyword" }
}
}

}
'
{
"error" : "MergeMappingException[Merge failed with failures {[mapper
[tags] has different index_analyzer]}]",
"status" : 400
}

oops ! we need to drop something..

curl -XDELETE 'http://localhost:9200/workshop/'
{"ok":true,"acknowledged":true}%
# index should exists if we want to put mapping..
curl -XPUT 'http://localhost:9200/workshop/'
curl
{

'http://localhost:9200/workshop/site/_mapping?pretty=true' -d '
"site" : {
"properties" : {
"url" : {"type" : "string"},
"title" : {"type" : "string"},
"description" : {"type" : "string"},
"tags" : {"type" : "string", "analyzer": "keyword" }
}
}

}
'


# test on the field analysis
curl -XGET 'localhost:9200/workshop/_analyze?
pretty=true&field=site.tags' -d 'Open Source'
{
"tokens" : [ {
"token" : "Open Source",
"start_offset" : 0,
"end_offset" : 11,
"type" : "word",
"position" : 1
} ]
}
# congrats !


# lets push data again
{
}'

{
"url": "http://www.mathieu-elie.net",
"title": "Mathieu ELIE Freelance - Full Stack Data Engineer, Data
Visualization",
"description": "Freelance Consultant in Bordeaux, System & Software
Architect. Love dataviz, redis, elasticsearch, architecture scalability
recipes and playing with data.",
tags: ["elasticsearch", "Data Visualization"]
}'

{
"description": "Giroll, collectif basâˆšé âˆšà Bordeaux, râˆšéunis autour
des Logiciels et des Cultures libres. Ateliers tous les mardis de 18h30 âˆš


# faceting ok ???
curl -XPOST 'http://localhost:9200/workshop/site/_search?
pretty=true' -d '{
"query" : {
"query_string" : {
}
},
"facets" : {
}
}'


"facets" : {
"tags" : {
"_type" : "terms",
"missing" : 0,
"total" : 5,
"other" : 0,
"terms" : [ {
"term" : "elasticsearch",
"count" : 2
}, {
"term" : "Open Source",
"count" : 1
}, {
"term" : "Distributed",
"count" : 1
}, {
"term" : "Data Visualization",
"count" : 1
} ]
}
}

cool ! our facets contains whole tags ! great jobs !!

if want only docs with "Open Source" tag
we use filters
reference/current/query-dsl-filters.html
and term filter


curl -XGET 'http://localhost:9200/workshop/site/_search?
pretty=true' -d '{
"query" : {
"match_all" : { }
},
"filter" : {
"term" : { "tags" : "Open Source"}
}
}'

• more efﬁcient than full text search
• cached / indexed
• you can ﬁlter using facet items

RTFM WAY
• elasticsearch doc is great
• but it is exhaustive
• so at the beguining its a bit frustrating


Think about json
hierachy
pretty=true' -d '{
"query" : {
"query_string" : {
}
},
"facets" : {
}
}'


your hitting the search api
reference/current/search-search.html
pretty=true' -d '{
"query" : {
"query_string" : {
}
},
"facets" : {
}
}'


your using the query dsl
reference/current/query-dsl.html
pretty=true' -d '{
"query" : {
"query_string" : {
}
},
"facets" : {
}
}'


your using different types of queries
reference/current/query-dsl-queries.html
pretty=true' -d '{
"query" : {
"query_string" : {
}
},
"facets" : {
}
}'


this query is a query_string type
with a query parameter set to elasticsearch
reference/current/query-dsl-query-string-query.html
pretty=true' -d '{
"query" : {
"query_string" : {
}
},
"facets" : {
}
}'


we also use faceting
reference/current/search-facets.html
pretty=true' -d '{
"query" : {
"query_string" : {
}
},
"facets" : {
}
}'


we use a terms facet
reference/current/search-facets-terms-facet.html
pretty=true' -d '{
"query" : {
"query_string" : {
}
},
"facets" : {
}
}'


RTFM WAY
• common mistake: the code example are
not showing always whole query

• so you should replace the code in the doc
in the whole dsl hierarchy

• think about hierarchy and everything
should be more clear


the end for me...

the begguining for you...

questions and more
• twitter @mathieuel
• contact on my freelance website
• http://www.mathieu-elie.net
• thanks to giroll for hosting this workshop !

elasticsearch basics workshop

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to elasticsearch basics workshop

Similar to elasticsearch basics workshop (20)

Recently uploaded

Recently uploaded (20)

elasticsearch basics workshop