elasticsearch basics workshop

  • 1,473 views
Uploaded on

Quick install of elasticsearch, put documents, request, set a mapping and prepare yourself to read the doc !

Quick install of elasticsearch, put documents, request, set a mapping and prepare yourself to read the doc !

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,473
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
45
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. elasticsearch basics workshop mathieu Elie at giroll mardi 17 décembre 13
  • 2. speaker : @mathieuel • freelance & founder @oneplaylist • full stack skills • see what i’ve done on http://www.mathieuelie.net mardi 17 décembre 13
  • 3. goal • go from first steps • and get over first frustation • give the you the power needed to learn by yourself mardi 17 décembre 13
  • 4. install • be sure you have java runtime • apt-get install openjdk-6-jre-headless -y • consider oracle jvm mardi 17 décembre 13
  • 5. unzip and run ! ## Get the latest stable archive wget https://download.elasticsearch.org/elasticsearch/ elasticsearch/elasticsearch-0.90.7.zip ## Extract the archive unzip elasticsearch-0.90.7.zip cd elasticsearch-0.90.7 ## run ! # This will run elasticsearch on foreground. ./bin/elasticsearch -f mardi 17 décembre 13
  • 6. its alive ! [2013-12-13 15:45:25,187][INFO ][node ] [Bridge, George Washington] version[0.90.7], pid[37998], build[36897d0/2013-11-13T12:06:54Z] [2013-12-13 15:45:25,189][INFO ][node ] [Bridge, George Washington] initializing ... [2013-12-13 15:45:25,202][INFO ][plugins ] [Bridge, George Washington] loaded [], sites [] [2013-12-13 15:45:28,342][INFO ][node ] [Bridge, George Washington] initialized [2013-12-13 15:45:28,342][INFO ][node ] [Bridge, George Washington] starting ... [2013-12-13 15:45:28,491][INFO ][transport ] [Bridge, George Washington] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.1.12:9300]} [2013-12-13 15:45:31,545][INFO ][cluster.service ] [Bridge, George Washington] new_master [Bridge, George Washington][pKCdh1b_TP2TlurO1gm4_g][inet[/192.168.1.12:9300]], reason: zen-disco-join (elected_as_master) [2013-12-13 15:45:31,577][INFO ][discovery ] [Bridge, George Washington] elasticsearch/pKCdh1b_TP2TlurO1gm4_g [2013-12-13 15:45:31,595][INFO ][http ] [Bridge, George Washington] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.1.12:9200]} [2013-12-13 15:45:31,596][INFO ][node ] [Bridge, George Washington] started [2013-12-13 15:45:31,629][INFO ][gateway ] [Bridge, George Washington] recovered [0] indices into cluster_state mardi 17 décembre 13
  • 7. ping es on port 9200 curl http://127.0.0.1:9200 { "ok" : true, "status" : 200, "name" : "Gideon, Gregory", "version" : { "number" : "0.90.6", "build_hash" : "e2a24efdde0cb7cc1b2071ffbbd1fd874a6d8d6b", "build_timestamp" : "2013-11-04T13:44:16Z", "build_snapshot" : false, "lucene_version" : "4.5.1" }, "tagline" : "You Know, for Search" }% mardi 17 décembre 13
  • 8. Store a Document curl -XPUT http://localhost:9200/workshop/site/1 -d ' { "url": "http://www.elasticsearch.org", "title": "Open Source Distributed Real Time Search & Analytics", "description": "Elasticsearch is a powerful open source search and analytics engine that makes data easy to explore.", "tags": ["Open Source", "elasticsearch", "Distributed"] }' {"ok":true,"_index":"workshop","_type":"sites","_id":"1","_version":1}% mardi 17 décembre 13
  • 9. retreive the document curl -XGET http://localhost:9200/workshop/site/1 {"_index":"workshop","_type":"site","_id":"1","_version":2,"exists":true, "_source" : { "url": "http://www.elasticsearch.org", "title": "Open Source Distributed Real Time Search & Analytics", "description": "Elasticsearch is a powerful open source search and analytics engine that makes data easy to explore.", "tags": ["Open Source", "elasticsearch", "Distributed"] }}% mardi 17 décembre 13
  • 10. add more documents curl -XPUT http://localhost:9200/workshop/site/2 -d ' { "url": "http://www.mathieu-elie.net", "title": "Mathieu ELIE Freelance - Full Stack Data Engineer, Data Visualization", "description": "Freelance Consultant in Bordeaux, System & Software Architect. Love dataviz, redis, elasticsearch, architecture scalability recipes and playing with data.", tags: ["elasticsearch", "Data Visualization"] }' curl -XPUT http://localhost:9200/workshop/site/3 -d ' { "url": "http://www.giroll.org", "title": "Collectif Giroll - Gironde Logiciels Libres", "description": "Giroll, collectif bas√é √à Bordeaux, r√éunis autour des Logiciels et des Cultures libres. Ateliers tous les mardis de 18h30 √à 20h30 et organisation d''Install Party Linux tous les six", tags: ["Open Source", "Collectif"] }' mardi 17 décembre 13
  • 11. now search ! mardi 17 décembre 13
  • 12. curl 'http://localhost:9200/workshop/_search?pretty=true' { "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 3, "max_score" : 1.0, "hits" : [ { "_index" : "workshop", "_type" : "site", "_id" : "1", "_score" : 1.0, "_source" : { "url": "http://www.elasticsearch.org", "title": "Open Source Distributed Real Time Search & Analytics", "description": "Elasticsearch is a powerful open source search and analytics engine that makes data easy to explore.", "tags": ["Open Source", "elasticsearch", "Distributed"] } }, { "_index" : "workshop", "_type" : "site", "_id" : "3", "_score" : 1.0, "_source" : { "url": "http://www.giroll.org", "title": "Collectif Giroll - Gironde Logiciels Libres", "description": "Giroll, collectif bas√é √à Bordeaux, r√éunis autour des Logiciels et des Cultures libres. Ateliers tous les mardis de 18h30 √à 20h30 et organisation mardi 17 décembre 13 dInstall Party Linux tous les six",
  • 13. ok great, but now i want to search for text ! mardi 17 décembre 13
  • 14. step 1 : pass query as a request body curl -XPOST 'http://localhost:9200/workshop/site/_search?pretty=true' -d '{ "query" : { "match_all" : { } } }' mardi 17 décembre 13
  • 15. It returns all documents because we use the match all query http://www.elasticsearch.org/guide/en/elasticsearch/ reference/current/query-dsl-match-all-query.html mardi 17 décembre 13
  • 16. match_all query is part of the queries dsl http://www.elasticsearch.org/guide/en/elasticsearch/ reference/current/query-dsl-queries.html mardi 17 décembre 13
  • 17. so lets use the query_string query dsl curl -XPOST 'http://localhost:9200/workshop/site/_search?pretty=true' -d '{ "query" : { "query_string" : { "query" : "elasticsearch" } } }' mardi 17 décembre 13
  • 18. result is a a quiet verbose lets get only title and tags fields curl -XPOST 'http://localhost:9200/workshop/site/_search?pretty=true' -d '{ "fields" : ["title", "tags"], "query" : { "query_string" : { "query" : "elasticsearch" } } }' mardi 17 décembre 13
  • 19. { "took" : 6, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 0.081366636, "hits" : [ { "_index" : "workshop", "_type" : "site", "_id" : "1", "_score" : 0.081366636, "fields" : { "tags" : [ "Open Source", "elasticsearch", "Distributed" ], "title" : "Open Source Distributed Real Time Search & Analytics" } }, { "_index" : "workshop", "_type" : "site", "_id" : "2", "_score" : 0.06780553, "fields" : { "tags" : [ "elasticsearch", "Data Visualization" ], "title" : "Mathieu ELIE Freelance - Full Stack Data Engineer, Data Visualization" } mardi 17 décembre 13
  • 20. lets go for facets on tags !! http://www.elasticsearch.org/guide/en/elasticsearch/ reference/current/search-facets.html do you see the wall ??? ;) mardi 17 décembre 13
  • 21. Facets dsl curl -XPOST 'http://localhost:9200/workshop/site/_search?pretty=true' -d '{ "fields" : ["title", "tags"], "query" : { "query_string" : { "query" : "elasticsearch" } }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } }' mardi 17 décembre 13
  • 22. "facets" : { "tags" : { "_type" : "terms", "missing" : 0, "total" : 7, "other" : 0, "terms" : [ { "term" : "elasticsearch", "count" : 2 }, { "term" : "visualization", "count" : 1 }, { "term" : "source", "count" : 1 }, { "term" : "open", "count" : 1 }, { "term" : "distributed", "count" : 1 }, { "term" : "data", "count" : 1 } ] } } mardi 17 décembre 13 ho no!!
  • 23. • hey ! see "Open Source" ! it is lower cased and exploded in multiple tokens ! • this is done by the defautl mapping and analyzer mardi 17 décembre 13
  • 24. curl 'http://localhost:9200/workshop/site/_mapping?pretty=true' { "site" : { "properties" : { "description" : { "type" : "string" }, "tags" : { "type" : "string" }, "title" : { "type" : "string" }, "url" : { "type" : "string" } } } } mardi 17 décembre 13
  • 25. • tags is a type of string and we have a default analyzer • http://www.elasticsearch.org/guide/en/ elasticsearch/reference/current/analysisstandard-analyzer.html • An analyzer of type standard is built using the Standard Tokenizer with the Standard Token Filter, Lower Case Token Filter, and Stop Token Filter. mardi 17 décembre 13
  • 26. test the default analyzer curl -XGET 'localhost:9200/workshop/_analyze?pretty=true' -d 'Open Source' { "tokens" : [ { "token" : "open", "start_offset" : 0, "end_offset" : 4, "type" : "<ALPHANUM>", "position" : 1 }, { "token" : "source", "start_offset" : 5, "end_offset" : 11, "type" : "<ALPHANUM>", "position" : 2 } ] } mardi 17 décembre 13
  • 27. • what about keyword analyzer ? • http://www.elasticsearch.org/guide/en/ elasticsearch/reference/current/analysiskeyword-analyzer.html mardi 17 décembre 13
  • 28. curl -XGET 'localhost:9200/workshop/_analyze? analyzer=keyword&pretty=true' -d 'Open Source' { "tokens" : [ { "token" : "Open Source", "start_offset" : 0, "end_offset" : 11, "type" : "word", "position" : 1 } ] } got it ! now how to apply this to our tags field ? mardi 17 décembre 13
  • 29. curl { 'http://localhost:9200/workshop/site/_mapping?pretty=true' -d ' "site" : { "properties" : { "url" : {"type" : "string"}, "title" : {"type" : "string"}, "description" : {"type" : "string"}, "tags" : {"type" : "string", "analyzer": "keyword" } } } } ' { "error" : "MergeMappingException[Merge failed with failures {[mapper [tags] has different index_analyzer]}]", "status" : 400 } oops ! we need to drop something.. mardi 17 décembre 13
  • 30. curl -XDELETE 'http://localhost:9200/workshop/' {"ok":true,"acknowledged":true}% # index should exists if we want to put mapping.. curl -XPUT 'http://localhost:9200/workshop/' {"ok":true,"acknowledged":true}% curl { 'http://localhost:9200/workshop/site/_mapping?pretty=true' -d ' "site" : { "properties" : { "url" : {"type" : "string"}, "title" : {"type" : "string"}, "description" : {"type" : "string"}, "tags" : {"type" : "string", "analyzer": "keyword" } } } } ' {"ok":true,"acknowledged":true}% mardi 17 décembre 13
  • 31. # test on the field analysis curl -XGET 'localhost:9200/workshop/_analyze? pretty=true&field=site.tags' -d 'Open Source' { "tokens" : [ { "token" : "Open Source", "start_offset" : 0, "end_offset" : 11, "type" : "word", "position" : 1 } ] } # congrats ! mardi 17 décembre 13
  • 32. # lets push data again curl -XPUT http://localhost:9200/workshop/site/1 -d ' { "url": "http://www.elasticsearch.org", "title": "Open Source Distributed Real Time Search & Analytics", "description": "Elasticsearch is a powerful open source search and analytics engine that makes data easy to explore.", "tags": ["Open Source", "elasticsearch", "Distributed"] }' curl -XPUT http://localhost:9200/workshop/site/2 -d ' { "url": "http://www.mathieu-elie.net", "title": "Mathieu ELIE Freelance - Full Stack Data Engineer, Data Visualization", "description": "Freelance Consultant in Bordeaux, System &amp; Software Architect. Love dataviz, redis, elasticsearch, architecture scalability recipes and playing with data.", tags: ["elasticsearch", "Data Visualization"] }' curl -XPUT http://localhost:9200/workshop/site/3 -d ' { "url": "http://www.giroll.org", "title": "Collectif Giroll - Gironde Logiciels Libres", "description": "Giroll, collectif bas√é √à Bordeaux, r√éunis autour des Logiciels et des Cultures libres. Ateliers tous les mardis de 18h30 √ mardi 17 décembre 13
  • 33. # faceting ok ??? curl -XPOST 'http://localhost:9200/workshop/site/_search? pretty=true' -d '{ "fields" : ["title", "tags"], "query" : { "query_string" : { "query" : "elasticsearch" } }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } }' mardi 17 décembre 13
  • 34. "facets" : { "tags" : { "_type" : "terms", "missing" : 0, "total" : 5, "other" : 0, "terms" : [ { "term" : "elasticsearch", "count" : 2 }, { "term" : "Open Source", "count" : 1 }, { "term" : "Distributed", "count" : 1 }, { "term" : "Data Visualization", "count" : 1 } ] } } cool ! our facets contains whole tags ! great jobs !! mardi 17 décembre 13
  • 35. if want only docs with "Open Source" tag we use filters http://www.elasticsearch.org/guide/en/elasticsearch/ reference/current/query-dsl-filters.html and term filter mardi 17 décembre 13
  • 36. curl -XGET 'http://localhost:9200/workshop/site/_search? pretty=true' -d '{ "query" : { "match_all" : { } }, "filter" : { "term" : { "tags" : "Open Source"} } }' • more efficient than full text search • cached / indexed • you can filter using facet items mardi 17 décembre 13
  • 37. RTFM WAY • elasticsearch doc is great • but it is exhaustive • so at the beguining its a bit frustrating mardi 17 décembre 13
  • 38. Think about json hierachy curl -XPOST 'http://localhost:9200/workshop/site/_search? pretty=true' -d '{ "fields" : ["title", "tags"], "query" : { "query_string" : { "query" : "elasticsearch" } }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } }' mardi 17 décembre 13
  • 39. your hitting the search api http://www.elasticsearch.org/guide/en/elasticsearch/ reference/current/search-search.html curl -XPOST 'http://localhost:9200/workshop/site/_search? pretty=true' -d '{ "fields" : ["title", "tags"], "query" : { "query_string" : { "query" : "elasticsearch" } }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } }' mardi 17 décembre 13
  • 40. your using the query dsl http://www.elasticsearch.org/guide/en/elasticsearch/ reference/current/query-dsl.html curl -XPOST 'http://localhost:9200/workshop/site/_search? pretty=true' -d '{ "fields" : ["title", "tags"], "query" : { "query_string" : { "query" : "elasticsearch" } }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } }' mardi 17 décembre 13
  • 41. your using different types of queries http://www.elasticsearch.org/guide/en/elasticsearch/ reference/current/query-dsl-queries.html curl -XPOST 'http://localhost:9200/workshop/site/_search? pretty=true' -d '{ "fields" : ["title", "tags"], "query" : { "query_string" : { "query" : "elasticsearch" } }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } }' mardi 17 décembre 13
  • 42. this query is a query_string type with a query parameter set to elasticsearch http://www.elasticsearch.org/guide/en/elasticsearch/ reference/current/query-dsl-query-string-query.html curl -XPOST 'http://localhost:9200/workshop/site/_search? pretty=true' -d '{ "fields" : ["title", "tags"], "query" : { "query_string" : { "query" : "elasticsearch" } }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } }' mardi 17 décembre 13
  • 43. we also use faceting http://www.elasticsearch.org/guide/en/elasticsearch/ reference/current/search-facets.html curl -XPOST 'http://localhost:9200/workshop/site/_search? pretty=true' -d '{ "fields" : ["title", "tags"], "query" : { "query_string" : { "query" : "elasticsearch" } }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } }' mardi 17 décembre 13
  • 44. we use a terms facet http://www.elasticsearch.org/guide/en/elasticsearch/ reference/current/search-facets-terms-facet.html curl -XPOST 'http://localhost:9200/workshop/site/_search? pretty=true' -d '{ "fields" : ["title", "tags"], "query" : { "query_string" : { "query" : "elasticsearch" } }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } } }' mardi 17 décembre 13
  • 45. RTFM WAY • common mistake: the code example are not showing always whole query • so you should replace the code in the doc in the whole dsl hierarchy • think about hierarchy and everything should be more clear mardi 17 décembre 13
  • 46. the end for me... the begguining for you... mardi 17 décembre 13
  • 47. questions and more • twitter @mathieuel • contact on my freelance website • http://www.mathieu-elie.net • thanks to giroll for hosting this workshop ! mardi 17 décembre 13