Elasticsearch - Devoxx France 2012 - English version

10,725 views

Published on

Elasticsearch presentation for Devoxx France 2012
English translation (feel free to correct my bad english ;-) )
French version is available here : http://www.slideshare.net/dadoonet/elasticsearch-devoxx-france-2012

Published in: Technology, Business

Elasticsearch - Devoxx France 2012 - English version

  1. 1. Elasticsearch : search enginedesigned for cloud by David Pilato @dadoonet and @elasticsearchfr 1
  2. 2. { “speaker” : “David Pilato” }$ curl http://localhost:9200/devoxx/speaker/dpilato{ "name" : "David Pilato", "jobs" : [ { "company" : "SRA Europe (SSII)", "mission" : "bon à tout faire", "duration" : 3 }, { "company" : "SFR", "mission" : "touche à tout", "duration" : 3 }, { "company" : "e-Brands / Vivendi", "mission" : "chef de projets", "duration" : 4 }, { "company" : "DGDDI (customs)", "mission" : "mouton à 5 pattes", "duration" : 7 } ], "passions" : [ "family", "job", "deejay" ], "blog" : "http://dev.david.pilato.fr/", "twitter" : [ "@dadoonet", "@elasticsearchfr" ], "email" : "david@pilato.fr"} 2
  3. 3. Abstract• The need for a search engine ?• Elasticsearch : a complete, simple and performant solution• What about indexing Twitter ? Make some noise on @DevoxxFR with the #elasticsearch hashtag ! 3
  4. 4. A search engine ? What for ?DO WE NEED A SEARCH ENGINE ? 4
  5. 5. Usual use case with « SQL old school » Having a document persisted in database : • date attribute : 19/04/2012 • coded attribute country : FR • Association table code/label • Code : FR • Label : France • comment attribute : "There is a type error in the comment for this product. We should call David."Engine Elasticsearch Rivers Facets Demo Architecture Community 5
  6. 6. Usual use case with « SQL old school » Having a document persisted in database : doc country • date attribute : 19/04/2012 date code • coded attribute country : FR country label • Association table code/label comment • Code : FR • Label : France • comment attribute : "There is a type error in the comment for this product. We should call David."Engine Elasticsearch Rivers Facets Demo Architecture Community 5
  7. 7. Usual need with « SQL old school » • Find a document from december 2011 about france containing error and david • SQL :Engine Elasticsearch Rivers Facets Demo Architecture Community 6
  8. 8. Usual need with « SQL old school » • Find a document from december 2011 about france containing error and david • SQL : SELECT doc.*, pays.* FROM doc, pays WHERE doc.pays_code = pays.code AND doc.date_doc > to_date(2011-12, yyyy-mm) AND doc.date_doc < to_date(2012-01, yyyy-mm) AND lower(pays.libelle) = france AND lower(doc.commentaire) LIKE ‘%error% AND lower(doc.commentaire) LIKE ‘%david%;Engine Elasticsearch Rivers Facets Demo Architecture Community 6
  9. 9. Performance impact of like ‘%’Engine Elasticsearch Rivers Facets Demo Architecture Community 7
  10. 10. Performance impact of like ‘%’ See also : http://www.cestpasdur.com/2012/04/01/elasticsearch-vs-mysql-rechercheEngine Elasticsearch Rivers Facets Demo Architecture Community 7
  11. 11. What is a search engine ?Engine Elasticsearch Rivers Facets Demo Architecture Community 8
  12. 12. What is a search engine ? • A search engine is : • an index engine for documents • a search engine on indexesEngine Elasticsearch Rivers Facets Demo Architecture Community 8
  13. 13. What is a search engine ? • A search engine is : • an index engine for documents • a search engine on indexes • A search engine is more powerful to do searches :Engine Elasticsearch Rivers Facets Demo Architecture Community 8
  14. 14. What is a search engine ? • A search engine is : • an index engine for documents • a search engine on indexes • A search engine is more powerful to do searches : it’s designed for it !Engine Elasticsearch Rivers Facets Demo Architecture Community 8
  15. 15. ELASTICSEARCH 9
  16. 16. Your Data, your Search !ELASTICSEARCH 9
  17. 17. ElasticsearchEngine Elasticsearch Rivers Facets Demo Architecture Community 10
  18. 18. Elasticsearch • Search engine for the NoSQL generationEngine Elasticsearch Rivers Facets Demo Architecture Community 10
  19. 19. Elasticsearch • Search engine for the NoSQL generation • Based on the standard Apache Lucene libraryEngine Elasticsearch Rivers Facets Demo Architecture Community 10
  20. 20. Elasticsearch • Search engine for the NoSQL generation • Based on the standard Apache Lucene library • Hide the Java / Lucene complexity with standard HTTP / RESTful / JSON servicesEngine Elasticsearch Rivers Facets Demo Architecture Community 10
  21. 21. Elasticsearch • Search engine for the NoSQL generation • Based on the standard Apache Lucene library • Hide the Java / Lucene complexity with standard HTTP / RESTful / JSON services • You can use it from whatever language or platformEngine Elasticsearch Rivers Facets Demo Architecture Community 10
  22. 22. Elasticsearch • Search engine for the NoSQL generation • Based on the standard Apache Lucene library • Hide the Java / Lucene complexity with standard HTTP / RESTful / JSON services • You can use it from whatever language or platform • Add the cloud layer that Lucene missEngine Elasticsearch Rivers Facets Demo Architecture Community 10
  23. 23. Elasticsearch • Search engine for the NoSQL generation • Based on the standard Apache Lucene library • Hide the Java / Lucene complexity with standard HTTP / RESTful / JSON services • You can use it from whatever language or platform • Add the cloud layer that Lucene miss • It’s an engine, not a graphical user interface !Engine Elasticsearch Rivers Facets Demo Architecture Community 10
  24. 24. Key pointsEngine Elasticsearch Rivers Facets Demo Architecture Community 11
  25. 25. Key points • Easy ! In some minutes (Zero Conf), you will get a full search engine ready to get your documents and perform your searches.Engine Elasticsearch Rivers Facets Demo Architecture Community 11
  26. 26. Key points • Easy ! In some minutes (Zero Conf), you will get a full search engine ready to get your documents and perform your searches. • Efficient ! Just start new Elasticsearch nodes to scale horizontally with replication and load balancing.Engine Elasticsearch Rivers Facets Demo Architecture Community 11
  27. 27. Key points • Easy ! In some minutes (Zero Conf), you will get a full search engine ready to get your documents and perform your searches. • Efficient ! Just start new Elasticsearch nodes to scale horizontally with replication and load balancing. • Powerful ! Lucene based product, with parallel processing to get acceptable response time (mainly less than 100ms).Engine Elasticsearch Rivers Facets Demo Architecture Community 11
  28. 28. Key points • Easy ! In some minutes (Zero Conf), you will get a full search engine ready to get your documents and perform your searches. • Efficient ! Just start new Elasticsearch nodes to scale horizontally with replication and load balancing. • Powerful ! Lucene based product, with parallel processing to get acceptable response time (mainly less than 100ms). • Complete ! Many features : analysis and facets, percolation, rivers, plugins, …Engine Elasticsearch Rivers Facets Demo Architecture Community 11
  29. 29. Storing your dataEngine Elasticsearch Rivers Facets Demo Architecture Community 12
  30. 30. Storing your data • Document : A full object containing all your data (NoSQL meaning). To think "search", you have to forget RDBMS and think "Documents"Engine Elasticsearch Rivers Facets Demo Architecture Community 12
  31. 31. Storing your data • Document : A full object containing all your data (NoSQL meaning). To think "search", you have to forget RDBMS and think "Documents" { "text": "Bienvenue à la conférence #elasticsearch pour #devoxxfr", "created_at": "2012-04-06T20:45:36.000Z", "source": "Twitter for iPad", "truncated": false, A tweet "retweet_count": 0, "hashtag": [ { "text": "elasticsearch", "start": 27, "end": 40 }, { "text": "devoxxfr", "start": 47, "end": 55 } ], "user": { "id": 51172224, "name": "David Pilato", "screen_name": "dadoonet", "location": "France", "description": "Soft Architect, Project Manager, Senior Developper.rnAt this time, enjoying NoSQL world : CouchDB, ElasticSearch.rnDeeJay 4 times a year, just for fun !" } }Engine Elasticsearch Rivers Facets Demo Architecture Community 12
  32. 32. Storing your data • Document : A full object containing all your data (NoSQL meaning). To think "search", you have to forget RDBMS and think "Documents" { "text": "Bienvenue à la conférence #elasticsearch pour #devoxxfr", "created_at": "2012-04-06T20:45:36.000Z", "source": "Twitter for iPad", "truncated": false, A tweet "retweet_count": 0, "hashtag": [ { "text": "elasticsearch", "start": 27, "end": 40 }, { "text": "devoxxfr", "start": 47, "end": 55 } ], "user": { "id": 51172224, "name": "David Pilato", "screen_name": "dadoonet", "location": "France", "description": "Soft Architect, Project Manager, Senior Developper.rnAt this time, enjoying NoSQL world : CouchDB, ElasticSearch.rnDeeJay 4 times a year, just for fun !" } } • Type : Includes all documents of the same typeEngine Elasticsearch Rivers Facets Demo Architecture Community 12
  33. 33. Storing your data • Document : A full object containing all your data (NoSQL meaning). To think "search", you have to forget RDBMS and think "Documents" { "text": "Bienvenue à la conférence #elasticsearch pour #devoxxfr", "created_at": "2012-04-06T20:45:36.000Z", "source": "Twitter for iPad", "truncated": false, A tweet "retweet_count": 0, "hashtag": [ { "text": "elasticsearch", "start": 27, "end": 40 }, { "text": "devoxxfr", "start": 47, "end": 55 } ], "user": { "id": 51172224, "name": "David Pilato", "screen_name": "dadoonet", "location": "France", "description": "Soft Architect, Project Manager, Senior Developper.rnAt this time, enjoying NoSQL world : CouchDB, ElasticSearch.rnDeeJay 4 times a year, just for fun !" } } • Type : Includes all documents of the same type • Index : Logical storage of related document typesEngine Elasticsearch Rivers Facets Demo Architecture Community 12
  34. 34. Playing with Elasticsearch REST API : http://host:port/[index]/[type]/[_action/id] HTTP Methods : GET, POST, PUT, DELETEEngine Elasticsearch Rivers Facets Demo Architecture Community 13
  35. 35. Playing with Elasticsearch REST API : http://host:port/[index]/[type]/[_action/id] HTTP Methods : GET, POST, PUT, DELETE Documents • curl -XPUT http://localhost:9200/twitter/tweet/1Engine Elasticsearch Rivers Facets Demo Architecture Community 13
  36. 36. Playing with Elasticsearch REST API : http://host:port/[index]/[type]/[_action/id] HTTP Methods : GET, POST, PUT, DELETE Documents • curl -XPUT http://localhost:9200/twitter/tweet/1 • curl -XGET http://localhost:9200/twitter/tweet/1Engine Elasticsearch Rivers Facets Demo Architecture Community 13
  37. 37. Playing with Elasticsearch REST API : http://host:port/[index]/[type]/[_action/id] HTTP Methods : GET, POST, PUT, DELETE Documents • curl -XPUT http://localhost:9200/twitter/tweet/1 • curl -XGET http://localhost:9200/twitter/tweet/1 • curl -XDELETE http://localhost:9200/twitter/tweet/1Engine Elasticsearch Rivers Facets Demo Architecture Community 13
  38. 38. Playing with Elasticsearch REST API : http://host:port/[index]/[type]/[_action/id] HTTP Methods : GET, POST, PUT, DELETE Documents • curl -XPUT http://localhost:9200/twitter/tweet/1 • curl -XGET http://localhost:9200/twitter/tweet/1 • curl -XDELETE http://localhost:9200/twitter/tweet/1 Search • curl -XGET http://localhost:9200/twitter/tweet/_searchEngine Elasticsearch Rivers Facets Demo Architecture Community 13
  39. 39. Playing with Elasticsearch REST API : http://host:port/[index]/[type]/[_action/id] HTTP Methods : GET, POST, PUT, DELETE Documents • curl -XPUT http://localhost:9200/twitter/tweet/1 • curl -XGET http://localhost:9200/twitter/tweet/1 • curl -XDELETE http://localhost:9200/twitter/tweet/1 Search • curl -XGET http://localhost:9200/twitter/tweet/_search • curl -XGET http://localhost:9200/twitter/_searchEngine Elasticsearch Rivers Facets Demo Architecture Community 13
  40. 40. Playing with Elasticsearch REST API : http://host:port/[index]/[type]/[_action/id] HTTP Methods : GET, POST, PUT, DELETE Documents • curl -XPUT http://localhost:9200/twitter/tweet/1 • curl -XGET http://localhost:9200/twitter/tweet/1 • curl -XDELETE http://localhost:9200/twitter/tweet/1 Search • curl -XGET http://localhost:9200/twitter/tweet/_search • curl -XGET http://localhost:9200/twitter/_search • curl -XGET http://localhost:9200/_searchEngine Elasticsearch Rivers Facets Demo Architecture Community 13
  41. 41. Playing with Elasticsearch REST API : http://host:port/[index]/[type]/[_action/id] HTTP Methods : GET, POST, PUT, DELETE Documents • curl -XPUT http://localhost:9200/twitter/tweet/1 • curl -XGET http://localhost:9200/twitter/tweet/1 • curl -XDELETE http://localhost:9200/twitter/tweet/1 Search • curl -XGET http://localhost:9200/twitter/tweet/_search • curl -XGET http://localhost:9200/twitter/_search • curl -XGET http://localhost:9200/_search Elasticsearch Meta Data • curl -XGET http://localhost:9200/twitter/_statusEngine Elasticsearch Rivers Facets Demo Architecture Community 13
  42. 42. Let’s index a document $ curl -XPUT localhost:9200/twitter/tweet/1 -d { "text": "Bienvenue à la conférence #elasticsearch pour #devoxxfr", "created_at": "2012-04-06T20:45:36.000Z", "source": "Twitter for iPad", "truncated": false, "retweet_count": 0, "hashtag": [ { "text": "elasticsearch", "start": 27, "end": 40 }, { "text": "devoxxfr", "start": 47, "end": 55 } ], "user": { "id": 51172224, "name": "David Pilato", "screen_name": "dadoonet", "location": "France", "description": "Soft Architect, Project Manager, Senior Developper.rnAt this time, enjoying NoSQL world : CouchDB, ElasticSearch.rnDeeJay 4 times a year, just for fun !" } }Engine Elasticsearch Rivers Facets Demo Architecture Community 14
  43. 43. Let’s index a document $ curl -XPUT localhost:9200/twitter/tweet/1 -d { "text": "Bienvenue à la conférence #elasticsearch pour #devoxxfr", "created_at": "2012-04-06T20:45:36.000Z", "source": "Twitter for iPad", "truncated": false, "retweet_count": 0, "hashtag": [ { "text": "elasticsearch", "start": 27, "end": 40 }, { "text": "devoxxfr", "start": 47, "end": 55 } ], "user": { "id": 51172224, "name": "David Pilato", "screen_name": "dadoonet", "location": "France", "description": "Soft Architect, Project Manager, Senior Developper.rnAt this time, enjoying NoSQL world : CouchDB, ElasticSearch.rnDeeJay 4 times a year, just for fun !" } } { "ok":true, "_index":"twitter", "_type":"tweet", "_id":"1" }Engine Elasticsearch Rivers Facets Demo Architecture Community 14
  44. 44. Let’s search for documents $ curl localhost:9200/twitter/tweet/_search?q=elasticsearchEngine Elasticsearch Rivers Facets Demo Architecture Community 15
  45. 45. Let’s search for documents $ curl localhost:9200/twitter/tweet/_search?q=elasticsearch { "took" : 24, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "text": "Bienvenue à la conférence #elasticsearch pour #devoxxfr", "created_at": "2012-04-06T20:45:36.000Z", "source": "Twitter for iPad", […] } } ] } }Engine Elasticsearch Rivers Facets Demo Architecture Community 15
  46. 46. Let’s search for documents $ curl localhost:9200/twitter/tweet/_search?q=elasticsearch { "took" : 24, "timed_out" : false, Total number of documents "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "text": "Bienvenue à la conférence #elasticsearch pour #devoxxfr", "created_at": "2012-04-06T20:45:36.000Z", "source": "Twitter for iPad", […] } } ] } }Engine Elasticsearch Rivers Facets Demo Architecture Community 15
  47. 47. Let’s search for documents $ curl localhost:9200/twitter/tweet/_search?q=elasticsearch { "took" : 24, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { location "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "text": "Bienvenue à la conférence #elasticsearch pour #devoxxfr", "created_at": "2012-04-06T20:45:36.000Z", "source": "Twitter for iPad", […] } } ] } }Engine Elasticsearch Rivers Facets Demo Architecture Community 15
  48. 48. Let’s search for documents $ curl localhost:9200/twitter/tweet/_search?q=elasticsearch { "took" : 24, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "text": "Bienvenue à la conférence #elasticsearch pour #devoxxfr", "created_at": "2012-04-06T20:45:36.000Z", "source": "Twitter for iPad", […] Relevance } } ] } }Engine Elasticsearch Rivers Facets Demo Architecture Community 15
  49. 49. Let’s search for documents $ curl localhost:9200/twitter/tweet/_search?q=elasticsearch { "took" : 24, "timed_out" : false, Document "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { source "total" : 1, "max_score" : 0.227, "hits" : [ { "_index" : "twitter", "_type" : "tweet", "_id" : "1", "_score" : 0.227, "_source" : { "text": "Bienvenue à la conférence #elasticsearch pour #devoxxfr", "created_at": "2012-04-06T20:45:36.000Z", "source": "Twitter for iPad", […] } } ] } }Engine Elasticsearch Rivers Facets Demo Architecture Community 15
  50. 50. Search resultsEngine Elasticsearch Rivers Facets Demo Architecture Community 16
  51. 51. Search results • Elasticsearch gives you the 10 first results (even on many millions) : pagination • You can move in the resultset $ curl "localhost:9200/twitter/tweet/_search?q=elasticsearch&from=10&size=10"Engine Elasticsearch Rivers Facets Demo Architecture Community 16
  52. 52. Search results • Elasticsearch gives you the 10 first results (even on many millions) : pagination • You can move in the resultset $ curl "localhost:9200/twitter/tweet/_search?q=elasticsearch&from=10&size=10" • Scoring is computed with term frequency in a document relative to the term frequency in the index $ curl "localhost:9200/twitter/tweet/_search?q=elasticsearch&explain=true"Engine Elasticsearch Rivers Facets Demo Architecture Community 16
  53. 53. Searches QueryDSL for advanced searches Type Description Search for everything (useful combined with filters) Search with term analysis, wildcards (Lucene syntax* +, -, FROM, TO, ^) Search for individual term without analysis Search for a text with analysis (OR is applied between tokens by default) Wildcard search (*, ?) Combine many criteria (MUST, MUST NOT, SHOULD) Range search (>, >=, <, <=) Useful for autocomplete requirements Filtering queries Useful to find documents that are “like” provided text Useful to find documents that are “like” provided text with a minimal constraint on found termsEngine Elasticsearch Rivers Facets Demo Architecture Community 17
  54. 54. Searches QueryDSL for advanced searches Type Description Match All Search for everything (useful combined with filters) QueryString Search with term analysis, wildcards (Lucene syntax* +, -, FROM, TO, ^) Term Search for individual term without analysis Text Search for a text with analysis (OR is applied between tokens by default) Wildcard Wildcard search (*, ?) Bool Combine many criteria (MUST, MUST NOT, SHOULD) Range Range search (>, >=, <, <=) Prefix Useful for autocomplete requirements Filtered Filtering queries Fuzzy like this Useful to find documents that are “like” provided text More like this Useful to find documents that are “like” provided text with a minimal constraint on found terms * http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/queryparsersyntax.htmlEngine Elasticsearch Rivers Facets Demo Architecture Community 17
  55. 55. AUTOMATIC PULLING DATA 18
  56. 56. Or "life is a long quiet river !"AUTOMATIC PULLING DATA 18
  57. 57. Pulling documentsEngine Elasticsearch Rivers Facets Demo Architecture Community 19
  58. 58. Pulling documents DatabaseEngine Elasticsearch Rivers Facets Demo Architecture Community 19
  59. 59. Pulling documents Doc DatabaseEngine Elasticsearch Rivers Facets Demo Architecture Community 19
  60. 60. Pulling documents Database Engine DocEngine Elasticsearch Rivers Facets Demo Architecture Community 20
  61. 61. Pulling documents Doc Database Engine DocEngine Elasticsearch Rivers Facets Demo Architecture Community 20
  62. 62. Pulling documents Database Doc DocEngine Elasticsearch Rivers Facets Demo Architecture Community 21
  63. 63. Pulling documents Database Doc DocEngine Elasticsearch Rivers Facets Demo Architecture Community 21
  64. 64. Pulling documents Database Doc DocEngine Elasticsearch Rivers Facets Demo Architecture Community 21
  65. 65. Pulling documents Database Doc DocEngine Elasticsearch Rivers Facets Demo Architecture Community 22
  66. 66. Pulling documents Doc Database Doc DocEngine Elasticsearch Rivers Facets Demo Architecture Community 22
  67. 67. Pulling documents Database Doc DocEngine Elasticsearch Rivers Facets Demo Architecture Community 23
  68. 68. Pulling documents Database DocEngine Elasticsearch Rivers Facets Demo Architecture Community 24
  69. 69. RiversEngine Elasticsearch Rivers Facets Demo Architecture Community 25
  70. 70. Rivers • CouchDB RiverEngine Elasticsearch Rivers Facets Demo Architecture Community 25
  71. 71. Rivers • CouchDB River • MongoDB RiverEngine Elasticsearch Rivers Facets Demo Architecture Community 25
  72. 72. Rivers • CouchDB River • MongoDB River • Wikipedia RiverEngine Elasticsearch Rivers Facets Demo Architecture Community 25
  73. 73. Rivers • CouchDB River • MongoDB River • Wikipedia River • Twitter RiverEngine Elasticsearch Rivers Facets Demo Architecture Community 25
  74. 74. Rivers • CouchDB River • MongoDB River • Wikipedia River • Twitter River • RabbitMQ RiverEngine Elasticsearch Rivers Facets Demo Architecture Community 25
  75. 75. Rivers • CouchDB River • MongoDB River • Wikipedia River • Twitter River • RabbitMQ River • RSS RiverEngine Elasticsearch Rivers Facets Demo Architecture Community 25
  76. 76. Rivers • CouchDB River • MongoDB River • Wikipedia River • Twitter River • RabbitMQ River • RSS River • Dick RiversEngine Elasticsearch Rivers Facets Demo Architecture Community 25
  77. 77. Looking at your data from different points of viewsRESULT ANALYSIS (IN NEAR REAL TIME) 26
  78. 78. Facets ID Username Date Hashtags 1 dadoonet 2012-04-18 1 2 devoxxfr 2012-04-18 5 Some tweets 3 elasticsearchfr 2012-04-18 2 4 dadoonet 2012-04-18 2 5 devoxxfr 2012-04-18 6 6 elasticsearchfr 2012-04-19 3 7 dadoonet 2012-04-19 3 8 devoxxfr 2012-04-19 7 9 elasticsearchfr 2012-04-20 4Engine Elasticsearch Rivers Facets Demo Architecture Community 27
  79. 79. Term Facet Username Date Hashtags dadoonet 2012-04-18 1 devoxxfr 2012-04-18 5 elasticsearchfr 2012-04-18 2 dadoonet 2012-04-18 2 devoxxfr 2012-04-18 6 elasticsearchfr 2012-04-19 3 dadoonet 2012-04-19 3 devoxxfr 2012-04-19 7 elasticsearchfr 2012-04-20 4Engine Elasticsearch Rivers Facets Demo Architecture Community 28
  80. 80. Term Facet Username Date Hashtags dadoonet 2012-04-18 1 devoxxfr 2012-04-18 5 elasticsearchfr 2012-04-18 Username 2 Count dadoonet 2012-04-18 dadoonet 2 3 devoxxfr 2012-04-18 devoxxfr6 3 elasticsearchfr 2012-04-19 elasticsearchfr 3 3 dadoonet 2012-04-19 3 devoxxfr 2012-04-19 7 elasticsearchfr 2012-04-20 4Engine Elasticsearch Rivers Facets Demo Architecture Community 28
  81. 81. Term Facet "facets" : { "users" : { "terms" : {"field" : "username"} } } ID Username Date Hashtags 1 dadoonet 2012-04-18 1 2 devoxxfr 2012-04-18 5 3 elasticsearchfr 2012-04-18 2 4 dadoonet 2012-04-18 2 5 devoxxfr 2012-04-18 6 6 elasticsearchfr 2012-04-19 3 7 dadoonet 2012-04-19 3 8 devoxxfr 2012-04-19 7 9 elasticsearchfr 2012-04-20 4Engine Elasticsearch Rivers Facets Demo Architecture Community 29
  82. 82. Term Facet "facets" : { "users" : { "terms" : {"field" : "username"} } } ID Username Date "facets" : { Hashtags 1 dadoonet 2012-04-18 : { "users" 1 2 devoxxfr 2012-04-18 : "terms", "_type" 5 "missing" : 0, 3 elasticsearchfr 2012-04-18 2 "total": 9, 4 dadoonet 2012-04-18 "other": 0, 2 5 devoxxfr 2012-04-18 : [ "terms" 6 6 elasticsearchfr { "term" : "dadoonet", "count" : 3 }, 2012-04-19 3 { "term" : "devoxxfr", "count" : 3 }, 7 dadoonet 2012-04-19 3 { "term" : "elasticsearchfr", "count" : 3 } 8 devoxxfr 2012-04-19 ] 7 9 elasticsearchfr } 2012-04-20 4Engine Elasticsearch Rivers Facets Demo Architecture Community 29
  83. 83. Date Histogram Facetame Date Hashtagsonet 2012-04-18 1xxfr 2012-04-18 5archfr 2012-04-18 2onet 2012-04-18 2xxfr 2012-04-18 6archfr 2012-04-19 3onet 2012-04-19 3xxfr 2012-04-19 7archfr 2012-04-20 4 Engine Elasticsearch Rivers Facets Demo Architecture Community 30
  84. 84. Date Histogram Facetame Date Hashtagsonet 2012-04-18 1 Per month Date Countxxfr 2012-04-18 5 2012-04 9archfr 2012-04-18 2onet 2012-04-18 2 Per dayxxfr 2012-04-18 6 Date Countarchfr 2012-04-19 3 2012-04-18 5onet 2012-04-19 3 2012-04-19 3xxfr 2012-04-19 7 2012-04-20 1archfr 2012-04-20 4 Engine Elasticsearch Rivers Facets Demo Architecture Community 30
  85. 85. Date Histogram Facet "facets" : { "perday" : { "date_histogram" : { "field" : "date",ame Date "interval" : "day" Hashtags }onet 2012-04-18 }1xxfr 2012-04-18 } 5archfr 2012-04-18 2onet 2012-04-18 2xxfr 2012-04-18 6archfr 2012-04-19 3onet 2012-04-19 3xxfr 2012-04-19 7archfr 2012-04-20 4 Engine Elasticsearch Rivers Facets Demo Architecture Community 31
  86. 86. Date Histogram Facet "facets" : { "perday" : { "date_histogram" : { "field" : "date",ame Date "interval" : "day" Hashtags }onet 2012-04-18 }1xxfr 2012-04-18 } 5archfr 2012-04-18 2 "facets" : {onet 2012-04-18 2 "perday" : {xxfr 2012-04-18 "_type" : "date_histogram", 6 "entries": [archfr 2012-04-19 3 { "time": 1334700000000, "count": 5 },onet 2012-04-19 3 { "time": 1334786400000, "count": 3 },xxfr 2012-04-19 7 { "time": 1334872800000, "count": 1 } ]archfr 2012-04-20 } 4 } Engine Elasticsearch Rivers Facets Demo Architecture Community 31
  87. 87. Ranges Facet Hashtags8 18 58 28 28 69 39 39 70 4 Engine Elasticsearch Rivers Facets Demo Architecture Community 32
  88. 88. Ranges Facet Hashtags8 18 5 Ranges Count Min Max Mean Total8 2 x<3 3 1 2 1.667 58 2 3 <= x < 5 3 3 4 3.333 108 6 x >= 5 3 5 7 6 189 39 39 70 4 Engine Elasticsearch Rivers Facets Demo Architecture Community 32
  89. 89. Ranges Facet "facets" : { "hashtags" : { "range" : { "field" : "hashtags", "ranges" : [ { "to" : 3 }, { "from" : 3, "to" : 5 }, { "from" : 5 } Hashtags ] } } }8 18 58 28 28 69 39 39 70 4 Engine Elasticsearch Rivers Facets Demo Architecture Community 33
  90. 90. Ranges Facet "facets" : { "hashtags" : { "range" : { "field" : "hashtags", "ranges" : [ { "to" : 3 }, { "from" : 3, "to" : 5 }, { "from" : 5 } Hashtags ] } } }8 1 "facets" : {8 5 "hashtags" : { "_type" : "range",8 2 "ranges" : [8 2 { "to": 3,8 6 "count": 3, "min": 1, "max": 2, "total": 5, "mean": 1.667 },9 3 { "from":3, "to" : 5,9 3 "count": 3, "min": 3, "max": 4, "total": 10, "mean": 3.333 },9 7 { "from":5,0 4 "count": 3, "min": 5, "max": 7, "total": 18, "mean": 6 } ] } } Engine Elasticsearch Rivers Facets Demo Architecture Community 33
  91. 91. Commerce site usageEngine Elasticsearch Rivers Facets Demo Architecture Community 34
  92. 92. Commerce site usageEngine Elasticsearch Rivers Facets Demo Architecture Community 34
  93. 93. Commerce site usageEngine Elasticsearch Rivers Facets Demo Architecture Community 34
  94. 94. Commerce site usage Ranges Term Term RangesEngine Elasticsearch Rivers Facets Demo Architecture Community 34
  95. 95. Faceted navigationEngine Elasticsearch Rivers Facets Demo Architecture Community 35
  96. 96. Faceted navigation Fixed CriteriaEngine Elasticsearch Rivers Facets Demo Architecture Community 35
  97. 97. Faceted navigation Fixed Criteria ResultsEngine Elasticsearch Rivers Facets Demo Architecture Community 35
  98. 98. Faceted navigation Fixed Criteria Term ResultsEngine Elasticsearch Rivers Facets Demo Architecture Community 35
  99. 99. Faceted navigation Fixed Criteria Term Date histogram ResultsEngine Elasticsearch Rivers Facets Demo Architecture Community 35
  100. 100. Faceted navigation Fixed Criteria Term Ranges Date histogram ResultsEngine Elasticsearch Rivers Facets Demo Architecture Community 35
  101. 101. Faceted navigationEngine Elasticsearch Rivers Facets Demo Architecture Community 36
  102. 102. Faceted navigation CriteriaEngine Elasticsearch Rivers Facets Demo Architecture Community 36
  103. 103. Near Real Time Data Visualization • Perform a matchAll search on all data • Update screen every x seconds • While indexing new documents Date histogram TermEngine Elasticsearch Rivers Facets Demo Architecture Community 37
  104. 104. Did we make noise ?DEMO APPLICATION 38
  105. 105. Demo architectureEngine Elasticsearch Rivers Facets Demo Architecture Community 39
  106. 106. Demo architectureEngine Elasticsearch Rivers Facets Demo Architecture Community 39
  107. 107. Demo architecture Twitter Streaming APIEngine Elasticsearch Rivers Facets Demo Architecture Community 39
  108. 108. Demo architecture Twitter Streaming APIEngine Elasticsearch Rivers Facets Demo Architecture Community 39
  109. 109. Demo architecture Twitter Streaming APIEngine Elasticsearch Rivers Facets Demo Architecture Community 39
  110. 110. Demo architecture Twitter Twitter Streaming River API $ curl -XPUT localhost:9200/_river/twitter/_meta -d { "type" : "twitter", "twitter" : { "user" : "twitter_user", "password" : "twitter_passowrd", "filter" : { "tracks" : ["devoxxfr"] } } }Engine Elasticsearch Rivers Facets Demo Architecture Community 39
  111. 111. Demo architecture Chrome Twitter Twitter Streaming River API $ curl -XPUT localhost:9200/_river/twitter/_meta -d { "type" : "twitter", "twitter" : { "user" : "twitter_user", "password" : "twitter_passowrd", "filter" : { "tracks" : ["devoxxfr"] } } }Engine Elasticsearch Rivers Facets Demo Architecture Community 39
  112. 112. Let’s go further : sharding / replica / scalabiltyARCHITECTURE 40
  113. 113. GlossaryEngine Elasticsearch Rivers Facets Demo Architecture Community 41
  114. 114. Glossary • Node : An Elasticsearch instance (~ server ?)Engine Elasticsearch Rivers Facets Demo Architecture Community 41
  115. 115. Glossary • Node : An Elasticsearch instance (~ server ?) • Cluster : A set of nodesEngine Elasticsearch Rivers Facets Demo Architecture Community 41
  116. 116. Glossary • Node : An Elasticsearch instance (~ server ?) • Cluster : A set of nodes • Shard : an index shard where you distribute documentsEngine Elasticsearch Rivers Facets Demo Architecture Community 41
  117. 117. Glossary • Node : An Elasticsearch instance (~ server ?) • Cluster : A set of nodes • Shard : an index shard where you distribute documents • Replica : One or more shard copy in the clusterEngine Elasticsearch Rivers Facets Demo Architecture Community 41
  118. 118. Glossary • Node : An Elasticsearch instance (~ server ?) • Cluster : A set of nodes • Shard : an index shard where you distribute documents • Replica : One or more shard copy in the cluster • Primary shard : shard elected as primary in the cluster. Lucene index documents there.Engine Elasticsearch Rivers Facets Demo Architecture Community 41
  119. 119. Glossary • Node : An Elasticsearch instance (~ server ?) • Cluster : A set of nodes • Shard : an index shard where you distribute documents • Replica : One or more shard copy in the cluster • Primary shard : shard elected as primary in the cluster. Lucene index documents there. • Secondary shard : store replicas of primary shardsEngine Elasticsearch Rivers Facets Demo Architecture Community 41
  120. 120. Let’s create an index Cluster Nœud 1 Client CURLEngine Elasticsearch Rivers Facets Demo Architecture Community 42
  121. 121. Let’s create an index $ curl -XPUT localhost:9200/twitter -d { Cluster "index" : { "number_of_shards" : 2, Nœud 1 "number_of_replicas" : 1 Shard 0 } } Shard 1 replication rule is not satisfied Client CURLEngine Elasticsearch Rivers Facets Demo Architecture Community 42
  122. 122. Let’s create an index $ curl -XPUT localhost:9200/twitter -d { Cluster "index" : { "number_of_shards" : 2, Node 1 Node 2 "number_of_replicas" : 1 Shard 0 Shard 0 } } Shard 1 Shard 1 replication rule is satisfied Client CURLEngine Elasticsearch Rivers Facets Demo Architecture Community 42
  123. 123. Dynamic reallocation Cluster Node 1 Node 2 Shard 0 Shard 0 Shard 1 Shard 1Engine Elasticsearch Rivers Facets Demo Architecture Community 43
  124. 124. Dynamic reallocation Cluster Node 1 Node 2 Node 3 Shard 0 Shard 0 Shard 1 Shard 1Engine Elasticsearch Rivers Facets Demo Architecture Community 43
  125. 125. Dynamic reallocation Cluster Node 1 Node 2 Node 3 Shard 0 Shard 0 Shard 0 Shard 1 Shard 1Engine Elasticsearch Rivers Facets Demo Architecture Community 44
  126. 126. Dynamic reallocation Cluster Node 1 Node 2 Node 3 Shard 0 Shard 0 Shard 1 Shard 1Engine Elasticsearch Rivers Facets Demo Architecture Community 44
  127. 127. Dynamic reallocation Cluster Node 1 Node 2 Node 3 Node 4 Shard 0 Shard 0 Shard 1 Shard 1Engine Elasticsearch Rivers Facets Demo Architecture Community 44
  128. 128. Dynamic reallocation Cluster Node 1 Node 2 Node 3 Node 4 Shard 0 Shard 0 Shard 1 Shard 1 Shard 1Engine Elasticsearch Rivers Facets Demo Architecture Community 45
  129. 129. Dynamic reallocation Cluster Node 1 Node 2 Node 3 Node 4 Shard 0 Shard 0 Shard 1 Shard 1 Tuning is finding the best numbers for nodes, shards and replicas !Engine Elasticsearch Rivers Facets Demo Architecture Community 45
  130. 130. Let’s index a document Cluster Node 1 Node 2 Node 3 Node 4 Shard 0 Shard 0 Shard 1 Shard 1 Doc 1 Client $ curl -XPUT localhost:9200/twitter/tweet/1 -d CURL { "text": "Bienvenue à la conférence #elasticsearch pour #devoxxfr", "created_at": "2012-04-06T20:45:36.000Z", "source": "Twitter for iPad", ... }Engine Elasticsearch Rivers Facets Demo Architecture Community 46
  131. 131. Let’s index a document Cluster Node 1 Node 2 Node 3 Node 4 Doc Shard 0 1 Shard 0 Shard 1 Shard 1 Client $ curl -XPUT localhost:9200/twitter/tweet/1 -d CURL { "text": "Bienvenue à la conférence #elasticsearch pour #devoxxfr", "created_at": "2012-04-06T20:45:36.000Z", "source": "Twitter for iPad", ... }Engine Elasticsearch Rivers Facets Demo Architecture Community 47
  132. 132. Let’s index a document Cluster Node 1 Node 2 Node 3 Node 4 Doc Shard 0 1 Shard 0 Shard 1 Shard 1 Client $ curl -XPUT localhost:9200/twitter/tweet/1 -d CURL { "text": "Bienvenue à la conférence #elasticsearch pour #devoxxfr", "created_at": "2012-04-06T20:45:36.000Z", "source": "Twitter for iPad", ... }Engine Elasticsearch Rivers Facets Demo Architecture Community 47
  133. 133. Let’s index a document Cluster Node 1 Node 2 Node 3 Node 4 Doc Doc Shard 0 1 Shard 0 1 Shard 1 Shard 1 Client $ curl -XPUT localhost:9200/twitter/tweet/1 -d CURL { "text": "Bienvenue à la conférence #elasticsearch pour #devoxxfr", "created_at": "2012-04-06T20:45:36.000Z", "source": "Twitter for iPad", ... }Engine Elasticsearch Rivers Facets Demo Architecture Community 48
  134. 134. Let’s index another document Cluster Node 1 Node 2 Node 3 Node 4 Doc Doc Shard 0 1 Shard 0 1 Shard 1 Shard 1 Doc 2 Client $ curl -XPUT localhost:9200/twitter/tweet/2 -d CURL { "text": "Je fais du bruit pour #elasticsearch à #devoxxfr", "created_at": "2012-04-06T21:12:52.000Z", "source": "Twitter for iPad", ... }Engine Elasticsearch Rivers Facets Demo Architecture Community 49
  135. 135. Let’s index another document Cluster Node 1 Node 2 Node 3 Node 4 Doc Doc Shard 0 1 Shard 0 1 Shard 1 Shard 1 Doc 2 Client $ curl -XPUT localhost:9200/twitter/tweet/2 -d CURL { "text": "Je fais du bruit pour #elasticsearch à #devoxxfr", "created_at": "2012-04-06T21:12:52.000Z", "source": "Twitter for iPad", ... }Engine Elasticsearch Rivers Facets Demo Architecture Community 50
  136. 136. Let’s index another document Cluster Node 1 Node 2 Node 3 Node 4 Doc Doc Shard 0 1 Shard 0 1 Shard 1 Doc Shard 1 2 Client $ curl -XPUT localhost:9200/twitter/tweet/2 -d CURL { "text": "Je fais du bruit pour #elasticsearch à #devoxxfr", "created_at": "2012-04-06T21:12:52.000Z", "source": "Twitter for iPad", ... }Engine Elasticsearch Rivers Facets Demo Architecture Community 51
  137. 137. Let’s index another document Cluster Node 1 Node 2 Node 3 Node 4 Doc Doc Shard 0 1 Shard 0 1 Shard 1 Doc Shard 1 2 Client $ curl -XPUT localhost:9200/twitter/tweet/2 -d CURL { "text": "Je fais du bruit pour #elasticsearch à #devoxxfr", "created_at": "2012-04-06T21:12:52.000Z", "source": "Twitter for iPad", ... }Engine Elasticsearch Rivers Facets Demo Architecture Community 51
  138. 138. Let’s index another document Cluster Node 1 Node 2 Node 3 Node 4 Doc Doc Shard 0 1 Shard 0 1 Doc Doc Shard 1 Shard 1 2 2 Client $ curl -XPUT localhost:9200/twitter/tweet/2 -d CURL { "text": "Je fais du bruit pour #elasticsearch à #devoxxfr", "created_at": "2012-04-06T21:12:52.000Z", "source": "Twitter for iPad", ... }Engine Elasticsearch Rivers Facets Demo Architecture Community 52

×