Elasto Mania

1,117 views

Published on

A gentle introduction to Elasticsearch

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,117
On SlideShare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
9
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Elasto Mania

  1. 1. . . . elasto mania @about_andrefs 2014
  2. 2. . . . what is it? ... . Elasticsearch is a flexible and powerful open source, distributed, real-time search and analytics engine. elasticsearch.org/overview/
  3. 3. . . . talk disclaimers • introduction to ES (sorry, no heavy stuff) • focused on Elasticsearch itself (not so much on integration with Kibana, Logstash, etc) • heavily based on Andrew Cholakian’s book Exploring Elasticsearch • Tiririca method • not all disclaimers have necessarily been disclaimed
  4. 4. . . . getting started
  5. 5. . . . buzzword driven slide • real time analytics • conflict management • per-operation persistence • document oriented • build on top of Apache Lucene™ • Apache 2 Open Source License • real time data • distributed • multi-tenancy • RESTful API • schema free • full text search • high availability
  6. 6. . . . use cases ... . search a large number of product descriptions for a specific phrase and return the best results
  7. 7. . . . use cases ... . search a large number of product descriptions for a specific phrase and return the best results ... .search for words that sound like a given word
  8. 8. . . . use cases ... . search a large number of product descriptions for a specific phrase and return the best results ... .search for words that sound like a given word ... . auto-complete a search box with previously search issues and allowing misspellings
  9. 9. . . . use cases ... . search a large number of product descriptions for a specific phrase and return the best results ... .search for words that sound like a given word ... . auto-complete a search box with previously search issues and allowing misspellings ... . storing large quantities of semi-structured (JSON) data in a distributed fashion, with redundancy
  10. 10. . . . don’t use cases ... .calculate how many items are le in an inventory
  11. 11. . . . don’t use cases ... .calculate how many items are le in an inventory ... . figure out the sum of all items in a given month’s invoices
  12. 12. . . . don’t use cases ... .calculate how many items are le in an inventory ... . figure out the sum of all items in a given month’s invoices ... . execute operations transactionally with rollback support
  13. 13. . . . don’t use cases ... .calculate how many items are le in an inventory ... . figure out the sum of all items in a given month’s invoices ... . execute operations transactionally with rollback support ... .guarantee item uniqueness across multiple fields
  14. 14. . . . history 2004: Shay Bannon creates Compass (Java search engine framework) 2009: big parts of Compass would need to be rewritten to release a third version focused on scalability Feb 2010: Elasticsearch 0.4.0 Mar 2012: Elasticsearch 0.19.0 Apr 2013: Elasticsearch 0.90.0 Feb 2014: Elasticsearch 1.0.0 Mar 2014: Elasticsearch 1.1.0
  15. 15. . . . the basics
  16. 16. . . . JSON over HTTP • primary data format for ES is JSON • main protocol consists of HTTP requests with JSON payload • _id is unique, and generated automatically if unassigned • internally, JSON is converted flat fields for Lucene’s key/value API
  17. 17. . . . mnemonic relational DB Elasticsearch database index table type schema definition mapping column field row document elasticsearch.org/guide/en/elasticsearch/reference/current/glossary.html
  18. 18. . . . documents • like a row in a table in an RDB • JSON objects • each is stored in an index, has a type and an id • each contains zero or more fields
  19. 19. . . . sample document . PUT /music/songs/1 .. . { ”_id” : 1, ”title” : ”The Vampyre of Time and Memory”, ”author” : ”Queens of the Stone Age”, ”album” : { ”title” : ”...Like Clockwork”, ”year” : 2013, ”track” : 3, }, ”genres” : [”alternative rock”,”piano rock”] }
  20. 20. . . . fields • key-value pairs • value can be a scalar or a nested structure • each field has a type, defined in a mapping
  21. 21. . . . types type definition string text integer 32-bit integers long 64-bit integers float IEEE floats double double precision floats boolean true or false date UTC Date/Time geo_point latitude/longitude null the value null array any field object type ommited, properties field nested separate document
  22. 22. . . . mapping • defines the types of a document’s fields • and the way they are indexed • scopes _ids (documents with different types may have identical _ids) • defines a bunch of index-wide settings • can be defined explicitly or automatically when a document is indexed
  23. 23. . . . sample mapping . PUT /music/songs/_mapping .. . { ”song” : { ”properties” : { ”title” : { ”type” : ”string” }, ”author” : { ”type” : ”string” }, ”album” : { ”properties” : { ”title” : { ”type” : ”string” }, ”year” : { ”type” : ”integer” }, ”number” : { ”type” : ”integer” } } }, ”genres” : { ”type” : ”string” } } } }
  24. 24. . . . indexes • like a database in an RDB • has a mapping which defines types • logical namespace • maps to one or more primary shards • can have zero or more replica shards
  25. 25. . . . CRUD I . PUT /music... . PUT /music/songs/_mapping .. . { ”song” : { ”properties” : { ... } } }
  26. 26. . . . CRUD II . PUT /music/songs/1 .. . { ”title” : ”The Vampyre of Time and Memory”, ... } . GET /music/songs/1 ... . POST /music/songs/1/_update .. .{ ”doc” : { ”year” : 2014 }} . DELETE /music/songs/1 ...
  27. 27. . . . search
  28. 28. . . . search fundamentals 1. boolean search 2. scoring
  29. 29. . . . ES Search API Includes: • Query DSL • Filter API • Facet API • Sort API • … ... . • /index/_search • /index/type/_search
  30. 30. . . . filters filtered queries: nested in the query field; affect both query results and facet counts top-level filters: specified at the root of search, will only affect queries facet level filters: pre-filters data before being aggregated, only affects one specific facet
  31. 31. . . . search sample I . POST /music/_search .. . { ”query” : { ”fuzzy” : { ”title” : ”vampires” } }}
  32. 32. . . . search sample II . POST /planet/_search .. . { ”from” : 0, ”size” : 15, ”query” : { ”match_all” : {} }, ”sort” : { ”handle” : ”desc” }, ”filter” : { ”term” : { ”_all” : ”coding” }}, ”facets” : { ”hobbies” : { ”terms” : { ”field” : ”hobbies” } } } }
  33. 33. . . . analysis • performed when documents are added • manipulates data to ensure better indexing • 3 steps: 1. character filtering 2. tokenization 3. token filtering • distinct analyzers for each field • multiple analyzers for each field • custom analyzers
  34. 34. . . . analyzers . PUT /music/songs/_mapping .. . { ”song” : { ”properties” : { ”title” : { ”type” : ”string”, ”fields” : { ”title_exact” : { ”type” : ”string”, ”index” : ”not_analyzed” }, ”title_simple”: { ”type” : ”string”, ”analyzer”: ”simple” }, ”title_snow” : { ”type” : ”string”, ”analyzer”: ”snowball” } } }, ... }}}
  35. 35. . . . highlighting . POST /publications/books/_search .. . { ”query” : { ”match” : { ”text” : ”spaceship” } }, ”fields” : [”title”, ”isbn”], ”highlight” : { ”fields” : { ”text” : { ”number_of_fragments” : 3 } } } }
  36. 36. . . . search phrases . POST /publications/books/_search .. . { ”query” : { ”match_phrase” : { ”text” : ”laser beam” } }, ”fields” : [”title”, ”isbn”], ”highlight” : { ”fields” : { ”text” : { ”number_of_fragments” : 3 } } } }
  37. 37. . . . going wild
  38. 38. . . . aggregations Unit of work that builds analytic information over a set of documents . bucketing.. . Documents are evaluated and placed into buckets according to previously defined criteria . metric.. . Keep track of metrics which are computed over a set of documents
  39. 39. . . . percolations
  40. 40. . . . more stuff • routing • uri search • suggesters • count API • validate API • explain API • more like this API • …
  41. 41. . . . scalability
  42. 42. . . . tools
  43. 43. . . . Logstash
  44. 44. . . . Kibana
  45. 45. . . . Marvel
  46. 46. . . . what about now
  47. 47. . . . new features.. . 2014.. . Apr 3rd : count Mar 6th : Tribe nodes Jan 17th : the cat API Jan 29th : Marvel Jan 21th : snapshot & restore . 2013.. . Sep 24th : official Elasticsearch clients for Ruby, Python, PHP and Perl Nov 28th : Lucene 4.x doc values …:
  48. 48. . . . go read a book • Exploring Elasticsearch, Andrew Cholakian • Elasticsearch – The Definitive Guide, Clinton Gormley, Zachary Tong
  49. 49. . . . getting in touch • https://github.com/elasticsearch • @elasticsearch • irc.freenode.org #elasticsearch • irc.perl.org #elasticsearch • http://www.elasticsearch.org/blog/ • Elasticsearch User mailing list
  50. 50. . . . references • Elastic Search Mega Manual • http://solr-vs-elasticsearch.com/ • Elastic Search in Production • Exploring Elasticsearch, Andrew Cholakian • Elasticsearch – The Definitive Guide, Clinton Gormley, Zachary Tong
  51. 51. . . . job’s done questions?

×