Elasto Mania
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Elasto Mania

on

  • 474 views

A gentle introduction to Elasticsearch

A gentle introduction to Elasticsearch

Statistics

Views

Total Views
474
Views on SlideShare
469
Embed Views
5

Actions

Likes
1
Downloads
6
Comments
0

3 Embeds 5

http://www.slideee.com 3
https://twitter.com 1
http://www.steampdf.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Elasto Mania Presentation Transcript

  • 1. . . . elasto mania @about_andrefs 2014
  • 2. . . . what is it? ... . Elasticsearch is a flexible and powerful open source, distributed, real-time search and analytics engine. elasticsearch.org/overview/
  • 3. . . . talk disclaimers • introduction to ES (sorry, no heavy stuff) • focused on Elasticsearch itself (not so much on integration with Kibana, Logstash, etc) • heavily based on Andrew Cholakian’s book Exploring Elasticsearch • Tiririca method • not all disclaimers have necessarily been disclaimed
  • 4. . . . getting started
  • 5. . . . buzzword driven slide • real time analytics • conflict management • per-operation persistence • document oriented • build on top of Apache Lucene™ • Apache 2 Open Source License • real time data • distributed • multi-tenancy • RESTful API • schema free • full text search • high availability
  • 6. . . . use cases ... . search a large number of product descriptions for a specific phrase and return the best results
  • 7. . . . use cases ... . search a large number of product descriptions for a specific phrase and return the best results ... .search for words that sound like a given word
  • 8. . . . use cases ... . search a large number of product descriptions for a specific phrase and return the best results ... .search for words that sound like a given word ... . auto-complete a search box with previously search issues and allowing misspellings
  • 9. . . . use cases ... . search a large number of product descriptions for a specific phrase and return the best results ... .search for words that sound like a given word ... . auto-complete a search box with previously search issues and allowing misspellings ... . storing large quantities of semi-structured (JSON) data in a distributed fashion, with redundancy
  • 10. . . . don’t use cases ... .calculate how many items are le in an inventory
  • 11. . . . don’t use cases ... .calculate how many items are le in an inventory ... . figure out the sum of all items in a given month’s invoices
  • 12. . . . don’t use cases ... .calculate how many items are le in an inventory ... . figure out the sum of all items in a given month’s invoices ... . execute operations transactionally with rollback support
  • 13. . . . don’t use cases ... .calculate how many items are le in an inventory ... . figure out the sum of all items in a given month’s invoices ... . execute operations transactionally with rollback support ... .guarantee item uniqueness across multiple fields
  • 14. . . . history 2004: Shay Bannon creates Compass (Java search engine framework) 2009: big parts of Compass would need to be rewritten to release a third version focused on scalability Feb 2010: Elasticsearch 0.4.0 Mar 2012: Elasticsearch 0.19.0 Apr 2013: Elasticsearch 0.90.0 Feb 2014: Elasticsearch 1.0.0 Mar 2014: Elasticsearch 1.1.0
  • 15. . . . the basics
  • 16. . . . JSON over HTTP • primary data format for ES is JSON • main protocol consists of HTTP requests with JSON payload • _id is unique, and generated automatically if unassigned • internally, JSON is converted flat fields for Lucene’s key/value API
  • 17. . . . mnemonic relational DB Elasticsearch database index table type schema definition mapping column field row document elasticsearch.org/guide/en/elasticsearch/reference/current/glossary.html
  • 18. . . . documents • like a row in a table in an RDB • JSON objects • each is stored in an index, has a type and an id • each contains zero or more fields
  • 19. . . . sample document . PUT /music/songs/1 .. . { ”_id” : 1, ”title” : ”The Vampyre of Time and Memory”, ”author” : ”Queens of the Stone Age”, ”album” : { ”title” : ”...Like Clockwork”, ”year” : 2013, ”track” : 3, }, ”genres” : [”alternative rock”,”piano rock”] }
  • 20. . . . fields • key-value pairs • value can be a scalar or a nested structure • each field has a type, defined in a mapping
  • 21. . . . types type definition string text integer 32-bit integers long 64-bit integers float IEEE floats double double precision floats boolean true or false date UTC Date/Time geo_point latitude/longitude null the value null array any field object type ommited, properties field nested separate document
  • 22. . . . mapping • defines the types of a document’s fields • and the way they are indexed • scopes _ids (documents with different types may have identical _ids) • defines a bunch of index-wide settings • can be defined explicitly or automatically when a document is indexed
  • 23. . . . sample mapping . PUT /music/songs/_mapping .. . { ”song” : { ”properties” : { ”title” : { ”type” : ”string” }, ”author” : { ”type” : ”string” }, ”album” : { ”properties” : { ”title” : { ”type” : ”string” }, ”year” : { ”type” : ”integer” }, ”number” : { ”type” : ”integer” } } }, ”genres” : { ”type” : ”string” } } } }
  • 24. . . . indexes • like a database in an RDB • has a mapping which defines types • logical namespace • maps to one or more primary shards • can have zero or more replica shards
  • 25. . . . CRUD I . PUT /music... . PUT /music/songs/_mapping .. . { ”song” : { ”properties” : { ... } } }
  • 26. . . . CRUD II . PUT /music/songs/1 .. . { ”title” : ”The Vampyre of Time and Memory”, ... } . GET /music/songs/1 ... . POST /music/songs/1/_update .. .{ ”doc” : { ”year” : 2014 }} . DELETE /music/songs/1 ...
  • 27. . . . search
  • 28. . . . search fundamentals 1. boolean search 2. scoring
  • 29. . . . ES Search API Includes: • Query DSL • Filter API • Facet API • Sort API • … ... . • /index/_search • /index/type/_search
  • 30. . . . filters filtered queries: nested in the query field; affect both query results and facet counts top-level filters: specified at the root of search, will only affect queries facet level filters: pre-filters data before being aggregated, only affects one specific facet
  • 31. . . . search sample I . POST /music/_search .. . { ”query” : { ”fuzzy” : { ”title” : ”vampires” } }}
  • 32. . . . search sample II . POST /planet/_search .. . { ”from” : 0, ”size” : 15, ”query” : { ”match_all” : {} }, ”sort” : { ”handle” : ”desc” }, ”filter” : { ”term” : { ”_all” : ”coding” }}, ”facets” : { ”hobbies” : { ”terms” : { ”field” : ”hobbies” } } } }
  • 33. . . . analysis • performed when documents are added • manipulates data to ensure better indexing • 3 steps: 1. character filtering 2. tokenization 3. token filtering • distinct analyzers for each field • multiple analyzers for each field • custom analyzers
  • 34. . . . analyzers . PUT /music/songs/_mapping .. . { ”song” : { ”properties” : { ”title” : { ”type” : ”string”, ”fields” : { ”title_exact” : { ”type” : ”string”, ”index” : ”not_analyzed” }, ”title_simple”: { ”type” : ”string”, ”analyzer”: ”simple” }, ”title_snow” : { ”type” : ”string”, ”analyzer”: ”snowball” } } }, ... }}}
  • 35. . . . highlighting . POST /publications/books/_search .. . { ”query” : { ”match” : { ”text” : ”spaceship” } }, ”fields” : [”title”, ”isbn”], ”highlight” : { ”fields” : { ”text” : { ”number_of_fragments” : 3 } } } }
  • 36. . . . search phrases . POST /publications/books/_search .. . { ”query” : { ”match_phrase” : { ”text” : ”laser beam” } }, ”fields” : [”title”, ”isbn”], ”highlight” : { ”fields” : { ”text” : { ”number_of_fragments” : 3 } } } }
  • 37. . . . going wild
  • 38. . . . aggregations Unit of work that builds analytic information over a set of documents . bucketing.. . Documents are evaluated and placed into buckets according to previously defined criteria . metric.. . Keep track of metrics which are computed over a set of documents
  • 39. . . . percolations
  • 40. . . . more stuff • routing • uri search • suggesters • count API • validate API • explain API • more like this API • …
  • 41. . . . scalability
  • 42. . . . tools
  • 43. . . . Logstash
  • 44. . . . Kibana
  • 45. . . . Marvel
  • 46. . . . what about now
  • 47. . . . new features.. . 2014.. . Apr 3rd : count Mar 6th : Tribe nodes Jan 17th : the cat API Jan 29th : Marvel Jan 21th : snapshot & restore . 2013.. . Sep 24th : official Elasticsearch clients for Ruby, Python, PHP and Perl Nov 28th : Lucene 4.x doc values …:
  • 48. . . . go read a book • Exploring Elasticsearch, Andrew Cholakian • Elasticsearch – The Definitive Guide, Clinton Gormley, Zachary Tong
  • 49. . . . getting in touch • https://github.com/elasticsearch • @elasticsearch • irc.freenode.org #elasticsearch • irc.perl.org #elasticsearch • http://www.elasticsearch.org/blog/ • Elasticsearch User mailing list
  • 50. . . . references • Elastic Search Mega Manual • http://solr-vs-elasticsearch.com/ • Elastic Search in Production • Exploring Elasticsearch, Andrew Cholakian • Elasticsearch – The Definitive Guide, Clinton Gormley, Zachary Tong
  • 51. . . . job’s done questions?