Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to elasticsearch

3,636 views

Published on

Introduction to elasticsearch

Published in: Technology

Introduction to elasticsearch

  1. 1. Introduction to Elasticsearch Praveen Manvi July 2016
  2. 2. Agenda • Overview – History, Product overview – ES Vocabulary – Feature set • Demo – Setup/ Configuration – Eco system – APIs for Index/Search & monitor
  3. 3. What is ElasticSearch? – Document (Json) oriented search engine – Distributed – Horizontally scalable and Highly Available – Multi-tenancy enabled – API centric & RESTful – Built on Lucene search engine library & used for – full-text search, structured search, analytics, or all three in combination
  4. 4. • Elastic search has become de facto search solution • few popular examples • GitHub uses Elasticsearch to query 130 billion lines of code. • Wikipedia uses Elasticsearch to provide full-text search with highlighted search snippets, and search-as-you- type and did-you-mean suggestions. • Stack Overflow combines full-text search with geolocation queries and uses more-like-this to find related questions and answers.
  5. 5. History Shay Benon @kimchy Doug Cutting @cutting Started Lucene in 1999, released under apache in 2005. Now part of cloudera supporting rival solution solr and commercial offerings Elasticsearch released in February 2010. Worked on this for 6 years (started with compass) Now part of http://elastic.co commercial offerings
  6. 6. Building Blocks Term Description ( ~analogy with relational database) Cluster ~Database cluster Group of nodes Node ~Instance of database A JVM process, usually a machine Index ~Database schema Hosts mapping types and their definitions contains many shards Mapping Type ~Database Table Field description, indexing requirements Document ~Database row Json document. Shard A Lucene index. Scalable unit and heart of search engine (primary and replica)
  7. 7. Physical Layout
  8. 8. Logical Layout
  9. 9. Lucene Inverted Index
  10. 10. value add over lucene • Distributed – Combines results with fork join against multiple indexes, with the new building blocks • Transaction Log – The transaction log guarantees durability, Operations are automatically replayed when a shard is reopened – It also simplifies shard relocation/recovery, Helps when moving a shard from one node to another by being able to replay the changes while transferring committed segments • Flush/Refresh/Monitor APIs – For managing the cluster/node/index statuses • Query DSL – provides huge set of grammar for search syntax
  11. 11. mapping/index/search docs
  12. 12. Document Metadata Fields • _id - The id of the document • _type - The document type • _source - enabled Stores the original document that was indexed • _all enabled Indexes all values of all document fields • _timestamp disabled timestamp associated with the document • _ttl disabled optionally defines an expiration time • _size disabled indexes the size of the uncompressed
  13. 13. Search Controller
  14. 14. Query DSL
  15. 15. Search request in place
  16. 16. Search Types • COUNT • Returns no hits, only total count matching the query, thus executes in a • single round trip to the shards • SCAN • Allows to iterate over large amounts of data using a cursor to paginate and hence memory efficient, helpful for re-indexing and decorating data outside the ES. • SEARCH • General search
  17. 17. Aggregation
  18. 18. Aggregations
  19. 19. Nested Aggregations
  20. 20. Few interesting Features • Bulk Indexing – Send multiple docs to ES • Multi Get APIs – Get multiple documents in a single API • Percolator – The idea is to have ES to notify your application when new content matches your filters instead of having to constantly poll the search engine to check for new updates. Great for building alerts • Pagination • Highlighting
  21. 21. Eco System (debug tools/development)
  22. 22. Client SDKs
  23. 23. Plugins •head •Elastic HQ •Marvel •BigDesk [ES_HOME/bin]./plugin install head
  24. 24. Configuration • Enabling store compression uses 55% less storage (LZF/snappy) • Disabling the '_all' field saves you 13% in storage. • Removing _source saves ~26% storage on disk • ES_HEAP_SIZE set it ½ of the machine memory (os file cache) • bootstrap.mlockall to true avoids swap
  25. 25. References • https://www.youtube.com/watch?v=5444z-L2V2A&spfreload=1 - “Lucene now and then” from Lucene creator Doug Cutting @ twitter, Gives history and how lucene evolved. • https://www.youtube.com/watch?v=lpZ6ZajygDY - from elastic search creator Shay Benon (Its 3 years old, but its very good content on data design patterns) • https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html - Official documentation from elasticsearch • https://www.manning.com/books/elasticsearch-in-action - From this place diagrams were picked in this presentation

×