ElasticSearch as a distributed NoSQL DB

10,465 views
9,975 views

Published on

Slides from Moscow BigData/Cassandra September 2013 meetup

Published in: Technology

ElasticSearch as a distributed NoSQL DB

  1. 1. ElasticSearch as a distributed NoSQL DB
  2. 2. Agenda 1. ElasticSearch architecture overview 2. How data is stored in ElasticSearch 3. Using ElasticSearch to store semi-structured data
  3. 3. ● ElasticSearch is a distributed inverted index ● Build on top of Apache Lucene ○ Lucene is a most popular java-based full text search index implementation ■ is used not only for text Overview
  4. 4. ElasticSearch cluster
  5. 5. Index request
  6. 6. Search request
  7. 7. Routing ● Any request can be manually routed ○ index request ○ search request ● Both master and slave replicas can process search requests
  8. 8. Replication ● Indexed documents are replicated to node holding slave replicas of a shard ● Sync replication (all nodes holding the shard copies must acknowledge the request) ● Optional async replication
  9. 9. Indexing ● New documents are not indexed immediately instead they are stored in memory and indexed in batches ○ Queued documents are not appear in search results ● Any change means that whole document will be marked as deleted and be reindexed
  10. 10. Agenda 1. ElasticSearch architecture overview 2. How data is stored in ElasicSearch 3. Using ElasticSearch to store semi-structured data
  11. 11. Lucene inverted index structure
  12. 12. Lucene index updates ● Index is immutable ○ All changes are added to the auxiliary index (segment) in batches ○ Search is done simultaneously in all segments of an index ● Segments are eventually merged to larger ones ○ Deleted documents is actually removed during merge process
  13. 13. Agenda 1. ElasticSearch architecture overview 2. How data is stored in ElasticSearch 3. Using ElasticSearch to store semi-structured data
  14. 14. Why use ElasticSearch for semi- structured data? ● Effective in search by many conditions ○ type: jeans AND color: [+blue +brown] AND price: [10 TO 100] AND brand: [+levis +colins] ● Inverted index has column-oriented layout ○ less disk IO ○ only data required to handle request is processed ○ effective compression is possible for the DocId lists ● Document-oriented, no strict schema
  15. 15. Example document JSON { “name”: “Ivan”, “age”: 18, “likes”: [ { “title”: “The Lord of the Rings”, “type”: ”book” }, { “title”: “The Matrix”, “type”: ”movie” } ] }
  16. 16. ElasticSearch fields ● name ● age ● likes.title ● likes.type
  17. 17. Mapping JSON to index ● Array elements field values are just a list of terms ○ how to search for users who like “The Lord of the Rings” movie? ● Separate document for each array item ○ store them on the same shard (data affinity) ● Add type prefix to field names ● Add type prefix to title term value
  18. 18. Using ElasticSearch with BigData storages ● Index in ElasticSearch, data blobs on S3 ○ user profiles in ElasticSearch ○ user wall dumps on S3 ● Index in ElasticSearch, data blobs in HBase ○ user post summaries in ElasticSearch ○ wall post contents in HBase
  19. 19. The end

×