ElasticSearch as a distributed NoSQL DB
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

ElasticSearch as a distributed NoSQL DB

  • 6,665 views
Uploaded on

Slides from Moscow BigData/Cassandra September 2013 meetup

Slides from Moscow BigData/Cassandra September 2013 meetup

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
6,665
On Slideshare
6,664
From Embeds
1
Number of Embeds
1

Actions

Shares
Downloads
87
Comments
0
Likes
11

Embeds 1

http://www.linkedin.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. ElasticSearch as a distributed NoSQL DB
  • 2. Agenda 1. ElasticSearch architecture overview 2. How data is stored in ElasticSearch 3. Using ElasticSearch to store semi-structured data
  • 3. ● ElasticSearch is a distributed inverted index ● Build on top of Apache Lucene ○ Lucene is a most popular java-based full text search index implementation ■ is used not only for text Overview
  • 4. ElasticSearch cluster
  • 5. Index request
  • 6. Search request
  • 7. Routing ● Any request can be manually routed ○ index request ○ search request ● Both master and slave replicas can process search requests
  • 8. Replication ● Indexed documents are replicated to node holding slave replicas of a shard ● Sync replication (all nodes holding the shard copies must acknowledge the request) ● Optional async replication
  • 9. Indexing ● New documents are not indexed immediately instead they are stored in memory and indexed in batches ○ Queued documents are not appear in search results ● Any change means that whole document will be marked as deleted and be reindexed
  • 10. Agenda 1. ElasticSearch architecture overview 2. How data is stored in ElasicSearch 3. Using ElasticSearch to store semi-structured data
  • 11. Lucene inverted index structure
  • 12. Lucene index updates ● Index is immutable ○ All changes are added to the auxiliary index (segment) in batches ○ Search is done simultaneously in all segments of an index ● Segments are eventually merged to larger ones ○ Deleted documents is actually removed during merge process
  • 13. Agenda 1. ElasticSearch architecture overview 2. How data is stored in ElasticSearch 3. Using ElasticSearch to store semi-structured data
  • 14. Why use ElasticSearch for semi- structured data? ● Effective in search by many conditions ○ type: jeans AND color: [+blue +brown] AND price: [10 TO 100] AND brand: [+levis +colins] ● Inverted index has column-oriented layout ○ less disk IO ○ only data required to handle request is processed ○ effective compression is possible for the DocId lists ● Document-oriented, no strict schema
  • 15. Example document JSON { “name”: “Ivan”, “age”: 18, “likes”: [ { “title”: “The Lord of the Rings”, “type”: ”book” }, { “title”: “The Matrix”, “type”: ”movie” } ] }
  • 16. ElasticSearch fields ● name ● age ● likes.title ● likes.type
  • 17. Mapping JSON to index ● Array elements field values are just a list of terms ○ how to search for users who like “The Lord of the Rings” movie? ● Separate document for each array item ○ store them on the same shard (data affinity) ● Add type prefix to field names ● Add type prefix to title term value
  • 18. Using ElasticSearch with BigData storages ● Index in ElasticSearch, data blobs on S3 ○ user profiles in ElasticSearch ○ user wall dumps on S3 ● Index in ElasticSearch, data blobs in HBase ○ user post summaries in ElasticSearch ○ wall post contents in HBase
  • 19. The end