Successfully reported this slideshow.
Your SlideShare is downloading. ×

"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presented at The Yelp Engineering Open House 11/20/13)

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 24 Ad

"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presented at The Yelp Engineering Open House 11/20/13)

Download to read offline

Slides from the Yelp Open House presentation showing how Yelp uses ElasticSearch to quickly build near real-time search applications.

Slides from the Yelp Open House presentation showing how Yelp uses ElasticSearch to quickly build near real-time search applications.

Advertisement
Advertisement

More Related Content

Similar to "Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presented at The Yelp Engineering Open House 11/20/13) (20)

Advertisement

Recently uploaded (20)

Advertisement

"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presented at The Yelp Engineering Open House 11/20/13)

  1. 1. Using ElasticSearch to scale near real-time search John Billings 2013-11-20
  2. 2. Problem: Finding reviews
  3. 3. Problem: Finding reviews
  4. 4. Where are reviews stored?
  5. 5. Searching using SQL select * from reviews where content like ‘%chicken tikka masala%’ and id = 123456
  6. 6. And we’re done
  7. 7. Not so fast...
  8. 8. What did we forget? ● Analysis ○ Tokenization ○ Stemming (‘curries’ vs ‘curry’) ○ Stop words (‘the’, ‘and’, ‘a’, …) ● Performance ● Highlighting / snippetting ● Faceting
  9. 9. Version 2
  10. 10. Query curl -w'n' -XGET 'host:14900/reviews/_search' -d '{ "fields" : [], "highlight": {"fields": {"review_comment" : {}}}, "query": { "bool": { "must": [ {"match": {"review_comment": "chicken tikka masala"}}, {"match": {"business_id": "123456"}} ] } } }'
  11. 11. Response { "took" : 10, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 341, "max_score" : 2.7088916, "hits" : [ // Hit objects ] } }
  12. 12. Response { "_index" : "reviews", "_type" : "review", "_id" : "123456", "_score" : 2.0553625, "highlight" : { "review_comment" : [ " I've found. We usually order saag paneer, <em>chicken</em> <em>tikka</em> <em>masala</em>, bengan bharta would recommend them all" ] } }
  13. 13. Aside: Clients
  14. 14. Indexing Updates JSON docs Gearman Indexer modules
  15. 15. Indexing class IndexerEvent: table: String id: int action: [‘insert’, ‘update’, ‘delete’] timestamp: float class IndexRequest: index_name: String doc_type: String doc_id: int field_values: Map<String, String> class Indexer: tables_to_watch: List<String> handle_event: IndexerEvent -> List<IndexRequest>
  16. 16. Replication and sharding curl -w'n' -XPUT 'host:14900/reviews/' -d '{ “settings”: { “index”: { “number_of_shards”: 3, “number_of_replicas”: 2, } } }’
  17. 17. Using ElasticSearch to discover local businesses
  18. 18. And we’re done
  19. 19. Not so fast…
  20. 20. Performance can be unpredictable
  21. 21. Problem: Disks are slow
  22. 22. Problem: Memory usage is unpredictable
  23. 23. Problem: Tenants can be noisy
  24. 24. Any questions?

×