Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Using ElasticSearch to scale near real-time search
John Billings

2013-11-20
Problem: Finding reviews
Problem: Finding reviews
Where are reviews stored?
Searching using SQL

select * from reviews where
content like ‘%chicken tikka masala%’
and id = 123456
And we’re done
Not so fast...
What did we forget?

● Analysis
○ Tokenization
○ Stemming (‘curries’ vs ‘curry’)
○ Stop words (‘the’, ‘and’, ‘a’, …)
● Per...
Version 2
Query

curl -w'n' -XGET 'host:14900/reviews/_search' -d '{
"fields" : [],
"highlight": {"fields": {"review_comment" : {}}}...
Response

{
"took" : 10,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"tot...
Response

{
"_index" : "reviews",
"_type" : "review",
"_id" : "123456",
"_score" : 2.0553625,
"highlight" : {
"review_comm...
Aside: Clients
Indexing

Updates

JSON docs

Gearman
Indexer
modules
Indexing

class IndexerEvent:
table: String
id: int
action: [‘insert’, ‘update’, ‘delete’]
timestamp: float
class IndexReq...
Replication and sharding

curl -w'n' -XPUT 'host:14900/reviews/' -d '{
“settings”: {
“index”: {
“number_of_shards”: 3,
“nu...
Using ElasticSearch to discover local businesses
And we’re done
Not so fast…
Performance can be unpredictable
Problem: Disks are slow
Problem: Memory usage is unpredictable
Problem: Tenants can be noisy
Any questions?
Upcoming SlideShare
Loading in …5
×
Upcoming SlideShare
An Introduction To Yelp's SOA Infrastructure
Next
Download to read offline and view in fullscreen.

Share

"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presented at The Yelp Engineering Open House 11/20/13)

Download to read offline

Slides from the Yelp Open House presentation showing how Yelp uses ElasticSearch to quickly build near real-time search applications.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

"Using ElasticSearch to Scale Near Real-Time Search" by John Billings (Presented at The Yelp Engineering Open House 11/20/13)

  1. 1. Using ElasticSearch to scale near real-time search John Billings 2013-11-20
  2. 2. Problem: Finding reviews
  3. 3. Problem: Finding reviews
  4. 4. Where are reviews stored?
  5. 5. Searching using SQL select * from reviews where content like ‘%chicken tikka masala%’ and id = 123456
  6. 6. And we’re done
  7. 7. Not so fast...
  8. 8. What did we forget? ● Analysis ○ Tokenization ○ Stemming (‘curries’ vs ‘curry’) ○ Stop words (‘the’, ‘and’, ‘a’, …) ● Performance ● Highlighting / snippetting ● Faceting
  9. 9. Version 2
  10. 10. Query curl -w'n' -XGET 'host:14900/reviews/_search' -d '{ "fields" : [], "highlight": {"fields": {"review_comment" : {}}}, "query": { "bool": { "must": [ {"match": {"review_comment": "chicken tikka masala"}}, {"match": {"business_id": "123456"}} ] } } }'
  11. 11. Response { "took" : 10, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 341, "max_score" : 2.7088916, "hits" : [ // Hit objects ] } }
  12. 12. Response { "_index" : "reviews", "_type" : "review", "_id" : "123456", "_score" : 2.0553625, "highlight" : { "review_comment" : [ " I've found. We usually order saag paneer, <em>chicken</em> <em>tikka</em> <em>masala</em>, bengan bharta would recommend them all" ] } }
  13. 13. Aside: Clients
  14. 14. Indexing Updates JSON docs Gearman Indexer modules
  15. 15. Indexing class IndexerEvent: table: String id: int action: [‘insert’, ‘update’, ‘delete’] timestamp: float class IndexRequest: index_name: String doc_type: String doc_id: int field_values: Map<String, String> class Indexer: tables_to_watch: List<String> handle_event: IndexerEvent -> List<IndexRequest>
  16. 16. Replication and sharding curl -w'n' -XPUT 'host:14900/reviews/' -d '{ “settings”: { “index”: { “number_of_shards”: 3, “number_of_replicas”: 2, } } }’
  17. 17. Using ElasticSearch to discover local businesses
  18. 18. And we’re done
  19. 19. Not so fast…
  20. 20. Performance can be unpredictable
  21. 21. Problem: Disks are slow
  22. 22. Problem: Memory usage is unpredictable
  23. 23. Problem: Tenants can be noisy
  24. 24. Any questions?
  • harshaxv

    Sep. 20, 2017
  • up1

    Jul. 30, 2017
  • patrickchassany

    Jun. 14, 2015
  • ryokawamura371

    Jan. 13, 2015
  • jyotti

    Oct. 21, 2014
  • mastropos

    Jul. 11, 2014
  • takahitotakabayashi

    Jul. 6, 2014
  • mt7

    Mar. 23, 2014
  • hmorimori

    Feb. 12, 2014
  • TakeshiWatanabe2

    Dec. 14, 2013
  • yuichisano967

    Dec. 10, 2013
  • penguinana

    Dec. 10, 2013

Slides from the Yelp Open House presentation showing how Yelp uses ElasticSearch to quickly build near real-time search applications.

Views

Total views

17,702

On Slideshare

0

From embeds

0

Number of embeds

8,351

Actions

Downloads

46

Shares

0

Comments

0

Likes

12

×