Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Eventually Elasticsearch: Eventual Consistency in the Real World

4,435 views

Published on

Based on the experience of an ElasticSearch implementation at bol.com, we'll discuss the consequences of different modes of operation of ElasticSearch in an environment of existing SQL databases. How can you connect ElasticSearch to change queues of other databases, how can the versioning mechanism be used to implement optimistic locking, and what are the consistency consequences of using ElasticSearch as either a free text index on external data, a data cache or as the single source-of-truth system?

Published in: Technology
  • Be the first to comment

Eventually Elasticsearch: Eventual Consistency in the Real World

  1. 1. eventually elasticsearch dealing with temporal inconsistencies in the real world ™ AnneVeling | @anneveling | March 25, 2015
  2. 2. agenda • Introduction • Bol.com Plaza / Square project • Using ElasticSearch in a mixed DB landscape – ES as a DB free-text index or as a separate DB • Consistency issues and solutions • Lessons learned
  3. 3. bol.com • Leading ecommerce platform inThe Netherlands and Belgium – 5M active customers – 1M visits every day – 9M products – €680M revenue • Growing (pains) – 750 employees, 37 scrum teams – moving towards continuous deployment, team independence • Plaza / Square Seller platform – 7k sellers, 16% of total revenue
  4. 4. Square ElasticSearch • Using ElasticSearch to combine Offer and Product information – Offers from Oracle – Products from MongoDb • ReplacingOracle SQL queries – Too slow for faceting and result sets (for sellers with over 2k offers) • About 12M productoffer documents • Scala,Team 1B • ElasticSearch 1.4 – With Search, Master and Data nodes • In production now, rolling out to sellers
  5. 5. data model products offers productoffers
  6. 6. architecture SDD SDD PCS PCS STEP SSY ES products offers productoffers ??
  7. 7. option: right • ElasticSearch as a free-text DB index on Offers • DB update  update ES too – In the same ‘transaction’ • Benefits – easier • Drawbacks – Less service independence – Slower (b/c refresh) SDD SDD PCS PCS STEP SSY ES
  8. 8. option: left SDD SDD PCS PCS STEP SSY ES • ElasticSearch as a separate database • Updates from DB sent to ES via async queues • Benefits – Architecture more loosely coupled – Search performance • Drawbacks – some latency between DB and ES: eventual consistency
  9. 9. architecture SDD SDD PCS PCS STEP SSY ES products offers productoffers
  10. 10. SDD SDD PCS PCS STEP SSY ES update Offerupdate Product
  11. 11. SDDPCS ES offer data facets results product data
  12. 12. eventual consistency consistent consistent inconsistent user db time
  13. 13. temporal inconsistency
  14. 14. “immediate” consistency? • Relational databases – User view vs. DB view – Take it or leave it – Only vertical scaling • ElasticSearch – Read snapshots by refresh interval – Caching – Write once, read many user 1 db user 2 START TRANSACTION; UPDATE OFFERS SET STOCK=1 WHERE ID=42; COMMIT TRANSACTION;
  15. 15. sources of temporal inconsistencies • Internal inconsistencies – within ElasticSearch • External inconsistencies – nature of ElasticSearch – between Database and ElasticSearch – between User expectations and Application behavior
  16. 16. send data to index API receives new data updates index quorum says ‘ok’ app master replica got ‘ok’ user curl -XPOST localhost:9200/demo/drinks -d '{brand:"Glenlivet", age:18}’ {"_index":"demo","_type":"drinks","_id":"AUxKuw5pxgWzNUrImnD4 ","_version":1,"created":true}
  17. 17. app master searchuser curl -XPOST localhost:9200/demo/drinks -d '{brand:"Glenlivet", age:18}’ {"_index":"demo","_type":"drinks","_id":"AUxKuw5pxgWzNUrImnD4","_version":1 ,"created":true} curl -XPOST localhost:9200/demo/drinks/_search -d '{query:{match:{brand:"Glenlivet"}}}' {"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0} ,"hits":{"total":0,"max_score":null,"hits":[]}} refresh refresh index.refresh_interval
  18. 18. influencing search refresh • Set index.refresh_interval curl -XPUT localhost:9200/demo/_settings -d '{index:{refresh_interval:"30s"}}’ • Refresh on demand curl -XPOST localhost:9200/demo/_refresh • Refresh after index (be careful!) curl -XPOST 'localhost:9200/demo/drinks?refresh=true' -d '{brand:"Famous Grouse", age:12}’
  19. 19. dealing with search delay For a user updating a single item in the UI • On the client – Wait until refresh_interval has passed before searching again – Do a get-by-id for changed item (=real time) • And only change the single item (but: aggregations out sync) • On the server – Wait until refresh_interval has passed – Show a “done” message and hope user is slow – Refresh all searchers upon index (all searches slower!) – Add queue priority – Update ES too • Or: accept eventual consistency
  20. 20. app ES dbqueue async queue issue Measure DB  ES latency {drinks: { _timestamp: {enabled: true, store: 'yes'}}} localhost:9200/demo/_search?fields=_timestamp,_version,_source
  21. 21. measuring DB  ES latency POST /productoffer-005/_search?fields=_timestamp,_source { "size":0, "query": { "range": { "modificationDate": { "from": "now-7d" } } }, "aggs": { "hokje": { "date_histogram": { "field": "dateModification", "interval": "10m" }, "aggs": { "q": { "stats": { "script”: "doc['_timestamp'].value - doc['modificationDate'].value" } } } }
  22. 22. app ES db async queue issue
  23. 23. app ES dbqueue queue order issue • Only update if newer (w/ optimistic locking) – read (with _version)  update  index (with expected _version)  retry • version_type=external, use DB last-modified timestamp curl -XPUT localhost:9200/demo/drinks/1?version=1427279177904&version_type= external -d '{brand: "Glenlivet", age: 12}'
  24. 24. conclusions • Compromises hurt someone • Are you sure you want an eventual-consistent database? – Lots of patch work needed by bol.com… – Choose left, make it look like you chose right • In real-life, consistency concerns – more than just ES-writes – Also ES-reads – How to get data in and keep fresh influences DBES DBES right: as a free-text index left: as a separate DB
  25. 25. ES Consistency knobs to control “consistency level” eventualimmediate faster slower 1 4 2 3 1. Optimistic locking & refresh=true 2. - 3. - 4. Eventually consistent
  26. 26. ES DB ES ES searcher R CUD refresh_interval ?consistency _version action.write_consistency ?refresh indexer
  27. 27. immediateeventual consistency slower faster performance (read & write)
  28. 28. lessons learned • Make assumptions even more clear • There is more to eventual consistency than you think – User-oriented round-trip consistency latency in a mixed DB context • Use the ES knobs and dials to make it – as consistent as you need – while keeping it as fast as you can • You have to know what you’re doing
  29. 29. thank you @anneveling ‘t is een kwestie van geduld rustig wachten op de dag dat heel Holland Elasticsearch lult dat heel Holland Elasticsearch lult eventually: Elasticsearch.

×