More Related Content

Similar to No C-QL (Or how I learned to stop worrying, and love eventual consistency) (NoSQL Matters September 2014)(20)

More from Brian Brazil(20)

No C-QL (Or how I learned to stop worrying, and love eventual consistency) (NoSQL Matters September 2014)

  1. No C-QL (Or how I learned to stop worrying, and love eventual consistency) Brian Brazil Senior Software Engineer Boxever
  2. Overview NoSQL and eventual consistency are complementary. Why would you need it, how do you take advantage of it, and when is it useful in distributed systems?
  3. A Simple Example Website that counts how many times a user visits the site, and each page.
  4. First visit User Total Count Brian 1 User Page Count Brian index 1 Webserver DB index Read Old Total Count Insert New Total Count Read Count Insert New Count DB DB
  5. Second visit User Total Count Brian 2 User Page Count Brian index 1 Brian sales 1 Webserver sales Read Old Total Count Update New Total Count Read Count Insert New Count DB DB DB
  6. What do you think?
  7. Initial thoughts Looks like a simple, obvious solution
  8. Tabbed Browsing
  9. Tabbed Browsing Webserver A: Read Old Total Count B: Read Old Total Count A: Update New Total Count B: Update New Total Count . . . Webserversales index DB DB DB
  10. Tabbed Browsing User Total Count Brian 3 User Page Count Brian index 2 Brian sales 2 Webserver A: Read Old Total Count B: Read Old Total Count A: Update New Total Count B: Update New Total Count . . . Webserversales index DB DB DB
  11. What happened? Concurrent updates clashed. Need to protect against that.
  12. ACID to the rescue! ● Apply each set of updates atomically ● Updates don’t interfere with each other
  13. ACID to the rescue? ● Bottleneck for writes ● Webserver needs to retry ● Databases need to coordinate for quorum
  14. Aside: Cost of Reads ● Hard disk seek: 5 - 10ms
  15. Aside: Cost of Reads ● Hard disk seek: 5 - 10ms ● Cluster quorum read: 1 - 200ms
  16. Aside: Cost of Reads ● Hard disk seek: 5 - 10ms ● Cluster quorum read: 1 - 200ms ● RTT to webserver: 1 - 200ms
  17. Aside: Cost of Reads ● Hard disk seek: 5 - 10ms ● Cluster quorum read: 1 - 200ms ● RTT to webserver: 1 - 200ms Avoid reads on the write path.
  18. Eventual consistency Being slightly out of date doesn’t matter for many applications. Can we take advantage of this to avoid bottlenecks and latency?
  19. First visit User UniqueID Page Brian 1409778671-Server1-PID index Webserver DB index Insert Visit DB DB
  20. Second visit User UniqueID Page Brian 1409778671-Server1-PID index Brian 1409778689-Server1-PID sales Webserver DB sales Insert Visit DB DB
  21. Tabbed Browsing User UniqueID Page Brian 1409778671-Server1-PID index Brian 1409778689-Server1-PID sales Brian 1409778691-Server1-PID index Brian 1409778691-Server2-PID sales DB A: Insert Visit B: Insert Visit DB DB Webserver Webserversales index
  22. Analysis ● Unique IDs avoid clashes ● No bottlenecks due to consistency ● Data always internally consistent ● Fire & forget ● Only need to talk to nearest database(s) ● Resilient to network partitions
  23. Not all good ● Reads have to look at more data ● Need to make sure data makes it everywhere ● What if something depends on having data up to time X?
  24. NoSQL? Everything thus far applies equally to relational and non-relational databases.
  25. NoSQL Advantage ● Data locality on disk, multiple entries per row ● Replication options beyond master/slave ● Writes are cheap ● Can also have consistency where needed
  26. When is it good? ● Writes dominate ● Need high throughput and scalability ● Inter/intra continental databases ● 100% correct data in realtime not essential
  27. When is it not so good? ● Writes are relatively rare ● Scale isn’t a concern ● Data must be completely consistent
  28. Idempotency Each entry is unique, so can be safely replayed again and again. Gives you options for disaster recovery: Log requests to local webserver disk, replay if database goes down
  29. Improvements ● Take advantage of entry ordering ● Do rollups as a batch task ○ Reduces data to be processed, cutting latency ● Row per day/week/month ○ Avoids rows getting too big ● Handling updates and deletes ○ Can be done using only puts
  30. Advanced This is a very simple example, only doing a count. For more see “Extreme availability and self-healing data with CRDTs” at 11am with Uwe Friedrichsen in Room 2
  31. Summary Consistency brings bottlenecks. Eventual consistency allows for high- throughput, reliable writes. NoSQL combined with eventual consistency can scale very well.
  32. Questions?