Consistency in Distributed Systems

1,109 views
925 views

Published on

A stable data model provides numerous advantages in developing Big Data and NoSQL systems, especially when sharing data over the cloud. Adopting an immutable model eases much of the pain of achieving consistency, especially at great scale. There can be trade-offs however that you need to be aware of.

This webinar will present details and examples of immutable data models as applied to various NoSQL systems, including MongoDB, Cloudant, Riak and Cassandra. The emphasis will be on the impact to application designers and architects, as well as the technical trade-offs and advantages. The discussion will be well-grounded in real world examples from within and beyond the enterprise.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,109
On SlideShare
0
From Embeds
0
Number of Embeds
185
Actions
Shares
0
Downloads
57
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Consistency in Distributed Systems

  1. 1. Consistency in Distributed Systems Mike Miller Co-Founder, Chief Scientist @mlmilleratmit
  2. 2. 2014-06-12 2 Want to learn more? P. Bailis: “Coordination and the Art of Scaling”
  3. 3. 2014-06-12 3 {Introductions: ‘Me’} Background -- Big Systems
  4. 4. 2014-06-12 4 MobileBig Data => Stress models for consistency, transactional reasoning
  5. 5. 2014-06-12 This is your problem when… ! … data doesn’t fit on one server. … data replicated between servers (e.g. read slaves). … data spread between data centers. … state spread across more than one device (mobile!) … mixed workloads with concurrency. … state spread across more than one process. 5
  6. 6. 2014-06-12 This is now everyone’s problem 6
  7. 7. 2014-06-12 Good news — market response: NewSQL, NoSQL, Cloud, … 7
  8. 8. 2014-06-12 Let’s view this from the developer’s perspective 8
  9. 9. 2014-06-12 9 ships with a mobile strategy
  10. 10. 2014-06-12 {Install: ‘Cloudant’} You do this: We give you: https://<username>.cloudant.com Done! Sign Up Step 1 Step 2 Step 3 10
  11. 11. 2014-06-12 {Cloudant: ‘API’} 11 JSON Documents Primary Index Secondary Indexes Search & Geospatial
  12. 12. 2014-06-12 {Write: ‘Local’, Sync: ‘Later’} Embedded, Edge, Satellites Desktop, Browser Cloud 12
  13. 13. 2014-06-12 {Grow: ‘More’} 13 Multitenant or Dedicated 30+ Locations: Softlayer, Rackspace, Azure, AWS, …
  14. 14. 2014-06-12 So… How do you code for that? How does that compare to <X>? What about transactions? 14
  15. 15. 2014-06-12 You do need to understand your datastore. 15
  16. 16. 2014-06-12 16 http://www.wired.com/wiredenterprise/2012/08/google-as-xerox-parc/
  17. 17. 2014-06-12 17 Google File System (2003) http://research.google.com/archive/gfs.html ! Google MapReduce (2004) http://research.google.com/archive/mapreduce.html ! Google BigTable (2006) http://research.google.com/archive/bigtable.html ! Amazon’s Dynamo (2007) http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf
  18. 18. 2014-06-12 18
  19. 19. 2014-06-12 19 {Sacrificed: ‘SPOFs’} Replaced with self healing systems
  20. 20. 2014-06-12 20 {Sacrificed: ‘Manual Sharding’} http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/spanner-osdi2012.pdf
  21. 21. 2014-06-12 21 {Sacrificed: ‘Locks’} Block (optionally) on reads, not writes
  22. 22. 2014-06-12 22 {Sacrificed: ‘Schemas’} “Schema on read”
  23. 23. 2014-06-12 23 {Sacrificed: ‘Tables’} SQL and JSON poorly matched
  24. 24. MySQL, MongoDB, CouchDB, SOLR, … Dynamo, Cloudant, Cassandra, Riak, …
  25. 25. 2014-06-12 25 … … http://www.bailis.org/papers/ramp-sigmod2014.pdf {Sacrificed: ‘Transactions’} Fundamental reason: CAP Theorem
  26. 26. 2014-06-12 26 {Consistency: ‘Eventual’} https://amplab.cs.berkeley.edu/wp-content/uploads/2013/04/p20-bailis.pdf Excellent high level overview
  27. 27. 2014-06-12 27 {Consistency: ‘Eventual’} https://amplab.cs.berkeley.edu/wp-content/uploads/2012/06/p776_peterbailis_vldb2012.pdf
  28. 28. 2014-06-12 28 {Consistency: ‘Eventual’} “AP” “C”
  29. 29. 2014-06-12 29 {Consistency: ‘Eventual’} https://amplab.cs.berkeley.edu/wp-content/uploads/2013/04/p20-bailis.pdf FoM = Benefit - Cost*Rate
  30. 30. 2014-06-12 30 {Consistency: ‘Eventual’} 3 minutes, 100 points (Dow Jones)
  31. 31. 2014-06-12 31 {Consistency: ‘Eventual’} What is the penalty? Hedge strategy?
  32. 32. 2014-06-12 32 {Strategy: ‘Immutability’} Write-only state machine
  33. 33. 2014-06-12 33 http://www.infoq.com/presentations/Value-Values {Spokesperson: ‘Rich Hickey’}
  34. 34. 2014-06-12 Immutability isn’t new ! ‣ “Accountants don’t use erasers” ‣ Functional, concurrent, distributed languages (e.g. Erlang) ‣ File systems (e.g. ZFS) ‣ Storage engines (e.g. LevelDB) & Databases (CouchDB, Datomic, …) ‣ Data model 34
  35. 35. 2014-06-12 ‣ Don’t update in place ‣ Keep old versions ‣ Query for newest version ‣ Even works for deletions (write a “tombstone”) 35 {Strategy 1: ‘Write Only’}
  36. 36. 2014-06-12 36 {Strategy 2: ‘Minimize Contention’} ‣ Break out one-to-many, many-to-many relationships using foreign keys and links. ‣ Normalize! Learn your indexing options!
  37. 37. 2014-06-12 37 {Strategy 3: ‘Think Commutative’} ‣ Store “deltas”, just like your checkbook Account Value via Materialized View
  38. 38. 2014-06-12 38 {Strategy 3: ‘Think Commutative’} Commutative Replicated Data Types (2010) http://pagesperso-systeme.lip6.fr/Marc.Shapiro/papers/RR-6956.pdf
  39. 39. 2014-06-12 Future Work ‣ Additional explicit data modeling examples ‣ Advanced reasoning for “AP” systems ‣ CRDTs ‣ Secondary index consistency, maintenance (RAMP) ‣ “New” transactional systems (HAT, Google Spanner) ‣ “Call me maybe”: • (http://aphyr.com/posts/281-call-me-maybe-carly-rae-jepsen-and-the-perils-of-network- partitions) 39
  40. 40. 2014-06-12 40 Keep Learning
  41. 41. 2014-06-12 41 AMP on Consistency https://amplab.cs.berkeley.edu/tag/consistency/
  42. 42. 2014-06-12 cloudant.com mike@cloudant.com @mlmilleratmit #Cloudant Thanks! 42 IRC
  43. 43. 2014-06-12 43

×