Consistency in Distributed Systems

Like this? Share it with your network

Share

Consistency in Distributed Systems

  • 664 views
Uploaded on

A stable data model provides numerous advantages in developing Big Data and NoSQL systems, especially when sharing data over the cloud. Adopting an immutable model eases much of the pain of......

A stable data model provides numerous advantages in developing Big Data and NoSQL systems, especially when sharing data over the cloud. Adopting an immutable model eases much of the pain of achieving consistency, especially at great scale. There can be trade-offs however that you need to be aware of.

This webinar will present details and examples of immutable data models as applied to various NoSQL systems, including MongoDB, Cloudant, Riak and Cassandra. The emphasis will be on the impact to application designers and architects, as well as the technical trade-offs and advantages. The discussion will be well-grounded in real world examples from within and beyond the enterprise.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
664
On Slideshare
498
From Embeds
166
Number of Embeds
6

Actions

Shares
Downloads
43
Comments
0
Likes
1

Embeds 166

http://www.dataversity.net 140
https://twitter.com 13
http://feedly.com 8
http://www.feedspot.com 3
https://reader.aol.com 1
http://www.slideee.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Consistency in Distributed Systems Mike Miller Co-Founder, Chief Scientist @mlmilleratmit
  • 2. 2014-06-12 2 Want to learn more? P. Bailis: “Coordination and the Art of Scaling”
  • 3. 2014-06-12 3 {Introductions: ‘Me’} Background -- Big Systems
  • 4. 2014-06-12 4 MobileBig Data => Stress models for consistency, transactional reasoning
  • 5. 2014-06-12 This is your problem when… ! … data doesn’t fit on one server. … data replicated between servers (e.g. read slaves). … data spread between data centers. … state spread across more than one device (mobile!) … mixed workloads with concurrency. … state spread across more than one process. 5
  • 6. 2014-06-12 This is now everyone’s problem 6
  • 7. 2014-06-12 Good news — market response: NewSQL, NoSQL, Cloud, … 7
  • 8. 2014-06-12 Let’s view this from the developer’s perspective 8
  • 9. 2014-06-12 9 ships with a mobile strategy
  • 10. 2014-06-12 {Install: ‘Cloudant’} You do this: We give you: https://<username>.cloudant.com Done! Sign Up Step 1 Step 2 Step 3 10
  • 11. 2014-06-12 {Cloudant: ‘API’} 11 JSON Documents Primary Index Secondary Indexes Search & Geospatial
  • 12. 2014-06-12 {Write: ‘Local’, Sync: ‘Later’} Embedded, Edge, Satellites Desktop, Browser Cloud 12
  • 13. 2014-06-12 {Grow: ‘More’} 13 Multitenant or Dedicated 30+ Locations: Softlayer, Rackspace, Azure, AWS, …
  • 14. 2014-06-12 So… How do you code for that? How does that compare to <X>? What about transactions? 14
  • 15. 2014-06-12 You do need to understand your datastore. 15
  • 16. 2014-06-12 16 http://www.wired.com/wiredenterprise/2012/08/google-as-xerox-parc/
  • 17. 2014-06-12 17 Google File System (2003) http://research.google.com/archive/gfs.html ! Google MapReduce (2004) http://research.google.com/archive/mapreduce.html ! Google BigTable (2006) http://research.google.com/archive/bigtable.html ! Amazon’s Dynamo (2007) http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf
  • 18. 2014-06-12 18
  • 19. 2014-06-12 19 {Sacrificed: ‘SPOFs’} Replaced with self healing systems
  • 20. 2014-06-12 20 {Sacrificed: ‘Manual Sharding’} http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/spanner-osdi2012.pdf
  • 21. 2014-06-12 21 {Sacrificed: ‘Locks’} Block (optionally) on reads, not writes
  • 22. 2014-06-12 22 {Sacrificed: ‘Schemas’} “Schema on read”
  • 23. 2014-06-12 23 {Sacrificed: ‘Tables’} SQL and JSON poorly matched
  • 24. MySQL, MongoDB, CouchDB, SOLR, … Dynamo, Cloudant, Cassandra, Riak, …
  • 25. 2014-06-12 25 … … http://www.bailis.org/papers/ramp-sigmod2014.pdf {Sacrificed: ‘Transactions’} Fundamental reason: CAP Theorem
  • 26. 2014-06-12 26 {Consistency: ‘Eventual’} https://amplab.cs.berkeley.edu/wp-content/uploads/2013/04/p20-bailis.pdf Excellent high level overview
  • 27. 2014-06-12 27 {Consistency: ‘Eventual’} https://amplab.cs.berkeley.edu/wp-content/uploads/2012/06/p776_peterbailis_vldb2012.pdf
  • 28. 2014-06-12 28 {Consistency: ‘Eventual’} “AP” “C”
  • 29. 2014-06-12 29 {Consistency: ‘Eventual’} https://amplab.cs.berkeley.edu/wp-content/uploads/2013/04/p20-bailis.pdf FoM = Benefit - Cost*Rate
  • 30. 2014-06-12 30 {Consistency: ‘Eventual’} 3 minutes, 100 points (Dow Jones)
  • 31. 2014-06-12 31 {Consistency: ‘Eventual’} What is the penalty? Hedge strategy?
  • 32. 2014-06-12 32 {Strategy: ‘Immutability’} Write-only state machine
  • 33. 2014-06-12 33 http://www.infoq.com/presentations/Value-Values {Spokesperson: ‘Rich Hickey’}
  • 34. 2014-06-12 Immutability isn’t new ! ‣ “Accountants don’t use erasers” ‣ Functional, concurrent, distributed languages (e.g. Erlang) ‣ File systems (e.g. ZFS) ‣ Storage engines (e.g. LevelDB) & Databases (CouchDB, Datomic, …) ‣ Data model 34
  • 35. 2014-06-12 ‣ Don’t update in place ‣ Keep old versions ‣ Query for newest version ‣ Even works for deletions (write a “tombstone”) 35 {Strategy 1: ‘Write Only’}
  • 36. 2014-06-12 36 {Strategy 2: ‘Minimize Contention’} ‣ Break out one-to-many, many-to-many relationships using foreign keys and links. ‣ Normalize! Learn your indexing options!
  • 37. 2014-06-12 37 {Strategy 3: ‘Think Commutative’} ‣ Store “deltas”, just like your checkbook Account Value via Materialized View
  • 38. 2014-06-12 38 {Strategy 3: ‘Think Commutative’} Commutative Replicated Data Types (2010) http://pagesperso-systeme.lip6.fr/Marc.Shapiro/papers/RR-6956.pdf
  • 39. 2014-06-12 Future Work ‣ Additional explicit data modeling examples ‣ Advanced reasoning for “AP” systems ‣ CRDTs ‣ Secondary index consistency, maintenance (RAMP) ‣ “New” transactional systems (HAT, Google Spanner) ‣ “Call me maybe”: • (http://aphyr.com/posts/281-call-me-maybe-carly-rae-jepsen-and-the-perils-of-network- partitions) 39
  • 40. 2014-06-12 40 Keep Learning
  • 41. 2014-06-12 41 AMP on Consistency https://amplab.cs.berkeley.edu/tag/consistency/
  • 42. 2014-06-12 cloudant.com mike@cloudant.com @mlmilleratmit #Cloudant Thanks! 42 IRC
  • 43. 2014-06-12 43