Consistency in Distributed Systems
 

Consistency in Distributed Systems

on

  • 580 views

A stable data model provides numerous advantages in developing Big Data and NoSQL systems, especially when sharing data over the cloud. Adopting an immutable model eases much of the pain of achieving ...

A stable data model provides numerous advantages in developing Big Data and NoSQL systems, especially when sharing data over the cloud. Adopting an immutable model eases much of the pain of achieving consistency, especially at great scale. There can be trade-offs however that you need to be aware of.

This webinar will present details and examples of immutable data models as applied to various NoSQL systems, including MongoDB, Cloudant, Riak and Cassandra. The emphasis will be on the impact to application designers and architects, as well as the technical trade-offs and advantages. The discussion will be well-grounded in real world examples from within and beyond the enterprise.

Statistics

Views

Total Views
580
Views on SlideShare
421
Embed Views
159

Actions

Likes
1
Downloads
42
Comments
0

6 Embeds 159

http://www.dataversity.net 138
http://feedly.com 8
https://twitter.com 8
http://www.feedspot.com 3
https://reader.aol.com 1
http://www.slideee.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Consistency in Distributed Systems Consistency in Distributed Systems Presentation Transcript

  • Consistency in Distributed Systems Mike Miller Co-Founder, Chief Scientist @mlmilleratmit
  • 2014-06-12 2 Want to learn more? P. Bailis: “Coordination and the Art of Scaling”
  • 2014-06-12 3 {Introductions: ‘Me’} Background -- Big Systems View slide
  • 2014-06-12 4 MobileBig Data => Stress models for consistency, transactional reasoning View slide
  • 2014-06-12 This is your problem when… ! … data doesn’t fit on one server. … data replicated between servers (e.g. read slaves). … data spread between data centers. … state spread across more than one device (mobile!) … mixed workloads with concurrency. … state spread across more than one process. 5
  • 2014-06-12 This is now everyone’s problem 6
  • 2014-06-12 Good news — market response: NewSQL, NoSQL, Cloud, … 7
  • 2014-06-12 Let’s view this from the developer’s perspective 8
  • 2014-06-12 9 ships with a mobile strategy
  • 2014-06-12 {Install: ‘Cloudant’} You do this: We give you: https://<username>.cloudant.com Done! Sign Up Step 1 Step 2 Step 3 10
  • 2014-06-12 {Cloudant: ‘API’} 11 JSON Documents Primary Index Secondary Indexes Search & Geospatial
  • 2014-06-12 {Write: ‘Local’, Sync: ‘Later’} Embedded, Edge, Satellites Desktop, Browser Cloud 12
  • 2014-06-12 {Grow: ‘More’} 13 Multitenant or Dedicated 30+ Locations: Softlayer, Rackspace, Azure, AWS, …
  • 2014-06-12 So… How do you code for that? How does that compare to <X>? What about transactions? 14
  • 2014-06-12 You do need to understand your datastore. 15
  • 2014-06-12 16 http://www.wired.com/wiredenterprise/2012/08/google-as-xerox-parc/
  • 2014-06-12 17 Google File System (2003) http://research.google.com/archive/gfs.html ! Google MapReduce (2004) http://research.google.com/archive/mapreduce.html ! Google BigTable (2006) http://research.google.com/archive/bigtable.html ! Amazon’s Dynamo (2007) http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf
  • 2014-06-12 18
  • 2014-06-12 19 {Sacrificed: ‘SPOFs’} Replaced with self healing systems
  • 2014-06-12 20 {Sacrificed: ‘Manual Sharding’} http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/spanner-osdi2012.pdf
  • 2014-06-12 21 {Sacrificed: ‘Locks’} Block (optionally) on reads, not writes
  • 2014-06-12 22 {Sacrificed: ‘Schemas’} “Schema on read”
  • 2014-06-12 23 {Sacrificed: ‘Tables’} SQL and JSON poorly matched
  • MySQL, MongoDB, CouchDB, SOLR, … Dynamo, Cloudant, Cassandra, Riak, …
  • 2014-06-12 25 … … http://www.bailis.org/papers/ramp-sigmod2014.pdf {Sacrificed: ‘Transactions’} Fundamental reason: CAP Theorem
  • 2014-06-12 26 {Consistency: ‘Eventual’} https://amplab.cs.berkeley.edu/wp-content/uploads/2013/04/p20-bailis.pdf Excellent high level overview
  • 2014-06-12 27 {Consistency: ‘Eventual’} https://amplab.cs.berkeley.edu/wp-content/uploads/2012/06/p776_peterbailis_vldb2012.pdf
  • 2014-06-12 28 {Consistency: ‘Eventual’} “AP” “C”
  • 2014-06-12 29 {Consistency: ‘Eventual’} https://amplab.cs.berkeley.edu/wp-content/uploads/2013/04/p20-bailis.pdf FoM = Benefit - Cost*Rate
  • 2014-06-12 30 {Consistency: ‘Eventual’} 3 minutes, 100 points (Dow Jones)
  • 2014-06-12 31 {Consistency: ‘Eventual’} What is the penalty? Hedge strategy?
  • 2014-06-12 32 {Strategy: ‘Immutability’} Write-only state machine
  • 2014-06-12 33 http://www.infoq.com/presentations/Value-Values {Spokesperson: ‘Rich Hickey’}
  • 2014-06-12 Immutability isn’t new ! ‣ “Accountants don’t use erasers” ‣ Functional, concurrent, distributed languages (e.g. Erlang) ‣ File systems (e.g. ZFS) ‣ Storage engines (e.g. LevelDB) & Databases (CouchDB, Datomic, …) ‣ Data model 34
  • 2014-06-12 ‣ Don’t update in place ‣ Keep old versions ‣ Query for newest version ‣ Even works for deletions (write a “tombstone”) 35 {Strategy 1: ‘Write Only’}
  • 2014-06-12 36 {Strategy 2: ‘Minimize Contention’} ‣ Break out one-to-many, many-to-many relationships using foreign keys and links. ‣ Normalize! Learn your indexing options!
  • 2014-06-12 37 {Strategy 3: ‘Think Commutative’} ‣ Store “deltas”, just like your checkbook Account Value via Materialized View
  • 2014-06-12 38 {Strategy 3: ‘Think Commutative’} Commutative Replicated Data Types (2010) http://pagesperso-systeme.lip6.fr/Marc.Shapiro/papers/RR-6956.pdf
  • 2014-06-12 Future Work ‣ Additional explicit data modeling examples ‣ Advanced reasoning for “AP” systems ‣ CRDTs ‣ Secondary index consistency, maintenance (RAMP) ‣ “New” transactional systems (HAT, Google Spanner) ‣ “Call me maybe”: • (http://aphyr.com/posts/281-call-me-maybe-carly-rae-jepsen-and-the-perils-of-network- partitions) 39
  • 2014-06-12 40 Keep Learning
  • 2014-06-12 41 AMP on Consistency https://amplab.cs.berkeley.edu/tag/consistency/
  • 2014-06-12 cloudant.com mike@cloudant.com @mlmilleratmit #Cloudant Thanks! 42 IRC
  • 2014-06-12 43