• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Automated conflict resolution - enabling masterless data distribution (Rune Skou Larsen, Trifork)
 

Automated conflict resolution - enabling masterless data distribution (Rune Skou Larsen, Trifork)

on

  • 1,183 views

This talk was held at the third meeting of the Swiss Big Data User Group on September 17 at ETH Zürich.

This talk was held at the third meeting of the Swiss Big Data User Group on September 17 at ETH Zürich.

Statistics

Views

Total Views
1,183
Views on SlideShare
1,182
Embed Views
1

Actions

Likes
0
Downloads
0
Comments
0

1 Embed 1

http://www.slashdocs.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Automated conflict resolution - enabling masterless data distribution (Rune Skou Larsen, Trifork) Automated conflict resolution - enabling masterless data distribution (Rune Skou Larsen, Trifork) Presentation Transcript

    • Automated conflict resolution - enabling masterless data distribution GOTO Aarhus 2012 Premier software development conference created by developers for developers. Conference: Oct. 1-3 // Training: Sept. 30, Oct. 4-5 Se mere på www.gotocon.com/aarhus-20121
    • About the speaker ■ Working with databases since 97 ■ NoSQL since 2008 ■ Danish Shared Medication Record ■ Migrating data from MySQL to Riak ■ Devel Riak clients ■ NoSQL architect on various international projects ■ Seoul, Korea ■ Toulouse, France ■ Leeds, UK RuneSkouLarsen ■ Copenhagen, Denmark2
    • Agenda ■Polyglot persistence landscape ■Distribute those data! ■Types of Consistency ■Moving towards consistency ■Introduction to CRDTs ■Consistency models of OLTP databases3
    • Polyglot persistence landscape In-memory OLTP Neo4j Riak VoltDB Cassandra Redis Voldemort CouchBase EasyDB Analytics MongoDB Hadoop CouchDB4
    • Why distributed databases? ■ Redundancy ■ Availability ■ Scaling ■ Getting closer to your users5
    • Types of Concistency ■ Consistency: All nodes see the same data at the same time ■ Eventual consistency → Autonomous consistency ■ Sequential consistency → Bureaucratic consistency6
    • When to be Consistent with what ■ Eventual consistency Support disconnected operations – Better to read a stale value than nothing – Better to save writes somewhere than nothing Potentially anomalous application behavior – Stale reads and conflicting writes… ■ Sequential consistency Requires highly available connections Not suitable for certain scenarios: – Disconnected clients (e.g. your phone) – Apps might prefer potential inconsistency to loss of availability7
    • Conflicting updates User A User B A B A B Asynchronous Synchronization8
    • Last Write Wins (LWW) User A User B A B t=t0 t=t1 A B t=t0 t=t1 Asynchronous Synchronization ■ Assign timestamp to all objects ■ Simple but fragile – depends on precise synchronization of timers ■ Data is lost9
    • Detecting conflicts using Vector Clocks (1) User A User B A B vclock=a:1 vclock=a:1,b:1 A A B vclock=a:1 vclock=a:1 Asynchronous vclock=a:1,b:1 Synchronization ■ Assign vector clock to objects ■ Spawn siblings when causality chain is broken ■ Data is never lost10
    • Detecting conflicts using Vector Clocks (2) User A User B A B vclock=a:1 vclock=b:1 A B vclock=a:1 vclock=b:1 Asynchronous Synchronization ■ Assign vector clock to objects ■ Spawn siblings when causality chain is broken ■ Data is never lost11
    • Conflict-free Replicated DataTypes User A User B A B AB A AB B Asynchronous Synchronization ■ Datastructure intrinsically merges objects ■ No data loss ■ Limited applicability12
    • Semantic resolution User A User B C A B A B Asynchronous Synchronization ■ Keep both values as siblings ■ User does the merging ■ Only solution for complex, important data13
    • Methods for resolving conflicts ■ Last Write Wins ■ Easy ■ Data is lost ■ Depends on timestamps ■ Conflict-free Data Types ■ Data structure has built-in convergence ■ Limited ability to model real-world problems ■ Semantic resolution ■ Requires application/user involvement ■ Generic solution14
    • Conflict-free Replicated Data Types ■ Convergent (CvRDT) ■ State is replicated ■ Moves towards one value ■ Commutative (CmRDT) ■ Operations to the state are replicated ■ The order of operations is insignificant a*b = b*a ■ CvRDT and CmRDT can emulate eachother15
    • CvRDT examples: G-set and 2P-Set Tombstone RIP16
    • CRDT References ■ CRDTs: Consistency without concurrency control 2009 INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE ■ A comprehensive study of Convergent and Commutative Replicated Data Types 2009 INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE ■ Sean Cribbs - Eventually Consistent Data Structures http://vimeo.com/4390396017
    • Consistency models of OLTP databases ■ Hinted handoff with sloppy quorums ■ Last write wins (highest write-availability) Riak Riak CouchDB/CouchBase Cassandra Cassandra ■ Strong consistency (read you own writes + strict quorums) ■ User resolvable conflicts Riak Riak Voldemort Voldemort Cassandra CouchDB/CouchBase (but unreliable) CouchBase ■ Active anti-entropy MongoDB Riak (Soon) Traditional SQL databases (Oracle, MySQL, etc.)18
    • Thank you Rune Skou Larsen rsl@trifork.com Twitter: RuneSkouLarsen19