Automated conflict resolution     - enabling masterless data     distribution         GOTO Aarhus 2012         Premier sof...
About the speaker    ■ Working with databases since 97    ■ NoSQL since 2008    ■ Danish Shared Medication Record       ■ ...
Agenda    ■Polyglot persistence landscape    ■Distribute those data!    ■Types of Consistency    ■Moving towards consisten...
Polyglot persistence    landscape        In-memory    OLTP          Neo4j        Riak          VoltDB    Cassandra        ...
Why distributed    databases?    ■ Redundancy    ■ Availability    ■ Scaling    ■ Getting closer to your      users5
Types of Concistency                       ■ Consistency:                           All nodes see the same                ...
When to be Consistent with what     ■ Eventual consistency       Support disconnected operations         – Better to read ...
Conflicting updates          User A                     User B               A                          B               A ...
Last Write Wins (LWW)             User A                         User B                 A                               B ...
Detecting conflicts using     Vector Clocks (1)             User A                                  User B                ...
Detecting conflicts using     Vector Clocks (2)             User A                             User B                     ...
Conflict-free Replicated DataTypes             User A                        User B                  A                    ...
Semantic resolution             User A                       User B                  C                  A                 ...
Methods for resolving conflicts     ■ Last Write Wins       ■ Easy       ■ Data is lost       ■ Depends on timestamps     ...
Conflict-free Replicated Data Types     ■ Convergent (CvRDT)       ■ State is replicated       ■ Moves towards one value  ...
CvRDT examples: G-set and 2P-Set                             Tombstone                               RIP16
CRDT References     ■ CRDTs: Consistency without concurrency control       2009       INSTITUT NATIONAL DE RECHERCHE EN IN...
Consistency models of OLTP databases                                    ■ Hinted handoff with sloppy quorums     ■ Last wr...
Thank you                 Rune Skou Larsen                 rsl@trifork.com                 Twitter: RuneSkouLarsen19
Upcoming SlideShare
Loading in …5
×

Automated conflict resolution - enabling masterless data distribution (Rune Skou Larsen, Trifork)

2,198 views

Published on

This talk was held at the third meeting of the Swiss Big Data User Group on September 17 at ETH Zürich.

Published in: Technology, Design
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,198
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Automated conflict resolution - enabling masterless data distribution (Rune Skou Larsen, Trifork)

  1. 1. Automated conflict resolution - enabling masterless data distribution GOTO Aarhus 2012 Premier software development conference created by developers for developers. Conference: Oct. 1-3 // Training: Sept. 30, Oct. 4-5 Se mere på www.gotocon.com/aarhus-20121
  2. 2. About the speaker ■ Working with databases since 97 ■ NoSQL since 2008 ■ Danish Shared Medication Record ■ Migrating data from MySQL to Riak ■ Devel Riak clients ■ NoSQL architect on various international projects ■ Seoul, Korea ■ Toulouse, France ■ Leeds, UK RuneSkouLarsen ■ Copenhagen, Denmark2
  3. 3. Agenda ■Polyglot persistence landscape ■Distribute those data! ■Types of Consistency ■Moving towards consistency ■Introduction to CRDTs ■Consistency models of OLTP databases3
  4. 4. Polyglot persistence landscape In-memory OLTP Neo4j Riak VoltDB Cassandra Redis Voldemort CouchBase EasyDB Analytics MongoDB Hadoop CouchDB4
  5. 5. Why distributed databases? ■ Redundancy ■ Availability ■ Scaling ■ Getting closer to your users5
  6. 6. Types of Concistency ■ Consistency: All nodes see the same data at the same time ■ Eventual consistency → Autonomous consistency ■ Sequential consistency → Bureaucratic consistency6
  7. 7. When to be Consistent with what ■ Eventual consistency Support disconnected operations – Better to read a stale value than nothing – Better to save writes somewhere than nothing Potentially anomalous application behavior – Stale reads and conflicting writes… ■ Sequential consistency Requires highly available connections Not suitable for certain scenarios: – Disconnected clients (e.g. your phone) – Apps might prefer potential inconsistency to loss of availability7
  8. 8. Conflicting updates User A User B A B A B Asynchronous Synchronization8
  9. 9. Last Write Wins (LWW) User A User B A B t=t0 t=t1 A B t=t0 t=t1 Asynchronous Synchronization ■ Assign timestamp to all objects ■ Simple but fragile – depends on precise synchronization of timers ■ Data is lost9
  10. 10. Detecting conflicts using Vector Clocks (1) User A User B A B vclock=a:1 vclock=a:1,b:1 A A B vclock=a:1 vclock=a:1 Asynchronous vclock=a:1,b:1 Synchronization ■ Assign vector clock to objects ■ Spawn siblings when causality chain is broken ■ Data is never lost10
  11. 11. Detecting conflicts using Vector Clocks (2) User A User B A B vclock=a:1 vclock=b:1 A B vclock=a:1 vclock=b:1 Asynchronous Synchronization ■ Assign vector clock to objects ■ Spawn siblings when causality chain is broken ■ Data is never lost11
  12. 12. Conflict-free Replicated DataTypes User A User B A B AB A AB B Asynchronous Synchronization ■ Datastructure intrinsically merges objects ■ No data loss ■ Limited applicability12
  13. 13. Semantic resolution User A User B C A B A B Asynchronous Synchronization ■ Keep both values as siblings ■ User does the merging ■ Only solution for complex, important data13
  14. 14. Methods for resolving conflicts ■ Last Write Wins ■ Easy ■ Data is lost ■ Depends on timestamps ■ Conflict-free Data Types ■ Data structure has built-in convergence ■ Limited ability to model real-world problems ■ Semantic resolution ■ Requires application/user involvement ■ Generic solution14
  15. 15. Conflict-free Replicated Data Types ■ Convergent (CvRDT) ■ State is replicated ■ Moves towards one value ■ Commutative (CmRDT) ■ Operations to the state are replicated ■ The order of operations is insignificant a*b = b*a ■ CvRDT and CmRDT can emulate eachother15
  16. 16. CvRDT examples: G-set and 2P-Set Tombstone RIP16
  17. 17. CRDT References ■ CRDTs: Consistency without concurrency control 2009 INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE ■ A comprehensive study of Convergent and Commutative Replicated Data Types 2009 INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE ■ Sean Cribbs - Eventually Consistent Data Structures http://vimeo.com/4390396017
  18. 18. Consistency models of OLTP databases ■ Hinted handoff with sloppy quorums ■ Last write wins (highest write-availability) Riak Riak CouchDB/CouchBase Cassandra Cassandra ■ Strong consistency (read you own writes + strict quorums) ■ User resolvable conflicts Riak Riak Voldemort Voldemort Cassandra CouchDB/CouchBase (but unreliable) CouchBase ■ Active anti-entropy MongoDB Riak (Soon) Traditional SQL databases (Oracle, MySQL, etc.)18
  19. 19. Thank you Rune Skou Larsen rsl@trifork.com Twitter: RuneSkouLarsen19

×