Automated conflict resolution     - enabling masterless data     distribution         GOTO Aarhus 2012         Premier sof...
About the speaker    ■ Working with databases since 97    ■ NoSQL since 2008    ■ Danish Shared Medication Record       ■ ...
Agenda    ■Polyglot persistence landscape    ■Distribute those data!    ■Types of Consistency    ■Moving towards consisten...
Polyglot persistence    landscape        In-memory    OLTP          Neo4j        Riak          VoltDB    Cassandra        ...
Why distributed    databases?    ■ Redundancy    ■ Availability    ■ Scaling    ■ Getting closer to your      users5
Types of Concistency                       ■ Consistency:                           All nodes see the same                ...
When to be Consistent with what     ■ Eventual consistency       Support disconnected operations         – Better to read ...
Conflicting updates          User A                     User B               A                          B               A ...
Last Write Wins (LWW)             User A                         User B                 A                               B ...
Detecting conflicts using     Vector Clocks (1)             User A                                  User B                ...
Detecting conflicts using     Vector Clocks (2)             User A                             User B                     ...
Conflict-free Replicated DataTypes             User A                        User B                  A                    ...
Semantic resolution             User A                       User B                  C                  A                 ...
Methods for resolving conflicts     ■ Last Write Wins       ■ Easy       ■ Data is lost       ■ Depends on timestamps     ...
Conflict-free Replicated Data Types     ■ Convergent (CvRDT)       ■ State is replicated       ■ Moves towards one value  ...
CvRDT examples: G-set and 2P-Set                             Tombstone                               RIP16
CRDT References     ■ CRDTs: Consistency without concurrency control       2009       INSTITUT NATIONAL DE RECHERCHE EN IN...
Consistency models of OLTP databases                                    ■ Hinted handoff with sloppy quorums     ■ Last wr...
Thank you                 Rune Skou Larsen                 rsl@trifork.com                 Twitter: RuneSkouLarsen19
Upcoming SlideShare
Loading in...5
×

Automated conflict resolution - enabling masterless data distribution (Rune Skou Larsen, Trifork)

1,523
-1

Published on

This talk was held at the third meeting of the Swiss Big Data User Group on September 17 at ETH Zürich.

Published in: Technology, Design
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,523
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Automated conflict resolution - enabling masterless data distribution (Rune Skou Larsen, Trifork)

  1. 1. Automated conflict resolution - enabling masterless data distribution GOTO Aarhus 2012 Premier software development conference created by developers for developers. Conference: Oct. 1-3 // Training: Sept. 30, Oct. 4-5 Se mere på www.gotocon.com/aarhus-20121
  2. 2. About the speaker ■ Working with databases since 97 ■ NoSQL since 2008 ■ Danish Shared Medication Record ■ Migrating data from MySQL to Riak ■ Devel Riak clients ■ NoSQL architect on various international projects ■ Seoul, Korea ■ Toulouse, France ■ Leeds, UK RuneSkouLarsen ■ Copenhagen, Denmark2
  3. 3. Agenda ■Polyglot persistence landscape ■Distribute those data! ■Types of Consistency ■Moving towards consistency ■Introduction to CRDTs ■Consistency models of OLTP databases3
  4. 4. Polyglot persistence landscape In-memory OLTP Neo4j Riak VoltDB Cassandra Redis Voldemort CouchBase EasyDB Analytics MongoDB Hadoop CouchDB4
  5. 5. Why distributed databases? ■ Redundancy ■ Availability ■ Scaling ■ Getting closer to your users5
  6. 6. Types of Concistency ■ Consistency: All nodes see the same data at the same time ■ Eventual consistency → Autonomous consistency ■ Sequential consistency → Bureaucratic consistency6
  7. 7. When to be Consistent with what ■ Eventual consistency Support disconnected operations – Better to read a stale value than nothing – Better to save writes somewhere than nothing Potentially anomalous application behavior – Stale reads and conflicting writes… ■ Sequential consistency Requires highly available connections Not suitable for certain scenarios: – Disconnected clients (e.g. your phone) – Apps might prefer potential inconsistency to loss of availability7
  8. 8. Conflicting updates User A User B A B A B Asynchronous Synchronization8
  9. 9. Last Write Wins (LWW) User A User B A B t=t0 t=t1 A B t=t0 t=t1 Asynchronous Synchronization ■ Assign timestamp to all objects ■ Simple but fragile – depends on precise synchronization of timers ■ Data is lost9
  10. 10. Detecting conflicts using Vector Clocks (1) User A User B A B vclock=a:1 vclock=a:1,b:1 A A B vclock=a:1 vclock=a:1 Asynchronous vclock=a:1,b:1 Synchronization ■ Assign vector clock to objects ■ Spawn siblings when causality chain is broken ■ Data is never lost10
  11. 11. Detecting conflicts using Vector Clocks (2) User A User B A B vclock=a:1 vclock=b:1 A B vclock=a:1 vclock=b:1 Asynchronous Synchronization ■ Assign vector clock to objects ■ Spawn siblings when causality chain is broken ■ Data is never lost11
  12. 12. Conflict-free Replicated DataTypes User A User B A B AB A AB B Asynchronous Synchronization ■ Datastructure intrinsically merges objects ■ No data loss ■ Limited applicability12
  13. 13. Semantic resolution User A User B C A B A B Asynchronous Synchronization ■ Keep both values as siblings ■ User does the merging ■ Only solution for complex, important data13
  14. 14. Methods for resolving conflicts ■ Last Write Wins ■ Easy ■ Data is lost ■ Depends on timestamps ■ Conflict-free Data Types ■ Data structure has built-in convergence ■ Limited ability to model real-world problems ■ Semantic resolution ■ Requires application/user involvement ■ Generic solution14
  15. 15. Conflict-free Replicated Data Types ■ Convergent (CvRDT) ■ State is replicated ■ Moves towards one value ■ Commutative (CmRDT) ■ Operations to the state are replicated ■ The order of operations is insignificant a*b = b*a ■ CvRDT and CmRDT can emulate eachother15
  16. 16. CvRDT examples: G-set and 2P-Set Tombstone RIP16
  17. 17. CRDT References ■ CRDTs: Consistency without concurrency control 2009 INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE ■ A comprehensive study of Convergent and Commutative Replicated Data Types 2009 INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE ■ Sean Cribbs - Eventually Consistent Data Structures http://vimeo.com/4390396017
  18. 18. Consistency models of OLTP databases ■ Hinted handoff with sloppy quorums ■ Last write wins (highest write-availability) Riak Riak CouchDB/CouchBase Cassandra Cassandra ■ Strong consistency (read you own writes + strict quorums) ■ User resolvable conflicts Riak Riak Voldemort Voldemort Cassandra CouchDB/CouchBase (but unreliable) CouchBase ■ Active anti-entropy MongoDB Riak (Soon) Traditional SQL databases (Oracle, MySQL, etc.)18
  19. 19. Thank you Rune Skou Larsen rsl@trifork.com Twitter: RuneSkouLarsen19

×