Successfully reported this slideshow.
Your SlideShare is downloading. ×

Eventual Consistency - Desining Fail Proof Systems

Ad

EVENTUAL CONSISTENCY
DESIGNING FAILPROOF SYSTEMS
Grzegorz Skorupa
Software Architect
Illustration: Getty Images

Ad

PROBLEMS EVERYWHERE …
• System is failing approx once each two days without visible reason …
• System failed because devel...

Ad

AGENDA
 Problem
 CAP Theorem
 Eventual Consistency
 Building an Eventually Consistent App
At Least Once
Source of Tr...

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Check these out next

1 of 65 Ad
1 of 65 Ad

Eventual Consistency - Desining Fail Proof Systems

Download to read offline

Each day our systems deal with more data, more traffic and more complex tasks. Every large-scale system has to integrate with other internal or external systems. At the same time it should provide up to date information. On the other hand business states high availability and reliability requirements. Those two make designing of such systems difficult. In particular it is difficult to keep your data consistent.

It often happens that not all requirements can't be technically met. Fortunately there are theoretical proofs that define what can and can be met in what circumstances. During the talk I will present some of the data consistency related problems typical to high traffic systems and try to sketch possible solutions. During the talk I will mention a few words about: CAP theorem, Eventual Consistency, At Least-Once Delivery and Self-Healing Systems.

Each day our systems deal with more data, more traffic and more complex tasks. Every large-scale system has to integrate with other internal or external systems. At the same time it should provide up to date information. On the other hand business states high availability and reliability requirements. Those two make designing of such systems difficult. In particular it is difficult to keep your data consistent.

It often happens that not all requirements can't be technically met. Fortunately there are theoretical proofs that define what can and can be met in what circumstances. During the talk I will present some of the data consistency related problems typical to high traffic systems and try to sketch possible solutions. During the talk I will mention a few words about: CAP theorem, Eventual Consistency, At Least-Once Delivery and Self-Healing Systems.

Advertisement
Advertisement

More Related Content

Advertisement

Eventual Consistency - Desining Fail Proof Systems

  1. 1. EVENTUAL CONSISTENCY DESIGNING FAILPROOF SYSTEMS Grzegorz Skorupa Software Architect Illustration: Getty Images
  2. 2. PROBLEMS EVERYWHERE … • System is failing approx once each two days without visible reason … • System failed because developer expected each post to have an author but author was not in DB • Problem is occuring for Max … but not for any other user … • One can find any article except for the one about … Data consistency?
  3. 3. AGENDA  Problem  CAP Theorem  Eventual Consistency  Building an Eventually Consistent App At Least Once Source of Truth Restoring Consistency
  4. 4. IT SYSTEMS TODAY Business requirements:  Highly Available  Serve large amount of users  Do complex tasks …  … On large data sets  Provide correct data  Provide up-to-date data Technical challenges:  Scalable  Distributed  No Single Point of Failure  Big Data  Data consistency
  5. 5. FRIENDS EXAMPLE No invitations for Alice Friends of Alice Friends of Bob Friend invitations for BobFriend invitations for Alice ✓
  6. 6. No invitations for Alice FRIENDS EXAMPLE Friends of Alice Friends of Bob Friend invitations for Bob ✓ Friend invitations for Alice friendAlice ??? friendBob ???
  7. 7. No invitations for Alice FRIENDS EXAMPLE Friends of Alice Friends of Bob Friend invitations for Bob ✓ Friend invitations for Alice ??? Bobinvite friendAlice ??? friendBob ??? ??? Aliceinvite
  8. 8. SOME CONSTRAINTS: We do not want to see two friend invitations from the same person We do not want to be friends twice If we are friends both of us should see the other person in the list of friends
  9. 9. FRIENDSHIP RELATIONSHIP Alice Bobinvite Alice invites Bob
  10. 10. FRIENDSHIP RELATIONSHIP Alice Bobinvite Alice invites Bob Bob accepts invitation friendAlice Bob
  11. 11. FRIENDSHIP RELATIONSHIP Alice Bobinvite Alice invites Bob Bob accepts invitation friendAlice Bob friendBob Alice
  12. 12. FRIENDSHIP RELATIONSHIP Alice Bobinvite Alice invites Bob Bob accepts invitation friendAlice Bob Alice Bobinvite friendBob Alice
  13. 13. THE NIGHTMARE – INCONSISTENCIES IN DATA Alice is friends with Bob
  14. 14. THE NIGHTMARE – INCONSISTENCIES IN DATA Alice is friends with Bob but Bob is not friends with Alice
  15. 15. THE NIGHTMARE – INCONSISTENCIES IN DATA Alice is friends with Bob but Bob is not friends with Alice Alice is friends with Bob
  16. 16. THE NIGHTMARE – INCONSISTENCIES IN DATA Alice is friends with Bob but Bob is not friends with Alice Alice is friends with Bob but Bob still sees friend invitation from Alice
  17. 17. WELL, WE HAVE ACID APPROACH Begin transaction Commit transaction friendAlice Bob Alice Bobinvite friendBob Alice
  18. 18. NO ACID? WHAT TO DO? B) • File upload: 1. Store file path in DB 2. Save file • Deleting file: 1. Remove file 2. Remove from DB A) • File upload: 1. Save file 2. Store file path in DB • Deleting file: 1. Remove from DB 2. Remove file VS
  19. 19. CAP THEOREM Consistency Availability Toleration to Partitioning We can have only the 2 out of 3 Seth G., Lynch N.: Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-tolerant Web Services, SIGACT News v. 33, n. 2, 2002
  20. 20. TWO PHASE COMMIT Coordinator Cohort QUERY TO COMMIT VOTE YES/NO COMMIT/ROLLBACK ACKNOWLEDGEMENT Prepare / Abort Commit / Abort Commit / Abort End
  21. 21. CAP AND TWO PHASE COMMIT When you do a Two Phase Commit: You are sacrificing Availability – locking when node is down Scalability suffers, performance suffers It has a Single Point of Failure It does not guarantee Consistency http://blog.thislongrun.com/2015/04/the-unclear-cp-vs-ca-case-in-cap.html
  22. 22. SO … OUR HIGH TRAFFIC SYSTEM CAN’T BE CONSISTENT I am done, we have no consistency anyways But what will happen WHEN 1 succeeds and 2 fails? friendAlice Bob Alice Bobinvite friendBob Alice
  23. 23. CAP: AVAILABILITY AND CONSISTENCY Availability: Every request received by a non-failing node must result in a (successful) response Consistency: There exists a total order of all operations such that each operation looks as if it were completed at a single instant SET V = old value SET V = new value READ V  old value (from node 1) READ V  new value (from node 2)
  24. 24. UNDERSTANDING CAP THEOREM (1) Write Reads Application Writes (3) Reads Application (2) Synchronize data
  25. 25. BAD CAP, BAD MATH (1) Writes Reads Application Database Nginx PHP
  26. 26. WHAT IS A NODE? Writes Reads Nginx (PHP) MySQL (Master) MySQL (Replica) Nginx (PHP) Facebook API GoLang Load Balancer Redis
  27. 27. BAD CAP, BAD MATH (2) Primary Secondary Secondary Secondary Secondary Writes Reads Reads
  28. 28. BAD CAP, BAD MATH (2) Primary Secondary Secondary Secondary Secondary Writes Reads Reads
  29. 29. BAD CAP, BAD MATH (2) Primary Secondary Secondary Secondary Secondary Writes Reads x x
  30. 30. PARTITION VS. FAILURE Application Application Application Application x Reads/Writes Reads/Writes
  31. 31. WHAT OUR HIGH TRAFFIC SYSTEMS ARE?  They are consistent most of the time  They tolerate partitioning to some extent  They are available most of the time From mathematical stand-point they are neither CA, nor CP, and not AP
  32. 32. DESIGNING SYSTEMS – RELATION TO CAP Available & Partition Tolerant: POST /news GET /news Consistent & Partition Tolerant: POST /friends/requests GET /friends/requests
  33. 33. BUT … WE NEED SOME SORT OF CONSISTENCY: EVENTUAL CONSISTENCY The system guarantees that if no new updates are made to the object, eventually all accesses will return the last updated value. Vogels, Werner. "Eventually consistent." Queue 6.6 (2008): 14-19.
  34. 34. USEFUL APPROACHES Eventually consistent instead of Consistent At least once + Idempotency instead of Exactly once Source of truth instead of Absolute truth Controlled inconsistency – A can live without B but B can’t live without A Restore Consistency procedures Minimize probability of inconsistent data
  35. 35. WHAT ORDER TO APPLY? friendAlice Bob Alice Bobinvite friendBob Alice
  36. 36. FAILURE SCENARIOS – REMOVE INVITATION FIRST (1) If the system fails here: Friend request will be gone Alice will not be friends with Bob If the system fails here: Alice will see Bob in friends list Bob will not see Alice in his friends list friendAlice Bob Alice Bobinvite friendBob Alice
  37. 37. FAILURE SCENARIOS – BOB FIRST (2) If the system fails here: Bob will see Alice in friends list Alice will not see Bob in his friends list Bob will still see friend request from Alice If the system fails here: Alice and Bob will see each other in friends list Bob will still see friend request from Alice friendAlice Bob friendBob Alice Alice Bobinvite
  38. 38. FAILURE SCENARIOS – ALICE FIRST (3) If the system fails here: Alice will see Bob in friends list Bob will not see Alice in his friends list Bob will still see friend request from Alice If the system fails here: Alice and Bob will see each other in friends list Bob will still see friend request from Alice friendAlice Bob friendBob Alice Alice Bobinvite
  39. 39. 1ST STEP: CHOOSE THE BEST ORDER BOB FIRST: Bob will see Alice in friends list Alice will not see Bob in his friends list Bob will still see friend request from Alice Alice and Bob will see each other in friends list Bob will still see friend request from Alice ALICE FIRST: Alice will see Bob in friends list Bob will not see Alice in his friends list Bob will still see friend request from Alice Alice and Bob will see each other in friends list Bob will still see friend request from Alice friendAlice Bob friendAlice Bob friendBob Alice friendBob Alice Alice BobinviteAlice Bobinvite
  40. 40. TWO GENERALS PROBLEM User: Server: 1. POST /posts „my new post” 2. Wait for response 3. Create the post 4. Send success response 5. Where is my response? 6. Should I resend? X
  41. 41. 2ND STEP: MAKE THE ACTION IDEMPOTENT IF (EXISTS(inviation {from: Alice ,to: Bob}) { UPSERT {f1: Alice, f2: Bob} //NOT INSERT!! UPSERT {f1: Bob, f2: Alice} //NOT INSERT!! DELETE invitation {from: Alice ,to: Bob} }
  42. 42. 3RD STEP: MAKE SURE YOUR SYSTEM CAN WORK WITH INCONSISTENT DATA What if is there but is not? What if is there but invitation also is there? friendAlice Bob friendBob Alice friendAlice Bob Alice Bobinvite
  43. 43. 4TH STEP: DON’T LET OTHERS BREAK THE SYSTEM Write a contract:  Alice and Bob mentioned in the invitation MUST have respective user accounts in the system  the friend relation SHOULD be always be both ways  When there is a friend relations there SHOULD be no invitation Should – respects eventual consistency Must – is always consistent
  44. 44. TEST AGAINST CONTRACT • FIT https://medium.com/netflix-techblog/fit-failure-injection-testing-35d8e2a9bb2 • Chaos Monkey: https://github.com/Netflix/chaosmonkey // Code function acceptInvitation($from, $to) { $invite = $this->invites->find($from, $to); if ($invite) { $this->friends->befriend($from, $to); $this->friends->befriend($to, $from); $this->invites->remove($invite); } } // Test function testIdempotency() { $this->invites->create($from, $to); $this->friends->befriend($from, $to); $response = $this->controller ->acceptInvitation($from, $to); $this->assertTrue(200, $response->code()); $this->assertTrue($this->friends->isFriend($from, $to)); $this->assertTrue($this->friends->isFriend($to, $from)); $this->assertNull($this->invites->find($from, $to)); }
  45. 45. YOU HAVE TO THINK ABOUT THE WHOLE FUNCTIONALITY What with deleting a friend? What if Alice could see her pending invitations? What if Alice could cancel the invitation? What if both send an invitation to each other? And finally: what about rush conditions?
  46. 46. EVENT SOURCING CQRS Change log:  Alice invited Bob  Bob declined invitation from Alice  Alice cancelled invitation to Bob  Bob invited Alice  Alice accepted invitation from Bob Result: Alice and Bob are friends Write model Read model
  47. 47. CHANGE LOG 1. Add pending operation to change log 2. Handle operation (create friend associations, remove invitation) 3. Commit operation in change log Bob accepts invitation from Alice PENDING Bob accepts invitation from Alice SUCCESS
  48. 48. FAILURE IN THE MIDDLE OF OPERATION REPLAY … 1. Read pending operations from change log 2. Handle operation (create friend associations, remove invitation) 3. Commit operation in change log Bob accepts invitation from Alice PENDING Bob accepts invitation from Alice SUCCESS < Idempotent!
  49. 49. PARTITION TOLERANCE MERGING OF CHANGE LOGS Bob accepts invitation from Alice PENDING Alice deletes friend invitation sent to Bob PENDING Alice deletes friend invitation sent to Bob DECLINED Node 1: Node 2: Bob accepts invitation from Alice SUCCESS Alice invites Bob SUCCESS Merge / Conflict solving:
  50. 50. CHEAPER SOLUTION TRAFFIC REDIRECTION Bob accepts invitation from Alice PENDING Alice deletes friend invitation sent to Bob PENDING Node 1: Node 2: Alice invites Bob SUCCESS xBob accepts invitation from Alice SUCCESS
  51. 51. READING STALE DATA Master Replica Replica Bob accepts invitation from Alice Alice and Bob are friends Alice and Bob are NOT friends Synchronous replication is slow ASYNC ASYNC
  52. 52. FAST STORAGE THAT IS UP TO DATE Master Replica Replica Bob accepts invitation from Alice 1 Cache with TTL 2 Alice and Bob are friends https://cloud.google.com/datastore/docs/articles/balancing-strong-and-eventual-consistency-with-google-cloud-datastore/#keys-only-global-query-followed-by-lookup-by-key
  53. 53. MASTER FAILS Master Replica Replica Bob accepts invitation from Alice 1 Cache with TTL 2 Alice and Bob are friends x ELECT NEW MASTER
  54. 54. CACHE FAILS Master Replica Replica Bob accepts invitation from Alice 1 Cache with TTL 2 Alice and Bob are friends(A)READ FROM REPLICAS (B) SPIN OFF NEW CACHE x
  55. 55. 5TH STEP: RETURN TO CONSISTENT STATE A: Wait for Bob to fix it B: Write a vacuum script Search the most recently processed friend requests Verify consistency  Add missing {f1: Alice, f2: Bob} entry  Or maybe remove it?
  56. 56. A SELF-HEALING SYSTEM A system that can work with inconsistent data AND Applies various strategies to ensure eventual consistency It basically does the three things: 1. Allow for inconsistency 2. Discover inconsistencies 3. Fix inconsistencies
  57. 57. SELF-HEALING DONE BADLY (1) try { INSERT A INSERT B } catch (Exception $e) { //rollback DELETE A DELETE B } Rollback must be a separated process
  58. 58. SELF-HEALING DONE BADLY (2) $tries = 0; while (true) { $succeeded = A(); if ($succeeded) { break; } $tries++; if ($tries > MAX_TRIES) { //log it //throw exception or break } } while (true) { $succeeded = A(); //A may fail due to rush conditions if ($succeeded) { break; } }
  59. 59. SOURCE OF TRUTH APPROACH Each user should have a unique http://example.com/name.surname address We are using Mongo and Redis Register user: 1. Find next free unique name.surname.X 2. Try to store in Mongo (it has unique index) – N tries max 3. Store the name in Redis (for performance) But what if our system fails before 3? name.surname.1 name.surname.2 name.surname.3 …
  60. 60. SOURCE OF TRUTH If one asks for http://example.com/name.surname  We check name.surname against Redis  Suppose it is not there  Do we know there is no such user? No!
  61. 61. ADHOC SELF-HEALING Ad-Hoc self-healing:  Ask Mongo DB for the user with name.surname  Not there? Then no such user > return 404  Is there? Apply self healing 1. Revert to source of truth 2. Fix data according to source of truth 3. Return valid result
  62. 62. DO NOT NINJA CODE IT When there is an inconsistency you must be informed of it Multiple inconsistencies suggest a bigger problem Your consistency checking/fixing should properly log the stuff – someone has to monitor it
  63. 63. REMEMBER Design Consistent and Available systems that: • Become Eventually Consistent during failures • Ensure to Restore Consistency CAP does not disallow this
  64. 64. CHEAT SHEET 1. Allow for inconsistency 2. Design for inconsistent data 3. Test against the contract 4. Ensure eventual consistency (do a self-healing system) 5. Know the behavior of functionality when failures happen
  65. 65. EVENTUAL CONSISTENCY DESIGNING FAILPROOF SYSTEMS Illustration: Getty Images Grzegorz Skorupa Software Architect

×