Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

473 views

Published on

You write with QUORUM, you read with QUORUM. You're safe, right?
Although it may seem that way, you could read a different value than the one you wrote - even if nobody else wrote after you. One way this can happen is if the time on the machines in your cluster is not synchronized closely enough. This is called clock skew, and is just one of the ways you'll see that this anomaly can occur.
In this talk we'll dive in to how Cassandra handles conflicting data, walk through several weird and seemingly impossible situations that can happen (both with and without clock skew), and see what we can do to work around them.

About the Speaker
Donny Nadolny Senior Developer, PagerDuty

Donny Nadolny is a Scala developer at PagerDuty, working on improving the reliability of their backend systems. He spends a large amount of time investigating problems experienced with distributed systems like Cassandra and ZooKeeper.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

  1. 1. 2016−09−08 Clock Skew, and other annoying realities in distributed systems Donny Nadolny donny@pagerduty.com #CassandraSummit
  2. 2. CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS 2016−09−08
  3. 3. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Probably not: • user tracking / metrics • hit counter / impressions • log data Should I Care? Yes: • incident management (PagerDuty) • financial info / banking / stocks • online store
  4. 4. 2016−09−08 Probably not: • user tracking / metrics • hit counter / impressions • log data Individual data is low impact Yes: • incident management (PagerDuty) • financial info / banking / stocks • online store Individual data is high impact CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Should I Care?
  5. 5. 9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC Introduction to Reads & Writes
  6. 6. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Cluster: 5 nodes • Replication factor: 3 • Consistency: QUORUM Cassandra Write
  7. 7. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Write INSERT INTO table1 …
  8. 8. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Write INSERT INTO table1 … write foo write foo write foo
  9. 9. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Write INSERT INTO table1 … value: foo write foo write foo write foo
  10. 10. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Write INSERT INTO table1 … value: foo value: foo write foo write foo write foo
  11. 11. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Write INSERT INTO table1 … Success value: foo value: foo write foo write foo write foo
  12. 12. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Write INSERT INTO table1 … Success value: foo value: foo write foo write foo write foo
  13. 13. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Read SELECT * FROM table1 WHERE … value: foo value: foo
  14. 14. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Read SELECT * FROM table1 WHERE … value: foo value: foo read read
  15. 15. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Read SELECT * FROM table1 WHERE … value: foo value: foo read read
  16. 16. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Read SELECT * FROM table1 WHERE … value: foo value: foo read read
  17. 17. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Read SELECT * FROM table1 WHERE … Success, value: foo value: foo value: foo read read
  18. 18. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Update UPDATE table1 … value: foo, t=5 value: foo, t=5
  19. 19. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Update UPDATE table1 … value: foo, t=5 write bar, t=7 write bar, t=7 write bar, t=7 value: foo, t=5
  20. 20. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Update UPDATE table1 … value: foo, t=5 value: bar, t=7 write bar, t=7 write bar, t=7 write bar, t=7 value: foo, t=5 value: bar, t=7
  21. 21. 9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC Successful Write?
  22. 22. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Bank Example t=5 savings: 10000, t=5 savings: 10000, t=5 write … write … write … t=2 INSERT INTO balances … savings: 10000, t=5
  23. 23. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Bank Example savings: 10000, t=5 savings: 10000, t=5 t=5 t=2 Success INSERT INTO balances … savings: 10000, t=5
  24. 24. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Withdraw 8,000 from ATM: • Read current balance: 10,000 Bank Example savings: 10000, t=5 savings: 10000, t=5 read read t=6 t=3 savings: 10000, t=5
  25. 25. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Withdraw 8,000 from ATM: • Read current balance: 10,000 • Update to 2,000 Bank Example savings: 10000, t=5 savings: 2000, t=4 write … w rite … t=7 t=4 writesavings:2000,t=4 savings: 10000, t=5 savings: 2000, t=4 s: 10000, t=5 s: 2000, t=4
  26. 26. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Withdraw 8,000 from ATM: • Read current balance: 10,000 • Update to 2,000 • Dispense 8,000 cash Bank Example Success t=7 t=4 savings: 10000, t=5 savings: 2000, t=4 savings: 10000, t=5 savings: 2000, t=4 s: 10000, t=5 s: 2000, t=4
  27. 27. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • A successful write can really fail • Your clocks are not perfectly synchronized • “I’m running NTP, I’m good” - oh really? Clock Skew
  28. 28. 9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC Failed Write?
  29. 29. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? INSERT INTO stock_trades … trade 123: buy 100 BRKA trade 123… trade 123… write … write trade 123 … write trade 123 …
  30. 30. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? INSERT INTO stock_trades … trade 123: buy 100 BRKA trade 123… trade 123… write … write trade 123 … write trade 123 …
  31. 31. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? Connection error trade 123: buy 100 BRKA trade 123… trade 123… write … write trade 123 … write trade 123 …
  32. 32. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? INSERT INTO stock_trades …
  33. 33. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? Connection Error Write Timeout
  34. 34. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? INSERT INTO stock_trades … trade 245: buy 100 BRKA trade 245… trade 245…
  35. 35. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? trade 245: buy 100 BRKA trade 245… trade 245… hints: tell nodeA trade 123 … tell nodeB trade 123 … tell nodeC trade 123 …
  36. 36. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? trade 245: buy 100 BRKA trade 123: buy 100 BRKA trade 245… trade 123… trade 245… trade 123… write … write trade 123 … write trade 123 …
  37. 37. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Full repair • Read repair chance • Hinted handoff Eventual Consistency
  38. 38. 9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC Multiple Writes aka “I wish I had transactions”
  39. 39. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Rule: minimum $10,000 end of day balance, monthly fee otherwise Another Bank Example
  40. 40. 2016−09−08 • Rule: minimum $10,000 end of day balance, monthly fee otherwise Balance checker for each user: s = read savings c = read checking if s + c < 10000 mark user for monthly fee CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Another Bank Example
  41. 41. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Rule: minimum $10,000 end of day balance, monthly fee otherwise Balance checker for each user: s = read savings c = read checking if s + c < 10000 mark user for monthly fee Another Bank Example Transfer money amount = … s = read savings c = read checking write_savings(s - amount) write_checking(c + amount)
  42. 42. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Rule: minimum $10,000 end of day balance, monthly fee otherwise Balance checker for each user: s = read savings c = read checking if s + c < 10000 mark user for monthly fee Another Bank Example Transfer money amount = 5000 s = read savings //7000 c = read checking //6000 write_savings(2000) write_checking(13000)
  43. 43. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Rule: minimum $10,000 end of day balance, monthly fee otherwise Balance checker for each user: s = read savings //2000 c = read checking //6000 if s + c < 10000 //true mark user for monthly fee Another Bank Example Transfer money amount = 5000 s = read savings //7000 c = read checking //6000 write_savings(2000) write_checking(11000)
  44. 44. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS 1. “Window of vulnerability is small, hope it doesn’t happen” • The client (your application) can crash 2. “Do the writes in reverse order” • Works for balance checker, but allows overdrawing your account 3. “Use a lock!” • The write can propagate out anyway • How long will you hold the lock for a failed write? Solutions?
  45. 45. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Writes to multiple columns in the same row (when issued at the same time) • Writes to multiple rows in one table that have the same partition key (when issued at the same time) Partition key: the primary key of a table, or the first part of the primary key if it is a compound key Isolation Guarantees in Cassandra
  46. 46. 9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC Atomic Batches
  47. 47. 2016−09−08 https://en.wikipedia.org/wiki/Atomicity_(database_systems) CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Atomicity “An atomic transaction is an indivisible and irreducible series of database operations such that either all occur, or nothing occurs… the transaction cannot be observed to be in progress by another database client”
  48. 48. 2016−09−08 https://en.wikipedia.org/wiki/Atomicity_(database_systems) CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Atomicity “An atomic transaction is an indivisible and irreducible series of database operations such that either all occur, or nothing occurs… the transaction cannot be observed to be in progress by another database client” “An example of an atomic transaction is a monetary transfer from bank account A to account B.”
  49. 49. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; Atomic Batch Write
  50. 50. 2016−09−08 BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Atomic Batch Write write batch write batch
  51. 51. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; Atomic Batch Write write batch write batch
  52. 52. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; Atomic Batch Write write table2 write table1 writetable1
  53. 53. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; Atomic Batch Write Success write table2 write table1 writetable1
  54. 54. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; Atomic Batch Write delete batch delete batch
  55. 55. 2016−09−08 BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Atomic Batch Write write table1 writetable1
  56. 56. 2016−09−08 BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Atomic Batch Write Connection error
  57. 57. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; Atomic Batch Write write table2 writetable1 writetable1
  58. 58. 9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC Summary
  59. 59. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • No isolation - you can read partial results • … even without any failures Summary
  60. 60. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • No isolation - you can read partial results • … even without any failures • Atomic batches aren't really atomic • also, you give up sequential ordering Summary
  61. 61. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • No isolation - you can read partial results • … even without any failures • Atomic batches aren't really atomic • also, you give up sequential ordering • A write can say it failed but really it succeeded • or it didn’t yet, but will hours later Summary
  62. 62. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • No isolation - you can read partial results • … even without any failures • Atomic batches aren't really atomic • also, you give up sequential ordering • A write can say it failed but really it succeeded • or it didn’t yet, but will hours later • A write can say it succeeded but really it failed • :( Summary
  63. 63. 2016−09−08 Questions? donny@pagerduty.com
  64. 64. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Idempotency - useful overall in distributed systems • Avoid modifying data • Critical deletes get a new delete column written + row delete • Truly mutable data can be written to a new column (incrementing a version number in the column name) • Monitor ntp • Distributed locks with ZooKeeper and a sleep(100) before release • Think hard about ordering & partial failure • Test by adding “if (rng < …) exit or sleep” in between various writes How do you deal with it?

×