Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
2016−09−08
Clock Skew, and other annoying realities in
distributed systems
Donny Nadolny
donny@pagerduty.com
#CassandraSum...
CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS 2016−09−08
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Probably not:
• user tracking / metrics
• hit cou...
2016−09−08
Probably not:
• user tracking / metrics
• hit counter / impressions
• log data
Individual data is low impact
Ye...
9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC
Introduction to Reads & Writes
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Cluster: 5 nodes
• Replication factor: 3
• Cons...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Write
INSERT INTO table1 …
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Write
INSERT INTO table1 …
write
foo
wr...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Write
INSERT INTO table1 …
value: foo
w...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Write
INSERT INTO table1 …
value: foo
v...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Write
INSERT INTO table1 …
Success
valu...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Write
INSERT INTO table1 …
Success
valu...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Read
SELECT * FROM table1 WHERE …
value...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Read
SELECT * FROM table1 WHERE …
value...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Read
SELECT * FROM table1 WHERE …
value...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Read
SELECT * FROM table1 WHERE …
value...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Read
SELECT * FROM table1 WHERE …
Succe...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Update
UPDATE table1 …
value: foo, t=5
...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Update
UPDATE table1 …
value: foo, t=5
...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Cassandra Update
UPDATE table1 …
value: foo, t=5
...
9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC
Successful Write?
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Bank Example
t=5
savings: 10000, t=5
savings: 100...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Bank Example savings: 10000, t=5
savings: 10000,
...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Withdraw 8,000 from ATM:
• Read current balance...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Withdraw 8,000 from ATM:
• Read current balance...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Withdraw 8,000 from ATM:
• Read current balance...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• A successful write can really fail
• Your clock...
9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC
Failed Write?
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write?
INSERT INTO stock_trades …
trade 12...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write?
INSERT INTO stock_trades …
trade 12...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write?
Connection error
trade 123: buy 100...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write?
INSERT INTO stock_trades …
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write?
Connection Error
Write Timeout
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write?
INSERT INTO stock_trades …
trade 24...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write? trade 245: buy 100 BRKA
trade 245…
...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
Failed Write? trade 245: buy 100 BRKA
trade 123: ...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Full repair
• Read repair chance
• Hinted hando...
9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC
Multiple Writes
aka “I wish I had transactions”
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Rule: minimum $10,000 end of day balance, month...
2016−09−08
• Rule: minimum $10,000 end of day balance, monthly fee otherwise
Balance checker
for each user:
s = read savin...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Rule: minimum $10,000 end of day balance, month...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Rule: minimum $10,000 end of day balance, month...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Rule: minimum $10,000 end of day balance, month...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
1. “Window of vulnerability is small, hope it doe...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Writes to multiple columns in the same row (whe...
9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC
Atomic Batches
2016−09−08
https://en.wikipedia.org/wiki/Atomicity_(database_systems)
CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUT...
2016−09−08
https://en.wikipedia.org/wiki/Atomicity_(database_systems)
CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUT...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO tabl...
2016−09−08
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO table2 …
INSERT INTO table1 …
APPLY BATCH;
CLOCK SKEW AND OTHER AN...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO tabl...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO tabl...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO tabl...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO tabl...
2016−09−08
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO table2 …
INSERT INTO table1 …
APPLY BATCH;
CLOCK SKEW AND OTHER AN...
2016−09−08
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO table2 …
INSERT INTO table1 …
APPLY BATCH;
CLOCK SKEW AND OTHER AN...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
BEGIN BATCH
INSERT INTO table1 …
INSERT INTO tabl...
9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC
Summary
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• No isolation - you can read partial results
• …...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• No isolation - you can read partial results
• …...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• No isolation - you can read partial results
• …...
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• No isolation - you can read partial results
• …...
2016−09−08
Questions?
donny@pagerduty.com
2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS
• Idempotency - useful overall in distributed sys...
Upcoming SlideShare
Loading in …5
×

of

Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 1 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 2 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 3 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 4 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 5 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 6 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 7 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 8 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 9 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 10 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 11 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 12 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 13 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 14 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 15 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 16 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 17 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 18 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 19 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 20 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 21 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 22 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 23 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 24 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 25 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 26 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 27 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 28 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 29 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 30 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 31 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 32 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 33 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 34 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 35 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 36 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 37 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 38 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 39 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 40 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 41 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 42 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 43 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 44 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 45 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 46 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 47 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 48 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 49 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 50 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 51 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 52 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 53 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 54 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 55 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 56 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 57 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 58 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 59 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 60 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 61 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 62 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 63 Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016 Slide 64

YouTube videos are no longer supported on SlideShare

View original on YouTube

Upcoming SlideShare
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra (Sam Bisbee, Threat Stack) | C* Summit 2016
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

Download to read offline

You write with QUORUM, you read with QUORUM. You're safe, right?
Although it may seem that way, you could read a different value than the one you wrote - even if nobody else wrote after you. One way this can happen is if the time on the machines in your cluster is not synchronized closely enough. This is called clock skew, and is just one of the ways you'll see that this anomaly can occur.
In this talk we'll dive in to how Cassandra handles conflicting data, walk through several weird and seemingly impossible situations that can happen (both with and without clock skew), and see what we can do to work around them.

About the Speaker
Donny Nadolny Senior Developer, PagerDuty

Donny Nadolny is a Scala developer at PagerDuty, working on improving the reliability of their backend systems. He spends a large amount of time investigating problems experienced with distributed systems like Cassandra and ZooKeeper.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny, PagerDuty) | Cassandra Summit 2016

  1. 1. 2016−09−08 Clock Skew, and other annoying realities in distributed systems Donny Nadolny donny@pagerduty.com #CassandraSummit
  2. 2. CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS 2016−09−08
  3. 3. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Probably not: • user tracking / metrics • hit counter / impressions • log data Should I Care? Yes: • incident management (PagerDuty) • financial info / banking / stocks • online store
  4. 4. 2016−09−08 Probably not: • user tracking / metrics • hit counter / impressions • log data Individual data is low impact Yes: • incident management (PagerDuty) • financial info / banking / stocks • online store Individual data is high impact CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Should I Care?
  5. 5. 9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC Introduction to Reads & Writes
  6. 6. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Cluster: 5 nodes • Replication factor: 3 • Consistency: QUORUM Cassandra Write
  7. 7. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Write INSERT INTO table1 …
  8. 8. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Write INSERT INTO table1 … write foo write foo write foo
  9. 9. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Write INSERT INTO table1 … value: foo write foo write foo write foo
  10. 10. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Write INSERT INTO table1 … value: foo value: foo write foo write foo write foo
  11. 11. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Write INSERT INTO table1 … Success value: foo value: foo write foo write foo write foo
  12. 12. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Write INSERT INTO table1 … Success value: foo value: foo write foo write foo write foo
  13. 13. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Read SELECT * FROM table1 WHERE … value: foo value: foo
  14. 14. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Read SELECT * FROM table1 WHERE … value: foo value: foo read read
  15. 15. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Read SELECT * FROM table1 WHERE … value: foo value: foo read read
  16. 16. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Read SELECT * FROM table1 WHERE … value: foo value: foo read read
  17. 17. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Read SELECT * FROM table1 WHERE … Success, value: foo value: foo value: foo read read
  18. 18. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Update UPDATE table1 … value: foo, t=5 value: foo, t=5
  19. 19. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Update UPDATE table1 … value: foo, t=5 write bar, t=7 write bar, t=7 write bar, t=7 value: foo, t=5
  20. 20. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Cassandra Update UPDATE table1 … value: foo, t=5 value: bar, t=7 write bar, t=7 write bar, t=7 write bar, t=7 value: foo, t=5 value: bar, t=7
  21. 21. 9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC Successful Write?
  22. 22. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Bank Example t=5 savings: 10000, t=5 savings: 10000, t=5 write … write … write … t=2 INSERT INTO balances … savings: 10000, t=5
  23. 23. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Bank Example savings: 10000, t=5 savings: 10000, t=5 t=5 t=2 Success INSERT INTO balances … savings: 10000, t=5
  24. 24. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Withdraw 8,000 from ATM: • Read current balance: 10,000 Bank Example savings: 10000, t=5 savings: 10000, t=5 read read t=6 t=3 savings: 10000, t=5
  25. 25. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Withdraw 8,000 from ATM: • Read current balance: 10,000 • Update to 2,000 Bank Example savings: 10000, t=5 savings: 2000, t=4 write … w rite … t=7 t=4 writesavings:2000,t=4 savings: 10000, t=5 savings: 2000, t=4 s: 10000, t=5 s: 2000, t=4
  26. 26. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Withdraw 8,000 from ATM: • Read current balance: 10,000 • Update to 2,000 • Dispense 8,000 cash Bank Example Success t=7 t=4 savings: 10000, t=5 savings: 2000, t=4 savings: 10000, t=5 savings: 2000, t=4 s: 10000, t=5 s: 2000, t=4
  27. 27. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • A successful write can really fail • Your clocks are not perfectly synchronized • “I’m running NTP, I’m good” - oh really? Clock Skew
  28. 28. 9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC Failed Write?
  29. 29. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? INSERT INTO stock_trades … trade 123: buy 100 BRKA trade 123… trade 123… write … write trade 123 … write trade 123 …
  30. 30. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? INSERT INTO stock_trades … trade 123: buy 100 BRKA trade 123… trade 123… write … write trade 123 … write trade 123 …
  31. 31. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? Connection error trade 123: buy 100 BRKA trade 123… trade 123… write … write trade 123 … write trade 123 …
  32. 32. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? INSERT INTO stock_trades …
  33. 33. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? Connection Error Write Timeout
  34. 34. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? INSERT INTO stock_trades … trade 245: buy 100 BRKA trade 245… trade 245…
  35. 35. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? trade 245: buy 100 BRKA trade 245… trade 245… hints: tell nodeA trade 123 … tell nodeB trade 123 … tell nodeC trade 123 …
  36. 36. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Failed Write? trade 245: buy 100 BRKA trade 123: buy 100 BRKA trade 245… trade 123… trade 245… trade 123… write … write trade 123 … write trade 123 …
  37. 37. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Full repair • Read repair chance • Hinted handoff Eventual Consistency
  38. 38. 9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC Multiple Writes aka “I wish I had transactions”
  39. 39. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Rule: minimum $10,000 end of day balance, monthly fee otherwise Another Bank Example
  40. 40. 2016−09−08 • Rule: minimum $10,000 end of day balance, monthly fee otherwise Balance checker for each user: s = read savings c = read checking if s + c < 10000 mark user for monthly fee CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Another Bank Example
  41. 41. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Rule: minimum $10,000 end of day balance, monthly fee otherwise Balance checker for each user: s = read savings c = read checking if s + c < 10000 mark user for monthly fee Another Bank Example Transfer money amount = … s = read savings c = read checking write_savings(s - amount) write_checking(c + amount)
  42. 42. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Rule: minimum $10,000 end of day balance, monthly fee otherwise Balance checker for each user: s = read savings c = read checking if s + c < 10000 mark user for monthly fee Another Bank Example Transfer money amount = 5000 s = read savings //7000 c = read checking //6000 write_savings(2000) write_checking(13000)
  43. 43. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Rule: minimum $10,000 end of day balance, monthly fee otherwise Balance checker for each user: s = read savings //2000 c = read checking //6000 if s + c < 10000 //true mark user for monthly fee Another Bank Example Transfer money amount = 5000 s = read savings //7000 c = read checking //6000 write_savings(2000) write_checking(11000)
  44. 44. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS 1. “Window of vulnerability is small, hope it doesn’t happen” • The client (your application) can crash 2. “Do the writes in reverse order” • Works for balance checker, but allows overdrawing your account 3. “Use a lock!” • The write can propagate out anyway • How long will you hold the lock for a failed write? Solutions?
  45. 45. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Writes to multiple columns in the same row (when issued at the same time) • Writes to multiple rows in one table that have the same partition key (when issued at the same time) Partition key: the primary key of a table, or the first part of the primary key if it is a compound key Isolation Guarantees in Cassandra
  46. 46. 9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC Atomic Batches
  47. 47. 2016−09−08 https://en.wikipedia.org/wiki/Atomicity_(database_systems) CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Atomicity “An atomic transaction is an indivisible and irreducible series of database operations such that either all occur, or nothing occurs… the transaction cannot be observed to be in progress by another database client”
  48. 48. 2016−09−08 https://en.wikipedia.org/wiki/Atomicity_(database_systems) CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Atomicity “An atomic transaction is an indivisible and irreducible series of database operations such that either all occur, or nothing occurs… the transaction cannot be observed to be in progress by another database client” “An example of an atomic transaction is a monetary transfer from bank account A to account B.”
  49. 49. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; Atomic Batch Write
  50. 50. 2016−09−08 BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Atomic Batch Write write batch write batch
  51. 51. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; Atomic Batch Write write batch write batch
  52. 52. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; Atomic Batch Write write table2 write table1 writetable1
  53. 53. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; Atomic Batch Write Success write table2 write table1 writetable1
  54. 54. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; Atomic Batch Write delete batch delete batch
  55. 55. 2016−09−08 BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Atomic Batch Write write table1 writetable1
  56. 56. 2016−09−08 BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS Atomic Batch Write Connection error
  57. 57. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS BEGIN BATCH INSERT INTO table1 … INSERT INTO table2 … INSERT INTO table1 … APPLY BATCH; Atomic Batch Write write table2 writetable1 writetable1
  58. 58. 9/16/16MAKING PAGERDUTY MORE RELIABLE USING PXC Summary
  59. 59. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • No isolation - you can read partial results • … even without any failures Summary
  60. 60. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • No isolation - you can read partial results • … even without any failures • Atomic batches aren't really atomic • also, you give up sequential ordering Summary
  61. 61. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • No isolation - you can read partial results • … even without any failures • Atomic batches aren't really atomic • also, you give up sequential ordering • A write can say it failed but really it succeeded • or it didn’t yet, but will hours later Summary
  62. 62. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • No isolation - you can read partial results • … even without any failures • Atomic batches aren't really atomic • also, you give up sequential ordering • A write can say it failed but really it succeeded • or it didn’t yet, but will hours later • A write can say it succeeded but really it failed • :( Summary
  63. 63. 2016−09−08 Questions? donny@pagerduty.com
  64. 64. 2016−09−08CLOCK SKEW AND OTHER ANNOYING REALITIES IN DISTRIBUTED SYSTEMS • Idempotency - useful overall in distributed systems • Avoid modifying data • Critical deletes get a new delete column written + row delete • Truly mutable data can be written to a new column (incrementing a version number in the column name) • Monitor ntp • Distributed locks with ZooKeeper and a sleep(100) before release • Think hard about ordering & partial failure • Test by adding “if (rng < …) exit or sleep” in between various writes How do you deal with it?

You write with QUORUM, you read with QUORUM. You're safe, right? Although it may seem that way, you could read a different value than the one you wrote - even if nobody else wrote after you. One way this can happen is if the time on the machines in your cluster is not synchronized closely enough. This is called clock skew, and is just one of the ways you'll see that this anomaly can occur. In this talk we'll dive in to how Cassandra handles conflicting data, walk through several weird and seemingly impossible situations that can happen (both with and without clock skew), and see what we can do to work around them. About the Speaker Donny Nadolny Senior Developer, PagerDuty Donny Nadolny is a Scala developer at PagerDuty, working on improving the reliability of their backend systems. He spends a large amount of time investigating problems experienced with distributed systems like Cassandra and ZooKeeper.

Views

Total views

881

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

23

Shares

0

Comments

0

Likes

0

×