Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How shit works: Time

20 views

Published on

A talk given at the Wix Engineering Conference, 2017 in Israel

The third talk in this series breaks the mold to take a hard look at one of the most commonly used, and at least as commonly misunderstood, elements in software engineering: time. Time is so fundamental to the way humans experience reality that we don't normally give it a second thought, but it's just as fundamental to software systems. Without a correct model for working with time BAD THINGS HAPPEN: data is persisted out of order, exceptions occur where they shouldn't be possible, and production systems blow up.

We'll cover the various common representations of time, acknowledge their caveats and deficiencies, and hopefully learn a few new tools and practices along the way.

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

How shit works: Time

  1. 1. THE STORY OF ADI
  2. 2. Meet Adi Awanta.
  3. 3. He has a dream.
  4. 4. Adi is fashionable • Event sourcing is the future, they say • Event sourcing is the shit, he agrees
  5. 5. Adi is fashionable • Event sourcing is the future, they say • Event sourcing is the shit, he agrees • … and designs an event model Created Modified Published Archived Restored
  6. 6. Column Type Key Null site_id binary(16) event_time timestamp event_type enum(…) payload mediumblob Adi is fashionable
  7. 7. HEY, KIDS Can you guess what happens next?
  8. 8. Adi is confused • Bad shit happens • Event streams exhibit: – Out of order events – Conflicting events – Impossible states
  9. 9. How shit works: Time Tomer Gabel Wix Engineering Conference May 2017 Image: Vera Kratochvil (public domain)
  10. 10. Time (noun) “… the system of those sequential relations that any event has to any other, as past, present, or future; indefinite and continuous duration regarded as that in which events succeed one another.” -- dictionary.com
  11. 11. Time (noun) “… the system of those sequential relations that any event has to any other, as past, present, or future; indefinite and continuous duration regarded as that in which events succeed one another.” -- dictionary.com
  12. 12. Modeling time • Encoding – Resolution – Epoch “Real” time Epoch Resolution T1 T2 T3 … Tn
  13. 13. Modeling time • Encoding – Resolution – Epoch T3 Instant (or “event”)
  14. 14. Modeling time • Encoding – Resolution – Epoch T3 A B Conflicting events
  15. 15. Modeling time • Encoding – Resolution – Epoch • Bootstrapping – Manual – Battery-backed – NTP Epoch ???
  16. 16. Modeling time • Encoding – Resolution – Epoch • Bootstrapping – Manual – Battery-backed – NTP • Updating T1 T2
  17. 17. Modeling time • Encoding – Resolution – Epoch • Bootstrapping – Manual – Battery-backed – NTP • Updating T = T+1 T1 T2 W hen?
  18. 18. System Clock (RTC) Modeling time • Encoding – Resolution – Epoch • Bootstrapping – Manual – Battery-backed – NTP • Updating Image: Jeremy Saglimbeni on Vimeo (CC BY-SA 3.0)
  19. 19. RTC isn’t perfect • Alas, clocks drift • Subtle causes – Temperature – Power supply – General relativity – Cosmic radiation – Alien gamma rays Image showing ~220µs clock drift by Luke Bigum
  20. 20. Distributed time Host A Host C Host B Host D RTC RTC RTC RTC Event Stream Event Store RTC event_time = ?
  21. 21. Time source: Application Host A Host C Host B Host D RTC RTC RTC RTC Event Stream Event Store RTC 1 2 3 4
  22. 22. Time source: Application Host A Host C Host B Host D RTC RTC RTC RTC Event Stream Event Store RTC 1 2 3 4 Clocks must be synchronized!
  23. 23. What about NTP? Host A Host C Host B Host D RTC RTC RTC RTC Event Store RTC NTP
  24. 24. What about NTP? Host A Host C Host B Host D RTC RTC RTC RTC Event Store RTC NTP Accurate within ~10ms Fig. 1, “Characterizing Quality of Time and Topology in a Time Synchronization Network”, Murta et al
  25. 25. DEALING WITH TIME IS HARD TL;DR
  26. 26. Reframing the problem • When reading events – Do we need wall time? – We don’t care about it – It’s just metadata • We only care about ordering events Created (2017-05-03 10:15) Updated (2017-05-03 10:27) Archived (2017-05-03 10:55)
  27. 27. Reprise: Time (noun) “… the system of those sequential relations that any event has to any other, as past, present, or future; indefinite and continuous duration regarded as that in which events succeed one another.” -- dictionary.com
  28. 28. Reprise: Time (noun) “… the system of those sequential relations that any event has to any other, as past, present, or future; indefinite and continuous duration regarded as that in which events succeed one another.” -- dictionary.com
  29. 29. Causality • Take any two events • What relationship can they have? Created Updated Updated Archived
  30. 30. Causality • Take any two events • What relationship can they have? – Happens before: A→B A. Created B. Updated Updated Archived
  31. 31. Causality • Take any two events • What relationship can they have? – Happens before: A→B – Concurrent: A↛B and B↛A Created A. Updated B. Updated Archived
  32. 32. Lamport timestamps • Provides partial ordering of events • Advantages: – Respects causality – Low overhead – Simple to implement Event Store Host A Host C Host B “Time, Clocks, and the Ordering of Events in a Distributed System”, Leslie Lamport, 1978
  33. 33. Event Store Host A Host C Host B Lamport timestamps • Each host maintains local logical clock • Start with T=0 T=0 T=0T=0
  34. 34. Event Store Host A Host C Host B Lamport timestamps • Each host maintains local logical clock • Start with T=0 • On send (i.e. write): – Increment local clock – Attach to message T=0 T=0T=0 T=1
  35. 35. Lamport timestamps • On receive (i.e. read): – Process event if T < Tin Event Store Host A Host C Host B T=0 Tin=1
  36. 36. Lamport timestamps • On receive (i.e. read): – Process event if T < Tin – Update clock past the latest event: T = max(T, Tin) + 1 • All done! Event Store Host A Host C Host B T=0 Tin=1 T=2
  37. 37. … well, almost • This is a partial order • Causality is dealt with – A→B ⇒ T(A) < T(B) • What about concurrency? – A↛B and B↛A ⇒ ?
  38. 38. Final touches • We want total ordering – Must be stable – Need not correspond to real time Created Updated Updated Archived T=1 T=5 T=2T=2
  39. 39. Final touches • We want total ordering – Must be stable – Need not correspond to real time • Breaking the tie: – Use any consistent key – Host name, MAC, IP… Created Updated Updated Archived T=1 T=5 T=2T=2
  40. 40. What did we gain? • We can assign versions that… – Respect causality – Are totally ordered – Don’t rely on the RTC • We’ve enabled serializability!
  41. 41. MAKE ADI GREAT AGAIN! Enough theory. Let’s
  42. 42. Versioning • Our versioning mechanism must: – Respect causality – Detect conflicting writes – Not rely on wall time • We’re almost there!
  43. 43. Relax • It’s fairly simple in practice • First, the schema Column Type Key Null site_id binary(16) event_time timestamp event_type enum(…) payload mediumblob
  44. 44. Relax • It’s fairly simple in practice • First, the schema – Don’t key on timestamp Column Type Key Null site_id binary(16) event_time timestamp event_type enum(…) payload mediumblob
  45. 45. Relax • It’s fairly simple in practice • First, the schema – Don’t key on timestamp – Add explicit version Column Type Key Null site_id binary(16) version mediumint event_time timestamp event_type enum(…) payload mediumblob
  46. 46. Finishing touches • On writes: MySQL Server Client C1-Cn
  47. 47. Finishing touches • On writes: – Read latest version V0 MySQL Server Client C1-Cn V0
  48. 48. Finishing touches • On writes: – Read latest version V0 – Generate events V1...Vn – Write atomically MySQL Server Client C1-Cn V1-Vn
  49. 49. Finishing touches • On writes: – Read latest version V0 – Generate events V1...Vn – Write atomically • Success! – Return new version MySQL Server Client C1-Cn V1-Vn Vn
  50. 50. Finishing touches • On writes: – Read latest version V0 – Generate events V1...Vn – Write atomically • Duplicate primary key? – Optimistic lock failure! – Retry, propagate, resolve MySQL Server Client C1-Cn V1-Vn Error PK violation
  51. 51. GREAT SUCCESS!
  52. 52. KFIR ‘ADI’ BLOCH A round of applause for our special guest: ... and for the terrific camera work I butchered: VAIDAS PILKAUSKAS
  53. 53. QUESTIONS? Thank you for listening tomer@tomergabel.com @tomerg http://engineering.wix.com On GitHub: https://github.com/holograph This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

×