Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

QCon London 2015 Protocols of Interaction

1,509 views

Published on

Slide deck from QCon London 2015 session on Protocols of Interaction.

Published in: Software

QCon London 2015 Protocols of Interaction

  1. 1. Protocols of Interaction Best Current Practices Todd L. Montgomery @toddlmontgomery
  2. 2. What is a Protocol? Why should I care!?
  3. 3. @toddlmontgomery pro·to·col noun ˈprō-tə-ˌkȯl, -ˌkōl, -ˌkäl, -kəl ... 3 b : a set of conventions governing the treatment and especially the formatting of data in an electronic communications system <network protocols> ... 3 a : a code prescribing strict adherence to correct etiquette and precedence (as in diplomatic exchange and in the military services) <a breach of protocol>
  4. 4. Protocols of Interaction Matter
  5. 5. In an emerging era of micro-services, protocols of interaction matter Protocols are a rich source of solutions to micro-service problems
  6. 6. Algorithms Performance Concurrency Security Multi-Disciplinary Number Theory Statistics Graph Theory Biology ?!
  7. 7. Networks, and especially the Internet, are Hostile Environments
  8. 8. Data can be lost, duplicated, and re-ordered!!
  9. 9. TCP connections can… be closed unexpectedly end in an unknown state be intercepted by idiots, er Proxies
  10. 10. Duplicated Re-Ordered Lost Which means Data over TCP* might be… * - When connections are re-established
  11. 11. Case Study 1 Loose Ordering = The New Normal (De)multiplexing
  12. 12. @toddlmontgomery Sync Requests & Responses Request Request Request Response Response Response Throughput limited by Round-Trip Time (RTT)!
  13. 13. @toddlmontgomery Async Requests & Responses Request Request Request Response Response Response Throughput less limited by Round-Trip Time!
  14. 14. @toddlmontgomery Async Requests & Responses Correlation! Request 0 Request 1 Request 2 Response 0 Response 1 Response 2
  15. 15. Aside…
  16. 16. Ordering is an Illusion!!
  17. 17. Compiler can re-order Runtime can re-order CPU can re-order
  18. 18. Ordering has to be imposed!
  19. 19. @toddlmontgomery Async Requests & Responses Correlation! Request 0 Request 1 Request 2 Response 0 Response 1 Response 2
  20. 20. @toddlmontgomery Correlation! Request 0 Request 1 Request 2 Response 0 Response 1 Response 2 Ordering
  21. 21. @toddlmontgomery Correlation! Request 0 Request 1 Request 2 Response 0 Response 1 Response 2 (Valid) Re-Ordering
  22. 22. @toddlmontgomery Handling the Unexpected Request 0 Response 1 Invalid, Drop We only know of 0. 1 is unknown!
  23. 23. SCTP HTTP/2 (SPDY) … most OSI Layer 4 protocols
  24. 24. Case Study 2 Can you hear me now? Timeouts & Retries
  25. 25. @toddlmontgomery Request ACK Processing Handling the unexpected
  26. 26. @toddlmontgomery Request TimeoutInterval X
  27. 27. @toddlmontgomery Request ACK Processing XTimeoutInterval Retransmit at end of interval
  28. 28. @toddlmontgomery ACK Processing … Spurious Retransmits Retransmit Original TimeoutInterval
  29. 29. @toddlmontgomery Interval = N x “typical” RTT Account for processing delay X TimeoutInterval “Average”
  30. 30. @toddlmontgomery Measure! But very “noisy”? RTTMeasurement Variances in processing, transmission, etc.
  31. 31. TCP Retransmit Timeout (RTO) Err = M - A A <- A + gErr D <- D + h(|Err| - D) RTO = A + 4D M = measurement, A = smoothed average, D = smoothed mean deviation, g and h = gain constants (0 to 1)
  32. 32. TCP Retransmit Timeout (RTO) Err = M - A A <- A + gErr D <- D + h(|Err| - D) RTO = A + 4D Do you measure on a Retransmit? NO!
  33. 33. @toddlmontgomery Does processing twice hurt? X Original ACK Retrans Process Once Process Twice TimeoutInterval
  34. 34. @toddlmontgomery Are Original & Retransmit treated the same? X Original ACK Retrans Process Once Process Twice TimeoutInterval
  35. 35. TCP SCTP Aeron … anything with reliability
  36. 36. Case Study 3 What I Need! When I Need It! “Lifetime” Management
  37. 37. “Managing” Application Working Set
  38. 38. Caching Algorithms LRU, MRU, PLRU, RR, SLRU, LFU, … “Liveness” is essential
  39. 39. @toddlmontgomery Request ACK Service A is Alive! Service B is Alive! Service A Service B Consequence of Processing
  40. 40. @toddlmontgomery Keepalive Keepalive Service A is Alive! Service B is Alive! Service A Service B Absence of Processing
  41. 41. RIP Route Deletion Step 0 - route info broadcast @30 seconds Step 1 (3 min) - Set Distance to Infinity (16) Step 2 (+1 min) - Delete Route Aside… RIP… aptly named
  42. 42. Aeron Driver Keepalive Time of Last Activity = Shared Variable Doesn’t need to be a message…
  43. 43. @toddlmontgomery Bye Bye Service A is gone! Service B is gone! Service A Service B Optimization, but insufficient with arbitrary failures
  44. 44. Liveness often exists across transient connectivity
  45. 45. So… Don’t conflate transport state with liveness! Like TCP connection state
  46. 46. BGP OSPF Transports … almost every protocol
  47. 47. Case Study 4 Elasti-What? Self-Similar Behavior
  48. 48. Request X Request X Request X Request X, X, X Multiple same/similar requests at the same time Response X, X, X
  49. 49. Similar Problem… Reliable Multicast
  50. 50. 1, 2, 3 1, 2, 3 1, 2, 3 1, 2, 3 Non-correlated loss X X X
  51. 51. NAK 1, 2, 3 NAK 2 NAK 1 NAK 3 Request individual lost data Retransmit 1, 2, 3
  52. 52. 1, 2, 3 1, 2, 3 1, 2, 3 1, 2, 3 Temporally/Spatially Correlated Loss X X X
  53. 53. NAK 2 NAK 2 NAK 2 NAK 2, 2, 2 Multiple requests for same data Retransmit 2, 2, 2
  54. 54. Request 2 Request 2 Request 2 Request 2, 2, 2 It’s a generic problem!
  55. 55. Request 2 Request 2 Request 2 Request 2, 2, 2 Overloading Responder & Network
  56. 56. Request 2 Don’t Immediately Request, Listen first Timeout! Request 2 Request 2 Suppress Request
  57. 57. Request 2 How long to wait & listen for? Timeout! Request 2 Request 2 Suppress Request
  58. 58. Statistics to the Rescue!
  59. 59. SRM Backoff RandomBackoff = [C1, C1+C2] * 1-way delay Random is more than good enough
  60. 60. Optimal Multicast Feedback double RandomBackoff(double T_maxBackoff, double groupSize) { double lambda = log(groupSize) + 1; double x = UniformRand(lambda/T_maxBackoff) + lambda / (T_maxBackoff*(exp(lambda)-1)); return ((T_maxBackoff/lambda) * log(x*(exp(lambda)-1)*(T_maxBackoff/lambda))); } Truncated Exponential Distribution
  61. 61. Request 2 Request 2 Request 2, 2 Must also shed duplicates on the responder Response 2, 2 Shed second “Request 2” if too soon X X
  62. 62. SRM PGM Aeron … http://en.wikipedia.org/wiki/Scalable_Reliable_Multicast http://www.eurecom.fr/en/publication/107/detail/optimal-multicast-feedback
  63. 63. Case Study 5 Hey, Slow Down! Flow (& Congestion) Control
  64. 64. @toddlmontgomery Data Data Data ACK ACK ACK Throughput = Data Length / RTT RTT Stop-And-Wait Flow Control
  65. 65. Delay Bandwidth BDP = (Byte / sec) * sec = Bytes BDP (Buffer)
  66. 66. @toddlmontgomery Data ACK RTT Throughput = N * Data Length / RTT … N Data “Blobs”
  67. 67. So… How big is N? This is surprisingly hard to answer
  68. 68. It depends…
  69. 69. Big… but Don’t overflow receiver Don’t overflow “network”
  70. 70. TCP Flow Control Receiver advertises N
  71. 71. TCP Congestion Control Sender probes for network N
  72. 72. TCP Sender min(Receiver N, Network N) Only go as fast as Network & Receiver
  73. 73. TCP Aeron … http://en.wikipedia.org/wiki/TCP_congestion-avoidance_algorithm
  74. 74. One more thing…
  75. 75. Queue Management Perhaps the single most useful thing!
  76. 76. Effective management of queues can not be overlooked
  77. 77. Unbounded Queues are bad, m’kay
  78. 78. Bounding implies Back pressure and/or Dropping
  79. 79. CoDel locally minimize delay in queue combat bufferbloat http://en.wikipedia.org/wiki/CoDel
  80. 80. Just a taste…
  81. 81. Takeaways!
  82. 82. Protocols are a rich source of solutions to complicated problems Protocols of interaction matter & can be tremendously impactful for better or worse…
  83. 83. @toddlmontgomery Questions? • IETF http://www.ietf.org/ • Aeron https://github.com/real-logic/Aeron • SlideShare http://www.slideshare.com/toddleemontgomery • Twitter @toddlmontgomery Thank You!

×