Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Microservice
Protocols of Interaction
Todd L. Montgomery
@toddlmontgomery
About me…
What is a Protocol?
Why should we care!?
@toddlmontgomery
pro·to·col noun ˈprō-tə-ˌkȯl, -ˌkōl, -ˌkäl, -kəl
...
3 b : a set of conventions governing the treatment ...
Protocols of Interaction
Wire Protocol, Method Calls, Shared Memory
Interactions, etc.
Microservice Architectures
Forced Decoupling
via an
“Asynchronous, Binary Boundary”
Forced Loose Coupling
The truth is…
Protocols can and do Couple
Protocols of Interaction
are quite important!
Protocols of Interaction Matter!
The Environment
Networks, and especially the Internet,
are Hostile Environments
Data can be
lost,
duplicated, and
re-ordered!!
TCP connections can…
be closed
unexpectedly
end in an
unknown state
be intercepted
by idiots, er Proxies
Duplicated
Re-Ordered
Lost
Which means
Data over TCP* might be…
* - When connections are re-established
Don’t assume the network
is reliable
https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing
Case Studies
Case Study 1
Loose Ordering
@toddlmontgomery
Sync
Requests
&
Responses
Request
Request
Request
Response
Response
Response
Throughput limited by
Round-...
@toddlmontgomery
Async
Requests
&
Responses
Request
Request
Request
Response
Response
Response
Throughput less limited by
...
@toddlmontgomery
Async
Requests
&
Responses
Correlation!
Request 0
Request 1
Request 2
Response 0
Response 1
Response 2
Aside…
Ordering is an Illusion!!
Compiler can re-order
Runtime can re-order
CPU can re-order
Ordering has to be imposed!
@toddlmontgomery
Async
Requests
&
Responses
Correlation!
Request 0
Request 1
Request 2
Response 0
Response 1
Response 2
@toddlmontgomery
Correlation!
Request 0
Request 1
Request 2
Response 0
Response 1
Response 2
Ordering
@toddlmontgomery
Correlation!
Request 0
Request 1
Request 2
Response 0
Response 1
Response 2
(Valid)
Re-Ordering
(one of m...
@toddlmontgomery
Handling the Unexpected
Request 0
Response 1
Invalid, Drop We only know of 0.
1 is unknown!
SCTP
HTTP/2 (SPDY)
…
most OSI Layer 4 protocols
Case Study 2
Can you hear me now?
Timeouts & Retries
@toddlmontgomery
Request
ACK
Processing
Handling the unexpected
@toddlmontgomery
Request
TimeoutInterval
X
@toddlmontgomery
Request
ACK
Processing
XTimeoutInterval
Retransmit at end of interval
@toddlmontgomery
ACK
Processing
…
Avoid Spurious Retransmits
Retransmit
Original
TimeoutInterval
@toddlmontgomery
Interval =
N x “typical” RTT
Account for processing delay
X
TimeoutInterval
“Average”
@toddlmontgomery
Measure! But very “noisy”?
RTTMeasurement
Variances in
processing,
transmission,
etc.
TCP Retransmit Timeout (RTO)
Err = M - A
A <- A + gErr
D <- D + h(|Err| - D)
RTO = A + 4D
M = measurement, A = smoothed av...
TCP Retransmit Timeout (RTO)
Err = M - A
A <- A + gErr
D <- D + h(|Err| - D)
RTO = A + 4D
Do you measure on a Retransmit? ...
@toddlmontgomery
Does processing twice hurt?
X
Original
ACK
Retrans
Process Once
Process Twice
TimeoutInterval
@toddlmontgomery
Are Original & Retransmit treated the same?
X
Original
ACK
Retrans
Process Once
Process Twice
TimeoutInte...
TCP
SCTP
Aeron
…
anything with reliability
Case Study 3
What I Need! When I Need It!
“Lifetime” Management
“Managing” Application
Working Set
or
Service Liveness
Caching Algorithms
LRU, MRU, PLRU, RR,
SLRU, LFU, …
“Liveness” is essential
@toddlmontgomery
Request
ACK
Service A
is Alive!
Service B
is Alive!
Service A Service B
Consequence of Processing
@toddlmontgomery
Keepalive
Keepalive
Service A
is Alive!
Service B
is Alive!
Service A Service B
Absence of Processing
RIP Route Deletion
Step 0 - route info broadcast @30 seconds
Step 1 (3 min) - Set Distance to Infinity (16)
Step 2 (+1 min)...
Aeron Driver Keepalive
Time of Last Activity = Shared Variable
Doesn’t need to be a message…
@toddlmontgomery
Bye
Bye
Service A
is gone!
Service B
is gone!
Service A Service B
Optimization, but insufficient with arb...
Liveness often exists across
transient connectivity
So…
Don’t conflate transport
state with liveness!
Like TCP connection state
Dead TCP connection
!=
Dead Service
Live TCP connection
!=
Live Service
BGP
OSPF
Transports
…
almost every protocol
Case Study 4
Elasti-What?
Self-Similar Behavior
Request
X
Request
X
Request
X
Request
X, X, X
Multiple same/similar requests at the same time
Response
X, X, X
Similar Problem…
Reliable Multicast
1, 2, 3
1, 2, 3 1, 2, 3 1, 2, 3
Non-correlated loss
X X X
NAK
1, 2, 3
NAK
2
NAK
1
NAK
3
Request individual lost data
Retransmit
1, 2, 3
1, 2, 3
1, 2, 3 1, 2, 3 1, 2, 3
Temporally/Spatially Correlated Loss
X X X
NAK
2
NAK
2
NAK
2
NAK
2, 2, 2
Multiple requests for same data
Retransmit
2, 2, 2
Request
2
Request
2
Request
2
Request
2, 2, 2
It’s a generic problem!
Request
2
Request
2
Request
2
Request
2, 2, 2
Overloading Responder & Network
Request
2
Publish Requests
Don’t Immediately Request, Listen first
Timeout!
Request
2
Request
2
Suppress
Request
Request
2
How long to wait & listen for?
Timeout!
Request
2
Request
2
Suppress
Request
Statistics to the Rescue!
SRM Backoff
RandomBackoff = [C1, C1+C2] * 1-way delay
Random is more than good enough
Request
2
Request
2
Request
2, 2
Must also shed duplicates on the responder
Response
2, 2
Shed second
“Request 2”
if too s...
SRM
PGM
Aeron
…
http://en.wikipedia.org/wiki/Scalable_Reliable_Multicast
http://www.eurecom.fr/en/publication/107/detail/o...
Case Study 5
Hey, Slow Down!
Flow (& Congestion) Control
@toddlmontgomery
Data
Data
Data
ACK
ACK
ACK
Throughput = Data Length / RTT
RTT
Stop-And-Wait
Flow Control
Delay
Bandwidth
BDP = (Byte / sec) * sec = Bytes
BDP
(Buffer)
@toddlmontgomery
Data
ACK
RTT
Throughput = N * Data Length / RTT
… N Data
“Blobs”
So…
How big is N?
This is surprisingly hard to answer
It depends…
Big… but
Don’t overflow receiver
Don’t overflow “network”
TCP Flow Control
Receiver advertises N
TCP Congestion Control
Sender probes for network N
TCP Sender
min(Receiver N, Network N)
Only go as fast as Network & Receiver
ReactiveStreams
Subscriber uses explicit request(N)
Publisher assumes best case
http://www.reactive-streams.org/
Takeaways!
Protocols of interaction are important &
can be tremendously impactful
for better or worse…
@toddlmontgomery
Questions?
• IETF http://www.ietf.org/
• Aeron https://github.com/real-logic/Aeron
• Twitter @toddlmontgo...
Microservice Protocols of Interaction
Microservice Protocols of Interaction
Microservice Protocols of Interaction
Microservice Protocols of Interaction
Upcoming SlideShare
Loading in …5
×

Microservice Protocols of Interaction

395 views

Published on

CRAFT 2016

Published in: Software
  • Be the first to comment

Microservice Protocols of Interaction

  1. 1. Microservice Protocols of Interaction Todd L. Montgomery @toddlmontgomery
  2. 2. About me…
  3. 3. What is a Protocol? Why should we care!?
  4. 4. @toddlmontgomery pro·to·col noun ˈprō-tə-ˌkȯl, -ˌkōl, -ˌkäl, -kəl ... 3 b : a set of conventions governing the treatment and especially the formatting of data in an electronic communications system <network protocols> ... 3 a : a code prescribing strict adherence to correct etiquette and precedence (as in diplomatic exchange and in the military services) <a breach of protocol>
  5. 5. Protocols of Interaction Wire Protocol, Method Calls, Shared Memory Interactions, etc.
  6. 6. Microservice Architectures
  7. 7. Forced Decoupling via an “Asynchronous, Binary Boundary”
  8. 8. Forced Loose Coupling
  9. 9. The truth is… Protocols can and do Couple
  10. 10. Protocols of Interaction are quite important! Protocols of Interaction Matter!
  11. 11. The Environment
  12. 12. Networks, and especially the Internet, are Hostile Environments
  13. 13. Data can be lost, duplicated, and re-ordered!!
  14. 14. TCP connections can… be closed unexpectedly end in an unknown state be intercepted by idiots, er Proxies
  15. 15. Duplicated Re-Ordered Lost Which means Data over TCP* might be… * - When connections are re-established
  16. 16. Don’t assume the network is reliable https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing
  17. 17. Case Studies
  18. 18. Case Study 1 Loose Ordering
  19. 19. @toddlmontgomery Sync Requests & Responses Request Request Request Response Response Response Throughput limited by Round-Trip Time (RTT)!
  20. 20. @toddlmontgomery Async Requests & Responses Request Request Request Response Response Response Throughput less limited by Round-Trip Time!
  21. 21. @toddlmontgomery Async Requests & Responses Correlation! Request 0 Request 1 Request 2 Response 0 Response 1 Response 2
  22. 22. Aside…
  23. 23. Ordering is an Illusion!!
  24. 24. Compiler can re-order Runtime can re-order CPU can re-order
  25. 25. Ordering has to be imposed!
  26. 26. @toddlmontgomery Async Requests & Responses Correlation! Request 0 Request 1 Request 2 Response 0 Response 1 Response 2
  27. 27. @toddlmontgomery Correlation! Request 0 Request 1 Request 2 Response 0 Response 1 Response 2 Ordering
  28. 28. @toddlmontgomery Correlation! Request 0 Request 1 Request 2 Response 0 Response 1 Response 2 (Valid) Re-Ordering (one of many)
  29. 29. @toddlmontgomery Handling the Unexpected Request 0 Response 1 Invalid, Drop We only know of 0. 1 is unknown!
  30. 30. SCTP HTTP/2 (SPDY) … most OSI Layer 4 protocols
  31. 31. Case Study 2 Can you hear me now? Timeouts & Retries
  32. 32. @toddlmontgomery Request ACK Processing Handling the unexpected
  33. 33. @toddlmontgomery Request TimeoutInterval X
  34. 34. @toddlmontgomery Request ACK Processing XTimeoutInterval Retransmit at end of interval
  35. 35. @toddlmontgomery ACK Processing … Avoid Spurious Retransmits Retransmit Original TimeoutInterval
  36. 36. @toddlmontgomery Interval = N x “typical” RTT Account for processing delay X TimeoutInterval “Average”
  37. 37. @toddlmontgomery Measure! But very “noisy”? RTTMeasurement Variances in processing, transmission, etc.
  38. 38. TCP Retransmit Timeout (RTO) Err = M - A A <- A + gErr D <- D + h(|Err| - D) RTO = A + 4D M = measurement, A = smoothed average, D = smoothed mean deviation, g and h = gain constants (0 to 1)
  39. 39. TCP Retransmit Timeout (RTO) Err = M - A A <- A + gErr D <- D + h(|Err| - D) RTO = A + 4D Do you measure on a Retransmit? NO!
  40. 40. @toddlmontgomery Does processing twice hurt? X Original ACK Retrans Process Once Process Twice TimeoutInterval
  41. 41. @toddlmontgomery Are Original & Retransmit treated the same? X Original ACK Retrans Process Once Process Twice TimeoutInterval
  42. 42. TCP SCTP Aeron … anything with reliability
  43. 43. Case Study 3 What I Need! When I Need It! “Lifetime” Management
  44. 44. “Managing” Application Working Set or Service Liveness
  45. 45. Caching Algorithms LRU, MRU, PLRU, RR, SLRU, LFU, … “Liveness” is essential
  46. 46. @toddlmontgomery Request ACK Service A is Alive! Service B is Alive! Service A Service B Consequence of Processing
  47. 47. @toddlmontgomery Keepalive Keepalive Service A is Alive! Service B is Alive! Service A Service B Absence of Processing
  48. 48. RIP Route Deletion Step 0 - route info broadcast @30 seconds Step 1 (3 min) - Set Distance to Infinity (16) Step 2 (+1 min) - Delete Route Aside… RIP… aptly named
  49. 49. Aeron Driver Keepalive Time of Last Activity = Shared Variable Doesn’t need to be a message…
  50. 50. @toddlmontgomery Bye Bye Service A is gone! Service B is gone! Service A Service B Optimization, but insufficient with arbitrary failures
  51. 51. Liveness often exists across transient connectivity
  52. 52. So… Don’t conflate transport state with liveness! Like TCP connection state
  53. 53. Dead TCP connection != Dead Service
  54. 54. Live TCP connection != Live Service
  55. 55. BGP OSPF Transports … almost every protocol
  56. 56. Case Study 4 Elasti-What? Self-Similar Behavior
  57. 57. Request X Request X Request X Request X, X, X Multiple same/similar requests at the same time Response X, X, X
  58. 58. Similar Problem… Reliable Multicast
  59. 59. 1, 2, 3 1, 2, 3 1, 2, 3 1, 2, 3 Non-correlated loss X X X
  60. 60. NAK 1, 2, 3 NAK 2 NAK 1 NAK 3 Request individual lost data Retransmit 1, 2, 3
  61. 61. 1, 2, 3 1, 2, 3 1, 2, 3 1, 2, 3 Temporally/Spatially Correlated Loss X X X
  62. 62. NAK 2 NAK 2 NAK 2 NAK 2, 2, 2 Multiple requests for same data Retransmit 2, 2, 2
  63. 63. Request 2 Request 2 Request 2 Request 2, 2, 2 It’s a generic problem!
  64. 64. Request 2 Request 2 Request 2 Request 2, 2, 2 Overloading Responder & Network
  65. 65. Request 2 Publish Requests Don’t Immediately Request, Listen first Timeout! Request 2 Request 2 Suppress Request
  66. 66. Request 2 How long to wait & listen for? Timeout! Request 2 Request 2 Suppress Request
  67. 67. Statistics to the Rescue!
  68. 68. SRM Backoff RandomBackoff = [C1, C1+C2] * 1-way delay Random is more than good enough
  69. 69. Request 2 Request 2 Request 2, 2 Must also shed duplicates on the responder Response 2, 2 Shed second “Request 2” if too soon X X
  70. 70. SRM PGM Aeron … http://en.wikipedia.org/wiki/Scalable_Reliable_Multicast http://www.eurecom.fr/en/publication/107/detail/optimal-multicast-feedback
  71. 71. Case Study 5 Hey, Slow Down! Flow (& Congestion) Control
  72. 72. @toddlmontgomery Data Data Data ACK ACK ACK Throughput = Data Length / RTT RTT Stop-And-Wait Flow Control
  73. 73. Delay Bandwidth BDP = (Byte / sec) * sec = Bytes BDP (Buffer)
  74. 74. @toddlmontgomery Data ACK RTT Throughput = N * Data Length / RTT … N Data “Blobs”
  75. 75. So… How big is N? This is surprisingly hard to answer
  76. 76. It depends…
  77. 77. Big… but Don’t overflow receiver Don’t overflow “network”
  78. 78. TCP Flow Control Receiver advertises N
  79. 79. TCP Congestion Control Sender probes for network N
  80. 80. TCP Sender min(Receiver N, Network N) Only go as fast as Network & Receiver
  81. 81. ReactiveStreams Subscriber uses explicit request(N) Publisher assumes best case http://www.reactive-streams.org/
  82. 82. Takeaways!
  83. 83. Protocols of interaction are important & can be tremendously impactful for better or worse…
  84. 84. @toddlmontgomery Questions? • IETF http://www.ietf.org/ • Aeron https://github.com/real-logic/Aeron • Twitter @toddlmontgomery Thank You!

×