Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Running with the Devil: Mechanical Sympathetic Networking

734 views

Published on

  • Be the first to comment

Running with the Devil: Mechanical Sympathetic Networking

  1. 1. Running with the Devil:Mechanical Sympathetic NetworkingTodd L. Montgomery@toddlmontgomeryInformatica Ultra Messaging Architecture
  2. 2. Tail of a Networking Stack Beastie Direct Descendants FreeBSD NetBSD OpenBSD ... Darwin (Mac OS X) also Windows, Solaris, even Linux, Android, ...
  3. 3. Domain: TCP & UDP
  4. 4. It’s a Trap! TCP MSS 1448 bytes BigRequest(256 KB) ...OverlyLong RTT Response Symptom: Overly Long Round-Trip Time for a Request + Response
  5. 5. It’s a Trap! TCP MSS 1448 bytes Only Specific Request Sizes? BigRequest(256 KB) ...OverlyLong RTT Response Symptom: Overly Long Round-Trip Time for a Request + Response
  6. 6. It’s a Trap! TCP MSS 1448 bytes Only Specific Request Sizes? OSs? BigRequest(256 KB) ...OverlyLong RTT Response Symptom: Overly Long Round-Trip Time for a Request + Response
  7. 7. It’s a Trap! TCP MSS 1448 bytes Only Specific Request Sizes? OSs? BigRequest(256 KB) Solution 1 ... Set SO_RCVBUFOverlyLong RTT Response Symptom: Overly Long Round-Trip Time for a Request + Response
  8. 8. It’s a Trap! TCP MSS 1448 bytes Only Specific Request Sizes? OSs? BigRequest(256 KB) Solution 1 ... Set SO_RCVBUF Pops up again!OverlyLong RTT Response Symptom: Overly Long Round-Trip Time for a Request + Response
  9. 9. It’s a Trap! TCP MSS 1448 bytes Only Specific Request Sizes? OSs? BigRequest(256 KB) Solution 1 ... Set SO_RCVBUF Pops up again! Solution 2OverlyLong Set TCP_NODELAY RTT Response Symptom: Overly Long Round-Trip Time for a Request + Response
  10. 10. It’s a Trap! TCP MSS 1448 bytes Only Specific Request Sizes? OSs? BigRequest(256 KB) Solution 1 ... Set SO_RCVBUF Pops up again! Solution 2OverlyLong Set TCP_NODELAY RTT YES! but CPU Skyrockets! Response Symptom: Overly Long Round-Trip Time for a Request + Response
  11. 11. It’s a Trap! TCP MSS 1448 bytes Only Specific Request Sizes? OSs? BigRequest(256 KB) Solution 1 ... Set SO_RCVBUF Pops up again! Solution 2OverlyLong Set TCP_NODELAY RTT ? YES! but CPU Skyrockets! Response Symptom: Overly Long Round-Trip Time for a Request + Response
  12. 12. It’s a Trap! TCP MSS 1448 bytes Only Specific Request Sizes? OSs? BigRequest(256 KB) Solution 1 ... Set SO_RCVBUF Pops up again! Solution 2OverlyLong Set TCP_NODELAY RTT ? YES! but CPU Skyrockets! Response Well understood bad interaction! Symptom: Overly Long Round-Trip Time for a Request + Response
  13. 13. Challenges with TCP
  14. 14. Challenges with TCP Nagle“Don’t send ‘small’ segments if un- acknowledged segments exist”
  15. 15. Challenges with TCP Nagle Delayed ACKs“Don’t send ‘small’ segments if un- + Don’t acknowledge data immediately. Wait a small acknowledged period of time (200 ms) segments exist” for responses to be generated and piggyback the response with the acknowledgement
  16. 16. Challenges with TCP Nagle Delayed ACKs Temporary Deadlock Waiting on an“Don’t send ‘small’ segments if un- + Don’t acknowledge data immediately. Wait a small = acknowledgement to send acknowledged period of time (200 ms) any waiting small segment. segments exist” for responses to be But acknowledgement is generated and piggyback delayed waiting on more the response with the data or a timeout acknowledgement
  17. 17. Challenges with TCP Nagle Delayed ACKs Temporary Deadlock Waiting on an“Don’t send ‘small’ segments if un- + Don’t acknowledge data immediately. Wait a small = acknowledgement to send acknowledged period of time (200 ms) any waiting small segment. segments exist” for responses to be But acknowledgement is generated and piggyback delayed waiting on more the response with the data or a timeout acknowledgement Solutions?
  18. 18. Little Experiment TCP MSS 16344 bytes (loopback) BIG Request (32 MB) ... Response (1 KB) Question: Does the size of a send matter that much?
  19. 19. Little Experiment TCP MSS 16344 bytes Chunk Size RTT (msec) (loopback) BIG 1500 32 Request (32 MB) 4096 16 ... 8192 12 Response (1 KB) Question: Does the size of a send matter that much?
  20. 20. Little Experiment TCP MSS 16344 bytes Chunk Size RTT (msec) (loopback) BIG 1500 32 Request (32 MB) 4096 16 Dramatically ... 8192 12 Higher CPU Response (1 KB) Question: Does the size of a send matter that much?
  21. 21. Little Experiment TCP MSS 16344 bytes Chunk Size RTT (msec) (loopback) BIG 1500 32 Request (32 MB) 4096 16 Dramatically ... 8192 12 Higher CPU Take Away(s) Response (1 KB) Question: Does the size of a send matter that much?
  22. 22. Little Experiment TCP MSS 16344 bytes Chunk Size RTT (msec) (loopback) BIG 1500 32 Request (32 MB) 4096 16 Dramatically ... 8192 12 Higher CPU Take Away(s) Response (1 KB) “Small” messages are evil? Question: Does the size of a send matter that much?
  23. 23. Little Experiment TCP MSS 16344 bytes Chunk Size RTT (msec) (loopback) BIG 1500 32 Request (32 MB) 4096 16 Dramatically ... 8192 12 Higher CPU Take Away(s) Response (1 KB) “Small” messages are evil? Chunks smaller than MSS are evil? Question: Does the size of a send matter that much?
  24. 24. Little Experiment TCP MSS 16344 bytes Chunk Size RTT (msec) (loopback) BIG 1500 32 Request (32 MB) 4096 16 Dramatically ... 8192 12 Higher CPU Take Away(s) Response (1 KB) “Small” messages are evil? Chunks smaller than MSS are evil? ... no, or not quite ... Question: Does the size of a send matter that much?
  25. 25. Little Experiment TCP MSS 16344 bytes Chunk Size RTT (msec) (loopback) BIG 1500 32 Request (32 MB) 4096 16 Dramatically ... 8192 12 Higher CPU Take Away(s) Response (1 KB) “Small” messages are evil? Chunks smaller than MSS are evil? ... no, or not quite ... OS pagesize (4096 bytes) matters! Question: Does the size of a send matter that much?
  26. 26. Little Experiment TCP MSS 16344 bytes Chunk Size RTT (msec) (loopback) BIG 1500 32 Request (32 MB) 4096 16 Dramatically ... 8192 12 Higher CPU Take Away(s) Response (1 KB) “Small” messages are evil? Chunks smaller than MSS are evil? ... no, or not quite ... OS pagesize (4096 bytes) matters! Why? Question: Does the size of a send matter that much?
  27. 27. Little Experiment TCP MSS 16344 bytes Chunk Size RTT (msec) (loopback) BIG 1500 32 Request (32 MB) 4096 16 Dramatically ... 8192 12 Higher CPU Take Away(s) Response (1 KB) “Small” messages are evil? Chunks smaller than MSS are evil? ... no, or not quite ... OS pagesize (4096 bytes) matters! Why? Kernel boundary crossings matter! Question: Does the size of a send matter that much?
  28. 28. Little Experiment TCP MSS 16344 bytes Chunk Size RTT (msec) (loopback) BIG 1500 32 Request (32 MB) 4096 16 Dramatically ... 8192 12 Higher CPU Take Away(s) Response (1 KB) “Small” messages are evil? Chunks smaller than MSS are evil? ... no, or not quite ... OS pagesize (4096 bytes) matters! What about sendfile(2) and Why? FileChannel.transferTo()? Kernel boundary crossings matter! Question: Does the size of a send matter that much?
  29. 29. Challenges with UDP
  30. 30. Challenges with UDP Not a Stream Message boundaries matter! (kernel boundary crossings)
  31. 31. Challenges with UDP Not a Stream Message boundaries matter! (kernel boundary crossings) No NagleSmall messages not batched
  32. 32. Challenges with UDP Not Reliable Not a StreamLoss recovery is apps Message boundaries responsibility matter! (kernel boundary crossings) No NagleSmall messages not batched
  33. 33. Challenges with UDP Not Reliable Not a StreamLoss recovery is apps Message boundaries responsibility matter! (kernel boundary crossings) No Nagle Causes of LossSmall messages not ‣Receiver buffer overrun batched ‣Network congestion (neither are strictly the apps fault)
  34. 34. Challenges with UDP Not Reliable Not a Stream No Flow ControlLoss recovery is apps Message boundaries Potential to overrun a responsibility matter! receiver (kernel boundary crossings) No Nagle Causes of LossSmall messages not ‣Receiver buffer overrun batched ‣Network congestion (neither are strictly the apps fault)
  35. 35. Challenges with UDP Not Reliable Not a Stream No Flow ControlLoss recovery is apps Message boundaries Potential to overrun a responsibility matter! receiver (kernel boundary crossings) No Congestion No Nagle Causes of Loss ControlSmall messages not ‣Receiver buffer overrun batched ‣Network congestion Potential impact to all (neither are strictly the apps fault) competing traffic!! (unconstrained flow)
  36. 36. Network Utilization & Datagrams
  37. 37. Network Utilization & Datagrams Data Data + Control
  38. 38. Network Utilization & Datagrams “The Data percentage of traffic that is data” Data + Control
  39. 39. Network Utilization & Datagrams “The Data percentage of traffic that is data” Data + Control No. of 200 Byte Utilization (%) App Messages 1 87.7 5 97.3 20 99.3 * IP Header = 20 bytes, UDP Header = 8 bytes, no response
  40. 40. Network Utilization & Datagrams “The Data percentage of traffic that is data” Data + Control No. of 200 Byte Utilization (%) App Messages 1 87.7 Plus Fewer 5 97.3 interrupts! 20 99.3 * IP Header = 20 bytes, UDP Header = 8 bytes, no response
  41. 41. Network Utilization & Datagrams “The Data percentage of traffic that is data” Data + Control Batching? No. of 200 Byte Utilization (%) App Messages 1 87.7 Plus Fewer 5 97.3 interrupts! 20 99.3 * IP Header = 20 bytes, UDP Header = 8 bytes, no response
  42. 42. Application-Level Batching?
  43. 43. Application-Level Batching? Application Specific Knowledge Applications sometimes know when to send small and when to batch* HTTP (headers + body), etc.
  44. 44. Application-Level Batching? Application + Specific Knowledge Applications sometimes know when to send small and when to batch* HTTP (headers + body), etc.
  45. 45. Application-Level Batching? Application Performance + Limitations & Specific Knowledge Tradeoffs Applications Nagle, Delayed ACKs, sometimes know Chunk Sizes, UDP when to send small Network Util, etc. and when to batch* HTTP (headers + body), etc.
  46. 46. Application-Level Batching? Application Performance + Limitations & = Specific Knowledge Tradeoffs Applications Nagle, Delayed ACKs, sometimes know Chunk Sizes, UDP when to send small Network Util, etc. and when to batch* HTTP (headers + body), etc.
  47. 47. Application-Level Batching? Application Performance Batching by + Limitations & = Specific the Application Knowledge Tradeoffs Applications can Nagle, Delayed ACKs, optimize and make Applications the tradeoffs sometimes know Chunk Sizes, UDP Network Util, etc. necessary at the time when to send small they are needed and when to batch* HTTP (headers + body), etc.
  48. 48. Application-Level Batching? Application Performance Batching by + Limitations & = Specific the Application Knowledge Tradeoffs Applications can Nagle, Delayed ACKs, optimize and make Applications the tradeoffs sometimes know Chunk Sizes, UDP Network Util, etc. necessary at the time when to send small they are needed and when to batch* HTTP (headers + body), etc. Addressing
  49. 49. Application-Level Batching? Application Performance Batching by + Limitations & = Specific the Application Knowledge Tradeoffs Applications can Nagle, Delayed ACKs, optimize and make Applications the tradeoffs sometimes know Chunk Sizes, UDP Network Util, etc. necessary at the time when to send small they are needed and when to batch* HTTP (headers + body), etc. Addressing ‣Request/Response idiosyncrasies
  50. 50. Application-Level Batching? Application Performance Batching by + Limitations & = Specific the Application Knowledge Tradeoffs Applications can Nagle, Delayed ACKs, optimize and make Applications the tradeoffs sometimes know Chunk Sizes, UDP Network Util, etc. necessary at the time when to send small they are needed and when to batch* HTTP (headers + body), etc. Addressing ‣Request/Response idiosyncrasies ‣Send-side optimizations
  51. 51. Application-Level Batching? Application Performance Batching by + Limitations & = Specific the Application Knowledge Tradeoffs Applications can Nagle, Delayed ACKs, optimize and make Applications the tradeoffs sometimes know Chunk Sizes, UDP Network Util, etc. necessary at the time when to send small they are needed and when to batch* HTTP (headers + body), etc. Addressing ‣Request/Response idiosyncrasies ‣Send-side optimizations
  52. 52. Batching setsockopt()s
  53. 53. Batching setsockopt()s TCP_CORK
  54. 54. Batching setsockopt()s TCP_CORK‣Linux only
  55. 55. Batching setsockopt()s TCP_CORK‣Linux only‣Only send when MSS full, when unCORKed, or ...
  56. 56. Batching setsockopt()s TCP_CORK‣Linux only‣Only send when MSS full, when unCORKed, or ...‣... after 200 msec
  57. 57. Batching setsockopt()s TCP_CORK‣Linux only‣Only send when MSS full, when unCORKed, or ...‣... after 200 msec‣unCORKing requires kernel boundary crossing
  58. 58. Batching setsockopt()s TCP_CORK‣Linux only‣Only send when MSS full, when unCORKed, or ...‣... after 200 msec‣unCORKing requires kernel boundary crossing‣Intended to work with TCP_NODELAY
  59. 59. Batching setsockopt()s TCP_CORK‣Linux only‣Only send when MSS full, when unCORKed, or ...‣... after 200 msec‣unCORKing requires kernel boundary crossing‣Intended to work with TCP_NODELAY TCP_NOPUSH
  60. 60. Batching setsockopt()s TCP_CORK‣Linux only‣Only send when MSS full, when unCORKed, or ...‣... after 200 msec‣unCORKing requires kernel boundary crossing‣Intended to work with TCP_NODELAY TCP_NOPUSH ‣BSD (some) only
  61. 61. Batching setsockopt()s TCP_CORK‣Linux only‣Only send when MSS full, when unCORKed, or ...‣... after 200 msec‣unCORKing requires kernel boundary crossing‣Intended to work with TCP_NODELAY TCP_NOPUSH ‣BSD (some) only ‣Only send when SO_SNDBUF full
  62. 62. Batching setsockopt()s TCP_CORK‣Linux only‣Only send when MSS full, when unCORKed, or ...‣... after 200 msec‣unCORKing requires kernel boundary crossing‣Intended to work with TCP_NODELAY TCP_NOPUSH ‣BSD (some) only ‣Only send when SO_SNDBUF full ‣Mostly broken on Darwin
  63. 63. Batching setsockopt()s TCP_CORK‣Linux only When to Batch?‣Only send when MSS full, when unCORKed, or ...‣... after 200 msec‣unCORKing requires kernel boundary crossing‣Intended to work with TCP_NODELAY TCP_NOPUSH When to Flush? ‣BSD (some) only ‣Only send when SO_SNDBUF full ‣Mostly broken on Darwin
  64. 64. Flush? Batch?
  65. 65. Flush? Batch? Batch when...
  66. 66. Flush? Batch? Batch when...1. Application logic
  67. 67. Flush? Batch? Batch when...1. Application logic2. More data is likely to follow
  68. 68. Flush? Batch? Batch when...1. Application logic2. More data is likely to follow3. Unlikely to get data out before next one
  69. 69. Flush? Batch? Batch when... Flush when...1. Application logic2. More data is likely to follow3. Unlikely to get data out before next one
  70. 70. Flush? Batch? Batch when... Flush when...1. Application logic 1. Application logic2. More data is likely to follow3. Unlikely to get data out before next one
  71. 71. Flush? Batch? Batch when... Flush when...1. Application logic 1. Application logic2. More data is likely to follow 2. More data is unlikely to follow3. Unlikely to get data out before next one
  72. 72. Flush? Batch? Batch when... Flush when...1. Application logic 1. Application logic2. More data is likely to follow 2. More data is unlikely to follow3. Unlikely to get data out before next one 3. Timeout (200 msec?)
  73. 73. Flush? Batch? Batch when... Flush when...1. Application logic 1. Application logic2. More data is likely to follow 2. More data is unlikely to follow3. Unlikely to get data out before next one 3. Timeout (200 msec?) 4. Likely to get data out before next one
  74. 74. Flush? Batch? Batch when... Flush when...1. Application logic 1. Application logic2. More data is likely to follow 2. More data is unlikely to follow3. Unlikely to get data out before next one 3. Timeout (200 msec?) 4. Likely to get data out before next one An Automatic Transmission for Batching
  75. 75. Flush? Batch? Batch when... Flush when...1. Application logic 1. Application logic2. More data is likely to follow 2. More data is unlikely to follow3. Unlikely to get data out before next one 3. Timeout (200 msec?) 4. Likely to get data out before next one An Automatic Transmission for Batching 1. Always default to flushing
  76. 76. Flush? Batch? Batch when... Flush when...1. Application logic 1. Application logic2. More data is likely to follow 2. More data is unlikely to follow3. Unlikely to get data out before next one 3. Timeout (200 msec?) 4. Likely to get data out before next one An Automatic Transmission for Batching 1. Always default to flushing 2. Batch when Mean time between sends < Mean time to send (EWMA?)
  77. 77. Flush? Batch? Batch when... Flush when...1. Application logic 1. Application logic2. More data is likely to follow 2. More data is unlikely to follow3. Unlikely to get data out before next one 3. Timeout (200 msec?) 4. Likely to get data out before next one An Automatic Transmission for Batching 1. Always default to flushing 2. Batch when Mean time between sends < Mean time to send (EWMA?) 3. Flush on timeout as safety measure
  78. 78. Flush? Batch? Batch when... Flush when...1. Application logic 1. Application logic2. More data is likely to follow 2. More data is unlikely to follow3. Unlikely to get data out before next one 3. Timeout (200 msec?) 4. Likely to get data out before next one An Automatic Transmission for Batching 1. Always default to flushing 2. Batch when Mean time between sends < Mean time to send (EWMA?) 3. Flush on timeout as safety measure Question: Can you batch too much?
  79. 79. Flush? Batch? Batch when... Flush when...1. Application logic 1. Application logic2. More data is likely to follow 2. More data is unlikely to follow3. Unlikely to get data out before next one 3. Timeout (200 msec?) 4. Likely to get data out before next one An Automatic Transmission for Batching 1. Always default to flushing 2. Batch when Mean time between sends < Mean time to send (EWMA?) 3. Flush on timeout as safety measure Question: Can you batch too much? YES!
  80. 80. Flush? Batch? Batch when... Flush when...1. Application logic 1. Application logic2. More data is likely to follow 2. More data is unlikely to follow3. Unlikely to get data out before next one 3. Timeout (200 msec?) 4. Likely to get data out before next one An Automatic Transmission for Batching 1. Always default to flushing 2. Batch when Mean time between sends < Mean time to send (EWMA?) 3. Flush on timeout as safety measure Large UDP (fragmentation) + Question: Can you batch too much? YES! non-trivial loss probability
  81. 81. A Batching Architecture
  82. 82. A Batching Architecture Socket(s)
  83. 83. A Batching Architecture Blocking Sends Socket(s)
  84. 84. A Batching Architecture Blocking Sends MTBS: Mean Time Between Sends Socket(s) MTTS: Mean Time To Send (on socket) “Smart” Batching Pulls off all waiting data when possible (automatically batches when MTBS < MTTS)
  85. 85. A Batching Architecture Blocking Sends MTBS: Mean Time Between Sends Socket(s) MTTS: Mean Time To Send (on socket) “Smart” Batching Pulls off all waiting data when possible (automatically batches when MTBS < MTTS) Advantages
  86. 86. A Batching Architecture Blocking Sends MTBS: Mean Time Between Sends Socket(s) MTTS: Mean Time To Send (on socket) “Smart” Batching Pulls off all waiting data when possible (automatically batches when MTBS < MTTS) Advantages‣Non-contended send threads
  87. 87. A Batching Architecture Blocking Sends MTBS: Mean Time Between Sends Socket(s) MTTS: Mean Time To Send (on socket) “Smart” Batching Pulls off all waiting data when possible (automatically batches when MTBS < MTTS) Advantages‣Non-contended send threads‣Decoupled API and socket sends
  88. 88. A Batching Architecture Blocking Sends MTBS: Mean Time Between Sends Socket(s) MTTS: Mean Time To Send (on socket) “Smart” Batching Pulls off all waiting data when possible (automatically batches when MTBS < MTTS) Advantages‣Non-contended send threads‣Decoupled API and socket sends‣Single writer principle for sockets
  89. 89. A Batching Architecture Blocking Sends MTBS: Mean Time Between Sends Socket(s) MTTS: Mean Time To Send (on socket) “Smart” Batching Pulls off all waiting data when possible (automatically batches when MTBS < MTTS) Advantages‣Non-contended send threads‣Decoupled API and socket sends‣Single writer principle for sockets‣Built-in back pressure (bounded ring buffer)
  90. 90. A Batching Architecture Blocking Sends MTBS: Mean Time Between Sends Socket(s) MTTS: Mean Time To Send (on socket) “Smart” Batching Pulls off all waiting data when possible (automatically batches when MTBS < MTTS) Advantages‣Non-contended send threads‣Decoupled API and socket sends‣Single writer principle for sockets‣Built-in back pressure (bounded ring buffer)‣Easy to add (async) send notification
  91. 91. A Batching Architecture Blocking Sends MTBS: Mean Time Between Sends Socket(s) MTTS: Mean Time To Send (on socket) “Smart” Batching Pulls off all waiting data when possible (automatically batches when MTBS < MTTS) Advantages‣Non-contended send threads‣Decoupled API and socket sends‣Single writer principle for sockets‣Built-in back pressure (bounded ring buffer)‣Easy to add (async) send notification‣Easy to add rate limiting
  92. 92. A Batching Architecture Blocking Sends MTBS: Mean Time Between Sends Socket(s) MTTS: Mean Time To Send (on socket) “Smart” Batching Pulls off all waiting data when possible (automatically batches when MTBS < MTTS) Advantages‣Non-contended send threads Can be re-used for other‣Decoupled API and socket sends batching tasks (like file I/O, DB‣Single writer principle for sockets writes, and pipeline requests)!‣Built-in back pressure (bounded ring buffer)‣Easy to add (async) send notification‣Easy to add rate limiting
  93. 93. Multi-Message Send/Receive
  94. 94. Multi-Message Send/Receive sendmmsg(2)
  95. 95. Multi-Message Send/Receive sendmmsg(2)‣Linux 3.x only
  96. 96. Multi-Message Send/Receive sendmmsg(2)‣Linux 3.x only‣Send multiple datagrams with single call
  97. 97. Multi-Message Send/Receive sendmmsg(2)‣Linux 3.x only‣Send multiple datagrams with single call‣Fits nicely with batching architecture
  98. 98. Multi-Message Send/Receive sendmmsg(2)‣Linux 3.x only‣Send multiple datagrams with single call‣Fits nicely with batching architecture Compliments gather send (sendmsg, writev) - which you can do in the same call!
  99. 99. Multi-Message Send/Receive sendmmsg(2) recvmmsg(2)‣Linux 3.x only‣Send multiple datagrams with single call‣Fits nicely with batching architecture Compliments gather send (sendmsg, writev) - which you can do in the same call!
  100. 100. Multi-Message Send/Receive sendmmsg(2) recvmmsg(2)‣Linux 3.x only ‣Linux 3.x only‣Send multiple datagrams with single call‣Fits nicely with batching architecture Compliments gather send (sendmsg, writev) - which you can do in the same call!
  101. 101. Multi-Message Send/Receive sendmmsg(2) recvmmsg(2)‣Linux 3.x only ‣Linux 3.x only‣Send multiple datagrams with single call ‣Receive multiple datagrams with single call‣Fits nicely with batching architecture Compliments gather send (sendmsg, writev) - which you can do in the same call!
  102. 102. Multi-Message Send/Receive sendmmsg(2) recvmmsg(2)‣Linux 3.x only ‣Linux 3.x only‣Send multiple datagrams with single call ‣Receive multiple datagrams with single call‣Fits nicely with batching architecture ‣So, so, sooo SMOKIN’ HOT! Compliments gather send (sendmsg, writev) - which you can do in the same call!
  103. 103. Multi-Message Send/Receive sendmmsg(2) recvmmsg(2)‣Linux 3.x only ‣Linux 3.x only‣Send multiple datagrams with single call ‣Receive multiple datagrams with single call‣Fits nicely with batching architecture ‣So, so, sooo SMOKIN’ HOT! Compliments gather send (sendmsg, writev) - which you can do in the same call! Advantages
  104. 104. Multi-Message Send/Receive sendmmsg(2) recvmmsg(2)‣Linux 3.x only ‣Linux 3.x only‣Send multiple datagrams with single call ‣Receive multiple datagrams with single call‣Fits nicely with batching architecture ‣So, so, sooo SMOKIN’ HOT! Compliments gather send (sendmsg, writev) - which you can do in the same call! Advantages ‣Reduced kernel boundary crossings
  105. 105. Multi-Message Send/Receive sendmmsg(2) recvmmsg(2)‣Linux 3.x only ‣Linux 3.x only‣Send multiple datagrams with single call ‣Receive multiple datagrams with single call‣Fits nicely with batching architecture ‣So, so, sooo SMOKIN’ HOT! Compliments gather send (sendmsg, writev) - which you can do in the same call! Advantages ‣Reduced kernel boundary crossings
  106. 106. Multi-Message Send/Receive sendmmsg(2) recvmmsg(2)‣Linux 3.x only ‣Linux 3.x only‣Send multiple datagrams with single call ‣Receive multiple datagrams with single call‣Fits nicely with batching architecture ‣So, so, sooo SMOKIN’ HOT! Compliments gather send Scatter recv (recvmsg, readv) is (sendmsg, writev) - which you usually not worth the trouble can do in the same call! Advantages ‣Reduced kernel boundary crossings
  107. 107. Domain: Protocol Design
  108. 108. Application-Level Framing
  109. 109. Application-Level Framing Clark and Tennenhouse, ACM SIGCOMM CCR, VOlume 20, Issue 4, Sept. 1990
  110. 110. Application-Level Framing 0 X bytes File to Transfer Clark and Tennenhouse, ACM SIGCOMM CCR, VOlume 20, Issue 4, Sept. 1990
  111. 111. Application-Level Framing 0 X bytes File to Transfer Split into Application Data Unit (constant size except maybe last) S: ADU 1 ADU 2 ADU 3 ADU 4 ADU 5 ADU 6 ADU 7 ADU 8 ADU 9 Clark and Tennenhouse, ACM SIGCOMM CCR, VOlume 20, Issue 4, Sept. 1990
  112. 112. Application-Level Framing 0 X bytes File to Transfer Split into Application Data Unit (constant size except maybe last) S: ADU 1 ADU 2 ADU 3 ADU 4 ADU 5 ADU 6 ADU 7 ADU 8 ADU 9 R: ADU 1 ADU 2 ADU 4 ADU 5 ADU 6 ADU 7 ADU 8 Clark and Tennenhouse, ACM SIGCOMM CCR, VOlume 20, Issue 4, Sept. 1990
  113. 113. Application-Level Framing 0 X bytes File to Transfer Split into Application Data Unit (constant size except maybe last) S: ADU 1 ADU 2 ADU 3 ADU 4 ADU 5 ADU 6 ADU 7 ADU 8 ADU 9 R: ADU 1 ADU 2 ADU 4 ADU 5 ADU 6 ADU 7 ADU 8 Recover ADU 3 ADU 9 Clark and Tennenhouse, ACM SIGCOMM CCR, VOlume 20, Issue 4, Sept. 1990
  114. 114. Application-Level Framing 0 X bytes File to Transfer Split into Application Data Unit (constant size except maybe last) S: ADU 1 ADU 2 ADU 3 ADU 4 ADU 5 ADU 6 ADU 7 ADU 8 ADU 9 R: ADU 1 ADU 2 ADU 4 ADU 5 ADU 6 ADU 7 ADU 8 Recover ADU 3 ADU 9 Advantages Clark and Tennenhouse, ACM SIGCOMM CCR, VOlume 20, Issue 4, Sept. 1990
  115. 115. Application-Level Framing 0 X bytes File to Transfer Split into Application Data Unit (constant size except maybe last) S: ADU 1 ADU 2 ADU 3 ADU 4 ADU 5 ADU 6 ADU 7 ADU 8 ADU 9 R: ADU 1 ADU 2 ADU 4 ADU 5 ADU 6 ADU 7 ADU 8 Recover ADU 3 ADU 9 Advantages ‣Optimize recovery until end (or checkpoints) Clark and Tennenhouse, ACM SIGCOMM CCR, VOlume 20, Issue 4, Sept. 1990
  116. 116. Application-Level Framing 0 X bytes File to Transfer Split into Application Data Unit (constant size except maybe last) S: ADU 1 ADU 2 ADU 3 ADU 4 ADU 5 ADU 6 ADU 7 ADU 8 ADU 9 R: ADU 1 ADU 2 ADU 4 ADU 5 ADU 6 ADU 7 ADU 8 Recover ADU 3 ADU 9 Advantages ‣Optimize recovery until end (or checkpoints) ‣Works well with multicast and unicast Clark and Tennenhouse, ACM SIGCOMM CCR, VOlume 20, Issue 4, Sept. 1990
  117. 117. Application-Level Framing 0 X bytes File to Transfer Split into Application Data Unit (constant size except maybe last) S: ADU 1 ADU 2 ADU 3 ADU 4 ADU 5 ADU 6 ADU 7 ADU 8 ADU 9 R: ADU 1 ADU 2 ADU 4 ADU 5 ADU 6 ADU 7 ADU 8 Recover ADU 3 ADU 9 Advantages ‣Optimize recovery until end (or checkpoints) ‣Works well with multicast and unicast ‣Works best over UDP (message boundaries) Clark and Tennenhouse, ACM SIGCOMM CCR, VOlume 20, Issue 4, Sept. 1990
  118. 118. Application-Level Framing 0 X bytes File to Transfer Split into Application Data Unit (constant size except maybe last) S: ADU 1 ADU 2 ADU 3 ADU 4 ADU 5 ADU 6 ADU 7 ADU 8 ADU 9 R: ADU 1 ADU 2 ADU 4 ADU 5 ADU 6 ADU 7 ADU 8 Recover ADU 3 ADU 9 Advantages ‣Optimize recovery until end (or checkpoints) ‣Works well with multicast and unicast ‣Works best over UDP (message boundaries) Clark and Tennenhouse, ACM SIGCOMM CCR, VOlume 20, Issue 4, Sept. 1990
  119. 119. PGM Router Assist Pragmatic General Multicast (PGM), IETF RFC 3208
  120. 120. PGM Router Assist S ... R R Pragmatic General Multicast (PGM), IETF RFC 3208
  121. 121. PGM Router Assist S ... Loss Here! Effects Both downstream links R R Pragmatic General Multicast (PGM), IETF RFC 3208
  122. 122. PGM Router Assist S No loss here on these subtrees ... Loss Here! Effects Both downstream links R R Pragmatic General Multicast (PGM), IETF RFC 3208
  123. 123. PGM Router Assist S No loss here on these subtrees ... Loss Here! Effects Both downstream links R R NAK Path NAKs traverse hop-by-hop back up the forwarding tree, saving state in each router for retransmits to follow Pragmatic General Multicast (PGM), IETF RFC 3208
  124. 124. PGM Router Assist S No loss here on these subtrees Retransmit only needs to traverse link once for both downstream links ... Loss Here! Effects Both downstream links R R NAK Path NAKs traverse hop-by-hop back up the forwarding tree, saving state in each router for Retransmit Path retransmits to follow Pragmatic General Multicast (PGM), IETF RFC 3208
  125. 125. PGM Router Assist S No loss here on these subtrees Retransmit only needs to traverse link once for both downstream links ... Loss Here! Effects Both downstream links Advantages R R NAK Path NAKs traverse hop-by-hop back up the forwarding tree, saving state in each router for Retransmit Path retransmits to follow Pragmatic General Multicast (PGM), IETF RFC 3208
  126. 126. PGM Router Assist S No loss here on these subtrees Retransmit only needs to traverse link once for both downstream links ... Loss Here! Effects Both downstream links Advantages‣NAKs follow reverse path R R NAK Path NAKs traverse hop-by-hop back up the forwarding tree, saving state in each router for Retransmit Path retransmits to follow Pragmatic General Multicast (PGM), IETF RFC 3208
  127. 127. PGM Router Assist S No loss here on these subtrees Retransmit only needs to traverse link once for both downstream links ... Loss Here! Effects Both downstream links Advantages‣NAKs follow reverse path R R‣Retransmissions localized NAK Path NAKs traverse hop-by-hop back up the forwarding tree, saving state in each router for Retransmit Path retransmits to follow Pragmatic General Multicast (PGM), IETF RFC 3208
  128. 128. PGM Router Assist S No loss here on these subtrees Retransmit only needs to traverse link once for both downstream links ... Loss Here! Effects Both downstream links Advantages‣NAKs follow reverse path R R‣Retransmissions localized‣Optimization of bandwidth! NAK Path NAKs traverse hop-by-hop back up the forwarding tree, saving state in each router for Retransmit Path retransmits to follow Pragmatic General Multicast (PGM), IETF RFC 3208
  129. 129. PGM Router Assist S No loss here on these subtrees Retransmit only needs to traverse link once for both downstream links ... Loss Here! Effects Both downstream links Advantages‣NAKs follow reverse path R R‣Retransmissions localized‣Optimization of bandwidth! NAK Path NAKs traverse hop-by-hop back up the forwarding tree, saving state in each router for Retransmit Path retransmits to follow Pragmatic General Multicast (PGM), IETF RFC 3208
  130. 130. Questions?

×