Multipath TCP Upstreaming
Mat Martineau (Intel) and Matthieu Baerts (Tessares)
Plan
● Multipath TCP Overview
● First Patch Set Upstreaming Roadmap
● Advanced Features Roadmap
● Conclusion and links
2
What is MPTCP?
3
Multipath TCP (MPTCP)
● Exchange data for a single connection over different paths, simultaneously
● RFC-6824 and supported by IETF Multipath TCP (MPTCP) working group
4Smartphone and WiFi icons by Blurred203 and Antü Plasma under CC-by-sa, others from Tango project, public domain
Multipath TCP (MPTCP)
● Exchange data for a single connection over different paths, simultaneously
● RFC-6824 and supported by IETF Multipath TCP (MPTCP) working group
● More bandwidth:
5Smartphone and WiFi icons by Blurred203 and Antü Plasma under CC-by-sa, others from Tango project, public domain
Multipath TCP (MPTCP)
● Exchange data for a single connection over different paths, simultaneously
● RFC-6824 and supported by IETF Multipath TCP (MPTCP) working group
● More mobility (walk-out):
6Smartphone and WiFi icons by Blurred203 and Antü Plasma under CC-by-sa, others from Tango project, public domain
Multipath TCP (MPTCP)
● Exchange data for a single connection over different paths, simultaneously
● RFC-6824 and supported by IETF Multipath TCP (MPTCP) working group
● More mobility (walk-out):
7Smartphone and WiFi icons by Blurred203 and Antü Plasma under CC-by-sa, others from Tango project, public domain
Multipath TCP (MPTCP)
● Exchange data for a single connection over different paths, simultaneously
● RFC-6824 and supported by IETF Multipath TCP (MPTCP) working group
● More mobility (walk-out):
8Smartphone and WiFi icons by Blurred203 and Antü Plasma under CC-by-sa, others from Tango project, public domain
Multipath TCP (MPTCP)
● Exchange data for a single connection over different paths, simultaneously
● RFC-6824 and supported by IETF Multipath TCP (MPTCP) working group
● More mobility (walk-out):
9Smartphone and WiFi icons by Blurred203 and Antü Plasma under CC-by-sa, others from Tango project, public domain
Multipath TCP Use Cases
● Smartphones (Apple, Samsung, LG, others)
○ Support failover / “walk-out” scenario.
○ More Bandwidth
● Residential Gateways (LTE + DSL, for example)
○ More Bandwidth
● Multipath TCP is part of 5G standardisation:
○ Access Traffic Steering, Switching and Splitting: ATSSS
10
Multipath TCP Use Cases: ATSSS
11
Steering
best network
selection
OR
Switching
seamless
handover
FROM TO
and vice
versa
Splitting
network
aggregation
AND
improved
end-user
experience
Defined in 3GPP Release 16, ATSSS is a core network function in 5G networks,
playing a key role in managing data traffic between 3GPP (5G, 4G) networks and non-3GPP (Wi-Fi) networks
5G
5G
5G
WiFi
WiFi
WiFi
Existing Linux implementation
● First implementation for Linux kernel in March 2009
○ Latest MPTCP out-of-tree Linux kernel version is v0.95
○ Generally used as a client / server in current deployments, for millions of users
● But not upstreamable
○ Built to support experiments and rapid changes but not generic enough
○ Special purpose implementation of MPTCP
12
Guidelines for upstream
● New implementation cannot affect existing TCP stack:
○ Without performance regressions. No code size change if CONFIG_MPTCP=n
○ Maintainable and configurable
○ Can be used in a variety of deployments
● Multipath TCP will be "opt-in"
● Proceed in steps:
○ Minimal features set
○ Optimisations and advanced features for later
13
Protocol Overview: RFC 6824
● Looks like TCP on the wire, similar usage for apps
14
MPTCP
TCP Subflow TCP Subflow
Socket Layer
IP Layer
Protocol Overview: RFC 6824
● Looks like TCP on the wire, similar usage for apps
● Subflow mapping: Data Sequence Number
15
MPTCP
TCP Subflow TCP Subflow
Socket Layer
IP Layer
AB
CD
EF
Dseq=0, seq=123,“A”
Dseq=1, seq=124,“B”
Dseq=2, seq=456,“C”
Dseq=4, seq=125,“E”
Dseq=5,
seq=126,“F”
Dseq=3, seq=456,“D”
Smartphone icon by Blurred203 under CC-by-sa, others from Tango project, public domain
● Looks like TCP on the wire, similar usage for apps
● Subflow mapping: Data Sequence Number
Protocol Overview: RFC 6824
16
MPTCP
TCP Subflow TCP Subflow
Socket Layer
IP Layer
Protocol Overview: RFC 6824
17
MPTCP
TCP Subflow TCP Subflow
Socket Layer
IP Layer
● Looks like TCP on the wire, similar usage for apps
● Subflow mapping: Data Sequence Number
Protocol Overview: RFC 6824
18
MPTCP
TCP Subflow TCP Subflow
Socket Layer
IP Layer
● Looks like TCP on the wire, similar usage for apps
● Subflow mapping: Data Sequence Number
Protocol Overview: RFC 6824
19
MPTCP
TCP Subflow TCP Subflow
Socket Layer
IP Layer
● Looks like TCP on the wire, similar usage for apps
● Subflow mapping: Data Sequence Number + ACK
Protocol Overview: RFC 6824
● Looks like TCP on the wire, similar usage for apps
● Subflow mapping: Data Sequence Number + ACK
● MP_CAPABLE, MP_JOIN, DATA_FIN
20
MPTCP
TCP Subflow TCP Subflow
Socket Layer
IP Layer
In TCP
options
Protocol Overview: RFC 6824
● Looks like TCP on the wire, similar usage for apps
● Subflow mapping: Data Sequence Number + ACK
● MP_CAPABLE, MP_JOIN, DATA_FIN
● Signaling: Add/Remove Addresses, Fast Close
21
MPTCP
TCP Subflow TCP Subflow
Socket Layer
IP Layer
Via
TCP ACK
Protocol Overview: RFC 6824
● Looks like TCP on the wire, similar usage for apps
● Subflow mapping: Data Sequence Number + ACK
● MP_CAPABLE, MP_JOIN, DATA_FIN
● Signaling: Add/Remove Addresses, Fast Close
● Coupled receive windows across TCP subflows
22
MPTCP
TCP Subflow TCP Subflow
Socket Layer
IP Layer
Multiple versions of MPTCP
● RFC 6824: Experimental
○ All known implementations support it, only this version
● RFC 6824 bis: Standard
○ Submitted to IESG for publication
○ Behavioral changes: MPTCP v0 → MPTCP v1
○ Some parts easier to implement
○ Selected by 3GPP for 5G
23
First Patch Set Roadmap
24
MPTCP Socket architecture
25
MPTCP
TCP Subflow TCP Subflow
Socket Layer IP Proto: struct proto
SKB 1
SKB 2
...
SKB 1
SKB 2
...
TCP ULP: struct tcp_ulp_ops
We start from: tcp_request_sock_ops
SKB extension: struct mptcp_ext
To store Data Sequence Signal (25 bytes)
Userspace API
● MPTCP selected when creating the socket:
socket(AF_INET(6), SOCK_STREAM, IPPROTO_MPTCP);
26
Userspace API
● MPTCP selected when creating the socket:
socket(AF_INET(6), SOCK_STREAM, IPPROTO_MPTCP);
○ IPPROTO_MPTCP = IPPROTO_TCP | 0x100; /* = 262 */
27
Userspace API
● MPTCP selected when creating the socket:
socket(AF_INET(6), SOCK_STREAM, IPPROTO_MPTCP);
○ IPPROTO_MPTCP = IPPROTO_TCP | 0x100; /* = 262 */
● getsockopt() / setsockopt() with MPTCP socket or its TCP subflows?
28
Userspace API
● MPTCP selected when creating the socket:
socket(AF_INET(6), SOCK_STREAM, IPPROTO_MPTCP);
○ IPPROTO_MPTCP = IPPROTO_TCP | 0x100; /* = 262 */
● getsockopt() / setsockopt() with MPTCP socket or its TCP subflows?
● Security: who can create MPTCP sockets?
○ Initial implementation will not be hardened by broad use yet (syzkaller, etc.)
○ sysctl per network namespace, MPTCP disabled by default: is it enough?
29
Diagnostics
● MPTCP will have a collection of counters for diagnostic and debug purposes
● Per-socket data will be shared with userspace via sock_diag(7)
○ TCP ULP framework has been extended to enable diag
● Some TCP counters are also found in /proc
○ Should MPTCP add to these as well?
30
Tests
● Kernel Self Tests
○ Between multiple namespaces (veth)
○ MPTCP ⇔ MPTCP, MPTCP ⇔ TCP, TCP ⇔ MPTCP
○ Various conditions including packet loss, reordering, and variations in routing
● Packetdrill
○ Background project ongoing to add MPTCP support
○ Out-of-tree Packetdrill with MPTCP support but old and limited
31
Initial use case
● Server role is a good place to start
● Simpler path management
○ Client side handles multiple interfaces (like cellular + Wi-Fi)
● Common server configuration uses one public
interface for clients
○ Advertising additional interfaces not required
● Client features all build on what’s needed for servers
32
Server
IP1
Client
IP2 IP3
NAT NAT
Internet
Code already merged upstream
● SKB extensions
○ Needed to carry MPTCP options that are tied to the data payload
○ Also used to remove sp (sec_path) and nf_bridge pointers from struct sk_buff
○ Suitable for data that can’t fit in sk_buff and justifies memory overhead
● Add inet_diag_ulp_info to socket diag format and ULP get_info hook
33
Change in TCP Code
Git Stat:
include/linux/skbuff.h | 11 ++
include/linux/tcp.h | 51 ++++++++++
include/net/sock.h | 6 +-
include/net/tcp.h | 20 ++++
include/trace/events/sock.h | 5 +-
include/uapi/linux/in.h | 2 +
net/Kconfig | 1 +
net/Makefile | 1 +
net/ax25/af_ax25.c | 2 +-
net/core/skbuff.c | 7 ++
net/decnet/af_decnet.c | 2 +-
net/ipv4/inet_connection_sock.c | 2 +
net/ipv4/tcp.c | 8 +-
net/ipv4/tcp_input.c | 29 +++++-
net/ipv4/tcp_ipv4.c | 4 +-
net/ipv4/tcp_minisocks.c | 6 ++
net/ipv4/tcp_output.c | 62 +++++++++++-
net/ipv4/tcp_ulp.c | 12 +++
34
Changes to TCP code
● tcp_ulp_clone()
● Export two low-level TCP functions and one struct
● SKBs with MPTCP extensions can’t be coalesced or collapsed
● MPTCP option parsing and writing
● is_mptcp flag in tcp_sock and tcp_request_sock
35
Changes to TCP code, continued
● One MPTCP-specific branch in TCP minisocks
● Call out to MPTCP from tcp_data_queue to add SKB extension and process
ACKs
● Additional members in struct tcp_options_received
● Subflow receive window sharing will introduce changes too
36
Advanced Features Roadmap
37
Path Manager
Which path to create/remove? Which address to announce?
38
?
?
Smartphone and WiFi icons by Blurred203 and Antü Plasma under CC-by-sa, others from Tango project, public domain
Userspace Path Manager
● Peers share ADD_ADDR and REMOVE_ADDR signals to advertise available
addresses for each MPTCP connection
● Path manager runs in userspace and uses generic netlink to track address and
local interface updates and request subflow changes
● Can be customized with different policies.
● Multipath TCP Daemon alpha release is available at github.com/intel/mptcpd
39
Packet Scheduler
On which available path packets will be sent? Reinject packets in another path?
40
A
A
?
?
Smartphone and WiFi icons by Blurred203 and Antü Plasma under CC-by-sa, others from Tango project, public domain
Packet scheduling
● Different connections may optimize for throughput, latency, or redundancy.
● Peers can set a ‘backup’ flag on each subflow to limit transmission on that flow
● Include basic scheduler options in the kernel
● Consider eBPF to define custom schedulers, instead of kernel modules
41
Using MPTCP with unmodified binaries
● Some organizations want to take advantage of MPTCP without recompiling
their userspace
● Can add BPF_CGROUP_SOCKET to attach an eBPF program that rewrites the
protocol number passed to socket()
● Similar attachment points exist for bind() and connect()
42
MPTCP Performance optimizations
● Initial emphasis is on correctness and reasonable MPTCP performance
○ While not disrupting TCP’s optimizations!
● Target performance optimizations based on data
● Protocol optimizations
○ Example: changing scheduler behavior for reinjection of data on different subflows
● TCP Fast Open support
43
Break-before-make
● MPTCP can keep a connection active even with zero subflows connected
○ Allows the session to continue by adding a subflow with MP_JOIN
● Can be useful to switch between access points
● Will add this capability if there’s demand for it
44
Subflow socket options
● One MPTCP socket manages a set of in-kernel subflow sockets
● Socket options that use TCP option space or change data flow could interfere
● The MPTCP socket can act as an intermediary for subflow options
● Will need to whitelist specific known-safe options
● Could expose file descriptors only good for getsockopt()/setsockopt()
45
Kernel TLS and MPTCP
● kTLS is built on top of TCP using ULP framework
● An MPTCP socket is not a TCP socket, so it doesn’t have ULP
● TLS needs to operate on the MPTCP data stream, not subflow streams
○ TLS records could be split across subflows
○ MPTCP DSS mappings are specific to TCP sequence numbers
● TLS_SW appears feasible but would need work to integrate with an MPTCP
socket type
46
Conclusion
47
Conclusion
● Build around TCP as much as we can.
● We are close to having an initial patch set ready.
48
This project is open to everybody.
● Wiki: https://is.gd/mptcp_upstream
● Mailing list: https://lists.01.org/mailman/listinfo/mptcp
● Git repository: https://github.com/multipath-tcp/mptcp_net-next
● Paper: https://linuxplumbersconf.org/event/4/contributions/435/
● mathew.j.martineau@linux.intel.com
● matthieu.baerts@tessares.net
Backup slides
49
Protocol challenges
50Used with Christoph Paasch’s permission
Relations between structures
kTLS record
51From: https://netdevconf.org/1.2/papers/netdevconf-TLS.pdf
Protocol challenges
Coupled receive windows across TCP subflows
52Used with Sébastien Barré’s permission
Multipath TCP (MPTCP)
Hybrid access network use-case (BBF TR-348 by Tessares - SwissCom - OVH)
53
Telco CloudDSL CPE
4G
CPE
TCP
TCP
MPTCP
copper (long line)
4G/LTE
network
fiber
xDSL
network
MPTCP
cloud
native
HAG
multi-
platform
Agent available
capacity
needed, for 4G/LTE
connectivity
Images from Tessares
Protocol challenges
54
Protocol challenges
Data sequence numbers and mappings
55
AB
C
Smartphone icon by Blurred203 under CC-by-sa, others from Tango project, public domain
Protocol challenges
Data sequence numbers and mappings
56
AB
CD
EF
Smartphone icon by Blurred203 under CC-by-sa, others from Tango project, public domain
Protocol challenges
Data sequence numbers and mappings
57
AB
C
D
EF
Smartphone icon by Blurred203 under CC-by-sa, others from Tango project, public domain
Protocol challenges
Data sequence numbers and mappings
58
AB
C
D
EF
Dseq=0, seq=123,“A”
Dseq=1, seq=124,“B”
Dseq=2, seq=456,“C”
Dseq=4, seq=125,“E”
Dseq=5, seq=126,“F”
Dseq=3, seq=456,“D”
Smartphone icon by Blurred203 under CC-by-sa, others from Tango project, public domain
Protocol challenges
Data sequence numbers and mappings
mptcp_sock
tcp_sock
Socket Layer
IP Layer
App sends data
59
Protocol challenges
Data sequence numbers and mappings
mptcp_sock
tcp_sock
Socket Layer
IP Layer
60
Select the TCP subflow
Protocol challenges
Data sequence numbers and mappings
mptcp_sock
tcp_sock
Socket Layer
IP Layer
TCP header: What DSS to set?
61
Protocol challenges
Sending of ACKs to signal options, e.g. REMOVE_ADDR in a TCP ACK
mptcp_sock
tcp_sock tcp_sock
Socket Layer
IP Layer
Subflow ops Subflow ops
62
Notification: one iface is down
Protocol challenges
Sending of ACKs to signal options, e.g. REMOVE_ADDR in a TCP ACK
mptcp_sock
tcp_sock tcp_sock
Socket Layer
IP Layer
Subflow ops Subflow ops
63
Select the subflow
Protocol challenges
Sending of ACKs to signal options, e.g. REMOVE_ADDR in a TCP ACK
mptcp_sock
tcp_sock tcp_sock
Socket Layer
IP Layer
Subflow ops Subflow ops
64
Sending a ACK not from TCP stack
Protocol challenges
Reception of ACKs with signaling options, e.g. REMOVE_ADDR in a TCP ACK
mptcp_sock
tcp_sock tcp_sock
Socket Layer
IP Layer
Subflow ops Subflow ops
65
TCP ACK received
Protocol challenges
Reception of ACKs with signaling options, e.g. REMOVE_ADDR in a TCP ACK
mptcp_sock
tcp_sock tcp_sock
Socket Layer
IP Layer
Subflow ops Subflow ops
66
TCP ACK is not dropped
Protocol challenges
Reception of ACKs with signaling options, e.g. REMOVE_ADDR in a TCP ACK
mptcp_sock
tcp_sock tcp_sock
Socket Layer
IP Layer
Subflow ops Subflow ops
67
Protocol challenges
Signaling with MPTCP:
● MP_CAPABLE
● MP_JOIN
● DSEQ / DACK
● FAST_CLOSE
● ADD_ADDR
● REMOVE_ADDR
68
SYN
SYN
ALL
ACK followed by RST
ACK
ACK

Multipath TCP Upstreaming

  • 1.
    Multipath TCP Upstreaming MatMartineau (Intel) and Matthieu Baerts (Tessares)
  • 2.
    Plan ● Multipath TCPOverview ● First Patch Set Upstreaming Roadmap ● Advanced Features Roadmap ● Conclusion and links 2
  • 3.
  • 4.
    Multipath TCP (MPTCP) ●Exchange data for a single connection over different paths, simultaneously ● RFC-6824 and supported by IETF Multipath TCP (MPTCP) working group 4Smartphone and WiFi icons by Blurred203 and Antü Plasma under CC-by-sa, others from Tango project, public domain
  • 5.
    Multipath TCP (MPTCP) ●Exchange data for a single connection over different paths, simultaneously ● RFC-6824 and supported by IETF Multipath TCP (MPTCP) working group ● More bandwidth: 5Smartphone and WiFi icons by Blurred203 and Antü Plasma under CC-by-sa, others from Tango project, public domain
  • 6.
    Multipath TCP (MPTCP) ●Exchange data for a single connection over different paths, simultaneously ● RFC-6824 and supported by IETF Multipath TCP (MPTCP) working group ● More mobility (walk-out): 6Smartphone and WiFi icons by Blurred203 and Antü Plasma under CC-by-sa, others from Tango project, public domain
  • 7.
    Multipath TCP (MPTCP) ●Exchange data for a single connection over different paths, simultaneously ● RFC-6824 and supported by IETF Multipath TCP (MPTCP) working group ● More mobility (walk-out): 7Smartphone and WiFi icons by Blurred203 and Antü Plasma under CC-by-sa, others from Tango project, public domain
  • 8.
    Multipath TCP (MPTCP) ●Exchange data for a single connection over different paths, simultaneously ● RFC-6824 and supported by IETF Multipath TCP (MPTCP) working group ● More mobility (walk-out): 8Smartphone and WiFi icons by Blurred203 and Antü Plasma under CC-by-sa, others from Tango project, public domain
  • 9.
    Multipath TCP (MPTCP) ●Exchange data for a single connection over different paths, simultaneously ● RFC-6824 and supported by IETF Multipath TCP (MPTCP) working group ● More mobility (walk-out): 9Smartphone and WiFi icons by Blurred203 and Antü Plasma under CC-by-sa, others from Tango project, public domain
  • 10.
    Multipath TCP UseCases ● Smartphones (Apple, Samsung, LG, others) ○ Support failover / “walk-out” scenario. ○ More Bandwidth ● Residential Gateways (LTE + DSL, for example) ○ More Bandwidth ● Multipath TCP is part of 5G standardisation: ○ Access Traffic Steering, Switching and Splitting: ATSSS 10
  • 11.
    Multipath TCP UseCases: ATSSS 11 Steering best network selection OR Switching seamless handover FROM TO and vice versa Splitting network aggregation AND improved end-user experience Defined in 3GPP Release 16, ATSSS is a core network function in 5G networks, playing a key role in managing data traffic between 3GPP (5G, 4G) networks and non-3GPP (Wi-Fi) networks 5G 5G 5G WiFi WiFi WiFi
  • 12.
    Existing Linux implementation ●First implementation for Linux kernel in March 2009 ○ Latest MPTCP out-of-tree Linux kernel version is v0.95 ○ Generally used as a client / server in current deployments, for millions of users ● But not upstreamable ○ Built to support experiments and rapid changes but not generic enough ○ Special purpose implementation of MPTCP 12
  • 13.
    Guidelines for upstream ●New implementation cannot affect existing TCP stack: ○ Without performance regressions. No code size change if CONFIG_MPTCP=n ○ Maintainable and configurable ○ Can be used in a variety of deployments ● Multipath TCP will be "opt-in" ● Proceed in steps: ○ Minimal features set ○ Optimisations and advanced features for later 13
  • 14.
    Protocol Overview: RFC6824 ● Looks like TCP on the wire, similar usage for apps 14 MPTCP TCP Subflow TCP Subflow Socket Layer IP Layer
  • 15.
    Protocol Overview: RFC6824 ● Looks like TCP on the wire, similar usage for apps ● Subflow mapping: Data Sequence Number 15 MPTCP TCP Subflow TCP Subflow Socket Layer IP Layer AB CD EF Dseq=0, seq=123,“A” Dseq=1, seq=124,“B” Dseq=2, seq=456,“C” Dseq=4, seq=125,“E” Dseq=5, seq=126,“F” Dseq=3, seq=456,“D” Smartphone icon by Blurred203 under CC-by-sa, others from Tango project, public domain
  • 16.
    ● Looks likeTCP on the wire, similar usage for apps ● Subflow mapping: Data Sequence Number Protocol Overview: RFC 6824 16 MPTCP TCP Subflow TCP Subflow Socket Layer IP Layer
  • 17.
    Protocol Overview: RFC6824 17 MPTCP TCP Subflow TCP Subflow Socket Layer IP Layer ● Looks like TCP on the wire, similar usage for apps ● Subflow mapping: Data Sequence Number
  • 18.
    Protocol Overview: RFC6824 18 MPTCP TCP Subflow TCP Subflow Socket Layer IP Layer ● Looks like TCP on the wire, similar usage for apps ● Subflow mapping: Data Sequence Number
  • 19.
    Protocol Overview: RFC6824 19 MPTCP TCP Subflow TCP Subflow Socket Layer IP Layer ● Looks like TCP on the wire, similar usage for apps ● Subflow mapping: Data Sequence Number + ACK
  • 20.
    Protocol Overview: RFC6824 ● Looks like TCP on the wire, similar usage for apps ● Subflow mapping: Data Sequence Number + ACK ● MP_CAPABLE, MP_JOIN, DATA_FIN 20 MPTCP TCP Subflow TCP Subflow Socket Layer IP Layer In TCP options
  • 21.
    Protocol Overview: RFC6824 ● Looks like TCP on the wire, similar usage for apps ● Subflow mapping: Data Sequence Number + ACK ● MP_CAPABLE, MP_JOIN, DATA_FIN ● Signaling: Add/Remove Addresses, Fast Close 21 MPTCP TCP Subflow TCP Subflow Socket Layer IP Layer Via TCP ACK
  • 22.
    Protocol Overview: RFC6824 ● Looks like TCP on the wire, similar usage for apps ● Subflow mapping: Data Sequence Number + ACK ● MP_CAPABLE, MP_JOIN, DATA_FIN ● Signaling: Add/Remove Addresses, Fast Close ● Coupled receive windows across TCP subflows 22 MPTCP TCP Subflow TCP Subflow Socket Layer IP Layer
  • 23.
    Multiple versions ofMPTCP ● RFC 6824: Experimental ○ All known implementations support it, only this version ● RFC 6824 bis: Standard ○ Submitted to IESG for publication ○ Behavioral changes: MPTCP v0 → MPTCP v1 ○ Some parts easier to implement ○ Selected by 3GPP for 5G 23
  • 24.
    First Patch SetRoadmap 24
  • 25.
    MPTCP Socket architecture 25 MPTCP TCPSubflow TCP Subflow Socket Layer IP Proto: struct proto SKB 1 SKB 2 ... SKB 1 SKB 2 ... TCP ULP: struct tcp_ulp_ops We start from: tcp_request_sock_ops SKB extension: struct mptcp_ext To store Data Sequence Signal (25 bytes)
  • 26.
    Userspace API ● MPTCPselected when creating the socket: socket(AF_INET(6), SOCK_STREAM, IPPROTO_MPTCP); 26
  • 27.
    Userspace API ● MPTCPselected when creating the socket: socket(AF_INET(6), SOCK_STREAM, IPPROTO_MPTCP); ○ IPPROTO_MPTCP = IPPROTO_TCP | 0x100; /* = 262 */ 27
  • 28.
    Userspace API ● MPTCPselected when creating the socket: socket(AF_INET(6), SOCK_STREAM, IPPROTO_MPTCP); ○ IPPROTO_MPTCP = IPPROTO_TCP | 0x100; /* = 262 */ ● getsockopt() / setsockopt() with MPTCP socket or its TCP subflows? 28
  • 29.
    Userspace API ● MPTCPselected when creating the socket: socket(AF_INET(6), SOCK_STREAM, IPPROTO_MPTCP); ○ IPPROTO_MPTCP = IPPROTO_TCP | 0x100; /* = 262 */ ● getsockopt() / setsockopt() with MPTCP socket or its TCP subflows? ● Security: who can create MPTCP sockets? ○ Initial implementation will not be hardened by broad use yet (syzkaller, etc.) ○ sysctl per network namespace, MPTCP disabled by default: is it enough? 29
  • 30.
    Diagnostics ● MPTCP willhave a collection of counters for diagnostic and debug purposes ● Per-socket data will be shared with userspace via sock_diag(7) ○ TCP ULP framework has been extended to enable diag ● Some TCP counters are also found in /proc ○ Should MPTCP add to these as well? 30
  • 31.
    Tests ● Kernel SelfTests ○ Between multiple namespaces (veth) ○ MPTCP ⇔ MPTCP, MPTCP ⇔ TCP, TCP ⇔ MPTCP ○ Various conditions including packet loss, reordering, and variations in routing ● Packetdrill ○ Background project ongoing to add MPTCP support ○ Out-of-tree Packetdrill with MPTCP support but old and limited 31
  • 32.
    Initial use case ●Server role is a good place to start ● Simpler path management ○ Client side handles multiple interfaces (like cellular + Wi-Fi) ● Common server configuration uses one public interface for clients ○ Advertising additional interfaces not required ● Client features all build on what’s needed for servers 32 Server IP1 Client IP2 IP3 NAT NAT Internet
  • 33.
    Code already mergedupstream ● SKB extensions ○ Needed to carry MPTCP options that are tied to the data payload ○ Also used to remove sp (sec_path) and nf_bridge pointers from struct sk_buff ○ Suitable for data that can’t fit in sk_buff and justifies memory overhead ● Add inet_diag_ulp_info to socket diag format and ULP get_info hook 33
  • 34.
    Change in TCPCode Git Stat: include/linux/skbuff.h | 11 ++ include/linux/tcp.h | 51 ++++++++++ include/net/sock.h | 6 +- include/net/tcp.h | 20 ++++ include/trace/events/sock.h | 5 +- include/uapi/linux/in.h | 2 + net/Kconfig | 1 + net/Makefile | 1 + net/ax25/af_ax25.c | 2 +- net/core/skbuff.c | 7 ++ net/decnet/af_decnet.c | 2 +- net/ipv4/inet_connection_sock.c | 2 + net/ipv4/tcp.c | 8 +- net/ipv4/tcp_input.c | 29 +++++- net/ipv4/tcp_ipv4.c | 4 +- net/ipv4/tcp_minisocks.c | 6 ++ net/ipv4/tcp_output.c | 62 +++++++++++- net/ipv4/tcp_ulp.c | 12 +++ 34
  • 35.
    Changes to TCPcode ● tcp_ulp_clone() ● Export two low-level TCP functions and one struct ● SKBs with MPTCP extensions can’t be coalesced or collapsed ● MPTCP option parsing and writing ● is_mptcp flag in tcp_sock and tcp_request_sock 35
  • 36.
    Changes to TCPcode, continued ● One MPTCP-specific branch in TCP minisocks ● Call out to MPTCP from tcp_data_queue to add SKB extension and process ACKs ● Additional members in struct tcp_options_received ● Subflow receive window sharing will introduce changes too 36
  • 37.
  • 38.
    Path Manager Which pathto create/remove? Which address to announce? 38 ? ? Smartphone and WiFi icons by Blurred203 and Antü Plasma under CC-by-sa, others from Tango project, public domain
  • 39.
    Userspace Path Manager ●Peers share ADD_ADDR and REMOVE_ADDR signals to advertise available addresses for each MPTCP connection ● Path manager runs in userspace and uses generic netlink to track address and local interface updates and request subflow changes ● Can be customized with different policies. ● Multipath TCP Daemon alpha release is available at github.com/intel/mptcpd 39
  • 40.
    Packet Scheduler On whichavailable path packets will be sent? Reinject packets in another path? 40 A A ? ? Smartphone and WiFi icons by Blurred203 and Antü Plasma under CC-by-sa, others from Tango project, public domain
  • 41.
    Packet scheduling ● Differentconnections may optimize for throughput, latency, or redundancy. ● Peers can set a ‘backup’ flag on each subflow to limit transmission on that flow ● Include basic scheduler options in the kernel ● Consider eBPF to define custom schedulers, instead of kernel modules 41
  • 42.
    Using MPTCP withunmodified binaries ● Some organizations want to take advantage of MPTCP without recompiling their userspace ● Can add BPF_CGROUP_SOCKET to attach an eBPF program that rewrites the protocol number passed to socket() ● Similar attachment points exist for bind() and connect() 42
  • 43.
    MPTCP Performance optimizations ●Initial emphasis is on correctness and reasonable MPTCP performance ○ While not disrupting TCP’s optimizations! ● Target performance optimizations based on data ● Protocol optimizations ○ Example: changing scheduler behavior for reinjection of data on different subflows ● TCP Fast Open support 43
  • 44.
    Break-before-make ● MPTCP cankeep a connection active even with zero subflows connected ○ Allows the session to continue by adding a subflow with MP_JOIN ● Can be useful to switch between access points ● Will add this capability if there’s demand for it 44
  • 45.
    Subflow socket options ●One MPTCP socket manages a set of in-kernel subflow sockets ● Socket options that use TCP option space or change data flow could interfere ● The MPTCP socket can act as an intermediary for subflow options ● Will need to whitelist specific known-safe options ● Could expose file descriptors only good for getsockopt()/setsockopt() 45
  • 46.
    Kernel TLS andMPTCP ● kTLS is built on top of TCP using ULP framework ● An MPTCP socket is not a TCP socket, so it doesn’t have ULP ● TLS needs to operate on the MPTCP data stream, not subflow streams ○ TLS records could be split across subflows ○ MPTCP DSS mappings are specific to TCP sequence numbers ● TLS_SW appears feasible but would need work to integrate with an MPTCP socket type 46
  • 47.
  • 48.
    Conclusion ● Build aroundTCP as much as we can. ● We are close to having an initial patch set ready. 48 This project is open to everybody. ● Wiki: https://is.gd/mptcp_upstream ● Mailing list: https://lists.01.org/mailman/listinfo/mptcp ● Git repository: https://github.com/multipath-tcp/mptcp_net-next ● Paper: https://linuxplumbersconf.org/event/4/contributions/435/ ● mathew.j.martineau@linux.intel.com ● matthieu.baerts@tessares.net
  • 49.
  • 50.
    Protocol challenges 50Used withChristoph Paasch’s permission Relations between structures
  • 51.
  • 52.
    Protocol challenges Coupled receivewindows across TCP subflows 52Used with Sébastien Barré’s permission
  • 53.
    Multipath TCP (MPTCP) Hybridaccess network use-case (BBF TR-348 by Tessares - SwissCom - OVH) 53 Telco CloudDSL CPE 4G CPE TCP TCP MPTCP copper (long line) 4G/LTE network fiber xDSL network MPTCP cloud native HAG multi- platform Agent available capacity needed, for 4G/LTE connectivity Images from Tessares
  • 54.
  • 55.
    Protocol challenges Data sequencenumbers and mappings 55 AB C Smartphone icon by Blurred203 under CC-by-sa, others from Tango project, public domain
  • 56.
    Protocol challenges Data sequencenumbers and mappings 56 AB CD EF Smartphone icon by Blurred203 under CC-by-sa, others from Tango project, public domain
  • 57.
    Protocol challenges Data sequencenumbers and mappings 57 AB C D EF Smartphone icon by Blurred203 under CC-by-sa, others from Tango project, public domain
  • 58.
    Protocol challenges Data sequencenumbers and mappings 58 AB C D EF Dseq=0, seq=123,“A” Dseq=1, seq=124,“B” Dseq=2, seq=456,“C” Dseq=4, seq=125,“E” Dseq=5, seq=126,“F” Dseq=3, seq=456,“D” Smartphone icon by Blurred203 under CC-by-sa, others from Tango project, public domain
  • 59.
    Protocol challenges Data sequencenumbers and mappings mptcp_sock tcp_sock Socket Layer IP Layer App sends data 59
  • 60.
    Protocol challenges Data sequencenumbers and mappings mptcp_sock tcp_sock Socket Layer IP Layer 60 Select the TCP subflow
  • 61.
    Protocol challenges Data sequencenumbers and mappings mptcp_sock tcp_sock Socket Layer IP Layer TCP header: What DSS to set? 61
  • 62.
    Protocol challenges Sending ofACKs to signal options, e.g. REMOVE_ADDR in a TCP ACK mptcp_sock tcp_sock tcp_sock Socket Layer IP Layer Subflow ops Subflow ops 62 Notification: one iface is down
  • 63.
    Protocol challenges Sending ofACKs to signal options, e.g. REMOVE_ADDR in a TCP ACK mptcp_sock tcp_sock tcp_sock Socket Layer IP Layer Subflow ops Subflow ops 63 Select the subflow
  • 64.
    Protocol challenges Sending ofACKs to signal options, e.g. REMOVE_ADDR in a TCP ACK mptcp_sock tcp_sock tcp_sock Socket Layer IP Layer Subflow ops Subflow ops 64 Sending a ACK not from TCP stack
  • 65.
    Protocol challenges Reception ofACKs with signaling options, e.g. REMOVE_ADDR in a TCP ACK mptcp_sock tcp_sock tcp_sock Socket Layer IP Layer Subflow ops Subflow ops 65 TCP ACK received
  • 66.
    Protocol challenges Reception ofACKs with signaling options, e.g. REMOVE_ADDR in a TCP ACK mptcp_sock tcp_sock tcp_sock Socket Layer IP Layer Subflow ops Subflow ops 66 TCP ACK is not dropped
  • 67.
    Protocol challenges Reception ofACKs with signaling options, e.g. REMOVE_ADDR in a TCP ACK mptcp_sock tcp_sock tcp_sock Socket Layer IP Layer Subflow ops Subflow ops 67
  • 68.
    Protocol challenges Signaling withMPTCP: ● MP_CAPABLE ● MP_JOIN ● DSEQ / DACK ● FAST_CLOSE ● ADD_ADDR ● REMOVE_ADDR 68 SYN SYN ALL ACK followed by RST ACK ACK