SlideShare a Scribd company logo
1 of 53
Making our networking stack
truly extensible
Olivier Bonaventure with
Quentin Deconinck , Cyril Dénos, Fabien Duchêne, Mathieu Jadin David Lebrun, Francois Michel,
Maxime Piraux,, Olivier Tilmans, Hoang Tran Viet, Thomas Wirtgen, Mathieu Xhonneux
http://inl.info.ucl.ac.be
LCN2019 Keynote, October 2019
Partially supported by FNRS, FRIA, MQUIC project (DG06 in cooperation with
Tessares),ARC-SDN and a Facebook grant
Agenda
• Evolution of the networking stack
• Making IPv6 Segment Routing programmable
• Making TCP extensible again
• Pluginizing Routing Protocols
• Pluginizing QUIC
TCP's complexity increases
But deploying TCP extensions
remains very difficult
• 20th century extensions took more than a
decade to be widely deployed
– TCP Window Scale
– TCP Timestamp
• Still not supported by Microsoft Windows
– TCP Selective Acknowledgements
– Explicit Congestion Notification
• Multipath TCP is being deployed, but getting it
everywhere will require lots of effort
Today's implementations are
black boxes
Protocol messages
Higher Layer
Lower Layer
API, e.g. socket
API
IETF
Tuning such an implementation
• Implementations typically expose a few
configuration knobs
– Socket options to enable/disable a given feature
– Socket options to set some limit (e.g. window)
– Sysctl variables for system-wide tuning
– Linux modules provide some flexibility
• Congestion control as loadable modules
• Path managers in Linux Multipath TCP
Agenda
• Evolution of the networking stack
• Making IPv6 Segment Routing programmable
• Making TCP extensible again
• Pluginizing QUIC
• Pluginizing Routing Protocols
IPv6 Segment Routing in one slide
• Each router advertises its loopback in IGP
– Packets contains source route in SRH and follow
shortest path to next address in SRH
R1
R4
R3
R5
R2 R7
R8 R9
100
3:7
3:7 3:7
8:4:7:3 8:4:7:3
8:4:7:3 8:4:7:3
8:4:7:3
IPv6 Segment Routing Header
IPv6 Segment Routing
Network Programming
• IPv6 SR enables more than non-shortest paths
– Each node advertises one or more prefixes
R4 R5
R2 R7
R8 R9
IGP : 2001:…:4/40
FCT1:param
FCT2:param
Locator Function Param
C. Filsfils et al., SRv6 Network Programming, draft-filsfils-spring-srv6-
network-programming-03, Dec. 2017
Implementing SRv6
Network Programming
• First step
– Add support for IPv6 Segment Routing in Linux
– David Lebrun's PhD thesis
• Second step
– Find a simple way to enable network operators to
truly program their network
• Socket options ?
• Kernel modules ?
• Add eBPF support in Linux's IPv6 Segment Routing
implementation
Lebrun, D., & Bonaventure, O. (2017, July). Implementing IPv6 Segment Routing in
the Linux kernel. In ANRW2017ACM.
eBPF
• Lightweight virtual machine, in Linux kernel
since 2014
– RISC instruction set (~100)
• ALU, memory and branch purposes
• Bytecode recompiled to native architecture
• Verifier
– Checks absence of loops, stack usage, …
• Dedicated, isolated stack memory
– But no persistence
• Use cases
– Monitoring, SECCOMP, …
01011
10010
x86_64
eBPF
bytecode
Realising Network Programming :
the power of eBPF
Application
verifier
K
E
R
N
E
L
bpf syscall
map
eBPF
bytecode
eBPF
VM
M. Xhonneux et al., Leveraging eBPF for programmable network functions
with IPv6 Segment Routing, Proc. Conext 2018
eBPF for SRv6
• When are eBPF programs called ?
– Upon reception of a packet whose address in SRH
matches
• Which features of the stack can eBPF programs use ?
– bpf_lwt_seg6_store_bytes
• update parts of SRH
– bpf_lwt_seg6_adjust_srh
• update TLVs in SRH.
– bpf_lwt_seg6_action
• execute basic SRv6 function (End.X, End.T, End.B6, End.B6.Encaps
and End.DT6)
• Each eBPF program returns specific code
– BPF_OK, BPF_DROP, BFP_REDIRECT
Performance impact compared
to native code
Demonstrated use cases
• Delay measurements
– Sender timestamps some packets are requests
routers to timestamp and tunnel them as well
• Hybrid Access Networks
– Segments are used to forward packets over
different paths and combine them as one router
• Failure Detection and recovery
– Uses eBPF to implement detection similar to BFD
and a simple fast reroute techniques
Xhonneux, Mathieu, and Olivier Bonaventure. "Flexible failure detection and fast reroute
using eBPF and SRv6." CNSM'18E, 2018.
Agenda
• Evolution of the networking stack
• Making IPv6 Segment Routing programmable
• Making TCP extensible again
• Pluginizing Routing Protocols
• Pluginizing QUIC
Debugging TCP performance problems
• Classical approaches
– Collect packet traces and ask Ph.D. student to
analyze them
– Look at SNMP MIB, output of netstat, ss, …
• Limitations
– Either limited visibility or scalability concerns
In-protocol debugging with eBPF
• eBPF probes can be attached at specific places
in the TCP stack to observe unusual events
– Retransmission of SYN packet
– Reception of out-of-order packets
– Peak in measured round-trip-time
– Application too slow to recv data from kernel
– …
• Daemon collects stats and sends them via
IPFIX
O. Tilmans, O. Bonaventure, COP2: Continuously Observing
Protocol Performance, Feb. 2019, arxiv 1902.04280v1
Example : SYN retransmissions
SYN retransmissions in our campus
TCP can be made more extensible
• Starting point
– Lawrence Brakmo's TCP-BPF patches
– Adds various hooks inside the TCP stack to
• Callbacks
– BPF_SOCK_OPS_TCP_CONNECT_CB,
BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB,
BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB
• Access to socket options
– BPF_SETSOCKOPT and BPF_GETSOCKOPT
• Read and write TCP state variables (rtt, cwnd, …)
– Main use case is to configure TCP parameters
on a per connection basis
Brakmo, L. (2017). TCP-BPF: Programmatically tuning TCP behavior through BPF.
NetDev 2.2.
What does TCP-BPF brings ?
Protocol messages
Callbacks
Helpers
Triggered by specific events An API used by
eBPF programs
User Timeout TCP option
• Defined in RFC5483 but not supported by Linux
• Sender side changes
– Add eBPF hooks in tcp_transmit_skb and
tcp_options_write
– eBPF code controls the transmission of this option
• Receiver side changes
– Add eBPF hook to tcp_parse_options
– eBPF code interprets the received option and adjusts
TCP state
Minor performance impact
CPU utilisation on receiver
Different use cases
• New TCP option to specify initial congestion
window
– Could be sent by iPhone in function of wireless
conditions
• New TCP option to specify delayed
acknowledgment strategy (delayed ack timer)
• Various improvements to Multipath TCP
– eBPF-based path manager
Tran, Viet-Hoang, and Olivier Bonaventure. "Beyond socket options: making the Linux TCP
stack truly extensible.', IFIP Networking(2019).
Agenda
• Evolution of the networking stack
• Making IPv6 Segment Routing programmable
• Making TCP extensible again
• Pluginizing Routing Protocols
• Pluginizing QUIC
Difficult to innovate in BGP/OSPF
• How long does it take for ISPs to get new
features in BGP ?
• Example: BGP large communities
2009:
First AS with 32bits AS number
September 2016:
draft-ietf-idr-large-community-00
February 2017:
RFC 8092
~March 2018 :
Router Implementation !
9 years before ISPs can use BGP communities
December 2002: draft-lange-
flexible-bgp-communities-00
Faster deployment of new routing
features
VM 0 0 0 0 1
1 1 1 0 0
1 1 1 10
0 0 0 0 1
1 1 1 0 0
1 1 1 10
0 0 0 0 1
1 1 1 0 0
1 1 1 10
Plugin
Protoco
l
CLI SNMP NetConfCLI
RIB
route1 → via R2
routeN → via R4
Internal Data Structure
Neighbor routers context
…
Protocol
SNMPNetConf
Protocol memory
API
T. Wirtgens et al., “The Case for Pluginized Routing Protocols”, ICNP 2019, Chicago
How to safely execute plugins ?
● Userspace eBPF VM
○ Same RISC instruction set (~100) as in Linux
kernel
■ ALU, memory and branch purposes
● Bytecode recompiled to native architecture
● Dedicated, isolated stack memory
○ But no persistence
● Rely on a user-space implementation
○ With relaxed verifier
○ With persistent heap memory
Example: Adding Monitoring to BGP
0 0 0 0
1 1 1 1
0 0 1 1
PRE
V
M
int bgp_update(args)
{
// code
r = decision_process(args);
// ...
// end of function
}
int decision_process(args)
{
// code
// ...
// ...
// ...
return something;
}
0 0 0 0
1 1 1 1
0 0 1 1
POST
V
M
time_t start = time(NULL);
time_t diff = time(NULL) - start;
45
int bgp_update(args)
{
// code
r = decision_process(args);
// ...
// end of function
}
int decision_process(args)
{
// code
// ...
// ...
// ...
return something;
}
Example: protocol function
replacement
46
int bgp_update(args)
{
// code
r = decision_process(args);
// ...
// end of function
}
int decision_process(args)
{
// code
// ...
// ...
// ...
return something;
}
Example: protocol function
replacement
47
int bgp_update(args)
{
// code
r = decision_process(args);
// ...
// end of function
}
int decision_process(args)
{
// code
// ...
// ...
// ...
return something;
}
Example: protocol function
replacement
0 0 0 0
1 1 1 1
0 0 1 1
REPLACE
V
M
Summary : plugin structure
PRE
REPLACE
0 0 0 0
1 1 1 1
0 0 1 1
…
Plugin
heap
stack
ctx
VM
0 0 0 0
1 1 1 1
0 0 1 1
heap
stack
0 0 0 0
1 1 1 1
0 0 1 1
heap
stack
POST …
0 0 0 0
1 1 1 1
0 0 1 1
heap
stack
0 0 0 0
1 1 1 1
0 0 1 1
heap
stack
Read Only
Read Only
Write
Access
ctx
ctx
ctx ctx
48
Protocol Memory
API RIB
Internal Data Structure
Neighbor context
Shared
Memory
VM
VM
VM VM
Use case: Flexible BGP filters
• BGP filters are key for ISPs,
– But they need to be written in special languages
uint64_t
filter_routes_from_even_as(bpf_full_args_t *args)
{
as_t a = bpf_get_args(args, 2);
// from even AS → DENY
if (a % 2 == 0) return FILTER_DENY;
return FILTER_PERMIT; // the route is originated
from odd AS → ACCEPT
}
router bgp 64512
bgp router-id 10.236.87.1
neighbor 10.0.0.1 remote-as 64515
neighbor 10.0.0.1 filter-list IN in
!
! IN list accepts routes originated from odd AS
only
as-path access-list IN permit ^(.+_+)*(.*)1$
as-path access-list IN permit ^(.+_+)*(.*)3$
as-path access-list IN permit ^(.+_+)*(.*)5$
as-path access-list IN permit ^(.+_+)*(.*)7$
as-path access-list IN permit ^(.+_+)*(.*)9$
as-path access-list IN deny any
C-based filter
Performance evaluation
Experiment : Injection of 200K routes to router R via Exabgp
R Exabgp
1Gbps
Performance evaluation
Experiment : Injection of 200K routes to router R via Exabgp
R Exabgp
1Gbps
Performance evaluation
Experiment : Injection of 200K routes to router R via Exabgp
R Exabgp
1Gbps
Agenda
• Evolution of the networking stack
• Making IPv6 Segment Routing programmable
• Making TCP extensible again
• Pluginizing Routing Protocols
• Pluginizing QUIC
The QUIC revolution
• What are the benefits ?
– Deploy without convincing kernel developers/ SDO
HTTP/2
TLS
TCP
IP
Application
QUIC
IP
Application
UDP
Pluginized QUIC
• Key ideas
– Include an eBPF VM inside PQUIC to enable it to
be dynamically extended with bytecode
– Expose a richer set of callback functions and
helpers than inside TCP
– Leverage QUIC's flexible packet format to support
a wide range of extensions
– Leverage QUIC's multistream and security features
to allow client and servers to exchange bytecode
over QUIC connections
Q. Deconinck et al., “Pluginized QUIC”, SIGCOMM’19, August 2019, Beijing
Exchanging plugins
First connection
Initial: Client Hello - “Hey, I support multipath”
Initial: Server Hello - “I want to inject monitoring”
Encrypted - PLUGIN_REQUEST(monitoring)
Encrypted - PLUGIN(monitoring)
...
Let’s monitor
the client state.
Let’s request
monitoring
Bytecode
Exchanging plugins
Next connections
Initial: Client Hello - “Hey, I support multipath and
monitoring”
Initial: Server Hello - “Let’s use monitoring”
Encrypted - STREAM, STAT(info about RTT,
reordering,...)
Encrypted - STREAM
...
Let’s use
monitoring
Added by the
monitoring
plugin
Very Different Use Cases
● Monitoring
● A QUIC VPN
● Multipath
● Forward Erasure Correction
See our SIGCOMM’19
paper for more details
Plugin Lines of C Code Number of bytecodes
Monitoring 500 14
QUIC VPN 500 11
Multipath 2600 32
Forward Erasure
Correction
2500 51
Use case : Monitoring
• Plugin
– Collect statistics about various events in the QUIC
stack
• bytes/packets sent/received, lost, received out-of-
order, etc.
– Exports data to a monitoring server, but could also
transmit them over QUIC connection
– Passive, pluglets are attached in pre or post
Plugin
500 lines of C code
14 pluglets
86 Kbytes of bytecode
See SIGCOMM’19
paper for more details
Use case : Multipath QUIC
• Plugin
– Supports our proposed Multipath QUIC draft
• Connection id and path id, address advertisement
– Includes path manager, packet scheduler (round
robin and lowest rtt) as in MPTCP
– provides similar performance as MPTCP
Plugin
2600 lines of C code
32 pluglets
138 Kbytes of bytecode
See SIGCOMM’19
paper for more details
Use case: Forward Erasure Correction
• Objective
– Encode packets so that losses can be recovered at
the receiver without waiting for retransmissions
• Plugin
– Adds new frame to carry Repair Symbols
– Supports XOR and Random Linear Code (RLC)
• Complex computations are required
Plugin
2500 lines of C code
51 pluglets
236 Kbytes of bytecode
See SIGCOMM’19
paper for more details
Performance overhead
• Some optimisations in the eBPF VM are possible to
reduce this performance overheard
Security and safety concerns
• PQUIC relies on several techniques to ensure
safety of plugins
• Plugins are isolated from PQUIC and each other
– eBPF VM adds code in JIT to validate memory
• We propose a system to certify plugins
– Manual certification like applications in a store
– Tool-assisted certification
• We have successfully used tools to prove termination
• Future work required to develop tools to verify more specific
properties than termination
– Cryptographic certificates are attached to plugins and
can be validated before injecting them
Conclusions …
• eBPF-based Protocol plugins bring benefits to
various protocols
– IPv6 Segment Routing for network programmability
– TCP to collect accurate measurement data, implement
new options, update key algorithms
– BGP with more flexible eBPF filters, OSPF for new LSAs
• Pluginized QUIC goes one step further by
exchanging eBPF plugins over QUIC connections
– Makes the protocol truly extensible
… next steps
• How to redesign network protocols to completely
leverage plugins ?
– A more efficient virtual machine
• Webassembly, improved eBPF, other ?
– A simple base protocol that provides a clean API
• Similar to microkernels, offload more complex or less
frequently used functions to plugins
– Interoperable independent implementations
• The same plugin should work on different implementations
– Tools and techniques to validate plugins
• Not only termination, but other types of automated proofs

More Related Content

What's hot

Implementing IPv6 Segment Routing in the Linux kernel
Implementing IPv6 Segment Routing in the Linux kernelImplementing IPv6 Segment Routing in the Linux kernel
Implementing IPv6 Segment Routing in the Linux kernelOlivier Bonaventure
 
Internet innovation with Multipath TCP
Internet innovation with Multipath TCPInternet innovation with Multipath TCP
Internet innovation with Multipath TCPOlivier Bonaventure
 
Building the Internet of Things with Thingsquare and Contiki - day 2 part 2
Building the Internet of Things with Thingsquare and Contiki - day 2 part 2Building the Internet of Things with Thingsquare and Contiki - day 2 part 2
Building the Internet of Things with Thingsquare and Contiki - day 2 part 2Adam Dunkels
 
Beyond TCP: The evolution of Internet transport protocols
Beyond TCP: The evolution of Internet transport protocolsBeyond TCP: The evolution of Internet transport protocols
Beyond TCP: The evolution of Internet transport protocolsOlivier Bonaventure
 
Beyond TCP: The evolution of Internet transport protocols
Beyond TCP: The evolution of Internet transport protocolsBeyond TCP: The evolution of Internet transport protocols
Beyond TCP: The evolution of Internet transport protocolsOlivier Bonaventure
 
Surviving The Stump The Chump Interview Questions
Surviving The Stump The Chump Interview QuestionsSurviving The Stump The Chump Interview Questions
Surviving The Stump The Chump Interview QuestionsDuane Bodle
 
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)Thomas Graf
 
Network interview questions
Network interview questionsNetwork interview questions
Network interview questionsrajasekar1712
 
Building day 2 upload Building the Internet of Things with Thingsquare and ...
Building day 2   upload Building the Internet of Things with Thingsquare and ...Building day 2   upload Building the Internet of Things with Thingsquare and ...
Building day 2 upload Building the Internet of Things with Thingsquare and ...Adam Dunkels
 
BIRD Routing Daemon
BIRD Routing DaemonBIRD Routing Daemon
BIRD Routing DaemonAPNIC
 
Multipath TCP as Security Solution
Multipath TCP as Security SolutionMultipath TCP as Security Solution
Multipath TCP as Security SolutionNishant Pawar
 
Silverlight Wireshark Analysis
Silverlight Wireshark AnalysisSilverlight Wireshark Analysis
Silverlight Wireshark AnalysisYoss Cohen
 

What's hot (20)

Implementing IPv6 Segment Routing in the Linux kernel
Implementing IPv6 Segment Routing in the Linux kernelImplementing IPv6 Segment Routing in the Linux kernel
Implementing IPv6 Segment Routing in the Linux kernel
 
IPv6 Entreprise Multihoming
IPv6 Entreprise MultihomingIPv6 Entreprise Multihoming
IPv6 Entreprise Multihoming
 
Multipath TCP
Multipath TCPMultipath TCP
Multipath TCP
 
Internet innovation with Multipath TCP
Internet innovation with Multipath TCPInternet innovation with Multipath TCP
Internet innovation with Multipath TCP
 
Part 7 : HTTP/2, UDP and TCP
Part 7 : HTTP/2, UDP and TCPPart 7 : HTTP/2, UDP and TCP
Part 7 : HTTP/2, UDP and TCP
 
Building the Internet of Things with Thingsquare and Contiki - day 2 part 2
Building the Internet of Things with Thingsquare and Contiki - day 2 part 2Building the Internet of Things with Thingsquare and Contiki - day 2 part 2
Building the Internet of Things with Thingsquare and Contiki - day 2 part 2
 
Beyond TCP: The evolution of Internet transport protocols
Beyond TCP: The evolution of Internet transport protocolsBeyond TCP: The evolution of Internet transport protocols
Beyond TCP: The evolution of Internet transport protocols
 
Beyond TCP: The evolution of Internet transport protocols
Beyond TCP: The evolution of Internet transport protocolsBeyond TCP: The evolution of Internet transport protocols
Beyond TCP: The evolution of Internet transport protocols
 
BGP Advanced topics
BGP Advanced topicsBGP Advanced topics
BGP Advanced topics
 
9 ipv6-routing
9 ipv6-routing9 ipv6-routing
9 ipv6-routing
 
Surviving The Stump The Chump Interview Questions
Surviving The Stump The Chump Interview QuestionsSurviving The Stump The Chump Interview Questions
Surviving The Stump The Chump Interview Questions
 
6 app-tcp
6 app-tcp6 app-tcp
6 app-tcp
 
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
 
Network interview questions
Network interview questionsNetwork interview questions
Network interview questions
 
11 bgp-ethernet
11 bgp-ethernet11 bgp-ethernet
11 bgp-ethernet
 
Building day 2 upload Building the Internet of Things with Thingsquare and ...
Building day 2   upload Building the Internet of Things with Thingsquare and ...Building day 2   upload Building the Internet of Things with Thingsquare and ...
Building day 2 upload Building the Internet of Things with Thingsquare and ...
 
BIRD Routing Daemon
BIRD Routing DaemonBIRD Routing Daemon
BIRD Routing Daemon
 
Multipath TCP as Security Solution
Multipath TCP as Security SolutionMultipath TCP as Security Solution
Multipath TCP as Security Solution
 
SEGMENT Routing
SEGMENT RoutingSEGMENT Routing
SEGMENT Routing
 
Silverlight Wireshark Analysis
Silverlight Wireshark AnalysisSilverlight Wireshark Analysis
Silverlight Wireshark Analysis
 

Similar to Making our networking stack truly extensible

Segment Routing v6 (SRv6) Academy Update
Segment Routing v6 (SRv6) Academy Update Segment Routing v6 (SRv6) Academy Update
Segment Routing v6 (SRv6) Academy Update Chunghan Lee
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingKernel TLV
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)Kirill Tsym
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPCilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPThomas Graf
 
draft-georgescu-bmwg-ipv6-tran-tech-benchmarking-00
draft-georgescu-bmwg-ipv6-tran-tech-benchmarking-00draft-georgescu-bmwg-ipv6-tran-tech-benchmarking-00
draft-georgescu-bmwg-ipv6-tran-tech-benchmarking-00Marius Georgescu
 
Scaling the Container Dataplane
Scaling the Container Dataplane Scaling the Container Dataplane
Scaling the Container Dataplane Michelle Holley
 
Pristine rina-sdk-icc-2016
Pristine rina-sdk-icc-2016Pristine rina-sdk-icc-2016
Pristine rina-sdk-icc-2016ICT PRISTINE
 
P4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server Adapters
P4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server AdaptersP4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server Adapters
P4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server AdaptersOpen-NFP
 
The Challenges of SDN/OpenFlow in an Operational and Large-scale Network
The Challenges of SDN/OpenFlow in an Operational and Large-scale NetworkThe Challenges of SDN/OpenFlow in an Operational and Large-scale Network
The Challenges of SDN/OpenFlow in an Operational and Large-scale NetworkOpen Networking Summits
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Michelle Holley
 
Stacks and Layers: Integrating P4, C, OVS and OpenStack
Stacks and Layers: Integrating P4, C, OVS and OpenStackStacks and Layers: Integrating P4, C, OVS and OpenStack
Stacks and Layers: Integrating P4, C, OVS and OpenStackOpen-NFP
 
Three years of OFELIA - taking stock
Three years of OFELIA - taking stockThree years of OFELIA - taking stock
Three years of OFELIA - taking stockFIBRE Testbed
 
Generic network architecture discussion
Generic network architecture discussionGeneric network architecture discussion
Generic network architecture discussionARCFIRE ICT
 
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
BPF  & Cilium - Turning Linux into a Microservices-aware Operating SystemBPF  & Cilium - Turning Linux into a Microservices-aware Operating System
BPF & Cilium - Turning Linux into a Microservices-aware Operating SystemThomas Graf
 
CentOS NFV SIG Introduction and Update
CentOS NFV SIG Introduction and UpdateCentOS NFV SIG Introduction and Update
CentOS NFV SIG Introduction and UpdateTom Herbert
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPFAlex Maestretti
 
Irati goals and achievements - 3rd RINA Workshop
Irati goals and achievements - 3rd RINA WorkshopIrati goals and achievements - 3rd RINA Workshop
Irati goals and achievements - 3rd RINA WorkshopEleni Trouva
 
RINA overview and ongoing research in EC-funded projects, ISO SC6 WG7
RINA overview and ongoing research in EC-funded projects, ISO SC6 WG7RINA overview and ongoing research in EC-funded projects, ISO SC6 WG7
RINA overview and ongoing research in EC-funded projects, ISO SC6 WG7Eleni Trouva
 

Similar to Making our networking stack truly extensible (20)

Segment Routing v6 (SRv6) Academy Update
Segment Routing v6 (SRv6) Academy Update Segment Routing v6 (SRv6) Academy Update
Segment Routing v6 (SRv6) Academy Update
 
FD.IO Vector Packet Processing
FD.IO Vector Packet ProcessingFD.IO Vector Packet Processing
FD.IO Vector Packet Processing
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPCilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDP
 
draft-georgescu-bmwg-ipv6-tran-tech-benchmarking-00
draft-georgescu-bmwg-ipv6-tran-tech-benchmarking-00draft-georgescu-bmwg-ipv6-tran-tech-benchmarking-00
draft-georgescu-bmwg-ipv6-tran-tech-benchmarking-00
 
Scaling the Container Dataplane
Scaling the Container Dataplane Scaling the Container Dataplane
Scaling the Container Dataplane
 
Pristine rina-sdk-icc-2016
Pristine rina-sdk-icc-2016Pristine rina-sdk-icc-2016
Pristine rina-sdk-icc-2016
 
P4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server Adapters
P4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server AdaptersP4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server Adapters
P4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server Adapters
 
The Challenges of SDN/OpenFlow in an Operational and Large-scale Network
The Challenges of SDN/OpenFlow in an Operational and Large-scale NetworkThe Challenges of SDN/OpenFlow in an Operational and Large-scale Network
The Challenges of SDN/OpenFlow in an Operational and Large-scale Network
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
 
Stacks and Layers: Integrating P4, C, OVS and OpenStack
Stacks and Layers: Integrating P4, C, OVS and OpenStackStacks and Layers: Integrating P4, C, OVS and OpenStack
Stacks and Layers: Integrating P4, C, OVS and OpenStack
 
Three years of OFELIA - taking stock
Three years of OFELIA - taking stockThree years of OFELIA - taking stock
Three years of OFELIA - taking stock
 
Generic network architecture discussion
Generic network architecture discussionGeneric network architecture discussion
Generic network architecture discussion
 
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
BPF  & Cilium - Turning Linux into a Microservices-aware Operating SystemBPF  & Cilium - Turning Linux into a Microservices-aware Operating System
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
 
CentOS NFV SIG Introduction and Update
CentOS NFV SIG Introduction and UpdateCentOS NFV SIG Introduction and Update
CentOS NFV SIG Introduction and Update
 
Security Monitoring with eBPF
Security Monitoring with eBPFSecurity Monitoring with eBPF
Security Monitoring with eBPF
 
mTCP使ってみた
mTCP使ってみたmTCP使ってみた
mTCP使ってみた
 
Irati goals and achievements - 3rd RINA Workshop
Irati goals and achievements - 3rd RINA WorkshopIrati goals and achievements - 3rd RINA Workshop
Irati goals and achievements - 3rd RINA Workshop
 
RINA overview and ongoing research in EC-funded projects, ISO SC6 WG7
RINA overview and ongoing research in EC-funded projects, ISO SC6 WG7RINA overview and ongoing research in EC-funded projects, ISO SC6 WG7
RINA overview and ongoing research in EC-funded projects, ISO SC6 WG7
 

More from Olivier Bonaventure

A personal journey towards more reproducible networking research
A personal journey towards more reproducible networking researchA personal journey towards more reproducible networking research
A personal journey towards more reproducible networking researchOlivier Bonaventure
 
Part 11 : Interdomain routing with BGP
Part 11 : Interdomain routing with BGPPart 11 : Interdomain routing with BGP
Part 11 : Interdomain routing with BGPOlivier Bonaventure
 
Part 10 : Routing in IP networks and interdomain routing with BGP
Part 10 : Routing in IP networks and interdomain routing with BGPPart 10 : Routing in IP networks and interdomain routing with BGP
Part 10 : Routing in IP networks and interdomain routing with BGPOlivier Bonaventure
 
Part 8 : TCP and Congestion control
Part 8 : TCP and Congestion controlPart 8 : TCP and Congestion control
Part 8 : TCP and Congestion controlOlivier Bonaventure
 

More from Olivier Bonaventure (20)

Part3-reliable.pptx
Part3-reliable.pptxPart3-reliable.pptx
Part3-reliable.pptx
 
Part10-router.pptx
Part10-router.pptxPart10-router.pptx
Part10-router.pptx
 
Part1-Intro-Apps.pptx
Part1-Intro-Apps.pptxPart1-Intro-Apps.pptx
Part1-Intro-Apps.pptx
 
Part9-congestion.pptx
Part9-congestion.pptxPart9-congestion.pptx
Part9-congestion.pptx
 
Part2-Apps-Security.pptx
Part2-Apps-Security.pptxPart2-Apps-Security.pptx
Part2-Apps-Security.pptx
 
Part11-lan.pptx
Part11-lan.pptxPart11-lan.pptx
Part11-lan.pptx
 
Part5-tcp-improvements.pptx
Part5-tcp-improvements.pptxPart5-tcp-improvements.pptx
Part5-tcp-improvements.pptx
 
Part8-ibgp.pptx
Part8-ibgp.pptxPart8-ibgp.pptx
Part8-ibgp.pptx
 
Part4-reliable-tcp.pptx
Part4-reliable-tcp.pptxPart4-reliable-tcp.pptx
Part4-reliable-tcp.pptx
 
Part7-routing.pptx
Part7-routing.pptxPart7-routing.pptx
Part7-routing.pptx
 
Part6-network-routing.pptx
Part6-network-routing.pptxPart6-network-routing.pptx
Part6-network-routing.pptx
 
Part1-Intro-Apps.pptx
Part1-Intro-Apps.pptxPart1-Intro-Apps.pptx
Part1-Intro-Apps.pptx
 
Part2-Apps-Security.pptx
Part2-Apps-Security.pptxPart2-Apps-Security.pptx
Part2-Apps-Security.pptx
 
Part4-reliable-tcp.pptx
Part4-reliable-tcp.pptxPart4-reliable-tcp.pptx
Part4-reliable-tcp.pptx
 
Part3-reliable.pptx
Part3-reliable.pptxPart3-reliable.pptx
Part3-reliable.pptx
 
A personal journey towards more reproducible networking research
A personal journey towards more reproducible networking researchA personal journey towards more reproducible networking research
A personal journey towards more reproducible networking research
 
Part 12 : Local Area Networks
Part 12 : Local Area Networks Part 12 : Local Area Networks
Part 12 : Local Area Networks
 
Part 11 : Interdomain routing with BGP
Part 11 : Interdomain routing with BGPPart 11 : Interdomain routing with BGP
Part 11 : Interdomain routing with BGP
 
Part 10 : Routing in IP networks and interdomain routing with BGP
Part 10 : Routing in IP networks and interdomain routing with BGPPart 10 : Routing in IP networks and interdomain routing with BGP
Part 10 : Routing in IP networks and interdomain routing with BGP
 
Part 8 : TCP and Congestion control
Part 8 : TCP and Congestion controlPart 8 : TCP and Congestion control
Part 8 : TCP and Congestion control
 

Recently uploaded

iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebiThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebJie Liau
 
Thank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirtsThank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirtsrahman018755
 
Topology of the Network class 8 .ppt pdf
Topology of the Network class 8 .ppt pdfTopology of the Network class 8 .ppt pdf
Topology of the Network class 8 .ppt pdfAnushkaTripathi61
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkklolsDocherty
 
I’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 ShirtI’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 Shirtrahman018755
 
Development Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of appsDevelopment Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of appscristianmanaila2
 
Pvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan
 
The Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyThe Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyDamar Juniarto
 
Statistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdfStatistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdfOndejSur
 
Bug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideBug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideVarun Mithran
 
Reggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirtsrahman018755
 
Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxChloeMeadows1
 
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.Tortogel
 
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital PresenceCyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital PresencePC Doctors NET
 
How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?Linksys Velop Login
 
Premier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdfPremier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdfappinfoedgeca
 

Recently uploaded (16)

iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebiThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
 
Thank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirtsThank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirts
 
Topology of the Network class 8 .ppt pdf
Topology of the Network class 8 .ppt pdfTopology of the Network class 8 .ppt pdf
Topology of the Network class 8 .ppt pdf
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
 
I’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 ShirtI’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 Shirt
 
Development Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of appsDevelopment Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of apps
 
Pvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdf
 
The Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyThe Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case Study
 
Statistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdfStatistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdf
 
Bug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideBug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's Guide
 
Reggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirts
 
Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptx
 
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
 
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital PresenceCyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
 
How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?
 
Premier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdfPremier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdf
 

Making our networking stack truly extensible

  • 1. Making our networking stack truly extensible Olivier Bonaventure with Quentin Deconinck , Cyril Dénos, Fabien Duchêne, Mathieu Jadin David Lebrun, Francois Michel, Maxime Piraux,, Olivier Tilmans, Hoang Tran Viet, Thomas Wirtgen, Mathieu Xhonneux http://inl.info.ucl.ac.be LCN2019 Keynote, October 2019 Partially supported by FNRS, FRIA, MQUIC project (DG06 in cooperation with Tessares),ARC-SDN and a Facebook grant
  • 2. Agenda • Evolution of the networking stack • Making IPv6 Segment Routing programmable • Making TCP extensible again • Pluginizing Routing Protocols • Pluginizing QUIC
  • 4. But deploying TCP extensions remains very difficult • 20th century extensions took more than a decade to be widely deployed – TCP Window Scale – TCP Timestamp • Still not supported by Microsoft Windows – TCP Selective Acknowledgements – Explicit Congestion Notification • Multipath TCP is being deployed, but getting it everywhere will require lots of effort
  • 5. Today's implementations are black boxes Protocol messages Higher Layer Lower Layer API, e.g. socket API IETF
  • 6. Tuning such an implementation • Implementations typically expose a few configuration knobs – Socket options to enable/disable a given feature – Socket options to set some limit (e.g. window) – Sysctl variables for system-wide tuning – Linux modules provide some flexibility • Congestion control as loadable modules • Path managers in Linux Multipath TCP
  • 7. Agenda • Evolution of the networking stack • Making IPv6 Segment Routing programmable • Making TCP extensible again • Pluginizing QUIC • Pluginizing Routing Protocols
  • 8. IPv6 Segment Routing in one slide • Each router advertises its loopback in IGP – Packets contains source route in SRH and follow shortest path to next address in SRH R1 R4 R3 R5 R2 R7 R8 R9 100 3:7 3:7 3:7 8:4:7:3 8:4:7:3 8:4:7:3 8:4:7:3 8:4:7:3
  • 10. IPv6 Segment Routing Network Programming • IPv6 SR enables more than non-shortest paths – Each node advertises one or more prefixes R4 R5 R2 R7 R8 R9 IGP : 2001:…:4/40 FCT1:param FCT2:param Locator Function Param C. Filsfils et al., SRv6 Network Programming, draft-filsfils-spring-srv6- network-programming-03, Dec. 2017
  • 11. Implementing SRv6 Network Programming • First step – Add support for IPv6 Segment Routing in Linux – David Lebrun's PhD thesis • Second step – Find a simple way to enable network operators to truly program their network • Socket options ? • Kernel modules ? • Add eBPF support in Linux's IPv6 Segment Routing implementation Lebrun, D., & Bonaventure, O. (2017, July). Implementing IPv6 Segment Routing in the Linux kernel. In ANRW2017ACM.
  • 12. eBPF • Lightweight virtual machine, in Linux kernel since 2014 – RISC instruction set (~100) • ALU, memory and branch purposes • Bytecode recompiled to native architecture • Verifier – Checks absence of loops, stack usage, … • Dedicated, isolated stack memory – But no persistence • Use cases – Monitoring, SECCOMP, … 01011 10010 x86_64
  • 13. eBPF bytecode Realising Network Programming : the power of eBPF Application verifier K E R N E L bpf syscall map eBPF bytecode eBPF VM M. Xhonneux et al., Leveraging eBPF for programmable network functions with IPv6 Segment Routing, Proc. Conext 2018
  • 14. eBPF for SRv6 • When are eBPF programs called ? – Upon reception of a packet whose address in SRH matches • Which features of the stack can eBPF programs use ? – bpf_lwt_seg6_store_bytes • update parts of SRH – bpf_lwt_seg6_adjust_srh • update TLVs in SRH. – bpf_lwt_seg6_action • execute basic SRv6 function (End.X, End.T, End.B6, End.B6.Encaps and End.DT6) • Each eBPF program returns specific code – BPF_OK, BPF_DROP, BFP_REDIRECT
  • 16. Demonstrated use cases • Delay measurements – Sender timestamps some packets are requests routers to timestamp and tunnel them as well • Hybrid Access Networks – Segments are used to forward packets over different paths and combine them as one router • Failure Detection and recovery – Uses eBPF to implement detection similar to BFD and a simple fast reroute techniques Xhonneux, Mathieu, and Olivier Bonaventure. "Flexible failure detection and fast reroute using eBPF and SRv6." CNSM'18E, 2018.
  • 17. Agenda • Evolution of the networking stack • Making IPv6 Segment Routing programmable • Making TCP extensible again • Pluginizing Routing Protocols • Pluginizing QUIC
  • 18. Debugging TCP performance problems • Classical approaches – Collect packet traces and ask Ph.D. student to analyze them – Look at SNMP MIB, output of netstat, ss, … • Limitations – Either limited visibility or scalability concerns
  • 19. In-protocol debugging with eBPF • eBPF probes can be attached at specific places in the TCP stack to observe unusual events – Retransmission of SYN packet – Reception of out-of-order packets – Peak in measured round-trip-time – Application too slow to recv data from kernel – … • Daemon collects stats and sends them via IPFIX O. Tilmans, O. Bonaventure, COP2: Continuously Observing Protocol Performance, Feb. 2019, arxiv 1902.04280v1
  • 20. Example : SYN retransmissions
  • 22. TCP can be made more extensible • Starting point – Lawrence Brakmo's TCP-BPF patches – Adds various hooks inside the TCP stack to • Callbacks – BPF_SOCK_OPS_TCP_CONNECT_CB, BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB, BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB • Access to socket options – BPF_SETSOCKOPT and BPF_GETSOCKOPT • Read and write TCP state variables (rtt, cwnd, …) – Main use case is to configure TCP parameters on a per connection basis Brakmo, L. (2017). TCP-BPF: Programmatically tuning TCP behavior through BPF. NetDev 2.2.
  • 23. What does TCP-BPF brings ? Protocol messages Callbacks Helpers Triggered by specific events An API used by eBPF programs
  • 24. User Timeout TCP option • Defined in RFC5483 but not supported by Linux • Sender side changes – Add eBPF hooks in tcp_transmit_skb and tcp_options_write – eBPF code controls the transmission of this option • Receiver side changes – Add eBPF hook to tcp_parse_options – eBPF code interprets the received option and adjusts TCP state
  • 26. CPU utilisation on receiver
  • 27. Different use cases • New TCP option to specify initial congestion window – Could be sent by iPhone in function of wireless conditions • New TCP option to specify delayed acknowledgment strategy (delayed ack timer) • Various improvements to Multipath TCP – eBPF-based path manager Tran, Viet-Hoang, and Olivier Bonaventure. "Beyond socket options: making the Linux TCP stack truly extensible.', IFIP Networking(2019).
  • 28. Agenda • Evolution of the networking stack • Making IPv6 Segment Routing programmable • Making TCP extensible again • Pluginizing Routing Protocols • Pluginizing QUIC
  • 29. Difficult to innovate in BGP/OSPF • How long does it take for ISPs to get new features in BGP ? • Example: BGP large communities 2009: First AS with 32bits AS number September 2016: draft-ietf-idr-large-community-00 February 2017: RFC 8092 ~March 2018 : Router Implementation ! 9 years before ISPs can use BGP communities December 2002: draft-lange- flexible-bgp-communities-00
  • 30. Faster deployment of new routing features VM 0 0 0 0 1 1 1 1 0 0 1 1 1 10 0 0 0 0 1 1 1 1 0 0 1 1 1 10 0 0 0 0 1 1 1 1 0 0 1 1 1 10 Plugin Protoco l CLI SNMP NetConfCLI RIB route1 → via R2 routeN → via R4 Internal Data Structure Neighbor routers context … Protocol SNMPNetConf Protocol memory API T. Wirtgens et al., “The Case for Pluginized Routing Protocols”, ICNP 2019, Chicago
  • 31. How to safely execute plugins ? ● Userspace eBPF VM ○ Same RISC instruction set (~100) as in Linux kernel ■ ALU, memory and branch purposes ● Bytecode recompiled to native architecture ● Dedicated, isolated stack memory ○ But no persistence ● Rely on a user-space implementation ○ With relaxed verifier ○ With persistent heap memory
  • 32. Example: Adding Monitoring to BGP 0 0 0 0 1 1 1 1 0 0 1 1 PRE V M int bgp_update(args) { // code r = decision_process(args); // ... // end of function } int decision_process(args) { // code // ... // ... // ... return something; } 0 0 0 0 1 1 1 1 0 0 1 1 POST V M time_t start = time(NULL); time_t diff = time(NULL) - start;
  • 33. 45 int bgp_update(args) { // code r = decision_process(args); // ... // end of function } int decision_process(args) { // code // ... // ... // ... return something; } Example: protocol function replacement
  • 34. 46 int bgp_update(args) { // code r = decision_process(args); // ... // end of function } int decision_process(args) { // code // ... // ... // ... return something; } Example: protocol function replacement
  • 35. 47 int bgp_update(args) { // code r = decision_process(args); // ... // end of function } int decision_process(args) { // code // ... // ... // ... return something; } Example: protocol function replacement 0 0 0 0 1 1 1 1 0 0 1 1 REPLACE V M
  • 36. Summary : plugin structure PRE REPLACE 0 0 0 0 1 1 1 1 0 0 1 1 … Plugin heap stack ctx VM 0 0 0 0 1 1 1 1 0 0 1 1 heap stack 0 0 0 0 1 1 1 1 0 0 1 1 heap stack POST … 0 0 0 0 1 1 1 1 0 0 1 1 heap stack 0 0 0 0 1 1 1 1 0 0 1 1 heap stack Read Only Read Only Write Access ctx ctx ctx ctx 48 Protocol Memory API RIB Internal Data Structure Neighbor context Shared Memory VM VM VM VM
  • 37. Use case: Flexible BGP filters • BGP filters are key for ISPs, – But they need to be written in special languages uint64_t filter_routes_from_even_as(bpf_full_args_t *args) { as_t a = bpf_get_args(args, 2); // from even AS → DENY if (a % 2 == 0) return FILTER_DENY; return FILTER_PERMIT; // the route is originated from odd AS → ACCEPT } router bgp 64512 bgp router-id 10.236.87.1 neighbor 10.0.0.1 remote-as 64515 neighbor 10.0.0.1 filter-list IN in ! ! IN list accepts routes originated from odd AS only as-path access-list IN permit ^(.+_+)*(.*)1$ as-path access-list IN permit ^(.+_+)*(.*)3$ as-path access-list IN permit ^(.+_+)*(.*)5$ as-path access-list IN permit ^(.+_+)*(.*)7$ as-path access-list IN permit ^(.+_+)*(.*)9$ as-path access-list IN deny any C-based filter
  • 38. Performance evaluation Experiment : Injection of 200K routes to router R via Exabgp R Exabgp 1Gbps
  • 39. Performance evaluation Experiment : Injection of 200K routes to router R via Exabgp R Exabgp 1Gbps
  • 40. Performance evaluation Experiment : Injection of 200K routes to router R via Exabgp R Exabgp 1Gbps
  • 41. Agenda • Evolution of the networking stack • Making IPv6 Segment Routing programmable • Making TCP extensible again • Pluginizing Routing Protocols • Pluginizing QUIC
  • 42. The QUIC revolution • What are the benefits ? – Deploy without convincing kernel developers/ SDO HTTP/2 TLS TCP IP Application QUIC IP Application UDP
  • 43. Pluginized QUIC • Key ideas – Include an eBPF VM inside PQUIC to enable it to be dynamically extended with bytecode – Expose a richer set of callback functions and helpers than inside TCP – Leverage QUIC's flexible packet format to support a wide range of extensions – Leverage QUIC's multistream and security features to allow client and servers to exchange bytecode over QUIC connections Q. Deconinck et al., “Pluginized QUIC”, SIGCOMM’19, August 2019, Beijing
  • 44. Exchanging plugins First connection Initial: Client Hello - “Hey, I support multipath” Initial: Server Hello - “I want to inject monitoring” Encrypted - PLUGIN_REQUEST(monitoring) Encrypted - PLUGIN(monitoring) ... Let’s monitor the client state. Let’s request monitoring Bytecode
  • 45. Exchanging plugins Next connections Initial: Client Hello - “Hey, I support multipath and monitoring” Initial: Server Hello - “Let’s use monitoring” Encrypted - STREAM, STAT(info about RTT, reordering,...) Encrypted - STREAM ... Let’s use monitoring Added by the monitoring plugin
  • 46. Very Different Use Cases ● Monitoring ● A QUIC VPN ● Multipath ● Forward Erasure Correction See our SIGCOMM’19 paper for more details Plugin Lines of C Code Number of bytecodes Monitoring 500 14 QUIC VPN 500 11 Multipath 2600 32 Forward Erasure Correction 2500 51
  • 47. Use case : Monitoring • Plugin – Collect statistics about various events in the QUIC stack • bytes/packets sent/received, lost, received out-of- order, etc. – Exports data to a monitoring server, but could also transmit them over QUIC connection – Passive, pluglets are attached in pre or post Plugin 500 lines of C code 14 pluglets 86 Kbytes of bytecode See SIGCOMM’19 paper for more details
  • 48. Use case : Multipath QUIC • Plugin – Supports our proposed Multipath QUIC draft • Connection id and path id, address advertisement – Includes path manager, packet scheduler (round robin and lowest rtt) as in MPTCP – provides similar performance as MPTCP Plugin 2600 lines of C code 32 pluglets 138 Kbytes of bytecode See SIGCOMM’19 paper for more details
  • 49. Use case: Forward Erasure Correction • Objective – Encode packets so that losses can be recovered at the receiver without waiting for retransmissions • Plugin – Adds new frame to carry Repair Symbols – Supports XOR and Random Linear Code (RLC) • Complex computations are required Plugin 2500 lines of C code 51 pluglets 236 Kbytes of bytecode See SIGCOMM’19 paper for more details
  • 50. Performance overhead • Some optimisations in the eBPF VM are possible to reduce this performance overheard
  • 51. Security and safety concerns • PQUIC relies on several techniques to ensure safety of plugins • Plugins are isolated from PQUIC and each other – eBPF VM adds code in JIT to validate memory • We propose a system to certify plugins – Manual certification like applications in a store – Tool-assisted certification • We have successfully used tools to prove termination • Future work required to develop tools to verify more specific properties than termination – Cryptographic certificates are attached to plugins and can be validated before injecting them
  • 52. Conclusions … • eBPF-based Protocol plugins bring benefits to various protocols – IPv6 Segment Routing for network programmability – TCP to collect accurate measurement data, implement new options, update key algorithms – BGP with more flexible eBPF filters, OSPF for new LSAs • Pluginized QUIC goes one step further by exchanging eBPF plugins over QUIC connections – Makes the protocol truly extensible
  • 53. … next steps • How to redesign network protocols to completely leverage plugins ? – A more efficient virtual machine • Webassembly, improved eBPF, other ? – A simple base protocol that provides a clean API • Similar to microkernels, offload more complex or less frequently used functions to plugins – Interoperable independent implementations • The same plugin should work on different implementations – Tools and techniques to validate plugins • Not only termination, but other types of automated proofs

Editor's Notes

  1. Autre chose que ce que l’ietf propose
  2. context memory api etc
  3. The idea of Pluginized QUIC is to revisit how protocol implementations are structured In particular, the transport protocol is now viewed as a set of basic functions which can now be easily mapped to implementation methods In this canvas, we can insert plugins, which are a set of modified or added protocol functions For instance, in the base implementation, we find operations for RTO computations, for the preparation of header or to handle new data from the application. In PQUIC, a plugin can change the algorithms, for instance the RTO computation, or even add new functions such as the support of unreliable messages. Our PQUIC provides dynamic per-connection customization allowing different algorithms on different connections, thanks to the isolation between them.
  4. Now that we have a mechanism to exchange plugins, what is needed at the PQUIC implementations to run the bytecode?
  5. Now, to exemplify how protocol operations appear in PQUIC, consider the following situation where an host receive a packet. In QUIC and thus PQUIC, a packet is composed of a small cleartext header and an encrypted payload containing the frames. So first, the host needs to process the incoming packet, with potentially decryption. Then it needs to parse the packet header, and all of the frames contained in the packet. While receiving the ACK frame, the host can estimate the latency of the network and update its retransmit timer. All these operations are base operations of the protocol, which thus map into implementation functions provided by the core of PQUIC.
  6. What if we want to recompute the retransmit timer? Just attach a VM at that place. How to handle the reception of STAT frames used to exchange monitoring information? Just create a new protocol operation to process the STAT frame and attach a VM.
  7. HOw can we insert the code here? Actually, the implementation of an operation is placed in an hook called REPLACE, which refer by default to the built-in implementation. If we want to change this code, we can simply attach a VM to this hook that will replace the behavior of the protocol operation. The code inserted at that hook has full read/write access to the connection state.
  8. Monitoring use case, see how code works, so just look at arguments and outputs of the protocol operations. PRE and POST hooks for that purpose, where VMs can be attached at that points. Those are read-only hooks to the connection state, but this enables multiple VMs, possibly for different purposes, to observe the operations.
  9. Materialize OR
  10. Monitoring case, inserted observer and write STAT frames, how to communicate? Use shared heap memory. Another plugin for another use case, memory is isolated.
  11. Mentionner Christian Huitema, lien vers Github
  12. Add LoC Table
  13. Pas mettre en gris le tableau