FreeBSD VPC Introduction

Sean Chittenden
Sean ChittendenSoftware Engineering
FreeBSD/VPC
Introduction to Virtual Private Cloud
Virtualization Status
Compute Isolation Status
• bhyve(4) is a stable, performant hypervisor

• ZFS zvol passthrough to guests works well

TIP: use multiple zvols per guest and stripe IO across
devices using lvcreate(8)

• Good CPU isolation - CPU pinned for low-density
workloads

• Perfect memory isolation (guest memory is wired)
Compute Isolation
em0
Guest 1
Customer A
Guest 2
Customer B
Network Isolation Status
• Network isolation is not core to bhyve(4) today

• Use of VNET(9) for manipulating FIBS for tap(4)
interfaces is possible, but limited and not performant
Guest Workloads
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
bridge0
tap51 tap52
Incomplete Solution
• bhyve(4) guests run customer workloads

• Cloud providers need a single FIB for the underlay
network

• Guests run in isolated overlay networks

• How do you map guests to their respective overlay
network?
Multi-Host Network Isolation
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
???
Possible Solution
• Plug a tap(4) device into a guest
• Plug tap(4) device into a bridge(4)
• Plug the physical or cloned interface into the underlay
bridge(4)
if_bridge(4) Approach
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
tap51
bridge0
tap50 tap52
bridge2bridge1
Possible Solution++
• Plug a tap(4) device into a guest
• Plug tap(4) device into a bridge(4)
• Plug a vxlan(4) interface into a per-subnet bridge(4)
• Plug the vxlan(4) into an underlay bridge(4) instance
• Plug the physical or cloned interface into the underlay
bridge(4)
Fuster Cluck Isolation
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
bridge1
tap51 tap52tap50
bridge0
vxlan1vxlan0
bridge2
Sad Performance
• Performance was "uninteresting"
• 1-2Gbps?
Problems with

tap(4)/bridge(4)/vxlan(4)/VNET(9)
• tap(4) is slow

• bridge(4) is slow

• vxlan(4) sends received packets through ip_input()
twice (i.e. "sub-optimal")

• VNET(9) virtualizes underlay networks, not overlay networks

• How do you ARP between VMs in the overlay network?

• How do you perform vxlan(4) encap?
uvxbridge(8) POC
uvxbridge(8) POC provides a high PPS interface and
guest VXLAN encapsulation using netmap(9)/
ptnetmap(4):



https://github.com/joyent/uvxbridge
Guest Workloads
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
tap51 tap52tap50
uvxbridge
Reasonable Performance
• 21Gbps guest-to-guest on the same host
• 15Gbps guest-to-guest across the wire
• Supports AES PSK encryption for cross-DC traffic
• Incomplete, but a successful POC
New Approach
• Build a network isolation subsystem aligned with the
capabilities of network hardware
• Reduce the number of times a packet is copied
• Reduce context switches
• Build a bottom-up abstraction rooted in the capabilities of
hardware, not a hardware implementation defined in terms
of an administrative policy
FreeBSD/VPC
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vmnic1
ethlink0
vmnic0 vmnic2
vpcsw1vpcsw0
FreeBSD/VPC
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vmnic1
ethlink0
vmnic0 vmnic2
vpcsw1
vpcsw0
vpcp0
vpcp1
vpcp3 vpcp4
vpcp5
FreeBSD/VPC Multi-Host
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vpclink0
vmnic0
vpcsw1vpcsw0
vmnic1 vmnic2
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vpclink0
vmnic0
vpcsw1vpcsw0
vmnic1 vmnic2
???
VXLAN to the Rescue
• Encapsulates all IP packets as UDP

• Adds a preamble to IP packet

• Tags packets and with a VXLAN ID, known as a VNI

• VXLAN is similar to VLAN tagging, but embeds tagging in
the IP header, not in the L2 frame
VXLAN
Encapsulated Ethernet Packet
Physical Ethernet Frame
1500B
OuterFrameChecksum
OuterEthernetHeader
OuterIPHeader
OuterUDPHeader
VXLANHeader
InnerSourceMAC(SMAC)
InnerDestMAC(DMAC)
802.1QHeader
Payload
EtherType
FreeBSD/VPC Multi-Host
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vpclink0
vmnic0
vpcsw1vpcsw0
vmnic1 vmnic2
VNI 123
VNI 987
VNI 123 VNI 987
VXLAN
Packets
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vpclink0
vmnic0
vpcsw1vpcsw0
vmnic1 vmnic2
vpc(4) Interfaces
• vpcsw(4) - switches packets - one switch per customer per host,
multiple subnets supported in the same switch

• vmnic(4) - dedicated guest NIC, looks like a virtio network device
to guests

• vpcp(4) - plugs vmnic(4) ports into vpcsw(4) switches

• vpci(4) - Non-bhyve(4) interface, usable in jails(2)

• ethlink(4) - Performs unencapsulated packet forwarding, wraps
a cloned or physical ethernet interface

• vpclink(4) - Performs VXLAN encapsulation
Interface Design
ioctl(2)
• Initially used ioctl(2)
• Unable to completely secure via capsicum(4) - able to
specify flags, but not scope the target device
• ioctl(2) is the kernel

equivalent of an HTTP PUT,

the interface for everything

without a design



"Hi ifconfig(8), I'm

looking at you."
One Device Per Object?
• New device in /dev for every new interface:
/dev/vpcsw0

/dev/vpcsw1

/dev/vpcp0
• Query via reads to individual devices in/dev
• Control devices via writes
• Security via devd(8)?
• Using VFS ACLs to secure network primitives?



One Device Per Object?
• New device in /dev for every new interface:
/dev/vpcsw0

/dev/vpcsw1

/dev/vpcp0
• Query via reads to individual devices in/dev
• Control devices via writes
• Security via devd(8)?
• Using VFS ACLs to secure network primitives?
Da Vinci mashed up with ...

One Device Per Object?
• New device in /dev for every new interface:
/dev/vpcsw0

/dev/vpcsw1

/dev/vpcp0
• Query via reads to individual devices in/dev
• Control devices via writes
• Security via devd(8)?
• Using VFS ACLs to secure network primitives?
Da Vinci mashed up with Jackson Pollock.

One Device Per Object?
• New device in /dev for every new interface:
/dev/vpcsw0

/dev/vpcsw1

/dev/vpcp0
• Query via reads to individual devices in/dev
• Control devices via writes
• Security via devd(8)?
• Using VFS ACLs to secure network primitives?
Da Vinci mashed up with Jackson Pollock.

Kernel API design hate crime.
New System Calls
• vpc_open(2) - Creates a new VPC descriptor

• vpc_ctl(2) - Manipulates VPC descriptors

• Capsicum-like, intended for privilege separation

• Intended for idempotent tooling

• Makes aggressive use of UUIDs as operator handles to
be compatible with Triton
vpc_open(2)
int

vpc_open(const vpc_id_t *vpc_id,

vpc_type_t obj_type,

vpc_flags_t flags);
• Creates a new "VPC descriptor"
• Similar to open(2)
• Manipulate descriptor via vcp_ctl(2)
• Responds to close(2)
• Priv-sep native security semantics
• Version aware
• System Call 580
vpc_id_t
int

vpc_open(const vpc_id_t *vpc_id,

vpc_handle_type_t obj_type,

vpc_flags_t flags);
16 bytes, UUID-like:



type ID struct {

TimeLow uint32

TimeMid uint16

TimeHi uint16

ClockSeqHi uint8

ObjType ObjType

Node [6]byte // Default MAC address vpc(4)

}
vpc_id_t
int

vpc_open(const vpc_id_t *vpc_id,

vpc_handle_type_t obj_type,

vpc_flags_t flags);
• All VPC objects have a MAC address
• Reused the node component of the vpc_id to set the MAC
address
vpc_handle_type_t
int

vpc_open(const vpc_id_t *vpc_id,

vpc_type_t obj_type,

vpc_flags_t flags);
typedef struct {

uint64_t vht_version:4;

uint64_t vht_pad1:4;

uint64_t vht_obj_type:8;

uint64_t vht_pad2:48;

} vpc_handle_type_t;
vpc_obj_type
int

vpc_open(const vpc_id_t *vpc_id,

vpc_type_t obj_type,

vpc_flags_t flags);
enum vpc_obj_type {

VPC_OBJ_INVALID = 0,

VPC_OBJ_SWITCH = 1,

VPC_OBJ_PORT = 2,

VPC_OBJ_ROUTER = 3,

VPC_OBJ_NAT = 4,

VPC_OBJ_VPCLINK = 5,

VPC_OBJ_VMNIC = 6,

VPC_OBJ_MGMT = 7,

VPC_OBJ_ETHLINK = 8,

VPC_OBJ_META = 9,

VPC_OBJ_TYPE_ANY = 10,

VPC_OBJ_TYPE_MAX = 10,

};
vpc_id_t
int

vpc_open(const vpc_id_t *vpc_id,

vpc_handle_type_t obj_type,

vpc_flags_t flags);
• 16 Versions
• 255 Object Types
• 4080 Object Types ought to be enough for anybody.
vpc_flags_t
int

vpc_open(const vpc_id_t *vpc_id,

vpc_type_t obj_type,

vpc_flags_t flags);
#define VPC_F_CREATE (1ULL << 0)

#define VPC_F_OPEN (1ULL << 1)

#define VPC_F_READ (1ULL << 2)

#define VPC_F_WRITE (1ULL << 3)
NOTE: going to revisit flags to enable a bit field for
capabilities per VPC Object Type
vpc_ctl(2)
int

vpc_ctl(int vpcd, vpc_op_t op,

size_t keylen, const void *key,

size_t *vallen, void *buf);
• Only interface for manipulating a VPC object
• Available operations different per VPC Object
• Inspired by ioctl(2), capsicum(4), and HTTP
• System Call 581
vpc_ctl(2)
enum vpc_vpcsw_op_type {
VPC_VPCSW_INVALID = 0,
VPC_VPCSW_PORT_ADD = 1,
VPC_VPCSW_PORT_DEL = 2,
VPC_VPCSW_PORT_UPLINK_SET = 3,
VPC_VPCSW_PORT_UPLINK_GET = 4,
VPC_VPCSW_STATE_GET = 5,
VPC_VPCSW_STATE_SET = 6,
VPC_VPCSW_RESET = 7,
VPC_VPCSW_RESPONSE_NDV4 = 8,
VPC_VPCSW_RESPONSE_NDV6 = 9,
VPC_VPCSW_RESPONSE_DHCPV4 = 10,
VPC_VPCSW_RESPONSE_DHCPV6 = 11,
VPC_VPCSW_OP_TYPE_MAX = 11,
};
vpc_ctl(2)
enum vpc_vpcp_op_type {
VPC_VPCP_INVALID = 0,
VPC_VPCP_CONNECT = 1,
VPC_VPCP_DISCONNECT = 2,
VPC_VPCP_VNI_GET = 3,
VPC_VPCP_VNI_SET = 4,
VPC_VPCP_VTAG_GET = 5,
VPC_VPCP_VTAG_SET = 6,
VPC_VPCP_UNUSED7 = 7,
VPC_VPCP_UNUSED8 = 8,
VPC_VPCP_PEER_ID_GET = 9,
VPC_VPCP_MAX = 9,
};
vpc_ctl(2)
enum vpc_vmnic_op_type {
VPC_VMNIC_INVALID = 0,
VPC_VMNIC_NQUEUES_GET = 1,
VPC_VMNIC_NQUEUES_SET = 2,
VPC_VMNIC_UNUSED3 = 3,
VPC_VMNIC_UNUSED4 = 4,
VPC_VMNIC_UNUSED5 = 5,
VPC_VMNIC_UNUSED6 = 6,
VPC_VMNIC_ATTACH = 7,
VPC_VMNIC_MSIX = 8,
VPC_VMNIC_FREEZE = 9,
VPC_VMNIC_UNFREEZE = 10,
VPC_VMNIC_OP_TYPE_MAX = 10,
};
What's it all mean?
• Network interfaces are first-class objects
• Network interfaces are pluggable
• Configurations can be arbitrarily complex
• All vpc(4) interfaces are iflib(9) and can be
tcpdump(8)'ed
• Performance is nice
Tooling
Tooling
• Able to be cross-compiled
• Developer-friendly
• Stable ABI
• Idempotent Operations
• No a priori knowledge of the target
OS required (i.e. no headers required
to cross-compile tooling)
Tooling
• Able to be cross-compiled
• Developer-friendly
• Stable ABI
• Idempotent Operations
• No a priori knowledge of the target
OS required (i.e. no headers required
to cross-compile tooling)
ELI5: vpc(4) edition
ELI5: vpc(4) edition
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vpclink0
vmnic0
vpcsw1vpcsw0
vmnic1 vmnic2
VNI 123
VNI 987
VNI 123 VNI 987
VXLAN
Packets
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vpclink0
vmnic0
vpcsw1vpcsw0
vmnic1 vmnic2
ELI5: vpc(4) Assumptions
• Guest is running Ubuntu or CentOS Linux

• Multi-Queue TX/RX in Host

• Multiple network queues available in the guest

• Underlay hosts can pass traffic

• Overlay hosts are in the same subnet
ELI5: vpc(4) edition
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vmnic1
ethlink0
vmnic0 vmnic2
vpcsw1
vpcsw0
vpcp0
vpcp1
vpcp3 vpcp4
vpcp5
ELI5: vpc(4) edition
em0
Guest 3
Customer B
Guest 2
Customer B
vmnic1
ethlink0
vmnic2
vpcsw1
vpcp3 vpcp4
vpcp5
ELI5: vpcsw(4) Setup
VNI=123

VLAN=456

VPCSW0_ID=da64c3f3-095d-91e5-df01-5aabcfc52468
vpc switch create 

--switch-id=${VPCSW0_ID} 

--vni=${VNI} 

--vlan=${VLAN}
ELI5: vpc(4) edition
em0
Guest 3
Customer B
Guest 2
Customer B
vpcsw1
ELI5: vmnic(4) Setup
VMNIC0_ID=07f95a11-6788-2ae7-c306-ba95cff1db38

VPCP0_ID=fd436f9c-1f77-11e8-8002-0cc47a6c7d1e



vpc vmnic create 

--vmnic-id=${VMNIC0_ID}



vpc switch port add 

--switch-id=${VPCSW0_ID} 

--port-id=${VPCP0_ID}



vpc switch port connect 

--port-id=${VPCP0_ID} 

--interface-id=${VMNIC0_ID}
ELI5: vpc(4) edition
em0
Guest 3
Customer B
Guest 2
Customer B
vmnic1
vpcsw1
vpcp3
ELI5: vmnic(4) Setup
VMNIC1_ID=a774ba3a-1f77-11e8-8006-0cc47a6c7d1e

VPCP1_ID=0ebf50e1-1f79-11e8-8002-0cc47a6c7d1e



vpc vmnic create 

--vmnic-id=${VMNIC1_ID}



vpc switch port add 

--switch-id=${VPCSW0_ID} 

--port-id=${VPCP1_ID}



vpc switch port connect 

--port-id=${VPCP1_ID} 

--interface-id=${VMNIC1_ID}
ELI5: vpc(4) edition
em0
Guest 3
Customer B
Guest 2
Customer B
vmnic1 vmnic2
vpcsw1
vpcp3 vpcp4
ELI5: ethlink(4) Setup
ETHLINK0_ID=5c4acd32-1b8d-11e8-b408-0cc47a6c7d1e

UPLINK_PORT_ID=ea58b648-203b-a707-cdf6-7a552c8d5295

UPLINK_IF=em0



vpc switch port add 

--switch-id=${VPCSW0_ID} 

--uplink 

--port-id=${UPLINK_PORT_ID} 

--l2-name=${UPLINK_IF} 

--ethlink-id=${ETHLINK0_ID}
ELI5: vpc(4) edition
em0
Guest 3
Customer B
Guest 2
Customer B
vmnic1
ethlink0
vmnic2
vpcsw1
vpcp3 vpcp4
vpcp5
ELI5: vpc(4) + VXLAN
ELI5: vpc(4) edition
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vpclink0
vmnic0
vpcsw1vpcsw0
vmnic1 vmnic2
VNI 123
VLAN 456
VNI 987
VNI 123
VTAG 456 VNI 987
VXLAN
Packets
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vpclink0
vmnic0
vpcsw1vpcsw0
vmnic1 vmnic2
10.65.5.161
10.65.5.162
ELI5: vpclink(4) Setup
• How do you do ARP or IPv6 Neighbor Discovery?
• Broadcast?
• Multicast?
ELI5: vpc(4) edition
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vpclink0
vmnic0
vpcsw1vpcsw0
vmnic1 vmnic2
VNI 123
VLAN 456
VNI 987
VNI 123
VTAG 456 VNI 987
VXLAN
Packets
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vpclink0
vmnic0
vpcsw1vpcsw0
vmnic1 vmnic2
10.65.5.161
10.65.5.162
???
VTEP: VXLAN Tunnel End Point
• vpcsw(4) traps broadcast packets
• If the packet is:
• a broadcast packet and
• either an IPv4 ARP or IPv6 ND packet

Packet is added to a knote (part of kqueue(2)) and passed to all
kqueue(2) subscribers filtering for EVFILT_VPC filters
• Packet payload is stored in user-allocated buffer pointed to by ext[0]
• Userland utility parses packet and performs lookup
• Overlay and underlay forwarding information written back to the kernel
via vpc_ctl(2)
• vpcsw(4) caches forwarding information for the src/dst MAC tuple
Ongoing Work
• Firewalling - integrated at vpcp(4)

• Routing

• NAT

• Userland Control Plane (including setup and teardown of
bhyve(4) guests via something not a shell script)
vpc(8)
% doas vpc vm create
% doas vpc vm run
• bhyve(4) VM creation and launch tool
• Unifies network isolation with compute isolation
Desktop Software + vpc(4)
• vpc(4) + NAT == Desktop Software Hotness

• Ability to sandbox VMs with no prior knowledge of the
host's networking

• Uses:

% vagrant up
% packer build
No more dependency on pf(4) or dnsmasq(8)
Code
• Kernel:

https://github.com/joyent/freebsd/tree/projects/VPC

• Kernel Libraries:

https://github.com/joyent/freebsd/tree/projects/VPC/
libexec/go/src/go.freebsd.org/sys/vpc

• Userland tooling:

https://github.com/sean-/vpc



Future home:



https://github.com/joyent-/vpc
Questions?
1 of 70

Recommended

Virtunoid: Breaking out of KVM by
Virtunoid: Breaking out of KVMVirtunoid: Breaking out of KVM
Virtunoid: Breaking out of KVMNelson Elhage
20.7K views54 slides
"One network to rule them all" - OpenStack Summit Austin 2016 by
"One network to rule them all" - OpenStack Summit Austin 2016"One network to rule them all" - OpenStack Summit Austin 2016
"One network to rule them all" - OpenStack Summit Austin 2016Phil Estes
955 views18 slides
Pipework: Software-Defined Network for Containers and Docker by
Pipework: Software-Defined Network for Containers and DockerPipework: Software-Defined Network for Containers and Docker
Pipework: Software-Defined Network for Containers and DockerJérôme Petazzoni
9.1K views25 slides
Virtualized network with openvswitch by
Virtualized network with openvswitchVirtualized network with openvswitch
Virtualized network with openvswitchSim Janghoon
19.4K views18 slides
How Networking works with Data Science by
How Networking works with Data Science How Networking works with Data Science
How Networking works with Data Science HungWei Chiu
608 views49 slides

More Related Content

What's hot

Open v switch20150410b by
Open v switch20150410bOpen v switch20150410b
Open v switch20150410bRichard Kuo
934 views19 slides
Docker networking basics & coupling with Software Defined Networks by
Docker networking basics & coupling with Software Defined NetworksDocker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined NetworksAdrien Blind
65.6K views28 slides
Linux Native VXLAN Integration - CloudStack Collaboration Conference 2013, Sa... by
Linux Native VXLAN Integration - CloudStack Collaboration Conference 2013, Sa...Linux Native VXLAN Integration - CloudStack Collaboration Conference 2013, Sa...
Linux Native VXLAN Integration - CloudStack Collaboration Conference 2013, Sa...Toshiaki Hatano
3.9K views18 slides
Docker Networking by
Docker NetworkingDocker Networking
Docker NetworkingKingston Smiler
3.4K views13 slides
Docker networking by
Docker networkingDocker networking
Docker networkinglakshman kumar Vit.Lakshman
154 views12 slides
LCNA14: Security in the Cloud: Containers, KVM, and Xen - George Dunlap, Citr... by
LCNA14: Security in the Cloud: Containers, KVM, and Xen - George Dunlap, Citr...LCNA14: Security in the Cloud: Containers, KVM, and Xen - George Dunlap, Citr...
LCNA14: Security in the Cloud: Containers, KVM, and Xen - George Dunlap, Citr...The Linux Foundation
2.4K views50 slides

What's hot(20)

Open v switch20150410b by Richard Kuo
Open v switch20150410bOpen v switch20150410b
Open v switch20150410b
Richard Kuo934 views
Docker networking basics & coupling with Software Defined Networks by Adrien Blind
Docker networking basics & coupling with Software Defined NetworksDocker networking basics & coupling with Software Defined Networks
Docker networking basics & coupling with Software Defined Networks
Adrien Blind65.6K views
Linux Native VXLAN Integration - CloudStack Collaboration Conference 2013, Sa... by Toshiaki Hatano
Linux Native VXLAN Integration - CloudStack Collaboration Conference 2013, Sa...Linux Native VXLAN Integration - CloudStack Collaboration Conference 2013, Sa...
Linux Native VXLAN Integration - CloudStack Collaboration Conference 2013, Sa...
Toshiaki Hatano3.9K views
LCNA14: Security in the Cloud: Containers, KVM, and Xen - George Dunlap, Citr... by The Linux Foundation
LCNA14: Security in the Cloud: Containers, KVM, and Xen - George Dunlap, Citr...LCNA14: Security in the Cloud: Containers, KVM, and Xen - George Dunlap, Citr...
LCNA14: Security in the Cloud: Containers, KVM, and Xen - George Dunlap, Citr...
How VXLAN works on Linux by Etsuji Nakai
How VXLAN works on LinuxHow VXLAN works on Linux
How VXLAN works on Linux
Etsuji Nakai26.7K views
Osdc2014 openstack networking yves_fauser by yfauser
Osdc2014 openstack networking yves_fauserOsdc2014 openstack networking yves_fauser
Osdc2014 openstack networking yves_fauser
yfauser3.5K views
XPDS14: OpenXT - Security and the Properties of a Xen Virtualisation Platform... by The Linux Foundation
XPDS14: OpenXT - Security and the Properties of a Xen Virtualisation Platform...XPDS14: OpenXT - Security and the Properties of a Xen Virtualisation Platform...
XPDS14: OpenXT - Security and the Properties of a Xen Virtualisation Platform...
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration by James Denton
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
James Denton7.9K views
Linux Tag 2014 OpenStack Networking by yfauser
Linux Tag 2014 OpenStack NetworkingLinux Tag 2014 OpenStack Networking
Linux Tag 2014 OpenStack Networking
yfauser2.1K views
The Basic Introduction of Open vSwitch by Te-Yen Liu
The Basic Introduction of Open vSwitchThe Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitch
Te-Yen Liu67.6K views
XPDS14 - Xen as High-Performance NFV Platform - Jun Nakajima, Intel by The Linux Foundation
XPDS14 - Xen as High-Performance NFV Platform - Jun Nakajima, IntelXPDS14 - Xen as High-Performance NFV Platform - Jun Nakajima, Intel
XPDS14 - Xen as High-Performance NFV Platform - Jun Nakajima, Intel
OpenStack networking by Sim Janghoon
OpenStack networkingOpenStack networking
OpenStack networking
Sim Janghoon8.2K views
Single Host Docker Networking by allingeek
Single Host Docker NetworkingSingle Host Docker Networking
Single Host Docker Networking
allingeek1.1K views
SDNDS.TW Mininet by NCTU
SDNDS.TW MininetSDNDS.TW Mininet
SDNDS.TW Mininet
NCTU2.2K views
OpenStack Neutron Tutorial by mestery
OpenStack Neutron TutorialOpenStack Neutron Tutorial
OpenStack Neutron Tutorial
mestery19K views

Similar to FreeBSD VPC Introduction

VIO on Cisco UCS and Network by
VIO on Cisco UCS and NetworkVIO on Cisco UCS and Network
VIO on Cisco UCS and NetworkYousef Morcos
160 views22 slides
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin... by
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...Jim St. Leger
3.8K views22 slides
Scaling the Container Dataplane by
Scaling the Container Dataplane Scaling the Container Dataplane
Scaling the Container Dataplane Michelle Holley
1.7K views21 slides
20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLAN by
20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLAN20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLAN
20 - IDNOG03 - Franki Lim (ARISTA) - Overlay Networking with VXLANIndonesia Network Operators Group
1.1K views32 slides
Module_2_Slides.pdf by
Module_2_Slides.pdfModule_2_Slides.pdf
Module_2_Slides.pdfgoldfer1
32 views26 slides
CloudStack and SDN by
CloudStack and SDNCloudStack and SDN
CloudStack and SDNSebastien Goasguen
1.8K views36 slides

Similar to FreeBSD VPC Introduction(20)

VIO on Cisco UCS and Network by Yousef Morcos
VIO on Cisco UCS and NetworkVIO on Cisco UCS and Network
VIO on Cisco UCS and Network
Yousef Morcos160 views
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin... by Jim St. Leger
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
Jim St. Leger3.8K views
Scaling the Container Dataplane by Michelle Holley
Scaling the Container Dataplane Scaling the Container Dataplane
Scaling the Container Dataplane
Michelle Holley1.7K views
Module_2_Slides.pdf by goldfer1
Module_2_Slides.pdfModule_2_Slides.pdf
Module_2_Slides.pdf
goldfer132 views
Openstack Networking Internals - first part by lilliput12
Openstack Networking Internals - first partOpenstack Networking Internals - first part
Openstack Networking Internals - first part
lilliput122.5K views
OpenVirtex (OVX) Tutorial by 동호 손
OpenVirtex (OVX) TutorialOpenVirtex (OVX) Tutorial
OpenVirtex (OVX) Tutorial
동호 손1.2K views
Implementing SR-IOv failover for Windows guests during live migration by Yan Vugenfirer
Implementing SR-IOv failover for Windows guests during live migrationImplementing SR-IOv failover for Windows guests during live migration
Implementing SR-IOv failover for Windows guests during live migration
Yan Vugenfirer296 views
VMworld 2013: vSphere Networking and vCloud Networking Suite Best Practices a... by VMworld
VMworld 2013: vSphere Networking and vCloud Networking Suite Best Practices a...VMworld 2013: vSphere Networking and vCloud Networking Suite Best Practices a...
VMworld 2013: vSphere Networking and vCloud Networking Suite Best Practices a...
VMworld2.5K views
PLNOG15: Is there something less complicated than connecting two LAN networks... by PROIDEA
PLNOG15: Is there something less complicated than connecting two LAN networks...PLNOG15: Is there something less complicated than connecting two LAN networks...
PLNOG15: Is there something less complicated than connecting two LAN networks...
PROIDEA243 views
OpenStack and OpenContrail for FreeBSD platform by Michał Dubiel by eurobsdcon
OpenStack and OpenContrail for FreeBSD platform by Michał DubielOpenStack and OpenContrail for FreeBSD platform by Michał Dubiel
OpenStack and OpenContrail for FreeBSD platform by Michał Dubiel
eurobsdcon2.8K views
Virtualization 101 - DeepDive by Amit Agarwal
Virtualization 101 - DeepDiveVirtualization 101 - DeepDive
Virtualization 101 - DeepDive
Amit Agarwal419 views
NET4933_vDS_Best_Practices_For_NSX_Francois_Tallet_Shahzad_Ali by shezy22
NET4933_vDS_Best_Practices_For_NSX_Francois_Tallet_Shahzad_AliNET4933_vDS_Best_Practices_For_NSX_Francois_Tallet_Shahzad_Ali
NET4933_vDS_Best_Practices_For_NSX_Francois_Tallet_Shahzad_Ali
shezy22287 views
Unikernels: Rise of the Library Hypervisor by Anil Madhavapeddy
Unikernels: Rise of the Library HypervisorUnikernels: Rise of the Library Hypervisor
Unikernels: Rise of the Library Hypervisor
Anil Madhavapeddy10.5K views
Networking 101 AWS - VPCs, Subnets, NAT Gateways, etc by Stenio Ferreira
Networking 101 AWS - VPCs, Subnets, NAT Gateways, etcNetworking 101 AWS - VPCs, Subnets, NAT Gateways, etc
Networking 101 AWS - VPCs, Subnets, NAT Gateways, etc
Stenio Ferreira487 views
Advanced Docker Developer Workflows on MacOS X and Windows by Anil Madhavapeddy
Advanced Docker Developer Workflows on MacOS X and WindowsAdvanced Docker Developer Workflows on MacOS X and Windows
Advanced Docker Developer Workflows on MacOS X and Windows
Anil Madhavapeddy70.3K views
OSCON: Advanced Docker developer workflows on Mac OS and Windows by Docker, Inc.
OSCON: Advanced Docker developer workflows on Mac OS and WindowsOSCON: Advanced Docker developer workflows on Mac OS and Windows
OSCON: Advanced Docker developer workflows on Mac OS and Windows
Docker, Inc.3.3K views

More from Sean Chittenden

BSDCan '19 Core Update by
BSDCan '19 Core UpdateBSDCan '19 Core Update
BSDCan '19 Core UpdateSean Chittenden
385 views51 slides
pg_prefaulter: Scaling WAL Performance by
pg_prefaulter: Scaling WAL Performancepg_prefaulter: Scaling WAL Performance
pg_prefaulter: Scaling WAL PerformanceSean Chittenden
1.1K views52 slides
Universal Userland by
Universal UserlandUniversal Userland
Universal UserlandSean Chittenden
1K views55 slides
Life Cycle of Metrics, Alerting, and Performance Monitoring in Microservices by
Life Cycle of Metrics, Alerting, and Performance Monitoring in MicroservicesLife Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Life Cycle of Metrics, Alerting, and Performance Monitoring in MicroservicesSean Chittenden
737 views109 slides
Codified PostgreSQL Schema by
Codified PostgreSQL SchemaCodified PostgreSQL Schema
Codified PostgreSQL SchemaSean Chittenden
1.9K views97 slides
PostgreSQL + ZFS best practices by
PostgreSQL + ZFS best practicesPostgreSQL + ZFS best practices
PostgreSQL + ZFS best practicesSean Chittenden
17.4K views110 slides

More from Sean Chittenden(14)

pg_prefaulter: Scaling WAL Performance by Sean Chittenden
pg_prefaulter: Scaling WAL Performancepg_prefaulter: Scaling WAL Performance
pg_prefaulter: Scaling WAL Performance
Sean Chittenden1.1K views
Life Cycle of Metrics, Alerting, and Performance Monitoring in Microservices by Sean Chittenden
Life Cycle of Metrics, Alerting, and Performance Monitoring in MicroservicesLife Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Life Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Sean Chittenden737 views
PostgreSQL + ZFS best practices by Sean Chittenden
PostgreSQL + ZFS best practicesPostgreSQL + ZFS best practices
PostgreSQL + ZFS best practices
Sean Chittenden17.4K views
Incrementalism: An Industrial Strategy For Adopting Modern Automation by Sean Chittenden
Incrementalism: An Industrial Strategy For Adopting Modern AutomationIncrementalism: An Industrial Strategy For Adopting Modern Automation
Incrementalism: An Industrial Strategy For Adopting Modern Automation
Sean Chittenden652 views
Production Readiness Strategies in an Automated World by Sean Chittenden
Production Readiness Strategies in an Automated WorldProduction Readiness Strategies in an Automated World
Production Readiness Strategies in an Automated World
Sean Chittenden862 views
PostgreSQL on ZFS Lightning Talk by Sean Chittenden
PostgreSQL on ZFS Lightning TalkPostgreSQL on ZFS Lightning Talk
PostgreSQL on ZFS Lightning Talk
Sean Chittenden816 views
Dynamic Database Credentials: Security Contingency Planning by Sean Chittenden
Dynamic Database Credentials: Security Contingency PlanningDynamic Database Credentials: Security Contingency Planning
Dynamic Database Credentials: Security Contingency Planning
Sean Chittenden1.4K views
PostgreSQL High-Availability and Geographic Locality using consul by Sean Chittenden
PostgreSQL High-Availability and Geographic Locality using consulPostgreSQL High-Availability and Geographic Locality using consul
PostgreSQL High-Availability and Geographic Locality using consul
Sean Chittenden4.7K views
Modern tooling to assist with developing applications on FreeBSD by Sean Chittenden
Modern tooling to assist with developing applications on FreeBSDModern tooling to assist with developing applications on FreeBSD
Modern tooling to assist with developing applications on FreeBSD
Sean Chittenden861 views
Creating PostgreSQL-as-a-Service at Scale by Sean Chittenden
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
Sean Chittenden1.2K views

Recently uploaded

Automated Testing of Microsoft Power BI Reports by
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsRTTS
11 views20 slides
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P... by
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...NimaTorabi2
17 views17 slides
Page Object Model by
Page Object ModelPage Object Model
Page Object Modelartembondar5
7 views5 slides
FOSSLight Community Day 2023-11-30 by
FOSSLight Community Day 2023-11-30FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30Shane Coughlan
8 views18 slides
Top-5-production-devconMunich-2023.pptx by
Top-5-production-devconMunich-2023.pptxTop-5-production-devconMunich-2023.pptx
Top-5-production-devconMunich-2023.pptxTier1 app
10 views40 slides
Winter Projects GDSC IITK by
Winter Projects GDSC IITKWinter Projects GDSC IITK
Winter Projects GDSC IITKSahilSingh368445
416 views60 slides

Recently uploaded(20)

Automated Testing of Microsoft Power BI Reports by RTTS
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
RTTS11 views
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P... by NimaTorabi2
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
NimaTorabi217 views
FOSSLight Community Day 2023-11-30 by Shane Coughlan
FOSSLight Community Day 2023-11-30FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30
Shane Coughlan8 views
Top-5-production-devconMunich-2023.pptx by Tier1 app
Top-5-production-devconMunich-2023.pptxTop-5-production-devconMunich-2023.pptx
Top-5-production-devconMunich-2023.pptx
Tier1 app10 views
Electronic AWB - Electronic Air Waybill by Freightoscope
Electronic AWB - Electronic Air Waybill Electronic AWB - Electronic Air Waybill
Electronic AWB - Electronic Air Waybill
Freightoscope 6 views
tecnologia18.docx by nosi6702
tecnologia18.docxtecnologia18.docx
tecnologia18.docx
nosi67026 views
How Workforce Management Software Empowers SMEs | TraQSuite by TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuiteHow Workforce Management Software Empowers SMEs | TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuite
TraQSuite7 views
aATP - New Correlation Confirmation Feature.pptx by EsatEsenek1
aATP - New Correlation Confirmation Feature.pptxaATP - New Correlation Confirmation Feature.pptx
aATP - New Correlation Confirmation Feature.pptx
EsatEsenek1222 views
How to build dyanmic dashboards and ensure they always work by Wiiisdom
How to build dyanmic dashboards and ensure they always workHow to build dyanmic dashboards and ensure they always work
How to build dyanmic dashboards and ensure they always work
Wiiisdom16 views
Top-5-production-devconMunich-2023-v2.pptx by Tier1 app
Top-5-production-devconMunich-2023-v2.pptxTop-5-production-devconMunich-2023-v2.pptx
Top-5-production-devconMunich-2023-v2.pptx
Tier1 app9 views
Understanding HTML terminology by artembondar5
Understanding HTML terminologyUnderstanding HTML terminology
Understanding HTML terminology
artembondar58 views
predicting-m3-devopsconMunich-2023-v2.pptx by Tier1 app
predicting-m3-devopsconMunich-2023-v2.pptxpredicting-m3-devopsconMunich-2023-v2.pptx
predicting-m3-devopsconMunich-2023-v2.pptx
Tier1 app14 views
Supercharging your Python Development Environment with VS Code and Dev Contai... by Dawn Wages
Supercharging your Python Development Environment with VS Code and Dev Contai...Supercharging your Python Development Environment with VS Code and Dev Contai...
Supercharging your Python Development Environment with VS Code and Dev Contai...
Dawn Wages5 views

FreeBSD VPC Introduction

  • 3. Compute Isolation Status • bhyve(4) is a stable, performant hypervisor • ZFS zvol passthrough to guests works well
 TIP: use multiple zvols per guest and stripe IO across devices using lvcreate(8) • Good CPU isolation - CPU pinned for low-density workloads • Perfect memory isolation (guest memory is wired)
  • 5. Network Isolation Status • Network isolation is not core to bhyve(4) today • Use of VNET(9) for manipulating FIBS for tap(4) interfaces is possible, but limited and not performant
  • 6. Guest Workloads em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B bridge0 tap51 tap52
  • 7. Incomplete Solution • bhyve(4) guests run customer workloads • Cloud providers need a single FIB for the underlay network • Guests run in isolated overlay networks • How do you map guests to their respective overlay network?
  • 8. Multi-Host Network Isolation em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B ???
  • 9. Possible Solution • Plug a tap(4) device into a guest • Plug tap(4) device into a bridge(4) • Plug the physical or cloned interface into the underlay bridge(4)
  • 10. if_bridge(4) Approach em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B tap51 bridge0 tap50 tap52 bridge2bridge1
  • 11. Possible Solution++ • Plug a tap(4) device into a guest • Plug tap(4) device into a bridge(4) • Plug a vxlan(4) interface into a per-subnet bridge(4) • Plug the vxlan(4) into an underlay bridge(4) instance • Plug the physical or cloned interface into the underlay bridge(4)
  • 12. Fuster Cluck Isolation em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B bridge1 tap51 tap52tap50 bridge0 vxlan1vxlan0 bridge2
  • 13. Sad Performance • Performance was "uninteresting" • 1-2Gbps?
  • 14. Problems with
 tap(4)/bridge(4)/vxlan(4)/VNET(9) • tap(4) is slow • bridge(4) is slow • vxlan(4) sends received packets through ip_input() twice (i.e. "sub-optimal") • VNET(9) virtualizes underlay networks, not overlay networks • How do you ARP between VMs in the overlay network? • How do you perform vxlan(4) encap?
  • 15. uvxbridge(8) POC uvxbridge(8) POC provides a high PPS interface and guest VXLAN encapsulation using netmap(9)/ ptnetmap(4):
 
 https://github.com/joyent/uvxbridge
  • 16. Guest Workloads em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B tap51 tap52tap50 uvxbridge
  • 17. Reasonable Performance • 21Gbps guest-to-guest on the same host • 15Gbps guest-to-guest across the wire • Supports AES PSK encryption for cross-DC traffic • Incomplete, but a successful POC
  • 18. New Approach • Build a network isolation subsystem aligned with the capabilities of network hardware • Reduce the number of times a packet is copied • Reduce context switches • Build a bottom-up abstraction rooted in the capabilities of hardware, not a hardware implementation defined in terms of an administrative policy
  • 19. FreeBSD/VPC em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vmnic1 ethlink0 vmnic0 vmnic2 vpcsw1vpcsw0
  • 20. FreeBSD/VPC em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vmnic1 ethlink0 vmnic0 vmnic2 vpcsw1 vpcsw0 vpcp0 vpcp1 vpcp3 vpcp4 vpcp5
  • 21. FreeBSD/VPC Multi-Host em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 ???
  • 22. VXLAN to the Rescue • Encapsulates all IP packets as UDP • Adds a preamble to IP packet • Tags packets and with a VXLAN ID, known as a VNI • VXLAN is similar to VLAN tagging, but embeds tagging in the IP header, not in the L2 frame
  • 23. VXLAN Encapsulated Ethernet Packet Physical Ethernet Frame 1500B OuterFrameChecksum OuterEthernetHeader OuterIPHeader OuterUDPHeader VXLANHeader InnerSourceMAC(SMAC) InnerDestMAC(DMAC) 802.1QHeader Payload EtherType
  • 24. FreeBSD/VPC Multi-Host em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 VNI 123 VNI 987 VNI 123 VNI 987 VXLAN Packets em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2
  • 25. vpc(4) Interfaces • vpcsw(4) - switches packets - one switch per customer per host, multiple subnets supported in the same switch • vmnic(4) - dedicated guest NIC, looks like a virtio network device to guests • vpcp(4) - plugs vmnic(4) ports into vpcsw(4) switches • vpci(4) - Non-bhyve(4) interface, usable in jails(2) • ethlink(4) - Performs unencapsulated packet forwarding, wraps a cloned or physical ethernet interface • vpclink(4) - Performs VXLAN encapsulation
  • 27. ioctl(2) • Initially used ioctl(2) • Unable to completely secure via capsicum(4) - able to specify flags, but not scope the target device • ioctl(2) is the kernel
 equivalent of an HTTP PUT,
 the interface for everything
 without a design
 
 "Hi ifconfig(8), I'm
 looking at you."
  • 28. One Device Per Object? • New device in /dev for every new interface: /dev/vpcsw0
 /dev/vpcsw1
 /dev/vpcp0 • Query via reads to individual devices in/dev • Control devices via writes • Security via devd(8)? • Using VFS ACLs to secure network primitives?
 

  • 29. One Device Per Object? • New device in /dev for every new interface: /dev/vpcsw0
 /dev/vpcsw1
 /dev/vpcp0 • Query via reads to individual devices in/dev • Control devices via writes • Security via devd(8)? • Using VFS ACLs to secure network primitives? Da Vinci mashed up with ...

  • 30. One Device Per Object? • New device in /dev for every new interface: /dev/vpcsw0
 /dev/vpcsw1
 /dev/vpcp0 • Query via reads to individual devices in/dev • Control devices via writes • Security via devd(8)? • Using VFS ACLs to secure network primitives? Da Vinci mashed up with Jackson Pollock.

  • 31. One Device Per Object? • New device in /dev for every new interface: /dev/vpcsw0
 /dev/vpcsw1
 /dev/vpcp0 • Query via reads to individual devices in/dev • Control devices via writes • Security via devd(8)? • Using VFS ACLs to secure network primitives? Da Vinci mashed up with Jackson Pollock.
 Kernel API design hate crime.
  • 32. New System Calls • vpc_open(2) - Creates a new VPC descriptor • vpc_ctl(2) - Manipulates VPC descriptors • Capsicum-like, intended for privilege separation • Intended for idempotent tooling • Makes aggressive use of UUIDs as operator handles to be compatible with Triton
  • 33. vpc_open(2) int
 vpc_open(const vpc_id_t *vpc_id,
 vpc_type_t obj_type,
 vpc_flags_t flags); • Creates a new "VPC descriptor" • Similar to open(2) • Manipulate descriptor via vcp_ctl(2) • Responds to close(2) • Priv-sep native security semantics • Version aware • System Call 580
  • 34. vpc_id_t int
 vpc_open(const vpc_id_t *vpc_id,
 vpc_handle_type_t obj_type,
 vpc_flags_t flags); 16 bytes, UUID-like:
 
 type ID struct {
 TimeLow uint32
 TimeMid uint16
 TimeHi uint16
 ClockSeqHi uint8
 ObjType ObjType
 Node [6]byte // Default MAC address vpc(4)
 }
  • 35. vpc_id_t int
 vpc_open(const vpc_id_t *vpc_id,
 vpc_handle_type_t obj_type,
 vpc_flags_t flags); • All VPC objects have a MAC address • Reused the node component of the vpc_id to set the MAC address
  • 36. vpc_handle_type_t int
 vpc_open(const vpc_id_t *vpc_id,
 vpc_type_t obj_type,
 vpc_flags_t flags); typedef struct {
 uint64_t vht_version:4;
 uint64_t vht_pad1:4;
 uint64_t vht_obj_type:8;
 uint64_t vht_pad2:48;
 } vpc_handle_type_t;
  • 37. vpc_obj_type int
 vpc_open(const vpc_id_t *vpc_id,
 vpc_type_t obj_type,
 vpc_flags_t flags); enum vpc_obj_type {
 VPC_OBJ_INVALID = 0,
 VPC_OBJ_SWITCH = 1,
 VPC_OBJ_PORT = 2,
 VPC_OBJ_ROUTER = 3,
 VPC_OBJ_NAT = 4,
 VPC_OBJ_VPCLINK = 5,
 VPC_OBJ_VMNIC = 6,
 VPC_OBJ_MGMT = 7,
 VPC_OBJ_ETHLINK = 8,
 VPC_OBJ_META = 9,
 VPC_OBJ_TYPE_ANY = 10,
 VPC_OBJ_TYPE_MAX = 10,
 };
  • 38. vpc_id_t int
 vpc_open(const vpc_id_t *vpc_id,
 vpc_handle_type_t obj_type,
 vpc_flags_t flags); • 16 Versions • 255 Object Types • 4080 Object Types ought to be enough for anybody.
  • 39. vpc_flags_t int
 vpc_open(const vpc_id_t *vpc_id,
 vpc_type_t obj_type,
 vpc_flags_t flags); #define VPC_F_CREATE (1ULL << 0)
 #define VPC_F_OPEN (1ULL << 1)
 #define VPC_F_READ (1ULL << 2)
 #define VPC_F_WRITE (1ULL << 3) NOTE: going to revisit flags to enable a bit field for capabilities per VPC Object Type
  • 40. vpc_ctl(2) int
 vpc_ctl(int vpcd, vpc_op_t op,
 size_t keylen, const void *key,
 size_t *vallen, void *buf); • Only interface for manipulating a VPC object • Available operations different per VPC Object • Inspired by ioctl(2), capsicum(4), and HTTP • System Call 581
  • 41. vpc_ctl(2) enum vpc_vpcsw_op_type { VPC_VPCSW_INVALID = 0, VPC_VPCSW_PORT_ADD = 1, VPC_VPCSW_PORT_DEL = 2, VPC_VPCSW_PORT_UPLINK_SET = 3, VPC_VPCSW_PORT_UPLINK_GET = 4, VPC_VPCSW_STATE_GET = 5, VPC_VPCSW_STATE_SET = 6, VPC_VPCSW_RESET = 7, VPC_VPCSW_RESPONSE_NDV4 = 8, VPC_VPCSW_RESPONSE_NDV6 = 9, VPC_VPCSW_RESPONSE_DHCPV4 = 10, VPC_VPCSW_RESPONSE_DHCPV6 = 11, VPC_VPCSW_OP_TYPE_MAX = 11, };
  • 42. vpc_ctl(2) enum vpc_vpcp_op_type { VPC_VPCP_INVALID = 0, VPC_VPCP_CONNECT = 1, VPC_VPCP_DISCONNECT = 2, VPC_VPCP_VNI_GET = 3, VPC_VPCP_VNI_SET = 4, VPC_VPCP_VTAG_GET = 5, VPC_VPCP_VTAG_SET = 6, VPC_VPCP_UNUSED7 = 7, VPC_VPCP_UNUSED8 = 8, VPC_VPCP_PEER_ID_GET = 9, VPC_VPCP_MAX = 9, };
  • 43. vpc_ctl(2) enum vpc_vmnic_op_type { VPC_VMNIC_INVALID = 0, VPC_VMNIC_NQUEUES_GET = 1, VPC_VMNIC_NQUEUES_SET = 2, VPC_VMNIC_UNUSED3 = 3, VPC_VMNIC_UNUSED4 = 4, VPC_VMNIC_UNUSED5 = 5, VPC_VMNIC_UNUSED6 = 6, VPC_VMNIC_ATTACH = 7, VPC_VMNIC_MSIX = 8, VPC_VMNIC_FREEZE = 9, VPC_VMNIC_UNFREEZE = 10, VPC_VMNIC_OP_TYPE_MAX = 10, };
  • 44. What's it all mean? • Network interfaces are first-class objects • Network interfaces are pluggable • Configurations can be arbitrarily complex • All vpc(4) interfaces are iflib(9) and can be tcpdump(8)'ed • Performance is nice
  • 46. Tooling • Able to be cross-compiled • Developer-friendly • Stable ABI • Idempotent Operations • No a priori knowledge of the target OS required (i.e. no headers required to cross-compile tooling)
  • 47. Tooling • Able to be cross-compiled • Developer-friendly • Stable ABI • Idempotent Operations • No a priori knowledge of the target OS required (i.e. no headers required to cross-compile tooling)
  • 49. ELI5: vpc(4) edition em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 VNI 123 VNI 987 VNI 123 VNI 987 VXLAN Packets em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2
  • 50. ELI5: vpc(4) Assumptions • Guest is running Ubuntu or CentOS Linux • Multi-Queue TX/RX in Host • Multiple network queues available in the guest • Underlay hosts can pass traffic • Overlay hosts are in the same subnet
  • 51. ELI5: vpc(4) edition em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vmnic1 ethlink0 vmnic0 vmnic2 vpcsw1 vpcsw0 vpcp0 vpcp1 vpcp3 vpcp4 vpcp5
  • 52. ELI5: vpc(4) edition em0 Guest 3 Customer B Guest 2 Customer B vmnic1 ethlink0 vmnic2 vpcsw1 vpcp3 vpcp4 vpcp5
  • 53. ELI5: vpcsw(4) Setup VNI=123
 VLAN=456
 VPCSW0_ID=da64c3f3-095d-91e5-df01-5aabcfc52468 vpc switch create 
 --switch-id=${VPCSW0_ID} 
 --vni=${VNI} 
 --vlan=${VLAN}
  • 54. ELI5: vpc(4) edition em0 Guest 3 Customer B Guest 2 Customer B vpcsw1
  • 55. ELI5: vmnic(4) Setup VMNIC0_ID=07f95a11-6788-2ae7-c306-ba95cff1db38
 VPCP0_ID=fd436f9c-1f77-11e8-8002-0cc47a6c7d1e
 
 vpc vmnic create 
 --vmnic-id=${VMNIC0_ID}
 
 vpc switch port add 
 --switch-id=${VPCSW0_ID} 
 --port-id=${VPCP0_ID}
 
 vpc switch port connect 
 --port-id=${VPCP0_ID} 
 --interface-id=${VMNIC0_ID}
  • 56. ELI5: vpc(4) edition em0 Guest 3 Customer B Guest 2 Customer B vmnic1 vpcsw1 vpcp3
  • 57. ELI5: vmnic(4) Setup VMNIC1_ID=a774ba3a-1f77-11e8-8006-0cc47a6c7d1e
 VPCP1_ID=0ebf50e1-1f79-11e8-8002-0cc47a6c7d1e
 
 vpc vmnic create 
 --vmnic-id=${VMNIC1_ID}
 
 vpc switch port add 
 --switch-id=${VPCSW0_ID} 
 --port-id=${VPCP1_ID}
 
 vpc switch port connect 
 --port-id=${VPCP1_ID} 
 --interface-id=${VMNIC1_ID}
  • 58. ELI5: vpc(4) edition em0 Guest 3 Customer B Guest 2 Customer B vmnic1 vmnic2 vpcsw1 vpcp3 vpcp4
  • 59. ELI5: ethlink(4) Setup ETHLINK0_ID=5c4acd32-1b8d-11e8-b408-0cc47a6c7d1e
 UPLINK_PORT_ID=ea58b648-203b-a707-cdf6-7a552c8d5295
 UPLINK_IF=em0
 
 vpc switch port add 
 --switch-id=${VPCSW0_ID} 
 --uplink 
 --port-id=${UPLINK_PORT_ID} 
 --l2-name=${UPLINK_IF} 
 --ethlink-id=${ETHLINK0_ID}
  • 60. ELI5: vpc(4) edition em0 Guest 3 Customer B Guest 2 Customer B vmnic1 ethlink0 vmnic2 vpcsw1 vpcp3 vpcp4 vpcp5
  • 61. ELI5: vpc(4) + VXLAN
  • 62. ELI5: vpc(4) edition em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 VNI 123 VLAN 456 VNI 987 VNI 123 VTAG 456 VNI 987 VXLAN Packets em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 10.65.5.161 10.65.5.162
  • 63. ELI5: vpclink(4) Setup • How do you do ARP or IPv6 Neighbor Discovery? • Broadcast? • Multicast?
  • 64. ELI5: vpc(4) edition em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 VNI 123 VLAN 456 VNI 987 VNI 123 VTAG 456 VNI 987 VXLAN Packets em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 10.65.5.161 10.65.5.162 ???
  • 65. VTEP: VXLAN Tunnel End Point • vpcsw(4) traps broadcast packets • If the packet is: • a broadcast packet and • either an IPv4 ARP or IPv6 ND packet
 Packet is added to a knote (part of kqueue(2)) and passed to all kqueue(2) subscribers filtering for EVFILT_VPC filters • Packet payload is stored in user-allocated buffer pointed to by ext[0] • Userland utility parses packet and performs lookup • Overlay and underlay forwarding information written back to the kernel via vpc_ctl(2) • vpcsw(4) caches forwarding information for the src/dst MAC tuple
  • 66. Ongoing Work • Firewalling - integrated at vpcp(4) • Routing • NAT • Userland Control Plane (including setup and teardown of bhyve(4) guests via something not a shell script)
  • 67. vpc(8) % doas vpc vm create % doas vpc vm run • bhyve(4) VM creation and launch tool • Unifies network isolation with compute isolation
  • 68. Desktop Software + vpc(4) • vpc(4) + NAT == Desktop Software Hotness • Ability to sandbox VMs with no prior knowledge of the host's networking • Uses:
 % vagrant up % packer build No more dependency on pf(4) or dnsmasq(8)
  • 69. Code • Kernel:
 https://github.com/joyent/freebsd/tree/projects/VPC • Kernel Libraries:
 https://github.com/joyent/freebsd/tree/projects/VPC/ libexec/go/src/go.freebsd.org/sys/vpc • Userland tooling:
 https://github.com/sean-/vpc
 
 Future home:
 
 https://github.com/joyent-/vpc