Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

FreeBSD VPC Introduction

901 views

Published on

An introduction to FreeBSD/VPC.

Published in: Software
  • Be the first to comment

FreeBSD VPC Introduction

  1. 1. FreeBSD/VPC Introduction to Virtual Private Cloud
  2. 2. Virtualization Status
  3. 3. Compute Isolation Status • bhyve(4) is a stable, performant hypervisor • ZFS zvol passthrough to guests works well
 TIP: use multiple zvols per guest and stripe IO across devices using lvcreate(8) • Good CPU isolation - CPU pinned for low-density workloads • Perfect memory isolation (guest memory is wired)
  4. 4. Compute Isolation em0 Guest 1 Customer A Guest 2 Customer B
  5. 5. Network Isolation Status • Network isolation is not core to bhyve(4) today • Use of VNET(9) for manipulating FIBS for tap(4) interfaces is possible, but limited and not performant
  6. 6. Guest Workloads em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B bridge0 tap51 tap52
  7. 7. Incomplete Solution • bhyve(4) guests run customer workloads • Cloud providers need a single FIB for the underlay network • Guests run in isolated overlay networks • How do you map guests to their respective overlay network?
  8. 8. Multi-Host Network Isolation em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B ???
  9. 9. Possible Solution • Plug a tap(4) device into a guest • Plug tap(4) device into a bridge(4) • Plug the physical or cloned interface into the underlay bridge(4)
  10. 10. if_bridge(4) Approach em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B tap51 bridge0 tap50 tap52 bridge2bridge1
  11. 11. Possible Solution++ • Plug a tap(4) device into a guest • Plug tap(4) device into a bridge(4) • Plug a vxlan(4) interface into a per-subnet bridge(4) • Plug the vxlan(4) into an underlay bridge(4) instance • Plug the physical or cloned interface into the underlay bridge(4)
  12. 12. Fuster Cluck Isolation em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B bridge1 tap51 tap52tap50 bridge0 vxlan1vxlan0 bridge2
  13. 13. Sad Performance • Performance was "uninteresting" • 1-2Gbps?
  14. 14. Problems with
 tap(4)/bridge(4)/vxlan(4)/VNET(9) • tap(4) is slow • bridge(4) is slow • vxlan(4) sends received packets through ip_input() twice (i.e. "sub-optimal") • VNET(9) virtualizes underlay networks, not overlay networks • How do you ARP between VMs in the overlay network? • How do you perform vxlan(4) encap?
  15. 15. uvxbridge(8) POC uvxbridge(8) POC provides a high PPS interface and guest VXLAN encapsulation using netmap(9)/ ptnetmap(4):
 
 https://github.com/joyent/uvxbridge
  16. 16. Guest Workloads em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B tap51 tap52tap50 uvxbridge
  17. 17. Reasonable Performance • 21Gbps guest-to-guest on the same host • 15Gbps guest-to-guest across the wire • Supports AES PSK encryption for cross-DC traffic • Incomplete, but a successful POC
  18. 18. New Approach • Build a network isolation subsystem aligned with the capabilities of network hardware • Reduce the number of times a packet is copied • Reduce context switches • Build a bottom-up abstraction rooted in the capabilities of hardware, not a hardware implementation defined in terms of an administrative policy
  19. 19. FreeBSD/VPC em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vmnic1 ethlink0 vmnic0 vmnic2 vpcsw1vpcsw0
  20. 20. FreeBSD/VPC em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vmnic1 ethlink0 vmnic0 vmnic2 vpcsw1 vpcsw0 vpcp0 vpcp1 vpcp3 vpcp4 vpcp5
  21. 21. FreeBSD/VPC Multi-Host em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 ???
  22. 22. VXLAN to the Rescue • Encapsulates all IP packets as UDP • Adds a preamble to IP packet • Tags packets and with a VXLAN ID, known as a VNI • VXLAN is similar to VLAN tagging, but embeds tagging in the IP header, not in the L2 frame
  23. 23. VXLAN Encapsulated Ethernet Packet Physical Ethernet Frame 1500B OuterFrameChecksum OuterEthernetHeader OuterIPHeader OuterUDPHeader VXLANHeader InnerSourceMAC(SMAC) InnerDestMAC(DMAC) 802.1QHeader Payload EtherType
  24. 24. FreeBSD/VPC Multi-Host em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 VNI 123 VNI 987 VNI 123 VNI 987 VXLAN Packets em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2
  25. 25. vpc(4) Interfaces • vpcsw(4) - switches packets - one switch per customer per host, multiple subnets supported in the same switch • vmnic(4) - dedicated guest NIC, looks like a virtio network device to guests • vpcp(4) - plugs vmnic(4) ports into vpcsw(4) switches • vpci(4) - Non-bhyve(4) interface, usable in jails(2) • ethlink(4) - Performs unencapsulated packet forwarding, wraps a cloned or physical ethernet interface • vpclink(4) - Performs VXLAN encapsulation
  26. 26. Interface Design
  27. 27. ioctl(2) • Initially used ioctl(2) • Unable to completely secure via capsicum(4) - able to specify flags, but not scope the target device • ioctl(2) is the kernel
 equivalent of an HTTP PUT,
 the interface for everything
 without a design
 
 "Hi ifconfig(8), I'm
 looking at you."
  28. 28. One Device Per Object? • New device in /dev for every new interface: /dev/vpcsw0
 /dev/vpcsw1
 /dev/vpcp0 • Query via reads to individual devices in/dev • Control devices via writes • Security via devd(8)? • Using VFS ACLs to secure network primitives?
 

  29. 29. One Device Per Object? • New device in /dev for every new interface: /dev/vpcsw0
 /dev/vpcsw1
 /dev/vpcp0 • Query via reads to individual devices in/dev • Control devices via writes • Security via devd(8)? • Using VFS ACLs to secure network primitives? Da Vinci mashed up with ...

  30. 30. One Device Per Object? • New device in /dev for every new interface: /dev/vpcsw0
 /dev/vpcsw1
 /dev/vpcp0 • Query via reads to individual devices in/dev • Control devices via writes • Security via devd(8)? • Using VFS ACLs to secure network primitives? Da Vinci mashed up with Jackson Pollock.

  31. 31. One Device Per Object? • New device in /dev for every new interface: /dev/vpcsw0
 /dev/vpcsw1
 /dev/vpcp0 • Query via reads to individual devices in/dev • Control devices via writes • Security via devd(8)? • Using VFS ACLs to secure network primitives? Da Vinci mashed up with Jackson Pollock.
 Kernel API design hate crime.
  32. 32. New System Calls • vpc_open(2) - Creates a new VPC descriptor • vpc_ctl(2) - Manipulates VPC descriptors • Capsicum-like, intended for privilege separation • Intended for idempotent tooling • Makes aggressive use of UUIDs as operator handles to be compatible with Triton
  33. 33. vpc_open(2) int
 vpc_open(const vpc_id_t *vpc_id,
 vpc_type_t obj_type,
 vpc_flags_t flags); • Creates a new "VPC descriptor" • Similar to open(2) • Manipulate descriptor via vcp_ctl(2) • Responds to close(2) • Priv-sep native security semantics • Version aware • System Call 580
  34. 34. vpc_id_t int
 vpc_open(const vpc_id_t *vpc_id,
 vpc_handle_type_t obj_type,
 vpc_flags_t flags); 16 bytes, UUID-like:
 
 type ID struct {
 TimeLow uint32
 TimeMid uint16
 TimeHi uint16
 ClockSeqHi uint8
 ObjType ObjType
 Node [6]byte // Default MAC address vpc(4)
 }
  35. 35. vpc_id_t int
 vpc_open(const vpc_id_t *vpc_id,
 vpc_handle_type_t obj_type,
 vpc_flags_t flags); • All VPC objects have a MAC address • Reused the node component of the vpc_id to set the MAC address
  36. 36. vpc_handle_type_t int
 vpc_open(const vpc_id_t *vpc_id,
 vpc_type_t obj_type,
 vpc_flags_t flags); typedef struct {
 uint64_t vht_version:4;
 uint64_t vht_pad1:4;
 uint64_t vht_obj_type:8;
 uint64_t vht_pad2:48;
 } vpc_handle_type_t;
  37. 37. vpc_obj_type int
 vpc_open(const vpc_id_t *vpc_id,
 vpc_type_t obj_type,
 vpc_flags_t flags); enum vpc_obj_type {
 VPC_OBJ_INVALID = 0,
 VPC_OBJ_SWITCH = 1,
 VPC_OBJ_PORT = 2,
 VPC_OBJ_ROUTER = 3,
 VPC_OBJ_NAT = 4,
 VPC_OBJ_VPCLINK = 5,
 VPC_OBJ_VMNIC = 6,
 VPC_OBJ_MGMT = 7,
 VPC_OBJ_ETHLINK = 8,
 VPC_OBJ_META = 9,
 VPC_OBJ_TYPE_ANY = 10,
 VPC_OBJ_TYPE_MAX = 10,
 };
  38. 38. vpc_id_t int
 vpc_open(const vpc_id_t *vpc_id,
 vpc_handle_type_t obj_type,
 vpc_flags_t flags); • 16 Versions • 255 Object Types • 4080 Object Types ought to be enough for anybody.
  39. 39. vpc_flags_t int
 vpc_open(const vpc_id_t *vpc_id,
 vpc_type_t obj_type,
 vpc_flags_t flags); #define VPC_F_CREATE (1ULL << 0)
 #define VPC_F_OPEN (1ULL << 1)
 #define VPC_F_READ (1ULL << 2)
 #define VPC_F_WRITE (1ULL << 3) NOTE: going to revisit flags to enable a bit field for capabilities per VPC Object Type
  40. 40. vpc_ctl(2) int
 vpc_ctl(int vpcd, vpc_op_t op,
 size_t keylen, const void *key,
 size_t *vallen, void *buf); • Only interface for manipulating a VPC object • Available operations different per VPC Object • Inspired by ioctl(2), capsicum(4), and HTTP • System Call 581
  41. 41. vpc_ctl(2) enum vpc_vpcsw_op_type { VPC_VPCSW_INVALID = 0, VPC_VPCSW_PORT_ADD = 1, VPC_VPCSW_PORT_DEL = 2, VPC_VPCSW_PORT_UPLINK_SET = 3, VPC_VPCSW_PORT_UPLINK_GET = 4, VPC_VPCSW_STATE_GET = 5, VPC_VPCSW_STATE_SET = 6, VPC_VPCSW_RESET = 7, VPC_VPCSW_RESPONSE_NDV4 = 8, VPC_VPCSW_RESPONSE_NDV6 = 9, VPC_VPCSW_RESPONSE_DHCPV4 = 10, VPC_VPCSW_RESPONSE_DHCPV6 = 11, VPC_VPCSW_OP_TYPE_MAX = 11, };
  42. 42. vpc_ctl(2) enum vpc_vpcp_op_type { VPC_VPCP_INVALID = 0, VPC_VPCP_CONNECT = 1, VPC_VPCP_DISCONNECT = 2, VPC_VPCP_VNI_GET = 3, VPC_VPCP_VNI_SET = 4, VPC_VPCP_VTAG_GET = 5, VPC_VPCP_VTAG_SET = 6, VPC_VPCP_UNUSED7 = 7, VPC_VPCP_UNUSED8 = 8, VPC_VPCP_PEER_ID_GET = 9, VPC_VPCP_MAX = 9, };
  43. 43. vpc_ctl(2) enum vpc_vmnic_op_type { VPC_VMNIC_INVALID = 0, VPC_VMNIC_NQUEUES_GET = 1, VPC_VMNIC_NQUEUES_SET = 2, VPC_VMNIC_UNUSED3 = 3, VPC_VMNIC_UNUSED4 = 4, VPC_VMNIC_UNUSED5 = 5, VPC_VMNIC_UNUSED6 = 6, VPC_VMNIC_ATTACH = 7, VPC_VMNIC_MSIX = 8, VPC_VMNIC_FREEZE = 9, VPC_VMNIC_UNFREEZE = 10, VPC_VMNIC_OP_TYPE_MAX = 10, };
  44. 44. What's it all mean? • Network interfaces are first-class objects • Network interfaces are pluggable • Configurations can be arbitrarily complex • All vpc(4) interfaces are iflib(9) and can be tcpdump(8)'ed • Performance is nice
  45. 45. Tooling
  46. 46. Tooling • Able to be cross-compiled • Developer-friendly • Stable ABI • Idempotent Operations • No a priori knowledge of the target OS required (i.e. no headers required to cross-compile tooling)
  47. 47. Tooling • Able to be cross-compiled • Developer-friendly • Stable ABI • Idempotent Operations • No a priori knowledge of the target OS required (i.e. no headers required to cross-compile tooling)
  48. 48. ELI5: vpc(4) edition
  49. 49. ELI5: vpc(4) edition em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 VNI 123 VNI 987 VNI 123 VNI 987 VXLAN Packets em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2
  50. 50. ELI5: vpc(4) Assumptions • Guest is running Ubuntu or CentOS Linux • Multi-Queue TX/RX in Host • Multiple network queues available in the guest • Underlay hosts can pass traffic • Overlay hosts are in the same subnet
  51. 51. ELI5: vpc(4) edition em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vmnic1 ethlink0 vmnic0 vmnic2 vpcsw1 vpcsw0 vpcp0 vpcp1 vpcp3 vpcp4 vpcp5
  52. 52. ELI5: vpc(4) edition em0 Guest 3 Customer B Guest 2 Customer B vmnic1 ethlink0 vmnic2 vpcsw1 vpcp3 vpcp4 vpcp5
  53. 53. ELI5: vpcsw(4) Setup VNI=123
 VLAN=456
 VPCSW0_ID=da64c3f3-095d-91e5-df01-5aabcfc52468 vpc switch create 
 --switch-id=${VPCSW0_ID} 
 --vni=${VNI} 
 --vlan=${VLAN}
  54. 54. ELI5: vpc(4) edition em0 Guest 3 Customer B Guest 2 Customer B vpcsw1
  55. 55. ELI5: vmnic(4) Setup VMNIC0_ID=07f95a11-6788-2ae7-c306-ba95cff1db38
 VPCP0_ID=fd436f9c-1f77-11e8-8002-0cc47a6c7d1e
 
 vpc vmnic create 
 --vmnic-id=${VMNIC0_ID}
 
 vpc switch port add 
 --switch-id=${VPCSW0_ID} 
 --port-id=${VPCP0_ID}
 
 vpc switch port connect 
 --port-id=${VPCP0_ID} 
 --interface-id=${VMNIC0_ID}
  56. 56. ELI5: vpc(4) edition em0 Guest 3 Customer B Guest 2 Customer B vmnic1 vpcsw1 vpcp3
  57. 57. ELI5: vmnic(4) Setup VMNIC1_ID=a774ba3a-1f77-11e8-8006-0cc47a6c7d1e
 VPCP1_ID=0ebf50e1-1f79-11e8-8002-0cc47a6c7d1e
 
 vpc vmnic create 
 --vmnic-id=${VMNIC1_ID}
 
 vpc switch port add 
 --switch-id=${VPCSW0_ID} 
 --port-id=${VPCP1_ID}
 
 vpc switch port connect 
 --port-id=${VPCP1_ID} 
 --interface-id=${VMNIC1_ID}
  58. 58. ELI5: vpc(4) edition em0 Guest 3 Customer B Guest 2 Customer B vmnic1 vmnic2 vpcsw1 vpcp3 vpcp4
  59. 59. ELI5: ethlink(4) Setup ETHLINK0_ID=5c4acd32-1b8d-11e8-b408-0cc47a6c7d1e
 UPLINK_PORT_ID=ea58b648-203b-a707-cdf6-7a552c8d5295
 UPLINK_IF=em0
 
 vpc switch port add 
 --switch-id=${VPCSW0_ID} 
 --uplink 
 --port-id=${UPLINK_PORT_ID} 
 --l2-name=${UPLINK_IF} 
 --ethlink-id=${ETHLINK0_ID}
  60. 60. ELI5: vpc(4) edition em0 Guest 3 Customer B Guest 2 Customer B vmnic1 ethlink0 vmnic2 vpcsw1 vpcp3 vpcp4 vpcp5
  61. 61. ELI5: vpc(4) + VXLAN
  62. 62. ELI5: vpc(4) edition em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 VNI 123 VLAN 456 VNI 987 VNI 123 VTAG 456 VNI 987 VXLAN Packets em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 10.65.5.161 10.65.5.162
  63. 63. ELI5: vpclink(4) Setup • How do you do ARP or IPv6 Neighbor Discovery? • Broadcast? • Multicast?
  64. 64. ELI5: vpc(4) edition em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 VNI 123 VLAN 456 VNI 987 VNI 123 VTAG 456 VNI 987 VXLAN Packets em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 10.65.5.161 10.65.5.162 ???
  65. 65. VTEP: VXLAN Tunnel End Point • vpcsw(4) traps broadcast packets • If the packet is: • a broadcast packet and • either an IPv4 ARP or IPv6 ND packet
 Packet is added to a knote (part of kqueue(2)) and passed to all kqueue(2) subscribers filtering for EVFILT_VPC filters • Packet payload is stored in user-allocated buffer pointed to by ext[0] • Userland utility parses packet and performs lookup • Overlay and underlay forwarding information written back to the kernel via vpc_ctl(2) • vpcsw(4) caches forwarding information for the src/dst MAC tuple
  66. 66. Ongoing Work • Firewalling - integrated at vpcp(4) • Routing • NAT • Userland Control Plane (including setup and teardown of bhyve(4) guests via something not a shell script)
  67. 67. vpc(8) % doas vpc vm create % doas vpc vm run • bhyve(4) VM creation and launch tool • Unifies network isolation with compute isolation
  68. 68. Desktop Software + vpc(4) • vpc(4) + NAT == Desktop Software Hotness • Ability to sandbox VMs with no prior knowledge of the host's networking • Uses:
 % vagrant up % packer build No more dependency on pf(4) or dnsmasq(8)
  69. 69. Code • Kernel:
 https://github.com/joyent/freebsd/tree/projects/VPC • Kernel Libraries:
 https://github.com/joyent/freebsd/tree/projects/VPC/ libexec/go/src/go.freebsd.org/sys/vpc • Userland tooling:
 https://github.com/sean-/vpc
 
 Future home:
 
 https://github.com/joyent-/vpc
  70. 70. Questions?

×