© 2014 VMware Inc. All rights reserved.
L2 over L3 Encapsulations
VXLAN, NVGRE, STT, Geneve, etc.
Motonori Shindo
Network & Security Business Unit
VMware
July. 13, 2014
Tunneling vs Encapsulation
• Tunneling Protocols
– Signaling + Encapsulation
• Usually equips some sort of “signaling” mechanism, which manages the tunnel.
• Encapsulation is another part of tunneling protocol.
– E.g. ) PPTP, L2TP, IPsec (IKE), etc.
• Encapsulations
– A way of wrapping (i.e. encapsulating) something
– E.g) GRE, VXLAN, NVGRE, STT, (Ethernet, IP, TCP, ….)
• What I’m going to talk about today is “encapsulation”
• I am not going to talk about “control plane” today (though it’s very important)
CONFIDENTIAL 2
L2 over L3 encapsulations typically seen in Network
Virtualization
• GRE (Generic Routing Encapsulation) *
• VXLAN (Virtual Extensible LAN)
• NVGRE (Network Virtualization using GRE)
• STT (Stateless Transport Tunneling)
* Strictly speaking GRE is not an L2 over L3 encapsulation
as it can encapsulate not only L2 but also L3
CONFIDENTIAL 3
VXLAN
• Proposed by Cumulus / Arista / Broadcom / Cisco / VMware / Citrix / RedHat
– draft-mahalingam-dutt-dcops-vxlan-09.txt
• Extends VLAN ID (12bit) to VNI (24bit)
• Encapsulation by UDP/IP
– L3 overlay
– Multipath
• Encapsulates Ethernet Frame only
• Simple so that it can be implemented by hardware
• Forming an “ecosystem”
CONFIDENTIAL 4
VXLAN Header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R|R|R|R|I|R|R|R| Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| VXLAN Network Identifier (VNI) | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
CONFIDENTIAL 5
Fabric Network
• Service Oriented Architecture
• 2 or 3 layer network to Leaf & Spine
• High density and bandwidth required
• Layer 3 ECMP
• No oversubscription
• Low and uniform delay characteristic
• Wire & configure once network
• Uniform network configuration
WAN/Internet
WAN/Internet
CONFIDENTIAL 6
Multipath Network
• Background
– In order to support significant increase of East-West traffic, Fabric Network based on multipath is
getting popular
• Requisites
– A given flow must traverse over the same paths
– Must have enough “entropy” to make an efficient use of fabric
CONFIDENTIAL 7
Multipath by VXLAN
VXLAN (8)UDP (8)IP (20)
Hash (src/dst MAC addr,
src/dst IP addr,
src/dst port number, etc.) *
dst port = 4789
src port = Hash()
Ether IP TCP Data
original packet
* Which fields to hash or which hash algorithm to use is not defined by the protocol. It is up to the implementation.
CONFIDENTIAL 8
VXLAN Ecosystem
• Switch / Router
– Arista, Brocade, Cisco, Cumulus, DELL, HP,
Huawei, Juniper, Open vSwitch, Pica8
• Operating System
– Linux, VMware
• Appliances
– A10, Citrix F5
• Testers
– IXIA, Spirent
• ASIC / NIC
– Broadcom, Intel (Fulcrum), Emulex, Mellanox
• Cloud Orchestrator
– CloudStack, OpenStack, vCAC
CONFIDENTIAL 9
Note: this is not an exhaustive list
This is a list of venders who participated in
VXLAN interoperability test at INTEROP Tokyo
2014, which went all successful.
NVGRE
• Proposed by Microsoft / Arista / Intel / Google / HP / Broadcom / Emulex
– draft-sridharan-virtualization-nvgre-04.txt
• 24bit Virtual Subnet ID (VSID) and 8bit FlowID
• Encapsulation is GRE as is:
– Put VSID + FlowID in Key Field
– L3 Overlay
– Multipath possible (in theory) but difficult
• Windows affinity
CONFIDENTIAL 10
NVGRE Header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| |1|0| Reserved0 | Ver | Protocol Type 0x6558 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Virtual Subnet ID (VSID) | FlowID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
CONFIDENTIAL 11
Multipath in NVGRE
GRE (8)IP (20)
Hash (src/dst MAC addr,
src/dst IP addr,
src/dst port number, etc.) *
FlowID = Hash()
Ether IP TCP Data
Original Packet
Router / Switch needs to
lookup the Key Field in GRE
header to do an ideal
multipath!
* Which fields to hash or which hash algorithm to use is not defined by the protocol. It is up to the implementation.
CONFIDENTIAL 12
NVGRE ecosystem
• Switch / Router
– Huawei
– Arista and Brocade claim they are going to support but product hasn’t come out yet??
• Operating System
– Microsoft (Windows Server 2012 R2)
• Appliances
– F5
• ASIC / NIC
– Emulex Mellanox
• Cloud Orchestrator
– System Center 2012 R2
CONFIDENTIAL 13
Note: this is not an exhaustive list
STT (Stateless Transport Tunneling)
• L2 over L3 encapsulation proposed by VMware
– draft-davie-stt-06.txt
• Why yet another L2 over L3 encapsulation ?
– Performance
– Richer context information
– Multipath
– Software oriented
CONFIDENTIAL 14
TSO (TCP Segmentation Offload)
• Modern NIC (shipped within 4-5 years) equips various hardware acceleration features:
– RSS, GSO/TSO, Checksum Offload, etc.
• With TSO, NIC will perform TCP segmentation processing on behalf of Operating System (in
software)
– Operating system can now send up to 64K bytes packet. This will lead to a significant decrease of the
number of packet processing (i.e. interrupt) hence much less context switches needed.
• To take advantage of TSO in NIC, STT encapsulates packets as if it looks like “TCP”!
CONFIDENTIAL 15
Encapsulation / Segmentation in STT
STT (18)TCP’ (20)IP (20)
Payload 1STT (18)TCP’ (20)IP (20)
Payload 2TCP’ (20)IP (20)
Payload nTCP’ (20)IP (20)
L2 Frame (up to 64K)
・
・
・
・
Segmentation
By
Hardware
CONFIDENTIAL 16
TCP-like Header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number(*) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number(*) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |U|A|P|R|S|F| |
| Offset| Reserved |R|C|S|S|Y|I| Window |
| | |G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fields marked as * are
repurposed in STT
CONFIDENTIAL 17
STT Header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Version | Flags | L4 Offset | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Max. Segment Size | PCP |V| VLAN ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ Context ID (64 bits) +
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Padding | data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
| |
CONFIDENTIAL 18
Throughput and CPU Utilization
0
10
20
30
40
50
60
70
80
90
100
0
1
2
3
4
5
6
7
8
9
10
Linux Bridge OVS Bridge OVS-GRE OVS-STT
スループット CPU (Receive) CPU (Send)
(Gbps) (%)Source: http://networkheresy.com/2012/06/08/the-overhead-of-software-tunneling/
CONFIDENTIAL 19
Multipath in STT
STT (18)TCP’ (20)IP (20)
Hash (src/dst MAC addr,
src/dst IP addr,
src/dst port number, etc.)
dst port = 7471 (TBD)
src port = Hash()
Ether IP TCP Data
Original Packet
* Which fields to hash or which hash algorithm to use is not defined by the protocol. It is up to the implementation.
CONFIDENTIAL 20
Geneve (Generic Network Virtualization Encapsulation)
• New encapsulation being proposed by VMware, Microsoft, RedHat, Intel
– draft-gross-geneve-00.txt
• Goals
– Extensibility
• Service Chaining, Metadata support, etc.
– Leverage NIC offload
– Above two at the same time! (each one is straightforward, but two at the same time is difficult)
• Highlights
– Information can be added as Option field in TLV formart
– Format carefully designed so that NIC can perform TSO
– OAM and Criticality (indicating parsing the option fields mandatory)
CONFIDENTIAL 21
Geneve Header & Option Header
Geneve Header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver| Opt Len |O|C| Rsvd. | Protocol Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Virtual Network Identifier (VNI) | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Variable Length Options |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Option
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Class | Type |R|R|R| Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Variable Option Data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
CONFIDENTIAL 22
Geneve Implementation
• Recently implemented in Open vSwitch (OVS) and merged into master branch on GitHub
– VNI can be specified
– Geneve Options can’t be specified (at this point)
– Can’t mark OAM flag?? (I tried but didn’t work)
– Looks like Critical flag supported as long as critical options are present
• Geneve dissector for Wireshark also implemented and merged to master branch of Github
• Geneve-aware NIC is not available yet
CONFIDENTIAL 23
Running Geneve on Open vSwtich
CONFIDENTIAL 24
host-1:~$ sudo ovs-vsctl add-br br0
host-1:~$ sudo ovs-vsctl add-br br1
host-1:~$ sudo ovs-vsctl add-port bra eth0
host-1:~$ sudo ifconfig eth0 0
host-1:~$ sudo dhclient br0
host-1:~$ sudo ifconfig br1 10.0.0.1 netmask 255.255.255.0
host-1:~$ sudo ovs-vsctl add-port br1 geneve1 -- set interface 
geneve1 type=geneve options:remote_ip=192.168.203.149
host-2:~$ sudo ovs-vsctl add-br br0
host-2:~$ sudo ovs-vsctl add-br br1
host-2:~$ sudo ovs-vsctl add-port bra eth0
host-2:~$ sudo ifconfig eth0 0
host-2:~$ sudo dhclient br0
host-2:~$ sudo ifconfig br1 10.0.0.2 netmask 255.255.255.0
host-2:~$ sudo ovs-vsctl add-port br1 geneve1 -- set interface 
geneve1 type=geneve options:remote_ip=192.168.203.151
Dissecting Geneve Packets by Wireshark (1)
CONFIDENTIAL 25
Dissecting Geneve Packets by Wireshark (2)
CONFIDENTIAL 26
Information about Geneve
• English
– http://tools.ietf.org/html/draft-gross-geneve-00
– http://cto.vmware.com/geneve-vxlan-network-virtualization-encapsulations/
– http://www.enterprisenetworkingplanet.com/netsp/geneve-generic-network-virtualization-encapsulation-
protocol-advances-video.html
– http://searchsdn.techtarget.com/news/2240219051/VMware-Microsoft-end-encapsulation-protocol-turf-
war-with-GENEVE
– http://www.plexxi.com/2014/06/attention-overlay-tunnel-construction-ahead
– http://blog.shin.do/2014/07/geneve-on-open-vswitch/
• Japanese
– http://blog.shin.do/2014/05/geneve-encapsulation/
– http://blog.shin.do/2014/07/geneve-on-open-vswitch/
CONFIDENTIAL 27
Geneve replaces VXLAN / STT / NVGRE ?
• Geneve replaces VXLAN ?
– NO
– VXLAN ecosystem has already grown big enough so it is unlikely to be replaced by something else
– VMware will continue to support VXLAN and ecosystem partners
• Geneve replaces STT?
– In short term, NO. In the long run, maybe if
• Geneve is accepted by the market and Geneve-aware NIC becomes widely available in the same level as STT
today.
• Geneve replaces NVGRE ?
– In short term, NO. In the long run, maybe if
• Geneve gets implemented on Windows and ecosystem is formed in the same level as NVGRE as to today.
CONFIDENTIAL 28
Encapsulation is like a wire, right cable in the right place
CONFIDENTIAL 29
http://cto.vmware.com/geneve-vxlan-network-virtualization-encapsulations/
World is not that simple 
• Some people are against Geneve
• Their claims are more or less as follows:
– What Geneve tries to accomplish can be achieved by existing encapsulation (such as L2TP static
tunneling or VXLAN) as is or with a small extension !?
– Service Chaining, Metadata stuff should not be bound to a particular encapsulation. It should be
independent from encapsulation !?
– 24bit as VNI not long enough !?
CONFIDENTIAL 30
L2TPv3 static tunneling
• L2TPv3 being as a tunneling protocol, inherently it has a signaling. That said, it can be used a
plain encapsulation method (i.e. pseudo wire) without using signaling. That is called “L2TPv3
static tunneling” where configuration is made at both ends manually.
• L2TPv3 became an RFC in 2005 (RFC3931) and been in market for many years. Cisco IOS
and Linux (l2tpd) have L2TPv3 static tunneling.
31
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|T|x|x|x|x|x|x|x|x|x|x|x| Ver | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Session ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Cookie (optional, maximum 64 bits)...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
CONFIDENTIAL
L2TPv3 static tunneling as a L2 over L3 encapsulation
• Session ID (32bit) corresponds to VNI
• L2TPv3 can be transported directly over IP or UDP. For multipath, UDP would be better.
• No explicit field for context information (metadata, etc.). It has to be configured manually on
both ends (if possible) and express it implicitly as a part of Session ID
– Therefore 32bit Session ID can’t be used entirely for VNI
• Strictly speaking, there is no way in L2TPv3 to tell (in the packet) where the subsequent packet
starts at so that NIC can do TSO. However, L2TPv2 had an “offset” option for this purpose.
Many L2TPv3 implementations still have this “offset” option for backward compatibility to
L2TPv2. So TSO is possible (if NIC understands this legacy option). Cisco and Linux l2tpd
support the offset field.
CONFIDENTIAL 32
VXLAN Generic Protocol Extension (a.k.a. eVXLAN)
• Proposed by Cisco、Huawei、Intel、Microsoft
– draft-quinn-vxlan-gpe-03.txt
• An extension to VXLAN
– Support protocols other than Ethernet
• IPv4 (0x01), IPv6 (0x02), Ethernet (0x03), Network Service Header [NSH] (0x04)
– Note that “Net Protocol” is only 8bits width. Protocol type (usually 16bits) has to be specifically encoded to fit into 8bits.
– OAM support
– Version field
• Used by Cisco ACI
CONFIDENTIAL 33
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R|R|R|R|I|P|R|O|Ver| Reserved |Next Protocol |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| VXLAN Network Identifier (VNI) | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
VXLAN-gpe as L2 over L3 encapsulation
• Mostly identical to VXLAN
– VNI length (24bits)
– Multipath property
– Hardware friendliness
• The biggest motivation of VXLAN-gpe is probably to allow Service Chaining by NSH (network
service header)
• No further extensibility
CONFIDENTIAL 34
Thank You!
35

L2 over l3 ecnaspsulations (english)

  • 1.
    © 2014 VMwareInc. All rights reserved. L2 over L3 Encapsulations VXLAN, NVGRE, STT, Geneve, etc. Motonori Shindo Network & Security Business Unit VMware July. 13, 2014
  • 2.
    Tunneling vs Encapsulation •Tunneling Protocols – Signaling + Encapsulation • Usually equips some sort of “signaling” mechanism, which manages the tunnel. • Encapsulation is another part of tunneling protocol. – E.g. ) PPTP, L2TP, IPsec (IKE), etc. • Encapsulations – A way of wrapping (i.e. encapsulating) something – E.g) GRE, VXLAN, NVGRE, STT, (Ethernet, IP, TCP, ….) • What I’m going to talk about today is “encapsulation” • I am not going to talk about “control plane” today (though it’s very important) CONFIDENTIAL 2
  • 3.
    L2 over L3encapsulations typically seen in Network Virtualization • GRE (Generic Routing Encapsulation) * • VXLAN (Virtual Extensible LAN) • NVGRE (Network Virtualization using GRE) • STT (Stateless Transport Tunneling) * Strictly speaking GRE is not an L2 over L3 encapsulation as it can encapsulate not only L2 but also L3 CONFIDENTIAL 3
  • 4.
    VXLAN • Proposed byCumulus / Arista / Broadcom / Cisco / VMware / Citrix / RedHat – draft-mahalingam-dutt-dcops-vxlan-09.txt • Extends VLAN ID (12bit) to VNI (24bit) • Encapsulation by UDP/IP – L3 overlay – Multipath • Encapsulates Ethernet Frame only • Simple so that it can be implemented by hardware • Forming an “ecosystem” CONFIDENTIAL 4
  • 5.
    VXLAN Header 0 12 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R|R|R|R|I|R|R|R| Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VXLAN Network Identifier (VNI) | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ CONFIDENTIAL 5
  • 6.
    Fabric Network • ServiceOriented Architecture • 2 or 3 layer network to Leaf & Spine • High density and bandwidth required • Layer 3 ECMP • No oversubscription • Low and uniform delay characteristic • Wire & configure once network • Uniform network configuration WAN/Internet WAN/Internet CONFIDENTIAL 6
  • 7.
    Multipath Network • Background –In order to support significant increase of East-West traffic, Fabric Network based on multipath is getting popular • Requisites – A given flow must traverse over the same paths – Must have enough “entropy” to make an efficient use of fabric CONFIDENTIAL 7
  • 8.
    Multipath by VXLAN VXLAN(8)UDP (8)IP (20) Hash (src/dst MAC addr, src/dst IP addr, src/dst port number, etc.) * dst port = 4789 src port = Hash() Ether IP TCP Data original packet * Which fields to hash or which hash algorithm to use is not defined by the protocol. It is up to the implementation. CONFIDENTIAL 8
  • 9.
    VXLAN Ecosystem • Switch/ Router – Arista, Brocade, Cisco, Cumulus, DELL, HP, Huawei, Juniper, Open vSwitch, Pica8 • Operating System – Linux, VMware • Appliances – A10, Citrix F5 • Testers – IXIA, Spirent • ASIC / NIC – Broadcom, Intel (Fulcrum), Emulex, Mellanox • Cloud Orchestrator – CloudStack, OpenStack, vCAC CONFIDENTIAL 9 Note: this is not an exhaustive list This is a list of venders who participated in VXLAN interoperability test at INTEROP Tokyo 2014, which went all successful.
  • 10.
    NVGRE • Proposed byMicrosoft / Arista / Intel / Google / HP / Broadcom / Emulex – draft-sridharan-virtualization-nvgre-04.txt • 24bit Virtual Subnet ID (VSID) and 8bit FlowID • Encapsulation is GRE as is: – Put VSID + FlowID in Key Field – L3 Overlay – Multipath possible (in theory) but difficult • Windows affinity CONFIDENTIAL 10
  • 11.
    NVGRE Header 0 12 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0| |1|0| Reserved0 | Ver | Protocol Type 0x6558 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Virtual Subnet ID (VSID) | FlowID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ CONFIDENTIAL 11
  • 12.
    Multipath in NVGRE GRE(8)IP (20) Hash (src/dst MAC addr, src/dst IP addr, src/dst port number, etc.) * FlowID = Hash() Ether IP TCP Data Original Packet Router / Switch needs to lookup the Key Field in GRE header to do an ideal multipath! * Which fields to hash or which hash algorithm to use is not defined by the protocol. It is up to the implementation. CONFIDENTIAL 12
  • 13.
    NVGRE ecosystem • Switch/ Router – Huawei – Arista and Brocade claim they are going to support but product hasn’t come out yet?? • Operating System – Microsoft (Windows Server 2012 R2) • Appliances – F5 • ASIC / NIC – Emulex Mellanox • Cloud Orchestrator – System Center 2012 R2 CONFIDENTIAL 13 Note: this is not an exhaustive list
  • 14.
    STT (Stateless TransportTunneling) • L2 over L3 encapsulation proposed by VMware – draft-davie-stt-06.txt • Why yet another L2 over L3 encapsulation ? – Performance – Richer context information – Multipath – Software oriented CONFIDENTIAL 14
  • 15.
    TSO (TCP SegmentationOffload) • Modern NIC (shipped within 4-5 years) equips various hardware acceleration features: – RSS, GSO/TSO, Checksum Offload, etc. • With TSO, NIC will perform TCP segmentation processing on behalf of Operating System (in software) – Operating system can now send up to 64K bytes packet. This will lead to a significant decrease of the number of packet processing (i.e. interrupt) hence much less context switches needed. • To take advantage of TSO in NIC, STT encapsulates packets as if it looks like “TCP”! CONFIDENTIAL 15
  • 16.
    Encapsulation / Segmentationin STT STT (18)TCP’ (20)IP (20) Payload 1STT (18)TCP’ (20)IP (20) Payload 2TCP’ (20)IP (20) Payload nTCP’ (20)IP (20) L2 Frame (up to 64K) ・ ・ ・ ・ Segmentation By Hardware CONFIDENTIAL 16
  • 17.
    TCP-like Header 0 12 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Port | Destination Port | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number(*) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number(*) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data | |U|A|P|R|S|F| | | Offset| Reserved |R|C|S|S|Y|I| Window | | | |G|K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | Urgent Pointer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Fields marked as * are repurposed in STT CONFIDENTIAL 17
  • 18.
    STT Header 0 12 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Version | Flags | L4 Offset | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Max. Segment Size | PCP |V| VLAN ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + Context ID (64 bits) + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Padding | data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | | CONFIDENTIAL 18
  • 19.
    Throughput and CPUUtilization 0 10 20 30 40 50 60 70 80 90 100 0 1 2 3 4 5 6 7 8 9 10 Linux Bridge OVS Bridge OVS-GRE OVS-STT スループット CPU (Receive) CPU (Send) (Gbps) (%)Source: http://networkheresy.com/2012/06/08/the-overhead-of-software-tunneling/ CONFIDENTIAL 19
  • 20.
    Multipath in STT STT(18)TCP’ (20)IP (20) Hash (src/dst MAC addr, src/dst IP addr, src/dst port number, etc.) dst port = 7471 (TBD) src port = Hash() Ether IP TCP Data Original Packet * Which fields to hash or which hash algorithm to use is not defined by the protocol. It is up to the implementation. CONFIDENTIAL 20
  • 21.
    Geneve (Generic NetworkVirtualization Encapsulation) • New encapsulation being proposed by VMware, Microsoft, RedHat, Intel – draft-gross-geneve-00.txt • Goals – Extensibility • Service Chaining, Metadata support, etc. – Leverage NIC offload – Above two at the same time! (each one is straightforward, but two at the same time is difficult) • Highlights – Information can be added as Option field in TLV formart – Format carefully designed so that NIC can perform TSO – OAM and Criticality (indicating parsing the option fields mandatory) CONFIDENTIAL 21
  • 22.
    Geneve Header &Option Header Geneve Header 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Ver| Opt Len |O|C| Rsvd. | Protocol Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Virtual Network Identifier (VNI) | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Variable Length Options | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Option 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Option Class | Type |R|R|R| Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Variable Option Data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ CONFIDENTIAL 22
  • 23.
    Geneve Implementation • Recentlyimplemented in Open vSwitch (OVS) and merged into master branch on GitHub – VNI can be specified – Geneve Options can’t be specified (at this point) – Can’t mark OAM flag?? (I tried but didn’t work) – Looks like Critical flag supported as long as critical options are present • Geneve dissector for Wireshark also implemented and merged to master branch of Github • Geneve-aware NIC is not available yet CONFIDENTIAL 23
  • 24.
    Running Geneve onOpen vSwtich CONFIDENTIAL 24 host-1:~$ sudo ovs-vsctl add-br br0 host-1:~$ sudo ovs-vsctl add-br br1 host-1:~$ sudo ovs-vsctl add-port bra eth0 host-1:~$ sudo ifconfig eth0 0 host-1:~$ sudo dhclient br0 host-1:~$ sudo ifconfig br1 10.0.0.1 netmask 255.255.255.0 host-1:~$ sudo ovs-vsctl add-port br1 geneve1 -- set interface geneve1 type=geneve options:remote_ip=192.168.203.149 host-2:~$ sudo ovs-vsctl add-br br0 host-2:~$ sudo ovs-vsctl add-br br1 host-2:~$ sudo ovs-vsctl add-port bra eth0 host-2:~$ sudo ifconfig eth0 0 host-2:~$ sudo dhclient br0 host-2:~$ sudo ifconfig br1 10.0.0.2 netmask 255.255.255.0 host-2:~$ sudo ovs-vsctl add-port br1 geneve1 -- set interface geneve1 type=geneve options:remote_ip=192.168.203.151
  • 25.
    Dissecting Geneve Packetsby Wireshark (1) CONFIDENTIAL 25
  • 26.
    Dissecting Geneve Packetsby Wireshark (2) CONFIDENTIAL 26
  • 27.
    Information about Geneve •English – http://tools.ietf.org/html/draft-gross-geneve-00 – http://cto.vmware.com/geneve-vxlan-network-virtualization-encapsulations/ – http://www.enterprisenetworkingplanet.com/netsp/geneve-generic-network-virtualization-encapsulation- protocol-advances-video.html – http://searchsdn.techtarget.com/news/2240219051/VMware-Microsoft-end-encapsulation-protocol-turf- war-with-GENEVE – http://www.plexxi.com/2014/06/attention-overlay-tunnel-construction-ahead – http://blog.shin.do/2014/07/geneve-on-open-vswitch/ • Japanese – http://blog.shin.do/2014/05/geneve-encapsulation/ – http://blog.shin.do/2014/07/geneve-on-open-vswitch/ CONFIDENTIAL 27
  • 28.
    Geneve replaces VXLAN/ STT / NVGRE ? • Geneve replaces VXLAN ? – NO – VXLAN ecosystem has already grown big enough so it is unlikely to be replaced by something else – VMware will continue to support VXLAN and ecosystem partners • Geneve replaces STT? – In short term, NO. In the long run, maybe if • Geneve is accepted by the market and Geneve-aware NIC becomes widely available in the same level as STT today. • Geneve replaces NVGRE ? – In short term, NO. In the long run, maybe if • Geneve gets implemented on Windows and ecosystem is formed in the same level as NVGRE as to today. CONFIDENTIAL 28
  • 29.
    Encapsulation is likea wire, right cable in the right place CONFIDENTIAL 29 http://cto.vmware.com/geneve-vxlan-network-virtualization-encapsulations/
  • 30.
    World is notthat simple  • Some people are against Geneve • Their claims are more or less as follows: – What Geneve tries to accomplish can be achieved by existing encapsulation (such as L2TP static tunneling or VXLAN) as is or with a small extension !? – Service Chaining, Metadata stuff should not be bound to a particular encapsulation. It should be independent from encapsulation !? – 24bit as VNI not long enough !? CONFIDENTIAL 30
  • 31.
    L2TPv3 static tunneling •L2TPv3 being as a tunneling protocol, inherently it has a signaling. That said, it can be used a plain encapsulation method (i.e. pseudo wire) without using signaling. That is called “L2TPv3 static tunneling” where configuration is made at both ends manually. • L2TPv3 became an RFC in 2005 (RFC3931) and been in market for many years. Cisco IOS and Linux (l2tpd) have L2TPv3 static tunneling. 31 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |T|x|x|x|x|x|x|x|x|x|x|x| Ver | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Session ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Cookie (optional, maximum 64 bits)... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ CONFIDENTIAL
  • 32.
    L2TPv3 static tunnelingas a L2 over L3 encapsulation • Session ID (32bit) corresponds to VNI • L2TPv3 can be transported directly over IP or UDP. For multipath, UDP would be better. • No explicit field for context information (metadata, etc.). It has to be configured manually on both ends (if possible) and express it implicitly as a part of Session ID – Therefore 32bit Session ID can’t be used entirely for VNI • Strictly speaking, there is no way in L2TPv3 to tell (in the packet) where the subsequent packet starts at so that NIC can do TSO. However, L2TPv2 had an “offset” option for this purpose. Many L2TPv3 implementations still have this “offset” option for backward compatibility to L2TPv2. So TSO is possible (if NIC understands this legacy option). Cisco and Linux l2tpd support the offset field. CONFIDENTIAL 32
  • 33.
    VXLAN Generic ProtocolExtension (a.k.a. eVXLAN) • Proposed by Cisco、Huawei、Intel、Microsoft – draft-quinn-vxlan-gpe-03.txt • An extension to VXLAN – Support protocols other than Ethernet • IPv4 (0x01), IPv6 (0x02), Ethernet (0x03), Network Service Header [NSH] (0x04) – Note that “Net Protocol” is only 8bits width. Protocol type (usually 16bits) has to be specifically encoded to fit into 8bits. – OAM support – Version field • Used by Cisco ACI CONFIDENTIAL 33 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R|R|R|R|I|P|R|O|Ver| Reserved |Next Protocol | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VXLAN Network Identifier (VNI) | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  • 34.
    VXLAN-gpe as L2over L3 encapsulation • Mostly identical to VXLAN – VNI length (24bits) – Multipath property – Hardware friendliness • The biggest motivation of VXLAN-gpe is probably to allow Service Chaining by NSH (network service header) • No further extensibility CONFIDENTIAL 34
  • 35.