L2 over l3 ecnaspsulations (english)

© 2014 VMware Inc. All rights reserved.
L2 over L3 Encapsulations
VXLAN, NVGRE, STT, Geneve, etc.
Motonori Shindo
Network & Security Business Unit
VMware
July. 13, 2014

Tunneling vs Encapsulation
• Tunneling Protocols
– Signaling + Encapsulation
• Usually equips some sort of “signaling” mechanism, which manages the tunnel.
• Encapsulation is another part of tunneling protocol.
– E.g. ) PPTP, L2TP, IPsec (IKE), etc.
• Encapsulations
– A way of wrapping (i.e. encapsulating) something
– E.g) GRE, VXLAN, NVGRE, STT, (Ethernet, IP, TCP, ….)
• What I’m going to talk about today is “encapsulation”
• I am not going to talk about “control plane” today (though it’s very important)
CONFIDENTIAL 2

L2 over L3 encapsulations typically seen in Network
Virtualization
• GRE (Generic Routing Encapsulation) *
• VXLAN (Virtual Extensible LAN)
• NVGRE (Network Virtualization using GRE)
• STT (Stateless Transport Tunneling)
* Strictly speaking GRE is not an L2 over L3 encapsulation
as it can encapsulate not only L2 but also L3
CONFIDENTIAL 3

VXLAN
• Proposed by Cumulus / Arista / Broadcom / Cisco / VMware / Citrix / RedHat
– draft-mahalingam-dutt-dcops-vxlan-09.txt
• Extends VLAN ID (12bit) to VNI (24bit)
• Encapsulation by UDP/IP
– L3 overlay
– Multipath
• Encapsulates Ethernet Frame only
• Simple so that it can be implemented by hardware
• Forming an “ecosystem”
CONFIDENTIAL 4

VXLAN Header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R|R|R|R|I|R|R|R| Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| VXLAN Network Identifier (VNI) | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
CONFIDENTIAL 5

Fabric Network
• Service Oriented Architecture
• 2 or 3 layer network to Leaf & Spine
• High density and bandwidth required
• Layer 3 ECMP
• No oversubscription
• Low and uniform delay characteristic
• Wire & configure once network
• Uniform network configuration
WAN/Internet
WAN/Internet
CONFIDENTIAL 6

Multipath Network
• Background
– In order to support significant increase of East-West traffic, Fabric Network based on multipath is
getting popular
• Requisites
– A given flow must traverse over the same paths
– Must have enough “entropy” to make an efficient use of fabric
CONFIDENTIAL 7

Multipath by VXLAN
VXLAN (8)UDP (8)IP (20)
Hash (src/dst MAC addr,
src/dst IP addr,
src/dst port number, etc.) *
dst port = 4789
src port = Hash()
Ether IP TCP Data
original packet
* Which fields to hash or which hash algorithm to use is not defined by the protocol. It is up to the implementation.
CONFIDENTIAL 8

VXLAN Ecosystem
• Switch / Router
– Arista, Brocade, Cisco, Cumulus, DELL, HP,
Huawei, Juniper, Open vSwitch, Pica8
• Operating System
– Linux, VMware
• Appliances
– A10, Citrix F5
• Testers
– IXIA, Spirent
• ASIC / NIC
– Broadcom, Intel (Fulcrum), Emulex, Mellanox
• Cloud Orchestrator
– CloudStack, OpenStack, vCAC
CONFIDENTIAL 9
Note: this is not an exhaustive list
This is a list of venders who participated in
VXLAN interoperability test at INTEROP Tokyo
2014, which went all successful.

NVGRE
• Proposed by Microsoft / Arista / Intel / Google / HP / Broadcom / Emulex
– draft-sridharan-virtualization-nvgre-04.txt
• 24bit Virtual Subnet ID (VSID) and 8bit FlowID
• Encapsulation is GRE as is:
– Put VSID + FlowID in Key Field
– L3 Overlay
– Multipath possible (in theory) but difficult
• Windows affinity
CONFIDENTIAL 10

NVGRE Header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| |1|0| Reserved0 | Ver | Protocol Type 0x6558 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Virtual Subnet ID (VSID) | FlowID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
CONFIDENTIAL 11

Multipath in NVGRE
GRE (8)IP (20)
src/dst IP addr,
src/dst port number, etc.) *
FlowID = Hash()
Ether IP TCP Data
Original Packet
Router / Switch needs to
lookup the Key Field in GRE
header to do an ideal
multipath!
CONFIDENTIAL 12

NVGRE ecosystem
• Switch / Router
– Huawei
– Arista and Brocade claim they are going to support but product hasn’t come out yet??
• Operating System
– Microsoft (Windows Server 2012 R2)
• Appliances
– F5
• ASIC / NIC
– Emulex Mellanox
• Cloud Orchestrator
– System Center 2012 R2
CONFIDENTIAL 13
Note: this is not an exhaustive list

STT (Stateless Transport Tunneling)
• L2 over L3 encapsulation proposed by VMware
– draft-davie-stt-06.txt
• Why yet another L2 over L3 encapsulation ?
– Performance
– Richer context information
– Multipath
– Software oriented
CONFIDENTIAL 14

TSO (TCP Segmentation Offload)
• Modern NIC (shipped within 4-5 years) equips various hardware acceleration features:
– RSS, GSO/TSO, Checksum Offload, etc.
• With TSO, NIC will perform TCP segmentation processing on behalf of Operating System (in
software)
– Operating system can now send up to 64K bytes packet. This will lead to a significant decrease of the
number of packet processing (i.e. interrupt) hence much less context switches needed.
• To take advantage of TSO in NIC, STT encapsulates packets as if it looks like “TCP”!
CONFIDENTIAL 15

Encapsulation / Segmentation in STT
STT (18)TCP’ (20)IP (20)
Payload 1STT (18)TCP’ (20)IP (20)
Payload 2TCP’ (20)IP (20)
Payload nTCP’ (20)IP (20)
L2 Frame (up to 64K)
・
・
・
・
Segmentation
By
Hardware
CONFIDENTIAL 16

TCP-like Header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number(*) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number(*) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |U|A|P|R|S|F| |
| Offset| Reserved |R|C|S|S|Y|I| Window |
| | |G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fields marked as * are
repurposed in STT
CONFIDENTIAL 17

Throughput and CPU Utilization
0
10
20
30
40
50
60
70
80
90
100
0
1
2
3
4
5
6
7
8
9
10
Linux Bridge OVS Bridge OVS-GRE OVS-STT
スループット CPU (Receive) CPU (Send)
(Gbps) (%)Source: http://networkheresy.com/2012/06/08/the-overhead-of-software-tunneling/
CONFIDENTIAL 19

Multipath in STT
STT (18)TCP’ (20)IP (20)
src/dst IP addr,
src/dst port number, etc.)
dst port = 7471 (TBD)
src port = Hash()
Ether IP TCP Data
Original Packet
CONFIDENTIAL 20

Geneve (Generic Network Virtualization Encapsulation)
• New encapsulation being proposed by VMware, Microsoft, RedHat, Intel
– draft-gross-geneve-00.txt
• Goals
– Extensibility
• Service Chaining, Metadata support, etc.
– Leverage NIC offload
– Above two at the same time! (each one is straightforward, but two at the same time is difficult)
• Highlights
– Information can be added as Option field in TLV formart
– Format carefully designed so that NIC can perform TSO
– OAM and Criticality (indicating parsing the option fields mandatory)
CONFIDENTIAL 21

Geneve Header & Option Header
Geneve Header
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Ver| Opt Len |O|C| Rsvd. | Protocol Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Virtual Network Identifier (VNI) | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Variable Length Options |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Option
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Option Class | Type |R|R|R| Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Variable Option Data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
CONFIDENTIAL 22

Geneve Implementation
• Recently implemented in Open vSwitch （OVS） and merged into master branch on GitHub
– VNI can be specified
– Geneve Options can’t be specified (at this point)
– Can’t mark OAM flag?? (I tried but didn’t work)
– Looks like Critical flag supported as long as critical options are present
• Geneve dissector for Wireshark also implemented and merged to master branch of Github
• Geneve-aware NIC is not available yet
CONFIDENTIAL 23

Running Geneve on Open vSwtich
CONFIDENTIAL 24
host-1:~$ sudo ovs-vsctl add-br br0
host-1:~$ sudo ovs-vsctl add-port bra eth0
host-1:~$ sudo ifconfig eth0 0
host-1:~$ sudo dhclient br0
host-1:~$ sudo ifconfig br1 10.0.0.1 netmask 255.255.255.0
host-1:~$ sudo ovs-vsctl add-port br1 geneve1 -- set interface
geneve1 type=geneve options:remote_ip=192.168.203.149
host-2:~$ sudo ovs-vsctl add-port bra eth0
host-2:~$ sudo ifconfig eth0 0
host-2:~$ sudo dhclient br0
host-2:~$ sudo ifconfig br1 10.0.0.2 netmask 255.255.255.0
host-2:~$ sudo ovs-vsctl add-port br1 geneve1 -- set interface
geneve1 type=geneve options:remote_ip=192.168.203.151

Dissecting Geneve Packets by Wireshark (1)
CONFIDENTIAL 25

Dissecting Geneve Packets by Wireshark (2)
CONFIDENTIAL 26

Information about Geneve
• English
– http://tools.ietf.org/html/draft-gross-geneve-00
– http://cto.vmware.com/geneve-vxlan-network-virtualization-encapsulations/
– http://www.enterprisenetworkingplanet.com/netsp/geneve-generic-network-virtualization-encapsulation-
protocol-advances-video.html
– http://searchsdn.techtarget.com/news/2240219051/VMware-Microsoft-end-encapsulation-protocol-turf-
war-with-GENEVE
– http://www.plexxi.com/2014/06/attention-overlay-tunnel-construction-ahead
– http://blog.shin.do/2014/07/geneve-on-open-vswitch/
• Japanese
– http://blog.shin.do/2014/05/geneve-encapsulation/
– http://blog.shin.do/2014/07/geneve-on-open-vswitch/
CONFIDENTIAL 27

Geneve replaces VXLAN / STT / NVGRE ?
• Geneve replaces VXLAN ?
– NO
– VXLAN ecosystem has already grown big enough so it is unlikely to be replaced by something else
– VMware will continue to support VXLAN and ecosystem partners
• Geneve replaces STT?
– In short term, NO. In the long run, maybe if
• Geneve is accepted by the market and Geneve-aware NIC becomes widely available in the same level as STT
today.
• Geneve replaces NVGRE ?
– In short term, NO. In the long run, maybe if
• Geneve gets implemented on Windows and ecosystem is formed in the same level as NVGRE as to today.
CONFIDENTIAL 28

Encapsulation is like a wire, right cable in the right place
CONFIDENTIAL 29
http://cto.vmware.com/geneve-vxlan-network-virtualization-encapsulations/

World is not that simple 
• Some people are against Geneve
• Their claims are more or less as follows:
– What Geneve tries to accomplish can be achieved by existing encapsulation (such as L2TP static
tunneling or VXLAN) as is or with a small extension !?
– Service Chaining, Metadata stuff should not be bound to a particular encapsulation. It should be
independent from encapsulation !?
– 24bit as VNI not long enough !?
CONFIDENTIAL 30

L2TPv3 static tunneling
• L2TPv3 being as a tunneling protocol, inherently it has a signaling. That said, it can be used a
plain encapsulation method (i.e. pseudo wire) without using signaling. That is called “L2TPv3
static tunneling” where configuration is made at both ends manually.
• L2TPv3 became an RFC in 2005 (RFC3931) and been in market for many years. Cisco IOS
and Linux (l2tpd) have L2TPv3 static tunneling.
31
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|T|x|x|x|x|x|x|x|x|x|x|x| Ver | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Session ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Cookie (optional, maximum 64 bits)...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
CONFIDENTIAL

L2TPv3 static tunneling as a L2 over L3 encapsulation
• Session ID (32bit) corresponds to VNI
• L2TPv3 can be transported directly over IP or UDP. For multipath, UDP would be better.
• No explicit field for context information (metadata, etc.). It has to be configured manually on
both ends (if possible) and express it implicitly as a part of Session ID
– Therefore 32bit Session ID can’t be used entirely for VNI
• Strictly speaking, there is no way in L2TPv3 to tell (in the packet) where the subsequent packet
starts at so that NIC can do TSO. However, L2TPv2 had an “offset” option for this purpose.
Many L2TPv3 implementations still have this “offset” option for backward compatibility to
L2TPv2. So TSO is possible (if NIC understands this legacy option). Cisco and Linux l2tpd
support the offset field.
CONFIDENTIAL 32

VXLAN Generic Protocol Extension (a.k.a. eVXLAN)
• Proposed by Cisco、Huawei、Intel、Microsoft
– draft-quinn-vxlan-gpe-03.txt
• An extension to VXLAN
– Support protocols other than Ethernet
• IPv4 (0x01), IPv6 (0x02), Ethernet (0x03), Network Service Header [NSH] (0x04)
– Note that “Net Protocol” is only 8bits width. Protocol type (usually 16bits) has to be specifically encoded to fit into 8bits.
– OAM support
– Version field
• Used by Cisco ACI
CONFIDENTIAL 33
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R|R|R|R|I|P|R|O|Ver| Reserved |Next Protocol |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| VXLAN Network Identifier (VNI) | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

VXLAN-gpe as L2 over L3 encapsulation
• Mostly identical to VXLAN
– VNI length (24bits)
– Multipath property
– Hardware friendliness
• The biggest motivation of VXLAN-gpe is probably to allow Service Chaining by NSH (network
service header)
• No further extensibility
CONFIDENTIAL 34

L2 over l3 ecnaspsulations (english)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to L2 over l3 ecnaspsulations (english)

Similar to L2 over l3 ecnaspsulations (english) (20)

More from Motonori Shindo

More from Motonori Shindo (19)

L2 over l3 ecnaspsulations (english)