VXLAN Integration with CloudStack was presented at the Advanced Zone CCCEU13 conference in Amsterdam on November 21, 2013. The presentation discussed integrating VXLAN to overcome the VLAN ID limitation in CloudStack and allow for more scalable network isolation. VXLAN was demonstrated working with CloudStack to provide isolated networks and inter-tier connectivity within VPCs while maintaining network isolation. Basic functions like VM connectivity, migration, and network availability were tested under VXLAN and found to work as expected. Feedback was welcomed on the VXLAN integration in CloudStack.
[2024]Digital Global Overview Report 2024 Meltwater.pdf
VXLAN Integration with CloudStack Advanced Zone
1. VXLAN Integration with CloudStack
Advanced Zone
CCCEU13 - Amsterdam
Nov. 21, 2013
Shinya Adachi s.adachi@ntt.com
Yoshikazu Nojima y.nojima@verio.net
1
2. Why we come to here?
•
Open source community growth is important
-Because we want to be free from specific
vendor products.
•
Contribute technology for cloud scalability to
accelerate migration from customer onpremise to be on the cloud
-By suggest one of possible solution to have
massive scalability.
2
3. Cloudn
• CloudStack based public cloud services(Compute)
• Currently available in Japan and the US
• 2 interface for customer, Customer portal GUI and APIs
(APIs over 150 including AWS compatible)
• VPC type “Coming Soon” in Japan
3
3
4. Problem: VLAN ID limitation
• Advanced Zone
o
More functionality
•
•
o
NAT, FW, LB, VPN
VPC
Isolation required
•
•
For each guest network
For each VPC tier
Virtual
Router
• Isolation Method: VLAN
o
Virtual
Router
Isolated
VLAN IDs are limited
•
•
o
Advanced
Zone
Public Network
Only 4096
Should be identical within a zone
Guest
Network
VPC
Tier
VPC
Tier
# of Domains is limited by VLAN
•
Each domain requires at least one
VLAN ID
VM
VM
VM
VM
VM
VM
VPC
4
5. VXLAN Overview
VXLAN [Virtual eXtensible Local Area Network]
Objective
Overcome VLAN scalability limitation
NW Type
Overlay network
Envelope type
UDP packet (L4 packet)
Standardization Status
Under IETF standardization process
Implementation
Software-based : Cisco Nexus Series Switches, VMware vSphere
Distributed Switch, Open vSwitch, and Linux kernel
hardware based : Arista 7150, Brocade ADX series
Characteristics
•
•
•
•
16M (2^24) isolated networks
On top of UDP packet
• Can utilize L4 port based ECMP load balancing solutions
• Src UDP port is a hash of payload MAC addr
Ethernet broadcast is mapped to IP multicast
• L2: IGMP (or MLD) snooping, otherwise it floods a little
• L3: If you want to communicate across L3 subnet
Dynamic tunnel endpoint learning
http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-06
5
6. How traffic flows with VXLAN?
Underlay Network for VXLAN
ethX
ethX
vxlanM
vxlanM
vxlanM
brethY-M
brethY-M
brethY-M
vnet
vnet
vnet
VM1
VM2
VM3
Host1
Host2
ethX
Host3
1. If Unicast and KVM host (Src) learned mapping between VM and KVM host (Dst)
VXLAN uses Unicast
2. If broadcast or Unicast but KVM host (Src) doesn’t know mapping
VXLAN uses Multicast
6
7. Host2 VTEP IP address resolution table
ex. ping
VNI
Capsule Dst IP addr
N
Underlay Network for VXLAN
Payload Dst MAC addr
VM3 MAC addr
Host3 IP addr
Host3 VTEP IP address resolution table
VNI
Payload Dst MAC addr
Capsule Dst IP addr
N
VM2 MAC addr
Host2 IP addr
VTEP : VXLAN Tunnel End Point
ethX
ethX
vxlanN
vxlanN
vxlanN
bridge
bridge
bridge
vnet
vnet
vnet
VM1
VM2
VM3
Host1
Host3
Host2
Payload
#
ethX
VXLAN header
Src MAC addr
Dst MAC addr
VNI
Src IP addr
Dst IP addr
Src UDP port number
1
ARP request
VM2 MAC addr
Broadcast
N
Host2 IP addr
Multicast IP addr
Hash(VM2 MAC addr)
2
ARP reply
VM3 MAC addr
VM2 MAC addr
N
Host3 IP addr
Host2 IP addr
Hash(VM3 MAC addr)
3
ICMP Echo request
VM2 MAC addr
VM3 MAC addr
N
Host2 IP addr
Host3 IP addr
Hash(VM2 MAC addr)
4
ICMP Echo reply
VM3 MAC addr
VM2 MAC addr
N
Host3 IP addr
Host2 IP addr
Hash(VM3 MAC addr)
7
8. How VXLAN shrink broadcast domain size?
Underlay Network for VXLAN
Underlay network segment 1
Underlay network segment 2
Underlay network segment 3
ethY
vxlanN
vxlanM
bridge
bridge
bridge
vnet
(no VM associated
with VNI : N)
vxlanN
vnet
vnet
VM2
VM3
VM1
Host1
Host2
Host3
Host4
1. Host1 contains no VM belong to VXLAN segment N, Host 1 doesn’t join the
multicast group N.
2. Since VM1 & VM2 belong to VXLAN segment N, Host 2 & 3 join same multicast
group N.
3. Since Host 4 contains no VM belong to VXLAN segment N, the path to Host4 is
excluded from multicast domain if the switch supports IGMP snooping
8
15. Functional test result overview
We tested the basic functions directly affected by VXLAN support.
(ex. VM start/stop, Internet connectivity, Inter-tier connectivity and live migration in Isolated Network and VPC tier )
Case # VR/VM location
1 VR&VM exist in a same hypervisor
Network type
isolated
Test target function
connectivity to VR
connectivity to the internet
VR restart
connectivity to VR after VR restart
connectivity to the internet after VR restart
VM restart
2 VR&VM exist in different hypervisors
isolated
connectivity to VR after VM restart
connectivity to the internet after VM restart
connectivity to VR
connectivity to the internet
VR restart
connectivity to VR after VR restart
connectivity to the internet after VR restart
VM restart
connectivity to VR after VM restart
connectivity to the internet after VM restart
VM migration
connectivity to VR after VM restart
connectivity to the internet after VM restart
3 VM1&VM2 exist in different isolated
network
isolated
inter isolated network isolation
4 VR&VM exist in different hypervisors
VPC
connectivity to VR
connectivity to the internet
VR restart
connectivity to VR after VR restart
connectivity to the internet after VR restart
VM restart
connectivity to VR after VM restart
connectivity to the internet after VM restart
VM migration
connectivity to VR after VM restart
connectivity to the internet after VM restart
5 VM1&VM2 exist in different tier, and
routing between two tier is allowed
6 VM1&VM2 exist in different tier, and
routing between two tier is denied
Step #
Procedure
Expected result
Result
1 ping to VR
2 ping to the host in the internet (ex. 8.8.8.8)
3 stop VR
4 start VR
5 ping to VR
6 ping to the host in the internet (ex. 8.8.8.8)
7 stop VM
8 start VM
9 ping to VR
10 ping to the host in the internet (ex. 8.8.8.8)
1 ping to VR
2 ping to the host in the internet (ex. 8.8.8.8)
3 stop VR
4 start VR
5 ping to VR
6 ping to the host in the internet (ex. 8.8.8.8)
7 stop VM
8 start VM
9 ping to VR
10 ping to the host in the internet (ex. 8.8.8.8)
11 migrate VM to another hypervisor
12 ping to VR
13 ping to the host in the internet (ex. 8.8.8.8)
ping reaches to the destination
ping reaches to the destination
job finishes successfully
job finishes successfully
ping reaches to the destination
ping reaches to the destination
job finishes successfully
job finishes successfully
ping reaches to the destination
ping reaches to the destination
ping reaches to the destination
ping reaches to the destination
job finishes successfully
job finishes successfully
ping reaches to the destination
ping reaches to the destination
job finishes successfully
job finishes successfully
ping reaches to the destination
ping reaches to the destination
job finishes successfully
ping reaches to the destination
ping reaches to the destination
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
1 ping from VM1 in one tier to the internet(ex. 8.8.8.8)
2 tcpdump from VM2 in another tier
1 ping to VR
2 ping to the host in the internet (ex. 8.8.8.8)
3 stop VR
4 start VR
5 ping to VR
6 ping to the host in the internet (ex. 8.8.8.8)
7 stop VM
8 start VM
9 ping to VR
10 ping to the host in the internet (ex. 8.8.8.8)
11 migrate VM to another hypervisor
12 ping to VR
13 ping to the host in the internet (ex. 8.8.8.8)
ping reaches to the destination
ping packet from VM1 cannot captured
ping reaches to the destination
ping reaches to the destination
job finishes successfully
job finishes successfully
ping reaches to the destination
ping reaches to the destination
job finishes successfully
job finishes successfully
ping reaches to the destination
ping reaches to the destination
job finishes successfully
ping reaches to the destination
ping reaches to the destination
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
Pass
VPC
inter-tier connectivity
1 ping from VM1 in one tier to VM2 in another tier
ping reaches to the destination
Pass
VPC
inter-tier isolation
1 ping from VM1 in one tier to the internet(ex. 8.8.8.8)
2 tcpdump from VM2 in another tier
ping reaches to the destination
ping packet from VM1 cannot captured
Pass
Pass
15
16. VXLAN plugin restriction
• VXLAN is not available for Public Network, Storage
Network, and Management Network
•
These networks do not consume many VLAN IDs.
• KVM is the only supported hypervisor
•
Maybe we can add LXC support
• Mapping between VNI and multicast address is hardcoded.
multicastAddress=
"239.$(( ($vni >> 16) % 256 )).$(( ($vni>> 8) % 256 )).$(( $vni % 256 ))"
16
17. Resources
• CloudStack Plugin guide for VXLAN
•
http://jenkins.buildacloud.org/job/build-docs-vxlanmaster/lastSuccessfulBuild/artifact/Apache_CloudStack
-4.3.0-CloudStack_VXLAN_Guide-en-US.pdf
• Design Doc
•
https://cwiki.apache.org/confluence/display/CLOUDSTA
CK/Linux+native+VXLAN+support+on+KVM+hypervisor
• JIRA ticket
•
https://issues.apache.org/jira/browse/CLOUDSTACK2328
Bug report, suggestions, any feedbacks are welcome!
17
18. Wrap up
• VXLAN integration for CloudStack we contributed is merged to
CloudStack 4.3 branch.
• We confirmed basic functions work in Isolated Network and VPC Tier.
• Please evaluate VXLAN integration,
any bug report, suggestions, feedbacks are welcome!
Special Thanks:
Toshiaki Hatano
NTT Communications Corp.
Junji Arakawa
NTT Communications Corp.
Chris Cameron
Verio Inc.
18
20. NVGRE Overview
NVGRE [Network Virtualization using Generic Routing Encapsulation]
Objective
Overcome VLAN scalability limitation
NW Type
Overlay network
Envelop type
Extended GRE packet (L3 packet)
Standardization Status
Under IETF standardization process
Implementation
Microsoft Hyper-V 2012 R2,
Intel Ethernet Switch FM6000 Series
Characteristics
•
•
16M (2^24) isolated networks
Extended GRE packet
• Utilize GRE packet’s key option field as VSID and flow-ID.
• ECMP load balancing solutions must be aware of NVGRE
flow-ID
•
Spec leaves Ethernet broadcast undefined.
• Mapping to IP multicast is suggested.
• Multicast network operation is required.
http://tools.ietf.org/html/draft-sridharan-virtualization-nvgre-03
20
21. STT Overview
STT [Stateless Transport Tunnel]
Objective
Overcome VLAN scalability limitation
NW Type
Overlay network
Envelop type
TCP like original L3 packet (protocol type is same as TCP. Pretends
TCP packet.)
Standardization Status
Under IETF standardization process
Implementation
VMware NSX (formerly Nicira NVP)
Characteristics
•
•
2^64 isolated networks
TCP-like header + STT header
• Can utilize NIC’s TSO feature
• FW/router may drop STT packets by statefull inspection.
•
Spec leaves Ethernet broadcast undefined.
• Mapping to IP multicast is suggested.
• Multicast network operation is required.
http://tools.ietf.org/html/draft-davie-stt-04
21
22. Solutions comparison
VXLAN
NVGRE
STT
Overhead Header Size
○ (50 bytes)
○ (42 bytes)
△(76 bytes)
NIC Offloading
○ (Special NIC is required)
○ (Special NIC is required)
◎ (able to utilize normal
TSO)
Existing Assets Fitness
◎ (MTU may need to be
adjusted)
◎ (MTU may need to be
adjusted)
△ (FW/router may drop STT
packets)
Interoperability
○ (Spec left only minor
undefined points)
×(Tunnel endpoint address
resolution is undefined.)
×(Tunnel endpoint address
resolution is undefined.)
Ethernet Broadcast
◎ (Mapping to IP Multicast)
△ (Mapping to IP Multicast
(suggestion))
△ (Mapping to IP Multicast
(suggestion))
ECMP
○ (Able to utilize L2 fabric's
L4 port base balancing)
△ (L2 fabric must aware
NVGRE flow-ID to balance)
○ (Able to utilize L2 fabric’s
L4 port base balancing)
Multicast Operation
Required
Required (depends on
implementation)
Required (depends on
implementation)
Supporting Vendors
VMware/Citrix/Red Hat/
Cisco/Intel/Broadcom/Arista
Microsoft/Arista/Emulex/
Dell/HP
VMware(formerly Nicira)
Linux Integration
◎ (kernel 3.7 or later)
× (no implementation exists)
○ (Nicira’s Open vSwitch is
required)
22
23. VXLAN Terminology
•
•
•
•
•
VXLAN
•
Virtual eXtensible Local Area Network
VXLAN Segment
•
VXLAN Layer 2 overlay network over which VMs communicate
VTEP
•
VXLAN Tunnel End Point
•
an entity which originates and/or terminates VXLAN tunnels
VNI
•
VXLAN Network Identifier (or VXLAN Segment ID)
VXLAN Gateway
•
an entity which forwards traffic between VXLAN and non-VXLAN
environments
23
24. VXLAN segment format
0
Outer Ethernet Header:
- FCS is newly calculated, inner FCS is omitted.
Outer IP header
- If Inner dst MAC is unicast MAC and local VTEP
knows remote VTEP for the MAC address,
dst IP set to remote VTEP’s IP address.
- If not, packet will be sent out to multicast group
associated with VNI.
- The VTEP will use (*,G) joins.
Outer UDP header
- Source port: It is recommended to be calculated
from inner Ether Header, for ECMP purpose.
- Destination port: 4789
- Checksum: SHOULD be 0. Or correct value
VXLAN header
- VNI has a 24-bit field
From current draft: (IPv4 case)
http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-06
2013-04-17: IANA assigned udp/4789 for VXLAN port
http://www.iana.org/assignments/service-names-portnumbers/service-names-port-numbers.xml
1
2
3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
Outer Ethernet Header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Outer Destination MAC Address
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Outer Destination MAC Address | Outer Source MAC Address
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Outer Source MAC Address
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|OptnlEthtype = C-Tag 802.1Q
| Outer.VLAN Tag Information
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Ethertype = 0x0800
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Outer IPv4 Header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service|
Total Length
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Identification
|Flags|
Fragment Offset
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live |Protocl=17(UDP)|
Header Checksum
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Outer Source IPv4 Address
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Outer Destination IPv4 Address
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Outer UDP Header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Source Port = xxxx
|
Dest Port = VXLAN Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
UDP Length
|
UDP Checksum
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
VXLAN Header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R|R|R|R|I|R|R|R|
Reserved
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
VXLAN Network Identifier (VNI) |
Reserved
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
(continuing to Inner Ethernet header, abbrev.)
24
26. Network concepts in CloudStack
• Guest Network
•
•
•
Virtual network VMs are connected
Isolated each other
There are two type of Guest networks
• Isolated network
–
–
Traffic from VMs go out to Public Network through Virtual Router.
VR is created per Isolated network.
• Shared network
–
Traffic from VM go out directly.
• VPC
•
•
In VPC, Virtual Router can have multiple Isolated
Networks (It is called VPC tier).
In VPC, routing between tier is configurable.
26
27. Network concepts in CloudStack (cont.)
• Isolation method
•
•
•
Method to isolate Guest Networks each other.
Typical isolation method is VLAN.
VXLAN need to be implemented as isolation method.
• Physical Network
•
•
Underlay network of Guest network
Isolation method of guest network is specified while defining
physical network
•
createPhysicalNetwork API has isolationmethods parameter.
–
http://cloudstack.apache.org/docs/api/apidocs4.2/root_admin/createPhysicalNetwork.html
27
28. Advanced Zone Network Overview
Internet
Public Network
KVM
VR for VPC
* VR for VPC will be
created per VPC
KVM
VR
VPC tier
VM
VM
VM
VM
VM
VM
Guest Network
VM
•
•
User can create multiple VPCs (depending on settings, up to 20 by default)
VPC can have multiple tiers (depending on settings, up to 3 by default)
28
35. VXLAN base zone setup procedure(6)
Fill zone wizard.
KVM is the only
supported hypervisor.
35
36. VXLAN base zone setup procedure(7)
Management
network/Public
network/Storage network
are not supported by
VXLAN
Select VLAN
Select VXLAN
Guest network is
supported by VXLAN
Set underlay network I/F name
(ex. “eth0”) to traffic type
36
37. VXLAN base zone setup procedure(8)
Fill zone wizard.
There is no VXLAN specific concern.
37
38. VXLAN base zone setup procedure(9)
Fill zone wizard.
There is no VXLAN specific concern.
38
39. VXLAN base zone setup procedure(10)
You can use 0-16777215 as VNI
39
40. VXLAN base zone setup procedure(11)
Fill zone wizard.
There is no VXLAN specific concern.
40
41. VXLAN base zone setup procedure(12)
Fill zone wizard.
There is no VXLAN specific concern.
41
42. VXLAN base zone setup procedure(13)
Fill zone wizard.
There is no VXLAN specific concern.
42
43. VXLAN base zone setup procedure(14)
Fill zone wizard.
There is no VXLAN specific concern.
43
44. VXLAN base zone setup procedure(15)
Fill zone wizard.
There is no VXLAN specific concern.
44
45. VXLAN base zone setup procedure(16)
Click “Launch zone” button
45