CCA - NoDerivs 3.0 Unported License - Usage OK, no modifications, full attribution*!
* All unlicensed or borrowed works retain their original licenses
OpenStack Scale-out 

Networking Architecture
Abhishek Chanda, Software Engineer!
OpenStack Juno Design Summit!
May 13th, 2014
Scaling to 1,000+ servers without Neutron!
@rony358
Abhishek Chanda
•Engineer at Cloudscaling!
•Started with a focus on software defined
networking and distributed systems!
• Ended up as jack of all trades!!
• Still bad at configuring network devices!
2
Today’s Goals
3
• Our Layer-3 (L3) network architecture!
• Motivation!
• Design goals!
• Network topology!
• Software components layout!
• Under the hood!
• Challenges Faced!
• Future Directions (SDN, Neutron etc.)
Networking Modes 

in Vanilla OpenStack
using nova-network
OpenStack Networking Options
5
Single-Host"
OpenStack
Networking

Multi-Host"
OpenStack
Networking

Flat 

OpenStack
Networking
OCS "
Classic

Networking
OCS "
VPC 

Networking
Scalability"
& Control 0 1 2 4 4
Reference
Network "
Architecture
Layer-2 VLANs
w/ STP
Layer-2 VLANs
w/ STP
Layer-2 VLANs
w/ STP
L3 Spine & Leaf
+ scale-out NAT
L3 Spine & Leaf
Underlay +
network
virtualization
Design
Pattern
Centralized
network stack
Decentralized
network stack
Centralized
network stack
Fully Distributed Fully Distributed
Core 

Design 

Flaw
All traffic through
a single x86
server
NAT at
Hypervisor;
802.1q tagging
All traffic
through a single
x86 server
Requires
deeper
networking
savvy
More bleeding
edge (scale not
proven)
Issues
SPOF;
Performance
bottleneck; 

No SDN
Security & 

QoS issues; 

No SDN
No control plane
scalability; No
SDN
No virtual
networks
No VLAN
support
Neutron Router
nova-network
Option 1 - OpenStack Single-Host
6
VLAN
VLAN
VLAN
VLAN
VLAN
VLAN
All traffic through
centralized bottleneck
and 

single point of failure
Performance bottleneck; non-standard networking arch.
Option 2 - OpenStack Multi-Host
7
Core Network
Hypervisor
nova-network
Public

IPs
Private

IPsNAT
Separate NICs cut
available bandwidth
by 50%
Direct database
access increases risk
NAT at hypervisor
means routing
protocols to
hypervisor or VLANs
+ STP across racks
for gratuitous ARP
Must route Public
IPs across core
Security problem; non-standard networking arch.
Option 3 - OpenStack Flat
8
Networking scalability, but no control plane scalability
Uses ethernet
adapters
configured as
bridges to allow
network traffic
between nodes!
!
Commonly used
in POC and
development
environments
nova-network
nova-scheduler
nova-api
nova-compute
Controller Compute
your local network IP address space
(eth1) (eth1)
10.0.0.0/8
br100

(eth0)
br100

(eth0)
What Path Have Others Chosen?
•HP, CERN, ANL and others!
•CERN uses a custom driver which talks
to a DB that maps IP to MAC addresses
(amongst other attributes)!
• Essentially flat with manually created VLANs
assigned to specific compute nodes!
•ANL added InfiniBand and VxLAN
support to nova-network
9
The L3 Network
Architecture in OCS
Design Goals
• Mimic Amazon EC2 “classic” networking!
• Pure L3 is blazing fast and well understood!
• Network/systems folks can troubleshoot easily!
• No VLANs, no Spanning Tree Protocol (STP), no L2 mess!
• No SPOFs, smaller failure domains!
• E.g. single_host & flat mode!
• Distributed “scale-out” DHCP!
• No virtual routers!!
• Path is vm->host->ToR->core
11
This enables a horizontally scalable stateless 

NAT layer that provides floating IPs
Layer 3 (L3) Networks Scale
12
Largest cloud operators are
L3 with NO VLANs
Cloud-ready apps don’t
need or want VLANs (L2)
An L3 networking model is
ideal underlay for SDN
overlay
The Internet Scales!
Why NOT L2?
13
L3 L2
Hierarchical topology Flat topology
Route aggregation No route aggregation / everything everywhere
Fast convergence times
Fast convergence times only when tuned
properly
Locality matters Locality disappears
Use all available bandwidth 

(via ECMP) using multiple paths
Uses half of available bandwidth 

and most traffic takes a long route
Proven scale STP/VLANs work at small to medium scale only
The Internet (& ISPs) are L3 oriented Typical enterprise datacenter is L2 oriented
Best practice for SDN “underlay” SDN “overlay” designed to provide L2 virt. nets
Smaller Failure Domains
14
Would you rather have the whole cloud down !
or just a small bit of it for a short time?
vs
Spine & Leaf
15
Simple Network View
16
Software Components Schematic Layout
17
Software!
Components
Hardware!
Components
• Nova Network is distributed
& synchronized!
• Means we can have 

many running at once) !
• This drives horizontal 

network scalability 

by adding more 

network managers
Zone Node
nova-

network
Compute Node
ToR
Core Network
nova-!
db
ToR
Software Components Schematic Layout
18
Software!
Components
Hardware!
Components
• VIF driver on each compute node!
• Bridge creation on each vm (/30)!
• Enhanced iptables rules!
• Per vm udhcpd process!
• Configures routing
Zone Node
nova-!
network
Compute Node
ToR
Core Network
nova-!
db
VM
DHCP 

Server
VIF!
Driver
ToR
Software Components Schematic Layout
19
Software!
Components
Hardware!
Components
• NAT service on the edge!
• Provides on demand 

elastic IP service
Zone Node
nova-!
network
Compute Node
ToR
Natter
Core Network
Edge Network
nova-!
db
VM
DHCP 

Server
VIF!
Driver
ToR
Provides utilities to
create a number of
/30 networks per
host and pin them
to host (l3addnet)
Under the Hood: Natter
20
Zone 

Node
nova-db
Natter
Core Network
Edge Network
Connection
Control Plane Data
1
Polls nova-db for new

<floating_ip, private_ip> tuples
2
Use tc to install 1:1 NAT!
rules in eth0
(eth1)
(eth0)
Under the Hood: Natter
21
22
Under the Hood: VIF Driver and Nova-Network
Zone!
Node
ToR
nova-
network
Compute 

Node
ToR
VIF!
Driver
Core Network
Natter
VM Provisioning"
1) VIF: Build linux bridge!
2) NM: Get host!
3) NM: Get all available networks for host!
4) NM: Find first unused network for the host!
5) VIF: Create a VIF!
6) VIF: Sets up and starts udhcpd on host per VM

MAC is calculated based on the IP!
7) VIF: Creates a /30 network for the VM, assigns

one address to the bridge, one to the VM!
8) VIF: Adds routes to enable VM to gateway traffic!
9) VIF: Adds iptables rules to enable blocked !
networks and whitelisted hosts!
!
VM Decommissioning"
1) VIF: Stop udhcpd for the bridge the VM is 

attached to and remove config!
2) NM: Delete all IPs in all VIFs!
3) NM: Cleanup linux bridge!
4) NM: Cleanup all /30 networks!
Under the Hood: l3addnet
23
Used by cloud admins to pin networks to hosts!
Wrapper around nova-manage network create
!
• Breaks down the input CIDR into /30 blocks!
• Loops through each block and calls the nova-
manage API to create a network on that compute
host
root@z2.b0.z1:~# l3addnet

Usage: l3addnet cidr host01 dns1 dns2
root@z2.b0.z1:~# l3addnet 10.50.0.0/24 10.18.1.12 8.8.8.8 8.8.4.4
Network Driver Challenges
• OpenStack releases are moving targets!
• Plugin interfaces change!
• Library dependencies change!
• Database API not rich/efficient enough!
• Straight to SQL to get what we needed!
• nova-network supposed to be deprecated?!
• First in Folsom or Grizzly? Then Havana??!
• Have to figure out our Neutron strategy
24
Why Not Neutron Now?
•Created in Diablo timeframe !
•Neutron still not stable!
• API changes and interfaces are actively hostile!
• No multi-host support!
• Complicated, non-intuitive maintenance procedures!
• Not all plugins are equally stable!
• many are outright buggy
25
SDN in OCS
• OpenContrail only one to meet our rqmts!
• OCS ref network arch ideal “underlay”!
• SDN underlays usually are spine and leaf!
• L3 routing does not interfere with supporting
encapsulation or tunneling protocols!
• Customers can choose network model!
• VPC or L3
26
A large customer who wants to seamlessly support
autoscaling for its tenants is a perfect use-case for VPC
Example Packet Path - L3*
27
Edge Routers!
(“egress”)
Core Routers!
(“spine”)
ToR Switches!
(“leaf”)
VMs
Router (L3)
Switch (L2)
Ethernet 

Domain
!
IP Traffic
!
Encapsulated

Traffic (L3oL3)
Internet
Linux Bridge

on compute
node
* natters not shown for
simplicity purposes!
Example Packet Path
28
Edge Routers!
(“egress”)
Core Routers!
(“spine”)
ToR Switches!
(“leaf”)
VMs
Router (L3)
Switch (L2)
Ethernet 

Domain
!
IP Traffic
!
Encapsulated

Traffic (L3oL3)
Internet
Virtual Network Virtual Network
Layer 3 

Physical Network
Network 

Virtualization
Layer 3 

Virtual Networks

for Customers 

& Apps
Abstraction
vRouter/vSW
Future Directions
• OCS L3 networking migrates to Neutron!
• As networking plugin (beyond nova-network
replacement)!
• OCS VPC w/ more advanced SDN capabilities!
• NFV combined with Service Chaining for Carriers!
• Support existing physical network assets with
Service Chaining
29
Questions?
Abhishek Chanda
@rony358

OpenStack Scale-out Networking Architecture

  • 1.
    CCA - NoDerivs3.0 Unported License - Usage OK, no modifications, full attribution*! * All unlicensed or borrowed works retain their original licenses OpenStack Scale-out 
 Networking Architecture Abhishek Chanda, Software Engineer! OpenStack Juno Design Summit! May 13th, 2014 Scaling to 1,000+ servers without Neutron! @rony358
  • 2.
    Abhishek Chanda •Engineer atCloudscaling! •Started with a focus on software defined networking and distributed systems! • Ended up as jack of all trades!! • Still bad at configuring network devices! 2
  • 3.
    Today’s Goals 3 • OurLayer-3 (L3) network architecture! • Motivation! • Design goals! • Network topology! • Software components layout! • Under the hood! • Challenges Faced! • Future Directions (SDN, Neutron etc.)
  • 4.
    Networking Modes 
 inVanilla OpenStack using nova-network
  • 5.
    OpenStack Networking Options 5 Single-Host" OpenStack Networking
 Multi-Host" OpenStack Networking
 Flat
 OpenStack Networking OCS " Classic
 Networking OCS " VPC 
 Networking Scalability" & Control 0 1 2 4 4 Reference Network " Architecture Layer-2 VLANs w/ STP Layer-2 VLANs w/ STP Layer-2 VLANs w/ STP L3 Spine & Leaf + scale-out NAT L3 Spine & Leaf Underlay + network virtualization Design Pattern Centralized network stack Decentralized network stack Centralized network stack Fully Distributed Fully Distributed Core 
 Design 
 Flaw All traffic through a single x86 server NAT at Hypervisor; 802.1q tagging All traffic through a single x86 server Requires deeper networking savvy More bleeding edge (scale not proven) Issues SPOF; Performance bottleneck; 
 No SDN Security & 
 QoS issues; 
 No SDN No control plane scalability; No SDN No virtual networks No VLAN support
  • 6.
    Neutron Router nova-network Option 1- OpenStack Single-Host 6 VLAN VLAN VLAN VLAN VLAN VLAN All traffic through centralized bottleneck and 
 single point of failure Performance bottleneck; non-standard networking arch.
  • 7.
    Option 2 -OpenStack Multi-Host 7 Core Network Hypervisor nova-network Public
 IPs Private
 IPsNAT Separate NICs cut available bandwidth by 50% Direct database access increases risk NAT at hypervisor means routing protocols to hypervisor or VLANs + STP across racks for gratuitous ARP Must route Public IPs across core Security problem; non-standard networking arch.
  • 8.
    Option 3 -OpenStack Flat 8 Networking scalability, but no control plane scalability Uses ethernet adapters configured as bridges to allow network traffic between nodes! ! Commonly used in POC and development environments nova-network nova-scheduler nova-api nova-compute Controller Compute your local network IP address space (eth1) (eth1) 10.0.0.0/8 br100
 (eth0) br100
 (eth0)
  • 9.
    What Path HaveOthers Chosen? •HP, CERN, ANL and others! •CERN uses a custom driver which talks to a DB that maps IP to MAC addresses (amongst other attributes)! • Essentially flat with manually created VLANs assigned to specific compute nodes! •ANL added InfiniBand and VxLAN support to nova-network 9
  • 10.
  • 11.
    Design Goals • MimicAmazon EC2 “classic” networking! • Pure L3 is blazing fast and well understood! • Network/systems folks can troubleshoot easily! • No VLANs, no Spanning Tree Protocol (STP), no L2 mess! • No SPOFs, smaller failure domains! • E.g. single_host & flat mode! • Distributed “scale-out” DHCP! • No virtual routers!! • Path is vm->host->ToR->core 11 This enables a horizontally scalable stateless 
 NAT layer that provides floating IPs
  • 12.
    Layer 3 (L3)Networks Scale 12 Largest cloud operators are L3 with NO VLANs Cloud-ready apps don’t need or want VLANs (L2) An L3 networking model is ideal underlay for SDN overlay The Internet Scales!
  • 13.
    Why NOT L2? 13 L3L2 Hierarchical topology Flat topology Route aggregation No route aggregation / everything everywhere Fast convergence times Fast convergence times only when tuned properly Locality matters Locality disappears Use all available bandwidth 
 (via ECMP) using multiple paths Uses half of available bandwidth 
 and most traffic takes a long route Proven scale STP/VLANs work at small to medium scale only The Internet (& ISPs) are L3 oriented Typical enterprise datacenter is L2 oriented Best practice for SDN “underlay” SDN “overlay” designed to provide L2 virt. nets
  • 14.
    Smaller Failure Domains 14 Wouldyou rather have the whole cloud down ! or just a small bit of it for a short time? vs
  • 15.
  • 16.
  • 17.
    Software Components SchematicLayout 17 Software! Components Hardware! Components • Nova Network is distributed & synchronized! • Means we can have 
 many running at once) ! • This drives horizontal 
 network scalability 
 by adding more 
 network managers Zone Node nova-
 network Compute Node ToR Core Network nova-! db ToR
  • 18.
    Software Components SchematicLayout 18 Software! Components Hardware! Components • VIF driver on each compute node! • Bridge creation on each vm (/30)! • Enhanced iptables rules! • Per vm udhcpd process! • Configures routing Zone Node nova-! network Compute Node ToR Core Network nova-! db VM DHCP 
 Server VIF! Driver ToR
  • 19.
    Software Components SchematicLayout 19 Software! Components Hardware! Components • NAT service on the edge! • Provides on demand 
 elastic IP service Zone Node nova-! network Compute Node ToR Natter Core Network Edge Network nova-! db VM DHCP 
 Server VIF! Driver ToR Provides utilities to create a number of /30 networks per host and pin them to host (l3addnet)
  • 20.
    Under the Hood:Natter 20 Zone 
 Node nova-db Natter Core Network Edge Network Connection Control Plane Data 1 Polls nova-db for new
 <floating_ip, private_ip> tuples 2 Use tc to install 1:1 NAT! rules in eth0 (eth1) (eth0)
  • 21.
    Under the Hood:Natter 21
  • 22.
    22 Under the Hood:VIF Driver and Nova-Network Zone! Node ToR nova- network Compute 
 Node ToR VIF! Driver Core Network Natter VM Provisioning" 1) VIF: Build linux bridge! 2) NM: Get host! 3) NM: Get all available networks for host! 4) NM: Find first unused network for the host! 5) VIF: Create a VIF! 6) VIF: Sets up and starts udhcpd on host per VM
 MAC is calculated based on the IP! 7) VIF: Creates a /30 network for the VM, assigns
 one address to the bridge, one to the VM! 8) VIF: Adds routes to enable VM to gateway traffic! 9) VIF: Adds iptables rules to enable blocked ! networks and whitelisted hosts! ! VM Decommissioning" 1) VIF: Stop udhcpd for the bridge the VM is 
 attached to and remove config! 2) NM: Delete all IPs in all VIFs! 3) NM: Cleanup linux bridge! 4) NM: Cleanup all /30 networks!
  • 23.
    Under the Hood:l3addnet 23 Used by cloud admins to pin networks to hosts! Wrapper around nova-manage network create ! • Breaks down the input CIDR into /30 blocks! • Loops through each block and calls the nova- manage API to create a network on that compute host root@z2.b0.z1:~# l3addnet
 Usage: l3addnet cidr host01 dns1 dns2 root@z2.b0.z1:~# l3addnet 10.50.0.0/24 10.18.1.12 8.8.8.8 8.8.4.4
  • 24.
    Network Driver Challenges •OpenStack releases are moving targets! • Plugin interfaces change! • Library dependencies change! • Database API not rich/efficient enough! • Straight to SQL to get what we needed! • nova-network supposed to be deprecated?! • First in Folsom or Grizzly? Then Havana??! • Have to figure out our Neutron strategy 24
  • 25.
    Why Not NeutronNow? •Created in Diablo timeframe ! •Neutron still not stable! • API changes and interfaces are actively hostile! • No multi-host support! • Complicated, non-intuitive maintenance procedures! • Not all plugins are equally stable! • many are outright buggy 25
  • 26.
    SDN in OCS •OpenContrail only one to meet our rqmts! • OCS ref network arch ideal “underlay”! • SDN underlays usually are spine and leaf! • L3 routing does not interfere with supporting encapsulation or tunneling protocols! • Customers can choose network model! • VPC or L3 26 A large customer who wants to seamlessly support autoscaling for its tenants is a perfect use-case for VPC
  • 27.
    Example Packet Path- L3* 27 Edge Routers! (“egress”) Core Routers! (“spine”) ToR Switches! (“leaf”) VMs Router (L3) Switch (L2) Ethernet 
 Domain ! IP Traffic ! Encapsulated
 Traffic (L3oL3) Internet Linux Bridge
 on compute node * natters not shown for simplicity purposes!
  • 28.
    Example Packet Path 28 EdgeRouters! (“egress”) Core Routers! (“spine”) ToR Switches! (“leaf”) VMs Router (L3) Switch (L2) Ethernet 
 Domain ! IP Traffic ! Encapsulated
 Traffic (L3oL3) Internet Virtual Network Virtual Network Layer 3 
 Physical Network Network 
 Virtualization Layer 3 
 Virtual Networks
 for Customers 
 & Apps Abstraction vRouter/vSW
  • 29.
    Future Directions • OCSL3 networking migrates to Neutron! • As networking plugin (beyond nova-network replacement)! • OCS VPC w/ more advanced SDN capabilities! • NFV combined with Service Chaining for Carriers! • Support existing physical network assets with Service Chaining 29
  • 30.