Managing Open vSwitch 
Across a large heterogeneous fleet 
Andy Hill @andyhky 
Systems Engineer, Rackspace 
Joel Preas @joelintheory 
Systems Engineer, Rackspace
Some Definitions 
Large Fleet Heterogenous 
• Several different hardware 
manufacturers 
• Several XenServer major versions 
(sometimes on varying kernels) 
• Five hardware profiles 
• Six production public clouds 
• Six internal private clouds 
• Various non production environments 
• Tens of thousands of hosts 
• Hundreds of thousands of instances
Quick OVS Introduction
History 
• Rackspace used Open vSwitch since the pre 1.0 days 
• Behind most of First Generation Cloud Servers (Slicehost) 
• Powers 100% of Next Generation Cloud Servers 
• Upgraded OVS on Next Gen hypervisors 9 times over 2 
years
Upgrade Open vSwitch 
If you get nothing else from this talk, upgrade OVS!
Why upgrade? 
Reasons we upgraded: 
• Performance 
• Less impacting upgrades 
• NSX Controller version requirements 
• Nasty regression in 2.1 [96be8de] 
http://bit.do/OVS21Regression 
• Performance
Performance 
• Broadcast domain sizing 
• Special care in ingress broadcast flows 
• Craft flows to explicitly allow destined broadcast traffic
Performance 
• The Dark Ages (< 1.11) 
• Megaflows (>= 1.11) 
• Ludicrous Speed (>= 2.1)
The Dark Ages (< 1.11) 
• Flow-eviction-threshold = 2000 
• Single threaded 
• 12 point match for datapath flow 
• 8 upcall paths for datapath misses 
• Userspace hit per bridge (2x the lookups)
Megaflows (1.11+) 
• Wildcard matching on datapath 
• Less likely to hit flow-eviction-threshold 
• Some workloads still had issues 
• Most cases datapath flows cut in half or better
Ludicrous speed (2.1+) 
• RIP flow-eviction-threshold 
• 200000 datapath flows (configurable) 
• In the wild, we have seen over 72K datapath flows / 
260K pps
OVS 1.4 -> OVS 2.1 
Broadcast flows
Mission Accomplished! 
We moved the bottleneck! 
New bottlenecks: 
● Guest OS kernel configuration 
● Xen Netback/Netfront Driver
Upgrade, Upgrade, Upgrade 
If you package Open vSwitch, don’t leave 
your customers in The Dark Ages 
Open vSwitch 2.3 is LTS
Upgrade process 
• Ansible Driven (async - watch your SSH timeouts) 
• /etc/init.d/openvswitch force-reload-kmod 
• bonded <= 30 sec of data plane impact 
• non-bonded <=5 sec of data plane impact 
http://bit.do/ovsupgrade
Bridge Fail Modes 
Secure vs. Normal bridge fail mode 
Learning L2 switch, overriding default 
• Critical XenServer bug with Windows causing full host 
reboots (CTX140814) 
• Bridge fail mode change is a datapath impacting event 
• Fail modes do not persist across reboots in XenServer 
unless in bridge other-config
Patch Ports and Bridge Fail Modes 
• Misconfigured patch ports + ‘Normal’ Bridge Fail mode 
• Patches do not persist across reboots, cron.reboot to 
set up- no hypervisor hook available
Bridge Migration 
OVS Upgrades required all bridges to be secured 
1. Create new bridge 
2. Move VIFs from old bridge to new bridge (loss of a 
couple of packets) 
3. Upgrade OVS 
4. Ensure bridge fail mode change persists across reboot 
5. Clean up 
Entire process orchestrated with Ansible
Kernel Modules 
Running Kernel OVS Kernel Module Staged Kernel Reboot Outcome 
vABC OVSvABC None Everything’s Fine 
vABC OVSvABC vDEF No Networking 
vABC OVSvABC, OVSvDEF vDEF Everything’s Fine
Kernel Modules 
• Ensure proper OVS kernel modules are in place 
• Kernel Upgrade = OVS kernel module upgrade 
• More packaging work to do for heterogenous 
environment 
• Failure to do so can force a trip to a Java console
Other Challenges with OVS 
• Tied to old version because $REASON 
• VLAN Splinters/ovs-vlan-bug-workaround 
• Hypervisor Integration 
• Platforms: LXC, KVM, XenServer 5.5 and beyond
Measuring OVS 
PavlOVS sends these metrics to StatsD/graphite: 
• Per bridge byte_count, packet_count, flow_count 
• Instance count 
• ovs CPU utilization 
• Aggregate datapath flow_count, missed, hit, lost rates 
These are aggregated per region->cell->host 
Useful for DDoS detection (Graphite highestCurrent()) 
Scaling issues with Graphite/StatsD
OVS in Compute Host Lifecycle 
Ovsulate - Ansible Module that checks host into NVP/NSX controllers. Can 
fail if routes bad or the host certificate changes, i.e. a host is re-kicked. First 
made sure it failed explicitly, later added logic to delete existing on 
provisioning.
Monitoring OVS 
Connectivity to SDN controller 
• ovs-vsctl find manager is_connected=false 
• ovs-vsctl find controller is_connected=false 
SDN integration process (ovs-xapi-sync) 
• pgrep -f ovs-xapi-sync 
Routes
Monitoring OVS
Reboots 
Will the host networking survive a reboot? (kernel modules) 
http://bit.do/iwillsurvive
Monitoring OVS
Monitoring OVS 
XSA-108 - AKA Rebootpocalypse 2014 
• Incorrect kmods on reboot may require OOB access to fix! 
• Had monitoring in place 
• Pre-flight check for KMods just in case
Questions? 
THANK YOU 
RACKSPACE® | 1 FANATICAL PLACE, CITY OF WINDCREST | SAN ANTONIO, TX 78218 
US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM 
© RACKSPACE LTD. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM

Managing Open vSwitch Across a Large Heterogenous Fleet

  • 1.
    Managing Open vSwitch Across a large heterogeneous fleet Andy Hill @andyhky Systems Engineer, Rackspace Joel Preas @joelintheory Systems Engineer, Rackspace
  • 2.
    Some Definitions LargeFleet Heterogenous • Several different hardware manufacturers • Several XenServer major versions (sometimes on varying kernels) • Five hardware profiles • Six production public clouds • Six internal private clouds • Various non production environments • Tens of thousands of hosts • Hundreds of thousands of instances
  • 3.
  • 4.
    History • Rackspaceused Open vSwitch since the pre 1.0 days • Behind most of First Generation Cloud Servers (Slicehost) • Powers 100% of Next Generation Cloud Servers • Upgraded OVS on Next Gen hypervisors 9 times over 2 years
  • 5.
    Upgrade Open vSwitch If you get nothing else from this talk, upgrade OVS!
  • 6.
    Why upgrade? Reasonswe upgraded: • Performance • Less impacting upgrades • NSX Controller version requirements • Nasty regression in 2.1 [96be8de] http://bit.do/OVS21Regression • Performance
  • 7.
    Performance • Broadcastdomain sizing • Special care in ingress broadcast flows • Craft flows to explicitly allow destined broadcast traffic
  • 10.
    Performance • TheDark Ages (< 1.11) • Megaflows (>= 1.11) • Ludicrous Speed (>= 2.1)
  • 11.
    The Dark Ages(< 1.11) • Flow-eviction-threshold = 2000 • Single threaded • 12 point match for datapath flow • 8 upcall paths for datapath misses • Userspace hit per bridge (2x the lookups)
  • 12.
    Megaflows (1.11+) •Wildcard matching on datapath • Less likely to hit flow-eviction-threshold • Some workloads still had issues • Most cases datapath flows cut in half or better
  • 14.
    Ludicrous speed (2.1+) • RIP flow-eviction-threshold • 200000 datapath flows (configurable) • In the wild, we have seen over 72K datapath flows / 260K pps
  • 16.
    OVS 1.4 ->OVS 2.1 Broadcast flows
  • 17.
    Mission Accomplished! Wemoved the bottleneck! New bottlenecks: ● Guest OS kernel configuration ● Xen Netback/Netfront Driver
  • 18.
    Upgrade, Upgrade, Upgrade If you package Open vSwitch, don’t leave your customers in The Dark Ages Open vSwitch 2.3 is LTS
  • 19.
    Upgrade process •Ansible Driven (async - watch your SSH timeouts) • /etc/init.d/openvswitch force-reload-kmod • bonded <= 30 sec of data plane impact • non-bonded <=5 sec of data plane impact http://bit.do/ovsupgrade
  • 20.
    Bridge Fail Modes Secure vs. Normal bridge fail mode Learning L2 switch, overriding default • Critical XenServer bug with Windows causing full host reboots (CTX140814) • Bridge fail mode change is a datapath impacting event • Fail modes do not persist across reboots in XenServer unless in bridge other-config
  • 21.
    Patch Ports andBridge Fail Modes • Misconfigured patch ports + ‘Normal’ Bridge Fail mode • Patches do not persist across reboots, cron.reboot to set up- no hypervisor hook available
  • 22.
    Bridge Migration OVSUpgrades required all bridges to be secured 1. Create new bridge 2. Move VIFs from old bridge to new bridge (loss of a couple of packets) 3. Upgrade OVS 4. Ensure bridge fail mode change persists across reboot 5. Clean up Entire process orchestrated with Ansible
  • 23.
    Kernel Modules RunningKernel OVS Kernel Module Staged Kernel Reboot Outcome vABC OVSvABC None Everything’s Fine vABC OVSvABC vDEF No Networking vABC OVSvABC, OVSvDEF vDEF Everything’s Fine
  • 24.
    Kernel Modules •Ensure proper OVS kernel modules are in place • Kernel Upgrade = OVS kernel module upgrade • More packaging work to do for heterogenous environment • Failure to do so can force a trip to a Java console
  • 25.
    Other Challenges withOVS • Tied to old version because $REASON • VLAN Splinters/ovs-vlan-bug-workaround • Hypervisor Integration • Platforms: LXC, KVM, XenServer 5.5 and beyond
  • 26.
    Measuring OVS PavlOVSsends these metrics to StatsD/graphite: • Per bridge byte_count, packet_count, flow_count • Instance count • ovs CPU utilization • Aggregate datapath flow_count, missed, hit, lost rates These are aggregated per region->cell->host Useful for DDoS detection (Graphite highestCurrent()) Scaling issues with Graphite/StatsD
  • 27.
    OVS in ComputeHost Lifecycle Ovsulate - Ansible Module that checks host into NVP/NSX controllers. Can fail if routes bad or the host certificate changes, i.e. a host is re-kicked. First made sure it failed explicitly, later added logic to delete existing on provisioning.
  • 28.
    Monitoring OVS Connectivityto SDN controller • ovs-vsctl find manager is_connected=false • ovs-vsctl find controller is_connected=false SDN integration process (ovs-xapi-sync) • pgrep -f ovs-xapi-sync Routes
  • 29.
  • 30.
    Reboots Will thehost networking survive a reboot? (kernel modules) http://bit.do/iwillsurvive
  • 31.
  • 32.
    Monitoring OVS XSA-108- AKA Rebootpocalypse 2014 • Incorrect kmods on reboot may require OOB access to fix! • Had monitoring in place • Pre-flight check for KMods just in case
  • 33.
    Questions? THANK YOU RACKSPACE® | 1 FANATICAL PLACE, CITY OF WINDCREST | SAN ANTONIO, TX 78218 US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM © RACKSPACE LTD. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM

Editor's Notes

  • #3 Worth mentioning the # of kernel versions?