Can you trust Neutron? 
A tour of scalability and reliability 
improvements from Havana to Juno 
Salvatore Orlando (@taturiello) 
Aaron Rosen (@aaronorosen)
From Havana to Juno 
● 12 months 
● 1672 commits 
● +147765 -70127 lines of code 
(excluding changes in neutron/locale/*) 
But... did it really get any better?
Measuring scalability - Process 
● Goal: Validate agent scalability under varying load 
o In this talk we’ll discuss the L2 agent only, sorry! 
● Testbed: single server OpenStack installation 
● Methodology: run several experiments increasing 
the number of servers concurrently created 
o Number of servers ranging from 1 to 20 
o Every experiment is repeated 20 times 
o For each metric, study mean, median, and variance
Measuring scalability - Metrics 
Instance metrics (t_start = instance created): 
● t_active - time until the instance reaches active state 
● t_ping - time until the instance can be pinged 
● t_allocate_net - time spent configuring networking for instance 
Port metrics (t_start = VIF plugged): 
● t_proc: time until the agent start processing the port 
● t_up: time until the port is wired 
● t_dhcp: time for adding DHCP info for the new port
Measuring scalability - Results 
t_up in Havana and Juno - a rather remarkable difference!
Measuring scalability - Results 
t_allocate_net almost constant in Juno 
Growth trend is only 15% of the one seen in Havana
Measuring scalability - results 
● VM failure rate 
analysis 
o Failure == error while 
creating VM or unable 
to ping within 3 min 
timeout 
● Juno is infallible 
decently reliable 
(Havana not as 
much…)
Analysing progress 
Folsom 
Grizzly Havana 
Juno Icehouse 
>> 
>> 
>> 
<<
How the software improved 
● Boot VMs only once network is wired 
● Remove choke points from L2 agents 
● Streamline security group RPC 
● Better router processing in L3 agents 
● Reporting floating IP processing status 
● many others… which unfortunately won’t fit into the time 
allocated to this talk
More results
● Virtually no improvements in 
time to ping an instance 
- As the tests are executed on a 
single host IO contention between 
instances is the main bottleneck. 
- “Time to ping” is slowed down by 
longer instance boot times 
● Instances are slower to go to 
“ACTIVE” then they were in 
Havana 
- This is actually a desired feature 
- Indeed it’s the reason for which 
failure rate in Juno is 0 even with 
20 concurrent instances
Nova/Neutron Event reporting 
Problem: 
Nova displays cached IPAM info about instance from 
neutron. Cache is updated slowly… 
neutron-api 
nova-api 
Wat? No 
floating ip?
Nova/Neutron Event reporting 
Solution: 
Neutron sends events to nova on IPAM changes causing 
nova to update its cache. 
neutron-api 
nova-api 
nova-compute 
I haz 
floating ip
Nova/Neutron Event reporting 
Problem: 
Instances would go active before network was wired. Some 
dhcp clients (as the one in cirros images) doesn’t continue 
retrying... 
nova-api 
W00T 
Active! 
Timeout.. 
Hrm?!?
Nova/Neutron Event reporting 
Solution: 
Neutron sends events to nova on when network is ready. 
VM 
1B. Port X 
active 
neutron-api 
Neutron Backend 
2B. event: network-vif-plugged: port X 
2. Allocate network for 
instance 
nova-api nova-scheduler nova-compute
Enabling/disabling event reporting 
Settings in nova.conf 
vif_plugging_timeout = 300 
vif_plugging_is_fatal = True
Speeding up L2 interface processing 
Problem - device processing delayed by: 
- inefficient server/agent interface 
- preemptive behaviour of security group callbacks 
- pedantic polling of interfaces on integration bridge 
- superficial analysis of devices to process 
Solution: 
- ovsdb-monitor triggers interface processing only when changes are detected 
- Neutron server perform at most 2 RPC call over AMQP for each API operation 
- only 1 call in most cases 
- The L2 agent queries the server only once for retrieving interface detail 
- Security group updates are processed in the same loop as interface, thus avoiding starvation. 
- The agent only processes interfaces which are ready to be used - and most importantly 
processes them only once!
Streamlining security group RPCs 
Problem - exponential complexity 
The payload of the RPC call to retrieve security group rules grows exponentially when the 
number of devices increases 
Solution: 
Restructure the format of the payload exchanged between agent and server, removing 
data redundancy. 
With the new payload format, security group rules are not repeated anymore.
Streamlining security group RPCs 
RPC message payload size vs # of ports RPC execution time vs # of ports 
Credits: Miguel Angel Ajo Pelayo 
http://www.ajo.es/post/95269040924/neutron-security-group-rules-for-devices-rpc-rewrite
Reducing router processing times 
Problems: 
● Router synchronization starves RPC handling 
● Not enough parallelism in router and floating IP processing 
Solution: 
● Router synchronization tasks and RPC messages are added to a priority 
queue. Items pulled from the queue are processed in separate threads. 
● Apply iptables command in a non blocking fashion
Know your floating IP status 
Problem: 
There was no way to know whether your floating IP is ready or not 
(beyond pinging it, obviously) 
Solution: 
- Introducing the concept of operational status for floating IPs. 
- The L3 agent calls back the server to confirm successful floating IP creation (ACTIVE), or an 
error (DOWN) 
- The state defaults to DOWN. Goes ACTIVE upon floating IP association, and DOWN when the 
floating IP is disassociated.
Other enhancements (in brief) 
● Multiple REST API workers 
● Multiple RPC over AMQP workers 
● Better IP address recycling 
● Removal of several locking queries 
o ie: LOCK FOR UPDATE statements 
● Removal of conditions triggering LOCK WAIT timeout errors 
o bug triggered by eventlet yielding within a transaction
Where we are... 
● The L2 agent scalability considerably improved over the past 12 months 
o Results measured with OVS only but the same considerations apply to Linux 
Bridge as well 
● Security groups can now be used even in very large deployments 
● Nova/Neutron interface much more reliable 
o Boot a server only when the network for it is wired 
o Faster, less chatty communication 
● Some progress on resource status tracking 
o Far from being optimal, but at least now you can now when your floating IP is 
ready to use...
… and where we want to be 
● There is still a lot of room for improvement in the agents 
o E.g.: OVS agent still scan all ports on integration bridge at each iteration 
● The Nova/Neutron interface is better, but is however far from ideal 
o Enhanced caching on the nova side can avoid a lot of round trips to neutron 
● Little to nothing has been done for tracking async operation and resource 
status. For example: 
o there is no way to know whether DHCP info are ready for a port 
o security group updates are processed asynchronously, but it is impossible to 
know when processing completes
Final thoughts 
● “Much better” is different from “ideal” 
o ≅ 3 seconds for wiring an interface could not be ideal for many 
applications 
o scalability limits should be addressed even if they involve architectural 
changes 
● What about data plane scalability? 
● What about API usability?

Can you trust Neutron?

  • 1.
    Can you trustNeutron? A tour of scalability and reliability improvements from Havana to Juno Salvatore Orlando (@taturiello) Aaron Rosen (@aaronorosen)
  • 2.
    From Havana toJuno ● 12 months ● 1672 commits ● +147765 -70127 lines of code (excluding changes in neutron/locale/*) But... did it really get any better?
  • 3.
    Measuring scalability -Process ● Goal: Validate agent scalability under varying load o In this talk we’ll discuss the L2 agent only, sorry! ● Testbed: single server OpenStack installation ● Methodology: run several experiments increasing the number of servers concurrently created o Number of servers ranging from 1 to 20 o Every experiment is repeated 20 times o For each metric, study mean, median, and variance
  • 4.
    Measuring scalability -Metrics Instance metrics (t_start = instance created): ● t_active - time until the instance reaches active state ● t_ping - time until the instance can be pinged ● t_allocate_net - time spent configuring networking for instance Port metrics (t_start = VIF plugged): ● t_proc: time until the agent start processing the port ● t_up: time until the port is wired ● t_dhcp: time for adding DHCP info for the new port
  • 5.
    Measuring scalability -Results t_up in Havana and Juno - a rather remarkable difference!
  • 6.
    Measuring scalability -Results t_allocate_net almost constant in Juno Growth trend is only 15% of the one seen in Havana
  • 7.
    Measuring scalability -results ● VM failure rate analysis o Failure == error while creating VM or unable to ping within 3 min timeout ● Juno is infallible decently reliable (Havana not as much…)
  • 8.
    Analysing progress Folsom Grizzly Havana Juno Icehouse >> >> >> <<
  • 9.
    How the softwareimproved ● Boot VMs only once network is wired ● Remove choke points from L2 agents ● Streamline security group RPC ● Better router processing in L3 agents ● Reporting floating IP processing status ● many others… which unfortunately won’t fit into the time allocated to this talk
  • 10.
  • 11.
    ● Virtually noimprovements in time to ping an instance - As the tests are executed on a single host IO contention between instances is the main bottleneck. - “Time to ping” is slowed down by longer instance boot times ● Instances are slower to go to “ACTIVE” then they were in Havana - This is actually a desired feature - Indeed it’s the reason for which failure rate in Juno is 0 even with 20 concurrent instances
  • 12.
    Nova/Neutron Event reporting Problem: Nova displays cached IPAM info about instance from neutron. Cache is updated slowly… neutron-api nova-api Wat? No floating ip?
  • 13.
    Nova/Neutron Event reporting Solution: Neutron sends events to nova on IPAM changes causing nova to update its cache. neutron-api nova-api nova-compute I haz floating ip
  • 14.
    Nova/Neutron Event reporting Problem: Instances would go active before network was wired. Some dhcp clients (as the one in cirros images) doesn’t continue retrying... nova-api W00T Active! Timeout.. Hrm?!?
  • 15.
    Nova/Neutron Event reporting Solution: Neutron sends events to nova on when network is ready. VM 1B. Port X active neutron-api Neutron Backend 2B. event: network-vif-plugged: port X 2. Allocate network for instance nova-api nova-scheduler nova-compute
  • 16.
    Enabling/disabling event reporting Settings in nova.conf vif_plugging_timeout = 300 vif_plugging_is_fatal = True
  • 17.
    Speeding up L2interface processing Problem - device processing delayed by: - inefficient server/agent interface - preemptive behaviour of security group callbacks - pedantic polling of interfaces on integration bridge - superficial analysis of devices to process Solution: - ovsdb-monitor triggers interface processing only when changes are detected - Neutron server perform at most 2 RPC call over AMQP for each API operation - only 1 call in most cases - The L2 agent queries the server only once for retrieving interface detail - Security group updates are processed in the same loop as interface, thus avoiding starvation. - The agent only processes interfaces which are ready to be used - and most importantly processes them only once!
  • 18.
    Streamlining security groupRPCs Problem - exponential complexity The payload of the RPC call to retrieve security group rules grows exponentially when the number of devices increases Solution: Restructure the format of the payload exchanged between agent and server, removing data redundancy. With the new payload format, security group rules are not repeated anymore.
  • 19.
    Streamlining security groupRPCs RPC message payload size vs # of ports RPC execution time vs # of ports Credits: Miguel Angel Ajo Pelayo http://www.ajo.es/post/95269040924/neutron-security-group-rules-for-devices-rpc-rewrite
  • 20.
    Reducing router processingtimes Problems: ● Router synchronization starves RPC handling ● Not enough parallelism in router and floating IP processing Solution: ● Router synchronization tasks and RPC messages are added to a priority queue. Items pulled from the queue are processed in separate threads. ● Apply iptables command in a non blocking fashion
  • 21.
    Know your floatingIP status Problem: There was no way to know whether your floating IP is ready or not (beyond pinging it, obviously) Solution: - Introducing the concept of operational status for floating IPs. - The L3 agent calls back the server to confirm successful floating IP creation (ACTIVE), or an error (DOWN) - The state defaults to DOWN. Goes ACTIVE upon floating IP association, and DOWN when the floating IP is disassociated.
  • 22.
    Other enhancements (inbrief) ● Multiple REST API workers ● Multiple RPC over AMQP workers ● Better IP address recycling ● Removal of several locking queries o ie: LOCK FOR UPDATE statements ● Removal of conditions triggering LOCK WAIT timeout errors o bug triggered by eventlet yielding within a transaction
  • 23.
    Where we are... ● The L2 agent scalability considerably improved over the past 12 months o Results measured with OVS only but the same considerations apply to Linux Bridge as well ● Security groups can now be used even in very large deployments ● Nova/Neutron interface much more reliable o Boot a server only when the network for it is wired o Faster, less chatty communication ● Some progress on resource status tracking o Far from being optimal, but at least now you can now when your floating IP is ready to use...
  • 24.
    … and wherewe want to be ● There is still a lot of room for improvement in the agents o E.g.: OVS agent still scan all ports on integration bridge at each iteration ● The Nova/Neutron interface is better, but is however far from ideal o Enhanced caching on the nova side can avoid a lot of round trips to neutron ● Little to nothing has been done for tracking async operation and resource status. For example: o there is no way to know whether DHCP info are ready for a port o security group updates are processed asynchronously, but it is impossible to know when processing completes
  • 25.
    Final thoughts ●“Much better” is different from “ideal” o ≅ 3 seconds for wiring an interface could not be ideal for many applications o scalability limits should be addressed even if they involve architectural changes ● What about data plane scalability? ● What about API usability?