OpenContrail
deployment
experience
at Cloudwatt
About me
● Network engineer since 2006
● Working on OpenStack since the beginning
2010
● Working on OpenContrail since a year as a
developer and integrator
Cloudwatt IaaS
● French public cloud provider
● 3 years experience with OpenStack
● 1 year experience with OpenContrail
○ 1 data center
■ 200 compute nodes
■ 3 peta of raw swift storage
○ OpenStack IceHouse release
Contrail in Cloudwatt
● Started with Contrail release 1.06 in June
2014
● Run onto a Cisco Nexus fabricpath
● Terminate l2vpn tunnel with two Juniper MX
Contrail in Cloudwatt
Contrail logical view
Config
Neutron API
Analytics
Control
IF-MAP
vrouter vrouter vrouter
Contrail in Cloudwatt
● 2 Neutron API: neutron server with Contrail
plugin
● 2 config nodes: discovery, API, SVC
monitor, schema, IF-MAP server
● 2 control nodes
● 2 analytics nodes
● 2 webUI nodes
Contrail in Cloudwatt
Config Config
Neutron API Neutron API
Analytics Analytics
Control Control
vrouter vrouter vrouter
IF-MAP
IF-MAP
WebUI
WebUI
XMPP
Contrail in Cloudwatt
● Load balancing front of APIs and WebUI
● 2 Cassandra clusters of 3 nodes each
● RabbitMQ cluster of 2 nodes
● Cluster Zookeeper compose of 3 nodes
Contrail in Cloudwatt
Config Config
Neutron API Neutron API
Analytics Analytics
Control Control
vrouter vrouter vrouter
IF-MAP
XMPP
Cassandra
Cassandra
AMQP +
ZK
IF-MAP
WebUI
WebUI
Issue on 1.06
● Difficulty to operate it and upgrade/maintain
it without down time
● Stabilize/compatibility Neutron to Contrail
translator API
● Analytics does not work
● Some memories leak on the compute node
Upgrade to 1.10
● After nine month with 1.06
● New version to fix issues and bring new
features (SNAT/LBaaS)
● Following the upstream
Upgrade to 1.10
Create a tool to monitor the contrail cluster status
Upgrade to 1.10
We deviced to do it in 2 steps:
1. Control plane (in a night)
○ Config (slave schema before)
○ Control
○ Analytics
○ WebUI
○ Neutron API
Upgrade to 1.10
2. Data plane (during few days)
○ upgrade/bootstrap spare compute node in 1.10 and
add them in the available compute pools
○ remove all running 1.06 compute nodes to the
available pool
○ let a time slot to clients on that 1.06 nodes to move
their VM before upgrade that node to 1.10 (no live
migration)
○ then open champagne bottles!
Bug met during the upgrade
● vrouter 1.06 cannot live with 1.10 with MPLSoUDP
encapsulation => pass to MPLSoGRE during the
cohabitation
● SNAT/LBaaS stuff does not take care of the vrouter
version
● Slow all the contrail API due to the move of the Neutron
Contrail plugin code from neutron-server to Contrail API
● Zookeeper timeout
Bug met after upgrade
● Data kernel module path memory leak
● Data kernel module path hold flows count
leak (workaround: restart the vrouter agent)
● 13 Cloudwatt patches added to the 1.10
upstream release:
https://review.opencontrail.org/#/q/status:
open+branch:R1.10,n,z
Bug still persist on 1.10
● Schema slave->master ~20 mins
● Logging stuff configuration
● Some 5xx error still appears on the Contrail
API
● Live upgrade a compute node without
downtime (do we need it?)
My wishlist to Santa SDN
● That people use more https://blueprints.
launchpad.net/opencontrail
● Stable master before pulling new branch
● Use http://semver.org to number releases
● The Contrail team to be more community
oriented
2015S2 todo
● Improve Neutron Contrail plugin code
https://review.opencontrail.org/10123
● Upgrade to 2.x branch
● Build a CI/CD on master
○ build and deploy daily
○ run opencontrail sanity
○ run functional no-reg
○ run performance no-reg
● OpenStack L3VPN integration
Questions ?

OpenContrail Cloudwatt Feedback

  • 1.
  • 2.
    About me ● Networkengineer since 2006 ● Working on OpenStack since the beginning 2010 ● Working on OpenContrail since a year as a developer and integrator
  • 3.
    Cloudwatt IaaS ● Frenchpublic cloud provider ● 3 years experience with OpenStack ● 1 year experience with OpenContrail ○ 1 data center ■ 200 compute nodes ■ 3 peta of raw swift storage ○ OpenStack IceHouse release
  • 4.
    Contrail in Cloudwatt ●Started with Contrail release 1.06 in June 2014 ● Run onto a Cisco Nexus fabricpath ● Terminate l2vpn tunnel with two Juniper MX
  • 5.
  • 6.
    Contrail logical view Config NeutronAPI Analytics Control IF-MAP vrouter vrouter vrouter
  • 7.
    Contrail in Cloudwatt ●2 Neutron API: neutron server with Contrail plugin ● 2 config nodes: discovery, API, SVC monitor, schema, IF-MAP server ● 2 control nodes ● 2 analytics nodes ● 2 webUI nodes
  • 8.
    Contrail in Cloudwatt ConfigConfig Neutron API Neutron API Analytics Analytics Control Control vrouter vrouter vrouter IF-MAP IF-MAP WebUI WebUI XMPP
  • 9.
    Contrail in Cloudwatt ●Load balancing front of APIs and WebUI ● 2 Cassandra clusters of 3 nodes each ● RabbitMQ cluster of 2 nodes ● Cluster Zookeeper compose of 3 nodes
  • 10.
    Contrail in Cloudwatt ConfigConfig Neutron API Neutron API Analytics Analytics Control Control vrouter vrouter vrouter IF-MAP XMPP Cassandra Cassandra AMQP + ZK IF-MAP WebUI WebUI
  • 11.
    Issue on 1.06 ●Difficulty to operate it and upgrade/maintain it without down time ● Stabilize/compatibility Neutron to Contrail translator API ● Analytics does not work ● Some memories leak on the compute node
  • 12.
    Upgrade to 1.10 ●After nine month with 1.06 ● New version to fix issues and bring new features (SNAT/LBaaS) ● Following the upstream
  • 13.
    Upgrade to 1.10 Createa tool to monitor the contrail cluster status
  • 14.
    Upgrade to 1.10 Wedeviced to do it in 2 steps: 1. Control plane (in a night) ○ Config (slave schema before) ○ Control ○ Analytics ○ WebUI ○ Neutron API
  • 15.
    Upgrade to 1.10 2.Data plane (during few days) ○ upgrade/bootstrap spare compute node in 1.10 and add them in the available compute pools ○ remove all running 1.06 compute nodes to the available pool ○ let a time slot to clients on that 1.06 nodes to move their VM before upgrade that node to 1.10 (no live migration) ○ then open champagne bottles!
  • 16.
    Bug met duringthe upgrade ● vrouter 1.06 cannot live with 1.10 with MPLSoUDP encapsulation => pass to MPLSoGRE during the cohabitation ● SNAT/LBaaS stuff does not take care of the vrouter version ● Slow all the contrail API due to the move of the Neutron Contrail plugin code from neutron-server to Contrail API ● Zookeeper timeout
  • 17.
    Bug met afterupgrade ● Data kernel module path memory leak ● Data kernel module path hold flows count leak (workaround: restart the vrouter agent) ● 13 Cloudwatt patches added to the 1.10 upstream release: https://review.opencontrail.org/#/q/status: open+branch:R1.10,n,z
  • 18.
    Bug still persiston 1.10 ● Schema slave->master ~20 mins ● Logging stuff configuration ● Some 5xx error still appears on the Contrail API ● Live upgrade a compute node without downtime (do we need it?)
  • 19.
    My wishlist toSanta SDN ● That people use more https://blueprints. launchpad.net/opencontrail ● Stable master before pulling new branch ● Use http://semver.org to number releases ● The Contrail team to be more community oriented
  • 20.
    2015S2 todo ● ImproveNeutron Contrail plugin code https://review.opencontrail.org/10123 ● Upgrade to 2.x branch ● Build a CI/CD on master ○ build and deploy daily ○ run opencontrail sanity ○ run functional no-reg ○ run performance no-reg ● OpenStack L3VPN integration
  • 21.