How Adobe Has Built An OpenStack
CloudJun Park (Ph.D, MBA), Solutions Architect At Adobe
Arghya Banerjee, Sr. Systems Engineer At Adobe
OpenStack Utah Meetup, Sept 24
Swiss Cheese Model
2
From
Wikipedia
If aligned, flaws would allow an accident to occur
Flaws In Defense layers
Two More Factors That Complicate
Things
3
SpaceTime Continuum
- Einstein
Interactions,
Higgs Field & Boson
From
Wikipedia
From
Youtube
Our Template
4
Time
Components
Dependencies
OpenStack Survey, May 2015
5
Adobe OpenStack Architecture
6
VM1 VM2
eth0 eth1 eth0 eth1
VM3
eth0 eth1
Private Networks: VxLAN-based
External Provider Networks: VLAN-based
Adobe Network Firewall
Adobe Corporate Networks
Storage: Ceph
RBD
Adobe OpenStack Architecture
7
VM1
eth0 eth1
External Provider Networks: VLAN-based
Adobe Network Firewall
Adobe Corporate Networks
Linux Bridge
OpenvSwitc
h
bond0
Physical
VLANs
Set of Images
Copy-On-Write (COW)
Ceph Volume
Base Volume For
All Three VMs
Individual COW
Volumes
Volume Management in OpenStack
2. Snapshot
3. Volumes
1. Copy
Live Demo
9
Possible Combinations
10
Containers VMsBare Metals
Containers
In ContainersVMs
Mesos Cluster Via Heat
11
VM1: mesos master
VM2: mesos slave1 VM3: mesos slave2
http server http server
Host1 Host2 Host3
-> Ubuntu-mesos image
available via diskimage-builder
-> Post configuration for master
-> starting services
-> Ubuntu-mesos image
-> Post configuration for slave
using mesos master IP.
-> starting services
Mesos Cluster with Marathon
12
Marathon
Mesos Slave2
http server
Mesos Master
With
ZooKeeper
Request to run a micro-service
via REST API
Mesos Slave1
http server
Ebay’s CI Approach With Mesos
13
Marathon
Mesos Slave2
Jenkins Slaves
Mesos Master
With
ZooKeeper
Create Jenkins Master
via REST API
Mesos Slave1
Jenkins Master
Create Jenkins Slaves
via API
1
2
3
4
6
7
5
Takeaways From Mesos Demo
 Flexible & Powerful
 No External Dependencies
 Towards Maximizing Efficiency and Productivity
 Good Hints for Better Services? Murano, Magnum,
and so on…
14
Heat Templates In Magnum
15
Time
Components
Dependencies
What Happened At
Networking?
16
May ‘15Jul ‘14Apr ‘14
Ubuntu 14.04
Trusty Released
With OVS 2.0.1
Bug Report
With OVS 2.0.1
In Ubuntu 14.04
Cherry-Pick
On OVS 2.0.2
In Ubuntu 14.04.2
Ubuntu
14.04
OpenvSwitc
h
(OVS)
Bug Fix
In all OVS 2.x
Jun ‘13
This Bug
Introduced with
OVS Mega Flow
Aug ‘14
OVS 2.3.0
OVS 2.1.3
OVS 2.0.2
Released
A New Bug: OVS Sporadically Crashes In Adding A Port
(https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1336555 and 1449012)
OVS 2.0.1
Released:
Mega Flow
Multiprocessing
Dec ‘13
Enhancement Patch
Not Yet Integrated
(e.g., 270 secs to 3 secs
For 25K rules)
Neutron
Security Group
O(N^2) Issue
Restarting agents
re-establishes entire flows
Fix ready, not added
What Happened At
Networking?
17
May ‘15Nov ‘14
Cherry-Pick
Onto OVS 2.0.2
In Ubuntu 14.04Ubuntu
14.04
OpenStack
Summits
A New Bug: OVS Sporadically Crashes In Adding A Port
(https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1336555 and 1449012)
OVS 2.0.1
Released:
Mega Flow
Multiprocessing
Dec ‘13
OVS
Paris
Juno
Vancouver
Kilo
• Some companies reverted OVS to LinuxBridge
• Some pundits spread FUD about Neutron!
Atlanta
IceHouse
May ‘14Apr ‘14
Ubuntu 14.04
Trusty Released
With OVS 2.0.1
What Happened At Storage?
18
July ‘15Apr ‘14
Ubuntu 14.04
Trusty Released
With Ceph FireFly 0.79
Ubuntu 14.04 Updates
With Ceph FireFly 0.80.10
Ubuntu
14.04
Ceph
Failover Instability
With FireFly
Hammer?
Ceph Operational Instability,
Cinder Scalability Issue
Enhancement Solution
Not Yet Integrated
(e.g., APIs Stacked Up ->
Multiprocessing)
Cinder
Cinder is stuck
when Ceph is stuck
(e.g., use local drive
for copying an image)
May ‘14
What Happened At Data
Node?
19
July ‘15Apr ‘14
Ubuntu 14.04
Trusty Released
With Kernel…
Ubuntu
14.04
Kernel
XFS
Deadlock
Bug
Kernel Memory Bug,
Security Issue
Security PatchKVM Security Issue
May ‘14 Nov‘14
Bug Fix
Dec‘13
Ubuntu 14.04
Trusty Released
With Kernel…
May ‘15
Our Workarounds
 Networks
 Understand OVS and find stable OVS
 Cherry-pick for Neutron Scalability: firewall rules
 Our own out-of-band rate limiting on networks, e.g., 200
Mbps
 Set up right MTU size on OVS structure
 Turn off GRO/LRO on hosts
 Storage
 Cinder Scalability
 Ceph Stability: Hammer, reconfigure towards optimal
20
How To Test at Scale
 Emulate future production env
 Create hundreds of VMs, inject workloads, and destroy all
 Recycle this entire test over and over again
 Findings: dead tokens stacked up
 Each component scalability
 Neutron: OVS
 Cinder: Ceph
 Nova: KVM
21
Have We Done Enough?
4?
3?
23
It's not that I'm so smart, it's just
that I stay with problems longer.
- Albert Einstein
New Efforts In OpenStack
 OpenStack Product Working Group
 Link up between contributors and users
 Governance/DefCoreCommittee
 Defining OpenStack Core
 Large Deployment Team
 Operational issues for large delpoyments
 Open Virtual Network (OVN)
 In-kernel Conntrack, DPDK, etc. Will run atop OVS
24
Milestone
 Murano
 Application Catalog service: CloudFoundry, Kubernetes,
Jenkins, Tomcat, etc.
 Magnum
 Docker Swarm, Kubernetes, and Mesos (for our live
demo)
 Advanced Networking
 DVR, Load Balancer, IPv6
25
How Adobe Built An OpenStack Cloud

How Adobe Built An OpenStack Cloud

  • 1.
    How Adobe HasBuilt An OpenStack CloudJun Park (Ph.D, MBA), Solutions Architect At Adobe Arghya Banerjee, Sr. Systems Engineer At Adobe OpenStack Utah Meetup, Sept 24
  • 2.
    Swiss Cheese Model 2 From Wikipedia Ifaligned, flaws would allow an accident to occur Flaws In Defense layers
  • 3.
    Two More FactorsThat Complicate Things 3 SpaceTime Continuum - Einstein Interactions, Higgs Field & Boson From Wikipedia From Youtube
  • 4.
  • 5.
  • 6.
    Adobe OpenStack Architecture 6 VM1VM2 eth0 eth1 eth0 eth1 VM3 eth0 eth1 Private Networks: VxLAN-based External Provider Networks: VLAN-based Adobe Network Firewall Adobe Corporate Networks Storage: Ceph RBD
  • 7.
    Adobe OpenStack Architecture 7 VM1 eth0eth1 External Provider Networks: VLAN-based Adobe Network Firewall Adobe Corporate Networks Linux Bridge OpenvSwitc h bond0 Physical VLANs
  • 8.
    Set of Images Copy-On-Write(COW) Ceph Volume Base Volume For All Three VMs Individual COW Volumes Volume Management in OpenStack 2. Snapshot 3. Volumes 1. Copy
  • 9.
  • 10.
    Possible Combinations 10 Containers VMsBareMetals Containers In ContainersVMs
  • 11.
    Mesos Cluster ViaHeat 11 VM1: mesos master VM2: mesos slave1 VM3: mesos slave2 http server http server Host1 Host2 Host3 -> Ubuntu-mesos image available via diskimage-builder -> Post configuration for master -> starting services -> Ubuntu-mesos image -> Post configuration for slave using mesos master IP. -> starting services
  • 12.
    Mesos Cluster withMarathon 12 Marathon Mesos Slave2 http server Mesos Master With ZooKeeper Request to run a micro-service via REST API Mesos Slave1 http server
  • 13.
    Ebay’s CI ApproachWith Mesos 13 Marathon Mesos Slave2 Jenkins Slaves Mesos Master With ZooKeeper Create Jenkins Master via REST API Mesos Slave1 Jenkins Master Create Jenkins Slaves via API 1 2 3 4 6 7 5
  • 14.
    Takeaways From MesosDemo  Flexible & Powerful  No External Dependencies  Towards Maximizing Efficiency and Productivity  Good Hints for Better Services? Murano, Magnum, and so on… 14
  • 15.
    Heat Templates InMagnum 15 Time Components Dependencies
  • 16.
    What Happened At Networking? 16 May‘15Jul ‘14Apr ‘14 Ubuntu 14.04 Trusty Released With OVS 2.0.1 Bug Report With OVS 2.0.1 In Ubuntu 14.04 Cherry-Pick On OVS 2.0.2 In Ubuntu 14.04.2 Ubuntu 14.04 OpenvSwitc h (OVS) Bug Fix In all OVS 2.x Jun ‘13 This Bug Introduced with OVS Mega Flow Aug ‘14 OVS 2.3.0 OVS 2.1.3 OVS 2.0.2 Released A New Bug: OVS Sporadically Crashes In Adding A Port (https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1336555 and 1449012) OVS 2.0.1 Released: Mega Flow Multiprocessing Dec ‘13 Enhancement Patch Not Yet Integrated (e.g., 270 secs to 3 secs For 25K rules) Neutron Security Group O(N^2) Issue Restarting agents re-establishes entire flows Fix ready, not added
  • 17.
    What Happened At Networking? 17 May‘15Nov ‘14 Cherry-Pick Onto OVS 2.0.2 In Ubuntu 14.04Ubuntu 14.04 OpenStack Summits A New Bug: OVS Sporadically Crashes In Adding A Port (https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1336555 and 1449012) OVS 2.0.1 Released: Mega Flow Multiprocessing Dec ‘13 OVS Paris Juno Vancouver Kilo • Some companies reverted OVS to LinuxBridge • Some pundits spread FUD about Neutron! Atlanta IceHouse May ‘14Apr ‘14 Ubuntu 14.04 Trusty Released With OVS 2.0.1
  • 18.
    What Happened AtStorage? 18 July ‘15Apr ‘14 Ubuntu 14.04 Trusty Released With Ceph FireFly 0.79 Ubuntu 14.04 Updates With Ceph FireFly 0.80.10 Ubuntu 14.04 Ceph Failover Instability With FireFly Hammer? Ceph Operational Instability, Cinder Scalability Issue Enhancement Solution Not Yet Integrated (e.g., APIs Stacked Up -> Multiprocessing) Cinder Cinder is stuck when Ceph is stuck (e.g., use local drive for copying an image) May ‘14
  • 19.
    What Happened AtData Node? 19 July ‘15Apr ‘14 Ubuntu 14.04 Trusty Released With Kernel… Ubuntu 14.04 Kernel XFS Deadlock Bug Kernel Memory Bug, Security Issue Security PatchKVM Security Issue May ‘14 Nov‘14 Bug Fix Dec‘13 Ubuntu 14.04 Trusty Released With Kernel… May ‘15
  • 20.
    Our Workarounds  Networks Understand OVS and find stable OVS  Cherry-pick for Neutron Scalability: firewall rules  Our own out-of-band rate limiting on networks, e.g., 200 Mbps  Set up right MTU size on OVS structure  Turn off GRO/LRO on hosts  Storage  Cinder Scalability  Ceph Stability: Hammer, reconfigure towards optimal 20
  • 21.
    How To Testat Scale  Emulate future production env  Create hundreds of VMs, inject workloads, and destroy all  Recycle this entire test over and over again  Findings: dead tokens stacked up  Each component scalability  Neutron: OVS  Cinder: Ceph  Nova: KVM 21
  • 22.
    Have We DoneEnough? 4? 3?
  • 23.
    23 It's not thatI'm so smart, it's just that I stay with problems longer. - Albert Einstein
  • 24.
    New Efforts InOpenStack  OpenStack Product Working Group  Link up between contributors and users  Governance/DefCoreCommittee  Defining OpenStack Core  Large Deployment Team  Operational issues for large delpoyments  Open Virtual Network (OVN)  In-kernel Conntrack, DPDK, etc. Will run atop OVS 24
  • 25.
    Milestone  Murano  ApplicationCatalog service: CloudFoundry, Kubernetes, Jenkins, Tomcat, etc.  Magnum  Docker Swarm, Kubernetes, and Mesos (for our live demo)  Advanced Networking  DVR, Load Balancer, IPv6 25