OpenStack Nova Liberty
John Garbutt, Principal Engineer, Nova PTL
OpenStack Summit, Tokyo, October 2015
2
Nova’s Mission
3
to provide massively scalable ,
on demand, self service access
to compute resources
4
Priorities
• Good API with a Strong Ecosystem
• Robust and Reliable
• Live Upgrades and Scale out
• Maintain Open Culture
• Stop Scope Creep
5
External Server “HA”
Help
External
HA Tool
Disable
Host
Live-
migrate
Mark
host
down
Evacuate
6
Liberty Update
7
Architecture Evolution
• Ongoing Project
• Maintaining Stability and Increasing Velocity
• API v2.1
• Upgrades
• Scheduler and Resource Tracking
8
API Users
The Absent The Active Multi-Cloud Ops & Dev
• Cloud upgrades
• But old script
works
• Uses newest APIs
• Check availability
• Multiple clouds
• Different versions
• Single script
• Who is using
what?
• How to evolve
API?
9
API Evolution
v2.0 v2.1 Third Party APIs
• First API
• Alias for v1.1
• Base + Extensions
• Deprecated Legacy Code
• No Extensions
• Better Validation
• Backwards compatible
mode
• Evolve using
“Microversions”
• Replaced by External
Project
• Deprecated
10
Upgrade
• Data plane and Control plane independence
• Upgrade from:
– Last stable branch
– Previous commit in same cycle
• Existing Configuration “just work”
• Warning before Deprecating
11
Upgrade Architecture
API Nodes
Behind LB
Compute Compute Compute Compute Compute
Database
Message
Queue
Conductor(s)
Other Control
Nodes
oslo.versionedobjects RPC Version Pin
Schema and Data Migrations
Tests: partial-ncpu with Grenade
12
Reduced Control Plane Downtime
Update DB
Schema
Restart API
& Control
Plane
Restart
nova-
compute
Update RPC
Pins
13
Upgrade Process
API Nodes
Behind LB
Compute Compute Compute Compute Compute
Database
Message
Queue
Conductor(s)
Other Control
Nodes
1
2
3
4
??
??
15
Reducing Scope Creep
• http://docs.openstack.org/developer/nova/project_scope.html
• Containers
– Nova is better for VMs
– But does support LXC+libvirt and Ironic
• nova-docker
– Removed from master due to lack of testing
– Currently unmaintained
16
Liberty
Continued
Architecture
Evolution
Upgrades
API v2.1
Reducing
Scope
Creep
Over 60
Blueprints
Over 400
Bug fixes
17
Mitaka and Beyond
18
Cells
v1 v2
• Cells is optional
• Not all features supported
• Sync instance between DBs
• Default is one v2 cell
• New API database
• Tools to migrate from cells v1
API
Compute
Cell 1
Compute
Cell 2
Compute
Cell 3
19
More User Experience Focus
• API Documentation
• Progress Reporting and Error Handing
• Scheduler Improvements
• Feature Classification
20
Continued Process Evolution
• Mentoring and Explaining Why
• Product Working Group: Increases alignment
• Review bottleneck
– Focus existing review efforts
– Encourage more non-core reviews
• Releasing more often
5 M I L L I N G T O N R O A D | H AY E S , U N I T E D K I N G D O M U B 3 4 A Z
U S S A L E S : + 4 4 ( 0 ) 2 0 8 71 2 6 5 07 | U K S U P P O R T: 0 8 0 0 9 8 8 0 3 0 0 | W W W. R AC K S PAC E . C O M
© RACKSPACE LTD. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED S TATES AND OTHER COUNTRIES. |
WWW.RACKSPACE.COM
Thank you
@johnthetubaguy

OpenStack Nova Liberty Update

  • 1.
    OpenStack Nova Liberty JohnGarbutt, Principal Engineer, Nova PTL OpenStack Summit, Tokyo, October 2015
  • 2.
  • 3.
    3 to provide massivelyscalable , on demand, self service access to compute resources
  • 4.
    4 Priorities • Good APIwith a Strong Ecosystem • Robust and Reliable • Live Upgrades and Scale out • Maintain Open Culture • Stop Scope Creep
  • 5.
    5 External Server “HA” Help External HATool Disable Host Live- migrate Mark host down Evacuate
  • 6.
  • 7.
    7 Architecture Evolution • OngoingProject • Maintaining Stability and Increasing Velocity • API v2.1 • Upgrades • Scheduler and Resource Tracking
  • 8.
    8 API Users The AbsentThe Active Multi-Cloud Ops & Dev • Cloud upgrades • But old script works • Uses newest APIs • Check availability • Multiple clouds • Different versions • Single script • Who is using what? • How to evolve API?
  • 9.
    9 API Evolution v2.0 v2.1Third Party APIs • First API • Alias for v1.1 • Base + Extensions • Deprecated Legacy Code • No Extensions • Better Validation • Backwards compatible mode • Evolve using “Microversions” • Replaced by External Project • Deprecated
  • 10.
    10 Upgrade • Data planeand Control plane independence • Upgrade from: – Last stable branch – Previous commit in same cycle • Existing Configuration “just work” • Warning before Deprecating
  • 11.
    11 Upgrade Architecture API Nodes BehindLB Compute Compute Compute Compute Compute Database Message Queue Conductor(s) Other Control Nodes oslo.versionedobjects RPC Version Pin Schema and Data Migrations Tests: partial-ncpu with Grenade
  • 12.
    12 Reduced Control PlaneDowntime Update DB Schema Restart API & Control Plane Restart nova- compute Update RPC Pins
  • 13.
    13 Upgrade Process API Nodes BehindLB Compute Compute Compute Compute Compute Database Message Queue Conductor(s) Other Control Nodes 1 2 3 4 ?? ??
  • 14.
    15 Reducing Scope Creep •http://docs.openstack.org/developer/nova/project_scope.html • Containers – Nova is better for VMs – But does support LXC+libvirt and Ironic • nova-docker – Removed from master due to lack of testing – Currently unmaintained
  • 15.
  • 16.
  • 17.
    18 Cells v1 v2 • Cellsis optional • Not all features supported • Sync instance between DBs • Default is one v2 cell • New API database • Tools to migrate from cells v1 API Compute Cell 1 Compute Cell 2 Compute Cell 3
  • 18.
    19 More User ExperienceFocus • API Documentation • Progress Reporting and Error Handing • Scheduler Improvements • Feature Classification
  • 19.
    20 Continued Process Evolution •Mentoring and Explaining Why • Product Working Group: Increases alignment • Review bottleneck – Focus existing review efforts – Encourage more non-core reviews • Releasing more often
  • 20.
    5 M IL L I N G T O N R O A D | H AY E S , U N I T E D K I N G D O M U B 3 4 A Z U S S A L E S : + 4 4 ( 0 ) 2 0 8 71 2 6 5 07 | U K S U P P O R T: 0 8 0 0 9 8 8 0 3 0 0 | W W W. R AC K S PAC E . C O M © RACKSPACE LTD. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED S TATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM Thank you @johnthetubaguy

Editor's Notes

  • #3 Mission hasn’t changed. Lack of alignment is a big cause of friction.
  • #4 Lack of alignment around the mission causes friction. Between developers, and between developers and users.
  • #5 https://upload.wikimedia.org/wikipedia/commons/7/76/Blue_Linckia_Starfish.JPG https://images.unsplash.com/photo-1431794062232-2a99a5431c6c?q=80&fm=jpg&s=2a0c6cb067ffaef134e053d94f555d91 To get the strong ecosystem, API needs to be interoperable and useful.
  • #6 Pet VMs want Server “HA” Out of Scope for Nova, but we have work to add supporting APIs.
  • #8 https://upload.wikimedia.org/wikipedia/commons/7/71/Florence_dome_Cigoli_drawing_colour_corrected.jpg API work is part of a wider effort. Look at what our users need, and work out how to achieve it.
  • #9 Lets take a look at our users, and what they want. Reference: https://dague.net/2015/06/05/the-nova-api-in-kilo-and-beyond-2/
  • #11 https://upload.wikimedia.org/wikipedia/commons/7/78/Airforce_forklift.jpg https://images.unsplash.com/photo-1429497419816-9ca5cfb4571a?q=80&fm=jpg&s=4bf1164d23eea4f04aeefe1732149cf3 This talk will focus on the control plane
  • #12 Its been a long road, reviews have to check every patch set for upgrade issues. Structure: API, Compute, Conductor, Scheduler, etc RPC: Messages and Data in them. New nodes able to send old messages to old nodes. New nodes accept old messages from old nodes. RPC version pin lets us pick what version of messages to send, so we can work with old nodes. Olso.versionedobjects: Strongly typed fields, and versioning, desired schema, independent of DB Used in RPC messages, instead of dicts, not included in RPC version Nova-conductor: always the newest version Isolates nova-compute from the DB Now able to backport objects when new code talks to old code DB: Separated Data migrations (in objects) and Schema migrations Many data migrations happen gradually, with a nova-managed to force completion, in small chunks Trying to automate expand/messy bits/contract migrations, but currently attempted by hand. Tests: nova partial-ncpu job that upgrades all the control services but leaves the compute service on the older code.
  • #13 http://www.danplanet.com/blog/2015/06/26/upgrading-nova-to-kilo-with-minimal-downtime/ Aim: zero downtime. Note: no rollback
  • #14 (1) Expand DB, checks all data migrations are complete, removes any cruft from previous releases (2) Pin RPC, upgrade all the control plane together, but conductor first (3) Talk about graceful compute shutdown, and its limitations (4) Un pin RPC (??) can’t I do the API last? (??) what is all this non-sense about RPC versions?
  • #15 Summary of new upgrade strategy
  • #16 Something about what we have to stop doing, to make time for all the good stuff. Note the heat project was created because it was considered out of scope for Nova. Searchlight is very similar, in some ways.