Nova states summit
Upcoming SlideShare
Loading in...5
Like this presentation? Why not share!

Nova states summit






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Nova states summit Nova states summit Presentation Transcript

  • Moving to structured statemanagement in OpenStack Yahoo! and NTT Data
  • Deployer use cases• As a deployer I want to ensure that an instance is reserved & provisioned without falling back and/or reporting to users internal OpenStack errors.• As a deployer I want to be able to allocate, schedule and reserve resources before they are consumed so that I can make advanced/complex/custom scheduling decisions using the combination of those resources as a whole.• I want to convey to my users that OpenStack is a reliable and dependable system that is resilient to API outages, resource failures…
  • Developer use cases• I want to be able to add new (and improved!) states to OpenStack and know what the impacts will be on the other states in OpenStack in a easy to understand manner.• I want to be able to undo (and redo) resource allocation decisions in a transactional and verifiably correct manner on errors or on other ‘smart’ algorithmic placement logic.• I want to be able to quickly and easily understand an API request from start to finish & I want other developers to have a single place to understand the same.
  • User use cases• I want to ensure that my instances are reliably brought up without involving myself to resolve (or raise to support) errors inside of OpenStack.• I want to ensure that my instances (and associated resources) are optimally scheduled in a reliable and correct manner or not have them scheduled to begin with.• I want my resources to be fully utilized, and not have zombie resources being ‘locked’ due to the lack of transactional semantics (and recovery) in the underlying code.
  • The problem• Hard to [follow, recover from, debug, ensure reliability, correctness, extend, audit…] ad-hoc distributed state transitions. – Created by continual placement of new features without revisiting the underlying state management system. • The never ending battle between new hotness vs. stability – Majority of focus (understandably) on getting OpenStack operational. – Typical technical debt. • Acceptable for a new project like OpenStack to get off the ground, but now is the time to focus on features that add stability/scalability...
  • The problem• Inter-state ‘cutting’ results in instances which require manual or periodic tasks to recover. – Distributed systems should always be able to automatically recover from failures, and not require manual/periodic intervention.• Continually adding local [solutions,fixes,patches] • Lack of [focus,time,desire] to fix the system as a whole?• How many inter-state race conditions are hiding underneath the covers?? – Can verification even be done with the current codebase (in a reasonable time period)?
  • CREATE SERVER API (admin/user) 1 4 10,14 nova-request nova-api MySQL 16 compute 2 8 5 11 13 9 15keystone 3 RabbitMQ Libvirt 6 nova- scheduler 7 Volume glance Service 12 Network Service
  • Create Server - Transitions and StatesID Service Operation vm_state task_state power_state1 Nova API Initial State - - -2 Keystone Authenticate user - - -3 Nova API/Glance Show image - - -4 Nova API/MySQL Create entry BUILDING SCHEDULING -5 Nova API/RabbitMQ Cast to Scheduler BUILDING SCHEDULING -6 Scheduler Received at Scheduler BUILDING SCHEDULING -7 Scheduler/RabbitMQ Cast to Compute BUILDING SCHEDULING -8 Compute Received at Compute BUILDING SCHEDULING -9 Compute/Glance Show image BUILDING SCHEDULING -10 Compute/MySQL Update DB BUILDING NETWORKING -11 Compute/RabbitMQ Call on Network BUILDING NETWORKING -12 Network Allocate Network BUILDING NETWORKING -13 Compute/Volume Attach volume BUILDING BLOCK_DEVICE_ - MAPPING14 Compute/MySQL Update DB BUILDING SPAWNING -15 Compute/Libvirt Spawn instance BUILDING SPAWNING -16 Compute/MySQL Update DB ACTIVE None RUNNING
  • What happens if we cut here?? Or here??Or here??
  • Solutions solutions solutions• Nova has mostly stabilized (code-wise) – It appears to be a good time to rethink some of the foundations. And rework some of the foundations (with as minimal of an impact as we can) – Eventually as other core components (quantum) stabilize similar analysis can be done there (if needed)• Prototyping a potential solution and discuss with community on next steps. – That’s why we are here folks 
  • Create request without orchestration
  • Create request with orchestration
  • Key Benefits• Less scattering of state management – Makes it easier to understand…• Less scattering of recovery scenarios – Clearly defined rollbacks…• Faster and more dependable resource acquisition – Compute node will perform initialization and final acquisition of resources. – Reservations and initial acquisitions will be done before request to provision instances, hence faster VM spawns.• Scheduler can be make better ‘overall’ scheduling decisions. – Ex. no need for compute <-> scheduler retry hacks – Can make advanced scheduling decisions based on volume choices, locality, network choices... When you are able to acquire/release resources before there use, anything is possible… – No more need for hinting...• Creates a single place where others can extend or alter nova state transitions to plug-in there own ‘custom/internal’ state transitions.