Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Learning to Scale OpenStack: An Update from the Rackspace Public Cloud

1,960 views

Published on

At the Portland and Hong Kong Summits, Rackspace invited the OpenStack community into the their experiences deploying OpenStack trunk to their their Public Cloud Infrastructure. In this presentation, Rackspace's Deployment System Team will provide an update on the latest challenges, triumphs, and lessons learned deploying and operating a production OpenStack public cloud during the Icehouse cycle. We’ll conclude by sharing the vision for our next steps in OpenStack deployments during the Juno cycle and beyond.

Published in: Internet, Technology, Business
  • Be the first to comment

Learning to Scale OpenStack: An Update from the Rackspace Public Cloud

  1. 1. An update from the Rackspace Public Cloud Learning to Scale Openstack Rainya Mosher and Jesse Keating – Deployment Engineering @rainyamosher @iamjkeating
  2. 2. #rackstackatl The Rackspace Public Cloud 6 Public Regions 3 Pre-Production Regions 10s of Thousands of nodes Growing continually Frequent deployments Staying aligned with upstream #rackstackatl
  3. 3. #rackstackatl • We could not deploy code in a reasonable window of time • We did not have confidence in the code we were deploying • We could not keep up with upstream Our Old Challenges
  4. 4. #rackstackatl • Deploys taking 6+ hours • Deploys often failed the first time • Migrations were an unknown factor • Deploys roughly 2 months behind upstream Old Challenges Met • Deploys take an hour, as short as 10 minutes • Deploys rarely fail the first time • Migrations tested upstream and timed downstream • Still up to 2 months behind
  5. 5. #rackstackatl It is by riding a bicycle that you learn the contours of a country best, since you have to sweat up the hills and coast down them. ~ Ernest Hemingway
  6. 6. #rackstackatl • Scaling Services • Scaling Deployments • Scaling Frequency Our New Challenges
  7. 7. #rackstackatl Scaling Services #rackstackatl
  8. 8. #rackstackatl Scaling Glance • Scheduled Images feature went live • Glance saw much more usage • Glance servers became saturated • Builds and snapshots slowed down, eventually piling up faster than could be consumed • Resolved by: – Scaling number of glance-api nodes – Scaling size of glance-api nodes – Scaling use of glance-bypass feature
  9. 9. #rackstackatl Scaling Nova Cells • Performance Cells went live • More and more cells added to regions • Nova cells service became single funnel slowing down the exchange of data • Eventually our single nova-cells service could not consume messages faster than they were being produced • Resolved by: – Scaling number of nova-cells services – Optimizing instance healing calls – Optimizing database usage from cells service
  10. 10. #rackstackatl How do we anticipate where our growth will hurt and proactively scale to match?
  11. 11. #rackstackatl Scaling Deployments #rackstackatl
  12. 12. #rackstackatl Higher Form Orchestration • Pre-staging content outside of deploy window • Increased tolerance of “downed” hosts • Targeted bring up of services – API first, then computes • More deployment options – Factonly – Cellonly – No migrations • Reduced complexity – Single entry point: bin/deploy – Single orchestration system: Ansible
  13. 13. #rackstackatl We still treat OpenStack as a legacy software deployment. As a community we need to treat it more like a cloud application, but that requires collaboration!
  14. 14. #rackstackatl Scaling Frequency #rackstackatl
  15. 15. #rackstackatl It never gets easier, you just go faster. ~ Greg LeMond
  16. 16. #rackstackatl Scaling Change • New features coming • New configurations coming • Accommodate without interrupting customer experience • Change faster, change frequently, on an ever growing fleet of systems • Resolved by: – Understanding change before it happens – Scheduling changes to not conflict – Dedicating release iterations to risky change on top of known good code – Custom deploy modes per change type
  17. 17. #rackstackatl Customer Experience is our most important measurement of how fast we can scale.
  18. 18. #rackstackatl Object Placeholder The Next Iteration
  19. 19. #rackstackatl • Leverage object model in Icehouse for mixed- version services • Implement Nova conductor service • Investigate read-only states Zero Perceived Downtime
  20. 20. #rackstackatl • Can we give Glance it's own pipeline and deployment capability, independent of Nova or other services? • How do we combat the exponential growth of service version combinations? • Does this actually make the whole pipeline any faster? Individual Service Deployment Pipelines
  21. 21. #rackstackatl • Creating not just ephemeral environments, but production ones as well • Upgrades are easy, initial setups are a lot harder • Validation is critical • Developers and Operators need to collaborate on this use case when services are being designed Fully Automated Environments
  22. 22. #rackstackatl I have always struggled to achieve excellence. One thing that cycling has taught me is that if you can achieve something without a struggle it's not going to be satisfying. ~ Greg LeMond
  23. 23. #rackstackatl RACKSPACE® HOSTING | 5000 WALZEM ROAD | SAN ANTONIO, TX 78218 US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM RACKSPACE® HOSTING | © RACKSPACE US, INC. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COMRACKSPACE® HOSTING | © RACKSPACE US, INC. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM

×