Learning to Scale OpenStack: An Update from the Rackspace Public Cloud

An update from the Rackspace Public Cloud
Learning to Scale Openstack
Rainya Mosher and Jesse Keating – Deployment Engineering
@rainyamosher @iamjkeating

#rackstackatl
The Rackspace
Public Cloud
6 Public Regions
3 Pre-Production Regions
10s of Thousands of nodes
Growing continually
Frequent deployments
Staying aligned with upstream
#rackstackatl

#rackstackatl
• We could not deploy code in a reasonable
window of time
• We did not have confidence in the code we
were deploying
• We could not keep up with upstream
Our Old Challenges

#rackstackatl
• Deploys taking 6+ hours
• Deploys often failed the first time
• Migrations were an unknown factor
• Deploys roughly 2 months behind upstream
Old Challenges Met
• Deploys take an hour, as short as 10 minutes
• Deploys rarely fail the first time
• Migrations tested upstream and timed downstream
• Still up to 2 months behind

#rackstackatl
It is by riding a bicycle that you learn
the contours of a country best, since
you have to sweat up the hills and
coast down them.
~ Ernest Hemingway

#rackstackatl
• Scaling Services
• Scaling Deployments
• Scaling Frequency
Our New Challenges

#rackstackatl
Scaling Services
#rackstackatl

#rackstackatl
Scaling Glance
• Scheduled Images feature went live
• Glance saw much more usage
• Glance servers became saturated
• Builds and snapshots slowed down, eventually piling
up faster than could be consumed
• Resolved by:
– Scaling number of glance-api nodes
– Scaling size of glance-api nodes
– Scaling use of glance-bypass feature

#rackstackatl
Scaling Nova Cells
• Performance Cells went live
• More and more cells added to regions
• Nova cells service became single funnel slowing
down the exchange of data
• Eventually our single nova-cells service could not
consume messages faster than they were being
produced
• Resolved by:
– Scaling number of nova-cells services
– Optimizing instance healing calls
– Optimizing database usage from cells service

#rackstackatl
How do we anticipate where our
growth will hurt and proactively scale
to match?

#rackstackatl
Scaling
Deployments
#rackstackatl

#rackstackatl
Higher Form Orchestration
• Pre-staging content outside of deploy window
• Increased tolerance of “downed” hosts
• Targeted bring up of services
– API first, then computes
• More deployment options
– Factonly
– Cellonly
– No migrations
• Reduced complexity
– Single entry point: bin/deploy
– Single orchestration system: Ansible

#rackstackatl
We still treat OpenStack as a legacy
software deployment. As a community
we need to treat it more like a cloud
application, but that requires
collaboration!

#rackstackatl
Scaling
Frequency
#rackstackatl

#rackstackatl
It never gets easier, you just go
faster.
~ Greg LeMond

#rackstackatl
Scaling Change
• New features coming
• New configurations coming
• Accommodate without interrupting customer
experience
• Change faster, change frequently, on an ever
growing fleet of systems
• Resolved by:
– Understanding change before it happens
– Scheduling changes to not conflict
– Dedicating release iterations to risky change
on top of known good code
– Custom deploy modes per change type

#rackstackatl
Customer Experience is our most
important measurement of how fast
we can scale.

#rackstackatl
Object Placeholder
The Next Iteration

#rackstackatl
• Leverage object model in Icehouse for mixed-
version services
• Implement Nova conductor service
• Investigate read-only states
Zero Perceived Downtime

#rackstackatl
• Can we give Glance it's own pipeline and
deployment capability, independent of Nova or
other services?
• How do we combat the exponential growth of
service version combinations?
• Does this actually make the whole pipeline
any faster?
Individual Service
Deployment Pipelines

#rackstackatl
• Creating not just ephemeral environments, but
production ones as well
• Upgrades are easy, initial setups are a lot
harder
• Validation is critical
• Developers and Operators need to collaborate
on this use case when services are being
designed
Fully Automated
Environments

#rackstackatl
I have always struggled to achieve
excellence. One thing that cycling
has taught me is that if you can
achieve something without a struggle
it's not going to be satisfying.
~ Greg LeMond

#rackstackatl
RACKSPACE® HOSTING | 5000 WALZEM ROAD | SAN ANTONIO, TX 78218
US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COM
RACKSPACE® HOSTING | © RACKSPACE US, INC. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COMRACKSPACE® HOSTING | © RACKSPACE US, INC. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN THE UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM

Learning to Scale OpenStack: An Update from the Rackspace Public Cloud

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (19)

Similar to Learning to Scale OpenStack: An Update from the Rackspace Public Cloud

Similar to Learning to Scale OpenStack: An Update from the Rackspace Public Cloud (20)

Recently uploaded

Recently uploaded (20)

Learning to Scale OpenStack: An Update from the Rackspace Public Cloud

Editor's Notes