3. Benchmarking OpenStack
● How to ensure that OpenStack works at scale?
● How to detect performance issues quickly and
improve OpenStack scalability?
4. A straightforward way to benchmark OpenStack
● Generate load from concurrent users
● Capture key metrics--avg/max time, failure rate
○ VM provisioning
○ Floating IP allocation
○ Snapshot creation
● Verify that the cloud works fine
...
● PROFIT!!!
5. A straightforward way to benchmark OpenStack
● Generate load from concurrent users
● Capture key metrics--avg/max time, failure rate
○ VM provisioning
○ Floating IP allocation
○ Snapshot creation
● Verify that the cloud works fine
...
● PROFIT!!!
… but what if it breaks apart?
13. Improve OS cloud performance and scalability
● 3 common approaches:
○ Use better hardware
○ Deploy better
○ Make the code better
14. Improve OS cloud performance and scalability
● 3 common approaches:
○ Use better hardware
○ Deploy better
○ Make the code better
● But we need to know data points
○ Which part of the code is a bottleneck?
○ What hardware limits are hit, if any?
○ How deployment topology influences
performance?
16. What is Rally?
● Rally is a community-based project that allows
OpenStack developers and operators to get
relevant and repeatable benchmarking data of
how their cloud operates at scale.
● Wiki https://wiki.openstack.org/wiki/Rally
17. Relevant to both devs and operators
● Different types of user-defined workloads
○ For developers: synthetic tests, stress tests
○ For operators: real-life cloud usage patterns
● Flexible reporting
○ For developers: low-level profiling data, bottlenecks
○ For operators: high-level data about cloud
performance, highlights of bottlenecks within their
use case
18. How Rally works
RALLY
Run
specified
scenarios
Deploy
OpenStack
cloud
Deploy engines
Server Providers
DevStack
Virsh
OpenStack
Fuel
LXC
Dummy
Amazon
…
…
Parameters
● Number of
users
● Number of
tenants
● Concurrency
● Type of
workload
● Duration
Get results
Get results
● Execution
time
breakdown
● Failure rates
● Graphics
● Profiling data
19. Benchmarking scenarios
Data for Developers
- Low-level profiling
- Tomograph results
- Graphs
Synthetic workloads
Workload 1
OpenStack
cloud
Results
Workload 2
Workload 3
Real-life workloads
Data for Stakeholders
- Historical data
- SLAs
- Bottlenecks
20. Synthetic tests for developers
● Put stress test on various OpenStack components
○
○
○
○
Large number of provisioned VMs per second
Large number of provisioned volumes per second
Large number of uploaded images per second
Large amount of active resources (VMs/images/volumes)
● Expose bottlenecks and uncover design issues in
OpenStack
● Create a golden standard for everyone in the
community to validate against
21. How did we deploy OpenStack?
●
●
●
●
●
Using Fuel
On real hardware
3 physical controllers
500+ physical compute nodes
In HA deployment mode with Galera,
HAProxy, Corosync, Pacemaker
22. Large number of active VMs
Large numbers of active
VMs shouldn’t affect
provision of new VMs
23. Large number of concurrent users
Average time of
booting and deleting
VMs with different
numbers of concurrent
users
24. Profiling with Tomograph and Zipkin
Highlights:
●
Launch 3 VMs
○ 336 DB queries
○ 74 RPC calls
●
Delete 3 VMs under high load
○ 1 minute global DB lock
on quotas table
25. Why real workloads in addition to synthetic?
● Rationale
○
○
○
In the real world, scenarios are more complicated, than “boot-destroy”
immediately
Workloads rarely change--OpenStack and its topology/configuration
change often
Profiles are specific for businesses
● Expected outcome
○
○
Let companies specify their existing workload and benchmark cloud
according to this workload
Let companies share
26. What to benchmark
Provision VMs
1.
How long (on average)?
2.
How long (maximum)?
3.
Success rate?
Use VMs
Destroy VMs
How long (on average)?
How long (maximum)?
Success rate?
28. Another workload representation
What it shows
● Areas of biggest
concern
● A baseline for all
future changes
(OpenStack version,
deployment
topology, Neutron
plugin)
29. What we ultimately want to achieve
● Provide a mechanism to
easily define workloads
● Let users benchmark their
cloud within specified
workload
● Provide historical data on
all applied optimizations to
see if they are heading to
better performance
30. Roadmap
● Greatly improve profiling capabilities to quickly
pinpoint problem location
● Extend workload definitions to support richer and
more realistic tests, combine workloads
● Support historical data and provide means of
comparison/analytics
● Better correlation between business KPIs and
reporting
31.
32. Join Rally community
● It’s up to you to make Rally better
● Join our team:
○ Wiki: https://wiki.openstack.org/wiki/Rally
○ Project space: https://launchpad.net/rally
○ IRC chat: #openstack-rally on irc.freenode.net