Rally--OpenStack Benchmarking at Scale

8,118 views

Published on

OpenStack benchmarking tool--Rally shows you how to detect OpenStack bottlenecks and design issues

Published in: Technology, Business
0 Comments
13 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
8,118
On SlideShare
0
From Embeds
0
Number of Embeds
1,938
Actions
Shares
0
Downloads
285
Comments
0
Likes
13
Embeds 0
No embeds

No notes for slide

Rally--OpenStack Benchmarking at Scale

  1. 1. [at scale] OpenStack Benchmarking Boris Pavlovic Mirantis, 2013
  2. 2. Agenda ● Benchmarking OpenStack at scale ○ What? Why? How? ● Rally ○ What is Rally? ○ Vision ○ Examples and results
  3. 3. Benchmarking OpenStack ● How to ensure that OpenStack works at scale? ● How to detect performance issues quickly and improve OpenStack scalability?
  4. 4. A straightforward way to benchmark OpenStack ● Generate load from concurrent users ● Capture key metrics--avg/max time, failure rate ○ VM provisioning ○ Floating IP allocation ○ Snapshot creation ● Verify that the cloud works fine ... ● PROFIT!!!
  5. 5. A straightforward way to benchmark OpenStack ● Generate load from concurrent users ● Capture key metrics--avg/max time, failure rate ○ VM provisioning ○ Floating IP allocation ○ Snapshot creation ● Verify that the cloud works fine ... ● PROFIT!!! … but what if it breaks apart?
  6. 6. Incorrect deployment setup?
  7. 7. Non-optimal hardware?
  8. 8. Bug in the code?
  9. 9. RTFM Did you take enough time to educate yourself? ;)
  10. 10. Really?
  11. 11. Read the docs… (after an hour)
  12. 12. There should be an
  13. 13. Improve OS cloud performance and scalability ● 3 common approaches: ○ Use better hardware ○ Deploy better ○ Make the code better
  14. 14. Improve OS cloud performance and scalability ● 3 common approaches: ○ Use better hardware ○ Deploy better ○ Make the code better ● But we need to know data points ○ Which part of the code is a bottleneck? ○ What hardware limits are hit, if any? ○ How deployment topology influences performance?
  15. 15. Shine a light in the darkness RALLY
  16. 16. What is Rally? ● Rally is a community-based project that allows OpenStack developers and operators to get relevant and repeatable benchmarking data of how their cloud operates at scale. ● Wiki https://wiki.openstack.org/wiki/Rally
  17. 17. Relevant to both devs and operators ● Different types of user-defined workloads ○ For developers: synthetic tests, stress tests ○ For operators: real-life cloud usage patterns ● Flexible reporting ○ For developers: low-level profiling data, bottlenecks ○ For operators: high-level data about cloud performance, highlights of bottlenecks within their use case
  18. 18. How Rally works RALLY Run specified scenarios Deploy OpenStack cloud Deploy engines Server Providers DevStack Virsh OpenStack Fuel LXC Dummy Amazon … … Parameters ● Number of users ● Number of tenants ● Concurrency ● Type of workload ● Duration Get results Get results ● Execution time breakdown ● Failure rates ● Graphics ● Profiling data
  19. 19. Benchmarking scenarios Data for Developers - Low-level profiling - Tomograph results - Graphs Synthetic workloads Workload 1 OpenStack cloud Results Workload 2 Workload 3 Real-life workloads Data for Stakeholders - Historical data - SLAs - Bottlenecks
  20. 20. Synthetic tests for developers ● Put stress test on various OpenStack components ○ ○ ○ ○ Large number of provisioned VMs per second Large number of provisioned volumes per second Large number of uploaded images per second Large amount of active resources (VMs/images/volumes) ● Expose bottlenecks and uncover design issues in OpenStack ● Create a golden standard for everyone in the community to validate against
  21. 21. How did we deploy OpenStack? ● ● ● ● ● Using Fuel On real hardware 3 physical controllers 500+ physical compute nodes In HA deployment mode with Galera, HAProxy, Corosync, Pacemaker
  22. 22. Large number of active VMs Large numbers of active VMs shouldn’t affect provision of new VMs
  23. 23. Large number of concurrent users Average time of booting and deleting VMs with different numbers of concurrent users
  24. 24. Profiling with Tomograph and Zipkin Highlights: ● Launch 3 VMs ○ 336 DB queries ○ 74 RPC calls ● Delete 3 VMs under high load ○ 1 minute global DB lock on quotas table
  25. 25. Why real workloads in addition to synthetic? ● Rationale ○ ○ ○ In the real world, scenarios are more complicated, than “boot-destroy” immediately Workloads rarely change--OpenStack and its topology/configuration change often Profiles are specific for businesses ● Expected outcome ○ ○ Let companies specify their existing workload and benchmark cloud according to this workload Let companies share
  26. 26. What to benchmark Provision VMs 1. How long (on average)? 2. How long (maximum)? 3. Success rate? Use VMs Destroy VMs How long (on average)? How long (maximum)? Success rate?
  27. 27. Detailed benchmark of each step schedule compute network glance nova-api nova-db compute network nova-dd Destroy VMs nova-db Use VMs nova-api Provision VMs 1s 2s 9s 4s 8s 2m 1s 2s 9s 4s 8s
  28. 28. Another workload representation What it shows ● Areas of biggest concern ● A baseline for all future changes (OpenStack version, deployment topology, Neutron plugin)
  29. 29. What we ultimately want to achieve ● Provide a mechanism to easily define workloads ● Let users benchmark their cloud within specified workload ● Provide historical data on all applied optimizations to see if they are heading to better performance
  30. 30. Roadmap ● Greatly improve profiling capabilities to quickly pinpoint problem location ● Extend workload definitions to support richer and more realistic tests, combine workloads ● Support historical data and provide means of comparison/analytics ● Better correlation between business KPIs and reporting
  31. 31. Join Rally community ● It’s up to you to make Rally better ● Join our team: ○ Wiki: https://wiki.openstack.org/wiki/Rally ○ Project space: https://launchpad.net/rally ○ IRC chat: #openstack-rally on irc.freenode.net

×