Presented by
Date
LEG HKG15-204
OpenStack Testing and
Performance Benchmarking
3rd-Party CI, Rally and Tempest
Andrew McDermott
Clark Laughlin
Tuesday, 10 Feb 2015
Agenda
● Update on 3rd party CI testing
○ why, where and how
● Update on Tempest
○ analysis of results
○ current issues
○ plan going forward
● Explanation of Rally
○ results
○ how we can make use of it
OpenStack 3rd-Party CI
● Goal: Get ARM recognized as an equal,
supported platform for OpenStack
○ Path to recognition requires setting up 3rd-party CI
system
○ Must be able to demonstrate stability before being
allowed to vote on patches
OpenStack 3rd-Party CI
● What it is:
○ Run Tempest against OpenStack triggered by gerrit events
○ Report results back to OpenStack gerrit
○ Functional test of OpenStack components
● What it is not:
○ A general purpose arm64 test environment
○ Testing hypervisor functionality
○ Testing performance, functionality of VMs
OpenStack 3rd-Party CI
● How?
○ Setting up using OpenStack CI components
○ OpenStack deployment with KVM as hypervisor
○ Run devstack/tempest configured to use QEMU
instances
Image credit: http://thoughtsoncloud.com/2014/09/creating-continuous-integration-environment-
openstack/
arm64
nova-compute
nodes
OpenStack 3rd-Party CI
● Setting up a dedicated testing environment in
Linaro co-lo facility
○ HP Moonshot
■ Single chassis
■ ~5 HP m300 cartridges (8-core amd64) running CI
infrastructure services
■ ~20 HP m400 cartridges (8-core arm64) running test
instances (KVM)
OpenStack 3rd-Party CI
● Plans:
○ Initially handle gerrit events for nova
○ Over time scale to handle additional projects:
■ cinder
■ glance
■ swift
■ neutron
Questions
● What other projects does Linaro need to work
towards adding test support for?
○ Network/storage plugins?
Questions
● Would anyone like to help?
○ Help debugging / fixing Tempest failures?
○ Experience setting up an OpenStack CI?
OpenStack Rally
● Rally is an OpenStack project that provides
a framework for measuring performance, for
benchmarking and validation
○ run benchmarks that explain how the
deployment scales
○ provides an historical view of the benchmarks
that were run
○ details how fast they run
○ validates that the workload run successfully
Rally versus Tempest
● Rally is a higher-level tool than Tempest
○ Tempest is typically about running something once
○ Rally is more about testing across your data centres with 1000’s
of machines, each with 1000’s of users/tenants
● Note: validation could use tempest as a workload
Rally high-level use cases
● Rally for devops
○ uses existing cloud, simulate real-world load, aggregate results, verify
SLA have been met
● Rally for developers and QA
○ deploy, simulate real-world load, iterate on performance issue, aggregate
results, make openstack better by upstreaming patches
● Rally for Continuous Integration / Delivery
○ deploy on specific h/w configuration with latest versions from tip, run a
specific set of benchmarks, store performance data for historical trend
analysis, report results - this use case is our initial focus
Rally Benchmark Scenarios
● A scenario is a benchmark specification
○ Typically grouped into OpenStack functional areas
● A scenario performs a small set of atomic operations
○ nova: boot then delete an instance
○ keystone: create user, then list user
● Benchmark scenarios are also customisable
○ which image to use, how much RAM, disk, CPU
Rally Benchmark Runners
● Control the execution of a benchmark
● Provide different strategies for applying load to the deployment:
○ constant - generate a constant load N times
○ constant-for-duration - constant, but time limited
○ periodic - intervals between consecutive runs
● Key aspect is concurrency
○ Run same test but with concurrent invocations
○ This is quite different to tempest testing
Rally Benchmark Context
● A Context typically specifies:
○ the number of users/tenants
○ the roles granted to those users/tenants
○ whether they have extended or narrowed quotas
● Running a test on your laptop is different to
running the test at scale
Rally Example Scenario
{ "NovaServers.boot_server": [ {
"args": {
"flavor_id": 42,
"image_id": "73257560-c59b-4275-a1ec-ab140e5b9979"
},
"runner": {
"type": "constant",
"times": 15,
"concurrency": 2
},
"context": {
"users": {
"tenants": 1,
"users_per_tenant": 3
},
"quotas": {
"nova": {
"instances": 20
}
Rally Benchmark Database
● Rally stores results in a database
• data mining & trend analysis
• looking at historical results
• results can be arbitrarily tagged, then used in SQL queries
$ rally task list
+--------------------------+---------------------+-----------+--------+
| uuid | created_at | status | failed |
+--------------------------+---------------------+-----------+--------+
| fbdf6a3e-...fe47d6345d13 | 2014-10-22 15:26:37 | finished | False |
| ab231519-...3a72b7460fad | 2014-10-22 15:29:32 | finished | False |
| 67ff34c4-...a6a651f1c458 | 2014-10-24 13:33:15 | finished | False |
| 495598c5-...98b0e9b005e6 | 2014-11-12 11:02:46 | finished | False |
+--------------------------+---------------------+-----------+--------+
Nova “boot-and-delete” scenario
● Manual runs of the “boot-and-delete” scenario
○ Results for 1 controller, 2 compute
○ Results for 1 controller, 3 compute
● Disclaimer: results and timings are exemplary
- machines and network shared
Rally Reports
How we use and run Rally?
● Deployment, testing and running of Rally
through LAVA and manually
● Start with nova scenarios
○ grow and expand for other OpenStack components
○ Future: benchmark ODP and NFV
● Run scenarios against icehouse, juno and tip
Openstack Tempest Update
● Summary from LEG-SC meeting
● Analysis of results
● What are the current issues
● What we plan to do next cycle
Tempest Result Summary
● Bundle Stream: https://validation.linaro.
org/dashboard/streams/private/team/mustang/mwhudson-
devstack/bundles/7c4d42405460a199ae694d0affe8d9e3ae96c64e/
ARMv8 x86 (OpenStack CI)
Pass 1379 2051
Fail 36 0
Skip 322 200
Understanding “skips”
● Components not installed
■ cinder, neutron, trove, sahara, ceilometer, zaqar, etc.
● Config setting not enabled
■ Nova v3 API, suspend, live migration
● Currently disabled (existing bugs)
● Configuration errors
■ ping/ssh access not enabled
■ not enough images in glance
Examining Tempest failures
● Some reasons:
○ HTTP timeouts in test setup
○ Invalid configuration creating instances (attempting to use
IDE bus)
● Common ARM and x86 failures
○ Unable to locate instance/image by ID
○ Unable to establish SSH connection to running instance
○ Tempest test suite can hang when running concurrently (e.
g., --concurrency=8)
Getting more tests passing
• We need to enable subsystems like cinder
(needs PCIe)
• Get live migration working
• Live migration is planned for 2015.03
• PCIe (hot plug) is planned for 2015 Q2
• Neutron, getting it configured and working on
ARMv8
Ongoing LAVA testing plan (1)
● Dedicate 3 (new) machines in LAVA for
OpenStack testing
● Will improve test execution time
○ no reboot
○ no reinstall of base OS for each run
○ not shared
● Machines will also be used for Rally
benchmarking
Ongoing LAVA testing plan (2)
● Establish baselines results for:
○ icehouse vs juno vs tip
● CI jobs for both ARM and x86
○ Want a baseline to make comparisons
○ x86 is minimal, best effort only
● Investigate LAVA results
○ some LAB issues
○ some test jobs fail very early
Linaro OpenStack bugzilla
● Bug database setup:
○ https://bugs.linaro.org/enter_bug.cgi?
product=OpenStack
● Capturing ARMv8 only bugs
○ Common bugs will be reported
upstream
HKG15-204: OpenStack: 3rd party testing and performance benchmarking

HKG15-204: OpenStack: 3rd party testing and performance benchmarking

  • 1.
    Presented by Date LEG HKG15-204 OpenStackTesting and Performance Benchmarking 3rd-Party CI, Rally and Tempest Andrew McDermott Clark Laughlin Tuesday, 10 Feb 2015
  • 2.
    Agenda ● Update on3rd party CI testing ○ why, where and how ● Update on Tempest ○ analysis of results ○ current issues ○ plan going forward ● Explanation of Rally ○ results ○ how we can make use of it
  • 3.
    OpenStack 3rd-Party CI ●Goal: Get ARM recognized as an equal, supported platform for OpenStack ○ Path to recognition requires setting up 3rd-party CI system ○ Must be able to demonstrate stability before being allowed to vote on patches
  • 4.
    OpenStack 3rd-Party CI ●What it is: ○ Run Tempest against OpenStack triggered by gerrit events ○ Report results back to OpenStack gerrit ○ Functional test of OpenStack components ● What it is not: ○ A general purpose arm64 test environment ○ Testing hypervisor functionality ○ Testing performance, functionality of VMs
  • 5.
    OpenStack 3rd-Party CI ●How? ○ Setting up using OpenStack CI components ○ OpenStack deployment with KVM as hypervisor ○ Run devstack/tempest configured to use QEMU instances
  • 6.
  • 7.
    OpenStack 3rd-Party CI ●Setting up a dedicated testing environment in Linaro co-lo facility ○ HP Moonshot ■ Single chassis ■ ~5 HP m300 cartridges (8-core amd64) running CI infrastructure services ■ ~20 HP m400 cartridges (8-core arm64) running test instances (KVM)
  • 8.
    OpenStack 3rd-Party CI ●Plans: ○ Initially handle gerrit events for nova ○ Over time scale to handle additional projects: ■ cinder ■ glance ■ swift ■ neutron
  • 9.
    Questions ● What otherprojects does Linaro need to work towards adding test support for? ○ Network/storage plugins?
  • 10.
    Questions ● Would anyonelike to help? ○ Help debugging / fixing Tempest failures? ○ Experience setting up an OpenStack CI?
  • 11.
    OpenStack Rally ● Rallyis an OpenStack project that provides a framework for measuring performance, for benchmarking and validation ○ run benchmarks that explain how the deployment scales ○ provides an historical view of the benchmarks that were run ○ details how fast they run ○ validates that the workload run successfully
  • 12.
    Rally versus Tempest ●Rally is a higher-level tool than Tempest ○ Tempest is typically about running something once ○ Rally is more about testing across your data centres with 1000’s of machines, each with 1000’s of users/tenants ● Note: validation could use tempest as a workload
  • 13.
    Rally high-level usecases ● Rally for devops ○ uses existing cloud, simulate real-world load, aggregate results, verify SLA have been met ● Rally for developers and QA ○ deploy, simulate real-world load, iterate on performance issue, aggregate results, make openstack better by upstreaming patches ● Rally for Continuous Integration / Delivery ○ deploy on specific h/w configuration with latest versions from tip, run a specific set of benchmarks, store performance data for historical trend analysis, report results - this use case is our initial focus
  • 14.
    Rally Benchmark Scenarios ●A scenario is a benchmark specification ○ Typically grouped into OpenStack functional areas ● A scenario performs a small set of atomic operations ○ nova: boot then delete an instance ○ keystone: create user, then list user ● Benchmark scenarios are also customisable ○ which image to use, how much RAM, disk, CPU
  • 15.
    Rally Benchmark Runners ●Control the execution of a benchmark ● Provide different strategies for applying load to the deployment: ○ constant - generate a constant load N times ○ constant-for-duration - constant, but time limited ○ periodic - intervals between consecutive runs ● Key aspect is concurrency ○ Run same test but with concurrent invocations ○ This is quite different to tempest testing
  • 16.
    Rally Benchmark Context ●A Context typically specifies: ○ the number of users/tenants ○ the roles granted to those users/tenants ○ whether they have extended or narrowed quotas ● Running a test on your laptop is different to running the test at scale
  • 17.
    Rally Example Scenario {"NovaServers.boot_server": [ { "args": { "flavor_id": 42, "image_id": "73257560-c59b-4275-a1ec-ab140e5b9979" }, "runner": { "type": "constant", "times": 15, "concurrency": 2 }, "context": { "users": { "tenants": 1, "users_per_tenant": 3 }, "quotas": { "nova": { "instances": 20 }
  • 18.
    Rally Benchmark Database ●Rally stores results in a database • data mining & trend analysis • looking at historical results • results can be arbitrarily tagged, then used in SQL queries $ rally task list +--------------------------+---------------------+-----------+--------+ | uuid | created_at | status | failed | +--------------------------+---------------------+-----------+--------+ | fbdf6a3e-...fe47d6345d13 | 2014-10-22 15:26:37 | finished | False | | ab231519-...3a72b7460fad | 2014-10-22 15:29:32 | finished | False | | 67ff34c4-...a6a651f1c458 | 2014-10-24 13:33:15 | finished | False | | 495598c5-...98b0e9b005e6 | 2014-11-12 11:02:46 | finished | False | +--------------------------+---------------------+-----------+--------+
  • 19.
    Nova “boot-and-delete” scenario ●Manual runs of the “boot-and-delete” scenario ○ Results for 1 controller, 2 compute ○ Results for 1 controller, 3 compute ● Disclaimer: results and timings are exemplary - machines and network shared
  • 20.
  • 21.
    How we useand run Rally? ● Deployment, testing and running of Rally through LAVA and manually ● Start with nova scenarios ○ grow and expand for other OpenStack components ○ Future: benchmark ODP and NFV ● Run scenarios against icehouse, juno and tip
  • 22.
    Openstack Tempest Update ●Summary from LEG-SC meeting ● Analysis of results ● What are the current issues ● What we plan to do next cycle
  • 23.
    Tempest Result Summary ●Bundle Stream: https://validation.linaro. org/dashboard/streams/private/team/mustang/mwhudson- devstack/bundles/7c4d42405460a199ae694d0affe8d9e3ae96c64e/ ARMv8 x86 (OpenStack CI) Pass 1379 2051 Fail 36 0 Skip 322 200
  • 24.
    Understanding “skips” ● Componentsnot installed ■ cinder, neutron, trove, sahara, ceilometer, zaqar, etc. ● Config setting not enabled ■ Nova v3 API, suspend, live migration ● Currently disabled (existing bugs) ● Configuration errors ■ ping/ssh access not enabled ■ not enough images in glance
  • 25.
    Examining Tempest failures ●Some reasons: ○ HTTP timeouts in test setup ○ Invalid configuration creating instances (attempting to use IDE bus) ● Common ARM and x86 failures ○ Unable to locate instance/image by ID ○ Unable to establish SSH connection to running instance ○ Tempest test suite can hang when running concurrently (e. g., --concurrency=8)
  • 26.
    Getting more testspassing • We need to enable subsystems like cinder (needs PCIe) • Get live migration working • Live migration is planned for 2015.03 • PCIe (hot plug) is planned for 2015 Q2 • Neutron, getting it configured and working on ARMv8
  • 27.
    Ongoing LAVA testingplan (1) ● Dedicate 3 (new) machines in LAVA for OpenStack testing ● Will improve test execution time ○ no reboot ○ no reinstall of base OS for each run ○ not shared ● Machines will also be used for Rally benchmarking
  • 28.
    Ongoing LAVA testingplan (2) ● Establish baselines results for: ○ icehouse vs juno vs tip ● CI jobs for both ARM and x86 ○ Want a baseline to make comparisons ○ x86 is minimal, best effort only ● Investigate LAVA results ○ some LAB issues ○ some test jobs fail very early
  • 29.
    Linaro OpenStack bugzilla ●Bug database setup: ○ https://bugs.linaro.org/enter_bug.cgi? product=OpenStack ● Capturing ARMv8 only bugs ○ Common bugs will be reported upstream