HKG15-204: OpenStack: 3rd party testing and performance benchmarking

Presented by
Date
LEG HKG15-204
OpenStack Testing and
Performance Benchmarking
3rd-Party CI, Rally and Tempest
Andrew McDermott
Clark Laughlin
Tuesday, 10 Feb 2015

Agenda
● Update on 3rd party CI testing
○ why, where and how
● Update on Tempest
○ analysis of results
○ current issues
○ plan going forward
● Explanation of Rally
○ results
○ how we can make use of it

OpenStack 3rd-Party CI
● Goal: Get ARM recognized as an equal,
supported platform for OpenStack
○ Path to recognition requires setting up 3rd-party CI
system
○ Must be able to demonstrate stability before being
allowed to vote on patches

● What it is:
○ Run Tempest against OpenStack triggered by gerrit events
○ Report results back to OpenStack gerrit
○ Functional test of OpenStack components
● What it is not:
○ A general purpose arm64 test environment
○ Testing hypervisor functionality
○ Testing performance, functionality of VMs

● How?
○ Setting up using OpenStack CI components
○ OpenStack deployment with KVM as hypervisor
○ Run devstack/tempest configured to use QEMU
instances

Image credit: http://thoughtsoncloud.com/2014/09/creating-continuous-integration-environment-
openstack/
arm64
nova-compute
nodes

● Setting up a dedicated testing environment in
Linaro co-lo facility
○ HP Moonshot
■ Single chassis
■ ~5 HP m300 cartridges (8-core amd64) running CI
infrastructure services
■ ~20 HP m400 cartridges (8-core arm64) running test
instances (KVM)

● Plans:
○ Initially handle gerrit events for nova
○ Over time scale to handle additional projects:
■ cinder
■ glance
■ swift
■ neutron

Questions
● What other projects does Linaro need to work
towards adding test support for?
○ Network/storage plugins?

Questions
● Would anyone like to help?
○ Help debugging / fixing Tempest failures?
○ Experience setting up an OpenStack CI?

OpenStack Rally
● Rally is an OpenStack project that provides
a framework for measuring performance, for
benchmarking and validation
○ run benchmarks that explain how the
deployment scales
○ provides an historical view of the benchmarks
that were run
○ details how fast they run
○ validates that the workload run successfully

Rally versus Tempest
● Rally is a higher-level tool than Tempest
○ Tempest is typically about running something once
○ Rally is more about testing across your data centres with 1000’s
of machines, each with 1000’s of users/tenants
● Note: validation could use tempest as a workload

Rally high-level use cases
● Rally for devops
○ uses existing cloud, simulate real-world load, aggregate results, verify
SLA have been met
● Rally for developers and QA
○ deploy, simulate real-world load, iterate on performance issue, aggregate
results, make openstack better by upstreaming patches
● Rally for Continuous Integration / Delivery
○ deploy on specific h/w configuration with latest versions from tip, run a
specific set of benchmarks, store performance data for historical trend
analysis, report results - this use case is our initial focus

Rally Benchmark Scenarios
● A scenario is a benchmark specification
○ Typically grouped into OpenStack functional areas
● A scenario performs a small set of atomic operations
○ nova: boot then delete an instance
○ keystone: create user, then list user
● Benchmark scenarios are also customisable
○ which image to use, how much RAM, disk, CPU

Rally Benchmark Runners
● Control the execution of a benchmark
● Provide different strategies for applying load to the deployment:
○ constant - generate a constant load N times
○ constant-for-duration - constant, but time limited
○ periodic - intervals between consecutive runs
● Key aspect is concurrency
○ Run same test but with concurrent invocations
○ This is quite different to tempest testing

Rally Benchmark Context
● A Context typically specifies:
○ the number of users/tenants
○ the roles granted to those users/tenants
○ whether they have extended or narrowed quotas
● Running a test on your laptop is different to
running the test at scale

Rally Example Scenario
{ "NovaServers.boot_server": [ {
"args": {
"flavor_id": 42,
"image_id": "73257560-c59b-4275-a1ec-ab140e5b9979"
},
"runner": {
"type": "constant",
"times": 15,
"concurrency": 2
},
"context": {
"users": {
"tenants": 1,
"users_per_tenant": 3
},
"quotas": {
"nova": {
"instances": 20
}

Nova “boot-and-delete” scenario
● Manual runs of the “boot-and-delete” scenario
○ Results for 1 controller, 2 compute
○ Results for 1 controller, 3 compute
● Disclaimer: results and timings are exemplary
- machines and network shared

How we use and run Rally?
● Deployment, testing and running of Rally
through LAVA and manually
● Start with nova scenarios
○ grow and expand for other OpenStack components
○ Future: benchmark ODP and NFV
● Run scenarios against icehouse, juno and tip

Openstack Tempest Update
● Summary from LEG-SC meeting
● Analysis of results
● What are the current issues
● What we plan to do next cycle

Tempest Result Summary
● Bundle Stream: https://validation.linaro.
org/dashboard/streams/private/team/mustang/mwhudson-
devstack/bundles/7c4d42405460a199ae694d0affe8d9e3ae96c64e/
ARMv8 x86 (OpenStack CI)
Pass 1379 2051
Fail 36 0
Skip 322 200

Understanding “skips”
● Components not installed
■ cinder, neutron, trove, sahara, ceilometer, zaqar, etc.
● Config setting not enabled
■ Nova v3 API, suspend, live migration
● Currently disabled (existing bugs)
● Configuration errors
■ ping/ssh access not enabled
■ not enough images in glance

Examining Tempest failures
● Some reasons:
○ HTTP timeouts in test setup
○ Invalid configuration creating instances (attempting to use
IDE bus)
● Common ARM and x86 failures
○ Unable to locate instance/image by ID
○ Unable to establish SSH connection to running instance
○ Tempest test suite can hang when running concurrently (e.
g., --concurrency=8)

Getting more tests passing
• We need to enable subsystems like cinder
(needs PCIe)
• Get live migration working
• Live migration is planned for 2015.03
• PCIe (hot plug) is planned for 2015 Q2
• Neutron, getting it configured and working on
ARMv8

Ongoing LAVA testing plan (1)
● Dedicate 3 (new) machines in LAVA for
OpenStack testing
● Will improve test execution time
○ no reboot
○ no reinstall of base OS for each run
○ not shared
● Machines will also be used for Rally
benchmarking

Ongoing LAVA testing plan (2)
● Establish baselines results for:
○ icehouse vs juno vs tip
● CI jobs for both ARM and x86
○ Want a baseline to make comparisons
○ x86 is minimal, best effort only
● Investigate LAVA results
○ some LAB issues
○ some test jobs fail very early

Linaro OpenStack bugzilla
● Bug database setup:
○ https://bugs.linaro.org/enter_bug.cgi?
product=OpenStack
● Capturing ARMv8 only bugs
○ Common bugs will be reported
upstream

HKG15-204: OpenStack: 3rd party testing and performance benchmarking

HKG15-204: OpenStack: 3rd party testing and performance benchmarking

More Related Content

What's hot

Similar to HKG15-204: OpenStack: 3rd party testing and performance benchmarking

More from Linaro

Recently uploaded

HKG15-204: OpenStack: 3rd party testing and performance benchmarking