2. User story
As performance engineer, I would like to measure
how different faults impact availability and
performance of OpenStack services.
Examples:
● What is the impact of Keystone restart
to Nova API operations?
● How loss of one of RabbitMQ servers
affects VM instances creation time?
3. Hypothesis
A particular failure may cause errors and/or
performance degradation
Measurements:
● Service downtime (seconds)
● MTTR (seconds)
● Absolute performance degradation
(seconds)
● Relative performance degradation
(ratio)
4. Implementation
1. Rally hooks
an entry-point to call plugins at specified
points of scenario execution
2. OS-Faults lib
fault-injection library
3. Stats processing
results visualization and report
generation
5. Rally hooks
● Hook is a new type of plugins.
● Hooks can be called at specific point of
scenario execution.
● Available in Rally 0.7.0
6. os-faults ● Generalized fault injection library
● DevStack, Fuel, libvirt and IPMI drivers
are already in
● Rally hook plugin is on review
https://review.openstack.org/384483
Simplified API:
● restart rabbitmq service
● reboot one node with mysql service
7. Stats processing ● Time-based vs iteration-based in Rally -
accurate look on service state
● Anomaly analysis - highlight areas where
performance differs