Icse2013 malik

Automatic Detection of
Performance Deviations in the Load
Testing of Large Scale Systems
Haroon Malik
Software Analysis and
Intelligence Lab (SAIL)
Queen’s University, Kingston,
Canada
Hadi Hemmati
Software Architecture Group
(SWAG)
University of Waterloo,
Waterloo, Canada
Ahmed E. Hassan
Software Analysis and
Intelligence Lab (SAIL)
Queen’s University, Kingston,
Canada
Performance
Team
Canada

Large scale systems need to satisfy
performance constraints

Environment setup Load test execution Load test analysis Report generation
Current Practice
1 2 3 4
4

Environment Setup Load test execution Load test analysis Report generation
1 2 3 4
5
Current Practice

1 2 3 4
6
Current Practice

Load Test Execution
MONITORING TOOL
LOAD GENERATOR- 1
SYSTEM PERFORMANCE
REPOSITORY
LOAD GENERATOR- 2
7

1 2 3 4
8
Current Practice

1 2 3 4
9
Current Practice

1 2 3 4
10
Current Practice

Challenges with Load Test Analysis
Limited
Knowledge
Large Number of Counters
1 2 3
11

Limited
Knowledge
1 2 3
12

Limited
Knowledge
1 2 3
13

Limited
Knowledge
1 2 3
14

We Propose 4 Deviation detection
Approaches
15
3 Unsupervised
1 Supervised

Using Performance Counters to Construct
Performance Signature
16
%CPU
Idle
%CPU
Busy
Byte
Commits Disk
writes/sec
% Cache
Faults/ Sec Bytes received

Performance Counters Are Highly
Correlated
CPU DISK (IOPS)
NETWORK
MEMORY
TRANSACTIONS/SEC
17

High Level Overview of our Approach
Data
Preparation
Signature
Generation
Deviation
Detection
Baseline
Test
New Test
Sanitization
Standardization
Performance
Report
Input Load
Test
18

Prepared
Load Test Extracting
Centroids
Signature
Data
Reduction
Clustering
Prepared
Load Test Random
Sampling
Signature
Unsupervised Signature Generation
Random
Sampling
Approach
Clustering
Approach
SignaturePrepared
Load Test
Dimension
Reduction
(PCA)
Identifying Top k
Performance
Counters
Mapping Ranking
Analyst tunes weight parameter
PCA Approach
19

Supervised Signature Generation
Identifying Top k
Performance
Counters
i. Count
ii. % Frequency
Attribute
Selection
OneR
Genetic Search
…
…
…
SPC1
SPC2
SPC10
.
.
.
Partitioningthe
Data
Prepared
Load Test
Labeling
(only for
baseline)
WRAPPER
Approach Signature
20

Supervised Signature Generation
Identifying Top k
Performance
Counters
i. Count
ii. % Frequency
Attribute
Selection
OneR
Genetic Search
…
…
…
SPC1
SPC2
SPC10
.
.
.
Partitioningthe
Data
Prepared
Load Test
Labeling
(only for
baseline)
WRAPPER
Approach Signature
21

Two Deviation Detection Approaches
Using Control Chart Using Approach-
Specific Techniques
22
For Clustering and
Random Sampling Approaches
For PCA and WRAPPER
Approaches

Control Chart
0
2
4
6
8
10
12
14
16
1 2 3 4 5 6 7 8 9 10 11
PerformanceCounterValue
Time (min)
Baseline Load Test
New Load Test
23

Control Chart
0
2
4
6
8
10
12
14
16
1 2 3 4 5 6 7 8 9 10 11
Time (min)
Baseline Load Test
New Load Test
24

Control Chart
0
2
4
6
8
10
12
14
16
1 2 3 4 5 6 7 8 9 10 11
Time (min)
Baseline Load Test
New Load Test
25

Control Chart
0
2
4
6
8
10
12
14
16
1 2 3 4 5 6 7 8 9 10 11
Time (min)
Baseline Load Test
New Load Test
Control Chars use control limits (CL) to represent
the limits of variation that should be expected
from a process.
26

Control Chart
0
2
4
6
8
10
12
14
16
1 2 3 4 5 6 7 8 9 10 11
Time (min)
Baseline Load Test
New Load Test
Baseline CL
Central Limit (CL): the median of all values of the
performance counters in a baseline load test
27

Control Chart
0
2
4
6
8
10
12
14
16
1 2 3 4 5 6 7 8 9 10 11
Time (min)
Baseline Load Test
New Load Test
Baseline LCL, UCL
The Upper/Lower Control Limits (U/LCL) is the
upper/lower limit of the range of a counter
under the normal behavior of the system
Baseline CL
28

Deviation Detection
Clustering and
Random
Sampling
PCA Approach Performance
Report
Comparing
PCA Counter
Weights
Baseline
Signature
New Test
Signature
Training
Data
Evaluation Data
WRAPPER
Approach
29
Baseline
Signature
New Test
Signature
Control Chart
Performance
Report
Performance
Report
Logistic
Regression
Baseline
Signature
New Test
Signature

Case Study
How effective are our signature-based approaches in
detecting performance deviations in load tests?
RQ
30

Case Study
How effective are our signature-based approaches in
detecting performance deviations in load tests?
RQ
An Ideal approach should predict
a minimal and correct set of
performance deviations.
Evaluation Using:
Precision, Recall and F-measure
31

Subjects of Study
System: Open Source
Domain: Ecommerce
Type of data:
1. Data From Our Experiments
with and Open Source
Benchmark Application
DVD Store
32
System: Industrial System
Domain: Telecom
Type of data:
1. Load Test Repository
2. Data From Our Experiments on
the Company’s Testing Platform

Fault Injection
Category Failure Trigger Faults
Software Failure
Resource Exhaustion
CPU Stress
Memory Stress
System Overload Abnormal Workload
Operator Errors Procedural Errors
Interfering Workload
Unscheduled
Replication
33

Case Study Findings
Effectiveness
Precision/Recall/F-measure
Practical
Differences
34

Case Study Findings
(Effectiveness)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
WRAPPER PCA Clustering Random
Precision
Recall
F-Measure
 Random Sampling has the lowest effectiveness
 On Avg. and in all experiments, PCA performs better than Clustering approach.
 WRAPPER dominates the best supervised approach i.e., PCA
35

Case Study Findings
(Effectiveness)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
WRAPPER PCA Clustering Random
Precision
Recall
F-Measure
36
Overall, there is an excellent balance of high precision
and recall of both the WRAPPER and PCA approaches
(on average 0.95, 0.94 and 0.82, 0.84 respectively) for
deviation detection

Real Time Analysis Stability Manual Overhead
Case Study Findings
(Practical Differences)
37

Real Time Analysis
WRAPPER--- deviations
on a per-observation
basis.
PCA --- requires a certain
amount of observations
(wait time).
38

Stability
 We refer to ‘Stability’
as the ability of an
approach to remain
effective while its
signature size is
reduced.
39

0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
45 40 35 30 25 20 15 10 5 4 3 2 1
F-Measure
Signature Size
Unsupervised (PCA)
Supervised(Wrapper)
Stability
40

0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
45 40 35 30 25 20 15 10 5 4 3 2 1
F-Measure
Signature Size
Unsupervised (PCA)
Supervised(Wrapper)
Stability
41

0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
45 40 35 30 25 20 15 10 5 4 3 2 1
F-Measure
Signature Size
Unsupervised (PCA)
Supervised(Wrapper)
Stability
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
1.05
60 50 40 30 20 10 4 2
F-Measure
Signature Size
Unsupervised (PCA)
Supervised(Wrapper)
42

0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
45 40 35 30 25 20 15 10 5 4 3 2 1
F-Measure
Signature Size
Unsupervised (PCA)
Supervised(Wrapper)
Stability
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
1.05
60 50 40 30 20 10 4 2
F-Measure
Signature Size
Unsupervised (PCA)
Supervised(Wrapper)
WRAPPER approach is more
stable that PCA approach
43

Manual Overhead
WRAPPER approach requires all
observations of the baseline
performance counter data to be labeled
as Pass/Fail
44

Manual Overhead
Marking each observation is time
consuming
45

Limitations of Our Approaches
Hardware Difference Sensitivity
46

Hardware Differences
Large systems have many testing labs
to run multiple load tests
simultaneously.
 Labs may have different hardware.
Our approaches may interpret same test on
different labs as deviated tests
Recent work by K.C.Foo et al. propose several
techniques to over come this problem
Mining Performance Regression Testing Repositories for Automated Performance
Analysis ( QSIC 2010)
47

Sensitivity
We can tune the sensitivity of our
approaches.
 PCA – Analyst tunes the ‘weight’
parameter.
 WRAPPER – Analyst decides, in the
labelling phase, how big the deviation
should be in order to be flagged.
Lowering sensitivity reduces false alarms, it may overlook some
important outliers
48

Icse2013 malik

Recommended

Recommended

More Related Content

What's hot

What's hot (7)

Similar to Icse2013 malik

Similar to Icse2013 malik (20)

More from SAIL_QU

More from SAIL_QU (20)

Icse2013 malik

Editor's Notes