Compsac2010 malik

Using Load Test to Automatically Compare
the Subsystems of a Large Enterprise
System
Haroon Malik, Bram Adams & Ahmed E. Hassan
Software Analysis and Intelligence Lab (SAIL)
Queen’s University, Kingston, Canada
Parminder Flora & Gilbert Hamann
Performance Engineering
Research In Motion, Waterloo, Canada

 Today's Large scale systems (LSS) are composed of many
underlying subsystems.
 These LSS grow rapidly in size to handle growing traffic, complex
services and business critical functionality
 Performance analyst have to face the challenge of dealing with
performance bugs as processing is spread across thousands of
subsystems and mail lion of hardware nodes

LOAD TESTING
Load Generator-1
Load Generator-2
Monitoring Tool
Performance counter Log
Performance Repository
System

Environment Setup Load test execution Load test analysis Report generation
CURRENT PRACTICE
1 2 3 4

LARGE NUMBER OF PERFORMANCE
COUNTERS

Automated
Methodology
Require
d

home
Like
Just
Work
Now
really
:::::::
cool
cpt
Just man Work home
smash lunch day pretty
beer ready working
home day smash pretty
Time getting get well
dude dinner bucket
head really heading got
time night get dude got
Feeling matt dude last
4560 ut2465 like now
still good feel still next
might game today 4562
PC-1
PC-2
PC-3
Lot of Data Our Methodology Signature
METHODOLOGY

home
Like
Just
Work
Now
really
:::::::
cool
cpt
Just man Work home
smash lunch day pretty
beer ready working
home day smash pretty
Time getting get well
dude dinner bucket
head really heading got
time night get dude got
Feeling matt dude last
4560 ut2465 like now
still good feel still next
might game today 4562
PC-1
PC-2
PC-3
Lot of Data Our Methodology Signature
METHODOLOGYDatabase
Mail Web

METHODOLOGY
Commits/Sec
Writes/Sec
CPU
Utilization
Database Cache
% Hit
Subsystems Base-Line Load Test - 1 DeviationMatch
0.59
1
0.99

METHODOLOGY
STEPS
1 2 3 4 5 6
Data
Preparation
Counter
Normalization
Dimension
Reduction
Crafting
Performance
Signatures
Extracting
Performance
Deviations
Report
Generation

MEASURING THE PERFORMANCE
Base- Line
Test- 1
t1 t2 t3 t4 t5 t6
Deviations
Predicted (P)
Deviations
Occurred (O)
PO= P ∩ O
Precision = P ∩ O/ P = 1/4 = 0.25 Recall = P ∩ O/ O = 1/3 = 0.33

RESEARCH QUESTIONS
 Can our methodology identify the subsystems
of an LSS, which have performance deviations
relative to prior tests?
 Can we save time on the unnecessary load test
completion by early identifying the
performance deviations along different
subsystems of a LSS?
 How is the performance of our methodology
affected by different sampling intervals?

 Can our methodology identify the
subsystems of an LSS, which have
performance deviations relative to prior
tests?
RQ-1

APPROACH
4 Load tests  8 hours
700 performance counters each
Monitoring interval 15 sec  1922 instances
Baseline test 85% data reduction
Test-1  Baseline test reproduction
Test-2  Synthetic fault injection via mutation
Test-3  Increased the work load intensity (8X)

0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 2 3 4 5 6 7 8 9 10 11
Base Line Test Test-A
Synthesized Test 8X- Load
Performance Counters
importance
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Web Server- A
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Application System
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Web Server- B
Database

FINDINGS
Our methodology help performance analysts
to identify sub-systems with performance
deviations relative to prior tests
Subsystems
Load Test
Test-A Synthesized 8-X load
Data Base 0.997 0.732 0.826
Web Server-A 1.000 0.701 0.795
Web Server-B 1.000 0.700 0.790
Application 1.000 0.623 0.681

Can we save time on the unnecessary load
test completion by early identifying the
performance deviations along different
subsystems of a LSS?
RQ-2

35
40
45
50
55
60
65
70
75
80
1
51
101
151
201
251
301
351
401
451
501
551
601
651
701
751
801
851
901
951
Observations
% CPU Utilization

35
40
45
50
55
60
65
70
75
80
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
Observations
% CPU Utilization

Baseline
Load Test
CPU Stress
38
88
0 50 100
Time (min)
% CPU Utilization
 Two Load Test
 2 hours, each
 Monitoring rate – 15 sec
CPU stress on database server at the 60th min
for 15 sec.
 Test comparison
 Removed 12% sample - 10 min
6%
6%
APPROACH

Database
(30-mins)
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 2 3 4 5 6 7 8 9 10 11
Base-Line Test Load Test
Database
(15-mins)
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 2 3 4 5 6 7 8 9 10 11
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 2 3 4 5 6 7 8 9 10 11
Database
(10-mins)
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 2 3 4 5 6 7 8 9 10 11
Database
(5-mins)
Performance Counters
importance

FINDINGS
Time-(Observations) Database
30-mins (120) 1
15-mins ( 60) 1
10-mins (40) 0.9893
5-mins (20) 0.8255
Early identification of deviations  within
10 minutes or 40 Observations

How is the performance of our
methodology affected by different
sampling intervals?
RQ-3

 Two Load Test
 2 hours, Each
 Monitoring rate– 15 sec
Fault  Stopped Load Generators  10 Times- 15 sec each
 Measured the performance of methodology at different time
interval
 30 min – 4 Samples
Baseline
Load Test -1
30-min
APPROACH

Baseline
Load Test -1
30-min
 Two Load Test
 2 hours, Each
interval
APPROACH

Baseline
Load Test -130-min
 Two Load Test
 2 hours, Each
interval
15-min
APPROACH

Small sample yield high RECALL
FINDINGS
Test Run Database Web Server -1 Web Server- 2 Application System Average
Min Obs Samples Recall Prec Recall Prec Recall Prec Recall Prec Recall Prec
30 120 4 0.50 1.00 0.50 1.00 0.30 1.00 0.25 1.00 0.325 1.000
15 60 8 0.62 1.00 0.62 1.0 0.62 1.0 0.50 1.0 0.590 1.000
10 40 12 1.00 0.90 1.00 0.9 1.00 0.9 0.9 0.69 0.975 0.847
5 20 24 1.00 0.70 1.00 0.7 1.00 0.8 1.00 0.66 1.000 0.715
All - 0.78 0.90 0.78 0.90 0.73 0.92 0.66 0.83 0.738 0.890
Large sample yield high PRECISION
Methodology performs best at 10
minutes time interval with nice
balance of both recall and precision

Compsac2010 malik

More Related Content

Similar to Compsac2010 malik

More from SAIL_QU

Compsac2010 malik

Editor's Notes