An Automated Approach for Recommending When to Stop Performance Tests

An Automated Approach for
Recommending When to Stop
Performance Tests
Hammam
AlGhamdi
Weiyi
Shang
Mark D.
Syer
Ahmed E.
Hassan
1

Failures in ultra
large-scale systems
are often due to
performance issues
rather than
functional issues
2

A 25-minutes service outage in 2013
cost Amazon approximately $1.7M
3

4
Performance testing is essential to
prevent these failures
System under
test
requests
requests
requests
Performance
counters, e.g.,
CPU, memory, I/O
and response time
Pre-deﬁned
workload
Performance testing
environment

5
Determining the length of a
performance test is challenging
Time
Repetitive data is generated
from the test
Optimal stopping
time

6
Time
Stopping too early,
misses performance
issues
Stopping too late,
delays the release
and wastes testing
resources
Optimal stopping
time

7
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes

8
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes

9
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes

10
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes

11
Time
Current time
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
Step 1: Collect
the data that the
test generates
Performance counters,
e.g., CPU, memory, I/O
and response time

12
Time
Step 2: Measure
the likelihood of
repetitiveness
Select a random time period (e.g. 30 min)
Current time
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
A

13
Time
Current time
Search for another non-overlapping time period that is
NOT statistically signiﬁcantly different.
…
Step 2: Measure
the likelihood of
repetitiveness
… B…A
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test

14
Time
Wilcoxon test
between the
distributions of
every performance
counter across both
periods
…
Current time
Step 2: Measure
the likelihood of
repetitiveness
B…A
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test

15
Step 2: Measure
the likelihood of
repetitiveness
Response
time
CPU Memory IO
p-values 0.0258 0.313 0.687 0.645
Statistically signiﬁcantly
different in response time!
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
Time
Wilcoxon test
between every
performance
counter from
both periods
…
Current
time
B…A

16
Step 2: Measure
the likelihood of
repetitiveness
Response
time
CPU Memory IO
p-values 0.67 0.313 0.687 0.645
Find a time period that is NOT
statistically signiﬁcantly different
in ALL performance metrics!
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
Time
Wilcoxon test
between every
performance
counter from
both periods
…
Current
time
B…A

17
Find a period that
is NOT statistically
signiﬁcantly
different?
Yes. Repetitive!

No. Not
repetitive!
Step 2: Measure
the likelihood of
repetitiveness
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
Time
Wilcoxon test
between every
performance
counter from both
periods
…
Current time
B…A

18
Step 2: Measure
the likelihood of
repetitiveness
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
Repeat this process a large
number (e.g., 1,000) times
to calculate the:

likelihood of
repetitiveness

19
Step 2: Measure
the likelihood of
repetitiveness
30 min
40 min
Time
…
1h 10 min
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
A new likelihood of repetitiveness is
measured periodically, e.g., every 10 min, in
order to get more frequent feedback on the
repetitiveness

20
Step 2: Measure
the likelihood of
repetitiveness
Time
likelihood of
repetitiveness
00:00
24:00
1%
100%
Stabilization
(little new information)
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
The likelihood of repetitiveness
eventually starts stabilizing.

21
Step 3: Extrapolate
the likelihood of
repetitiveness
Time
likelihood of
repetitiveness
00:00
24:00
1%
100%
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
To know when the repetitiveness stabilizes,
we calculate the ﬁrst derivative.

22
Step 4: Determine
whether to stop
the test
Time
likelihood of
repetitivenes
s
00:00
24:00
1%
100%
Stop the test if the
ﬁst derivative is
close to 0.
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
To know when the repetitiveness stabilizes, we
calculate the ﬁrst derivative.

23
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes

PetClinic
Dell DVD Store
CloudStore
24
We conduct 24-hour performance
tests on three systems

25
We evaluate whether our
approach:
Stops the test too
early?
Stops the test too late?
Optimal
stopping time
1 2

26
Pre-stopping data
Post-stopping data
Time
STOP
Does our approach stop the test
too early?
00:00
24:00
1
1) Select a random time
period from the post-
stopping data
2) Check if the random
time period has a
repetitive one from the
pre-stopping data
The test is likely to generate little new data,
after the stopping times (preserving more
than 91.9% of the information).
Repeat
1,000
times

27
We apply our evaluation approach in RQ1
at the end of every hour during the test to
ﬁnd the most cost effective stopping time.
too late?
2
1h
2h
Time
…
10h
20h
24h

The most cost-effective
stopping time has:

1.  A big difference to
the previous hour
2.  A small difference to
the next hour
28
1%
100%
00:00
04:00
05:00
06:00
too late?
2
likelihood of
repetitiveness

29
There is a short delay
between the
recommended stopping
times and the most cost
effective stopping times
(The majority are under
4-hour delay).
Short
delay
too late?
2

30
Time
Stopping too early,
misses performance
issues
Stopping too late,
delays the release
and wastes testing
resources
Optimal stopping
time

31
30
Time
Stopping too early,
misses performance
issues
Stopping too late,
delays the release
and wastes testing
resources
Optimal stopping
time

32
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes

33
30
Time
Stopping too early,
misses performance
issues
Stopping too late,
delays the release
and wastes testing
resources
Optimal stopping
time
32
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes

34
Pre-stopping data
Post-stopping data
Time
STOP
too early?
00:00
24:00
1
stopping data
time period has a
pre-stopping data
Repeat
1,000
times

35
30
Time
Stopping too early,
misses performance
issues
Stopping too late,
delays the release
and wastes testing
resources
Optimal stopping
time
32
Pre-stopping data
Post-stopping data
Time
STOP
too early?
00:00
24:00
1
stopping data
time period has a
pre-stopping data
Repeat
1,000
times
32
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes

36
between the
4-hour delay).
Short
delay
too late?
2

37
30
Time
Stopping too early,
misses performance
issues
Stopping too late,
delays the release
and wastes testing
resources
Optimal stopping
time
32
Pre-stopping data
Post-stopping data
Time
STOP
too early?
00:00
24:00
1
stopping data
time period has a
pre-stopping data
Repeat
1,000
times
33
between the
4-hour delay).
Short
delay
too late?
2
32
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes

An Automated Approach for Recommending When to Stop Performance Tests

More Related Content

Similar to An Automated Approach for Recommending When to Stop Performance Tests

More from SAIL_QU

Recently uploaded

An Automated Approach for Recommending When to Stop Performance Tests