SlideShare a Scribd company logo
An Automated Approach for
Recommending When to Stop
Performance Tests
Hammam
AlGhamdi
Weiyi
Shang 
Mark D.
Syer 
Ahmed E.
Hassan
1
Failures in ultra
large-scale systems
are often due to
performance issues
rather than
functional issues
2
A 25-minutes service outage in 2013
cost Amazon approximately $1.7M
3
4
Performance testing is essential to
prevent these failures
System under
test
requests
requests
requests
Performance
counters, e.g.,
CPU, memory, I/O
and response time
Pre-defined
workload
Performance testing
environment
5
Determining the length of a
performance test is challenging
Time
Repetitive data is generated
from the test
Optimal stopping
time
6
Determining the length of a
performance test is challenging
Time
Stopping too early,
misses performance
issues
Stopping too late, 
delays the release
and wastes testing
resources
Optimal stopping
time
7
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
8
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
9
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
10
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
11
Time
Current time
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
Step 1: Collect
the data that the
test generates
Performance counters,
e.g., CPU, memory, I/O
and response time
12
Time
Step 2: Measure
the likelihood of
repetitiveness
Select a random time period (e.g. 30 min) 
Current time
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
A
13
Time
Current time
Search for another non-overlapping time period that is
NOT statistically significantly different. 
…
Step 2: Measure
the likelihood of
repetitiveness
… B…A
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
14
Time
Wilcoxon test
between the
distributions of
every performance
counter across both
periods
…
Current time
Step 2: Measure
the likelihood of
repetitiveness
B…A
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
15
Step 2: Measure
the likelihood of
repetitiveness
Response
time
CPU Memory IO
p-values 0.0258 0.313 0.687 0.645
Statistically significantly
different in response time!
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
Time
Wilcoxon test
between every
performance
counter from
both periods
…
Current
time
B…A
16
Step 2: Measure
the likelihood of
repetitiveness
Response
time
CPU Memory IO
p-values 0.67 0.313 0.687 0.645
Find a time period that is NOT
statistically significantly different
in ALL performance metrics!
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
Time
Wilcoxon test
between every
performance
counter from
both periods
…
Current
time
B…A
17
Find a period that
is NOT statistically
significantly
different?
Yes. Repetitive!

No. Not
repetitive!
Step 2: Measure
the likelihood of
repetitiveness
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
Time
Wilcoxon test
between every
performance
counter from both
periods
…
Current time
B…A
18
Step 2: Measure
the likelihood of
repetitiveness
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
Repeat this process a large
number (e.g., 1,000) times
to calculate the:

likelihood of
repetitiveness
19
Step 2: Measure
the likelihood of
repetitiveness
30 min
40 min
Time
…
1h 10 min
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
A new likelihood of repetitiveness is
measured periodically, e.g., every 10 min, in
order to get more frequent feedback on the
repetitiveness
20
Step 2: Measure
the likelihood of
repetitiveness
Time
likelihood of
repetitiveness
00:00
 24:00
1%
100%
Stabilization
(little new information)
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
The likelihood of repetitiveness
eventually starts stabilizing.
21
Step 3: Extrapolate
the likelihood of
repetitiveness
Time
likelihood of
repetitiveness
00:00
 24:00
1%
100%
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
To know when the repetitiveness stabilizes,
we calculate the first derivative.
22
Step 4: Determine
whether to stop
the test
Time
likelihood of
repetitivenes
s
00:00
 24:00
1%
100%
Stop the test if the
fist derivative is
close to 0.
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
To know when the repetitiveness stabilizes, we
calculate the first derivative.
23
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
PetClinic
 Dell DVD Store
CloudStore
24
We conduct 24-hour performance
tests on three systems
25
We evaluate whether our
approach: 
Stops the test too
early?
Stops the test too late?
Optimal
stopping time
1 2
26
Pre-stopping data
 Post-stopping data
Time
STOP
Does our approach stop the test
too early?
00:00
 24:00
1
1) Select a random time
period from the post-
stopping data
2) Check if the random
time period has a
repetitive one from the
pre-stopping data
The test is likely to generate little new data,
after the stopping times (preserving more
than 91.9% of the information).
Repeat
1,000
times
27
We apply our evaluation approach in RQ1
at the end of every hour during the test to
find the most cost effective stopping time.
Does our approach stop the test
too late?
2
1h
 2h
Time
…
10h
 20h
 24h
The most cost-effective
stopping time has:

1.  A big difference to
the previous hour
2.  A small difference to
the next hour
28
1%
100%
00:00
 04:00
05:00
06:00
Does our approach stop the test
too late?
2
likelihood of
repetitiveness
29
There is a short delay
between the
recommended stopping
times and the most cost
effective stopping times
(The majority are under
4-hour delay).
Short
delay
Does our approach stop the test
too late?
2
30
Determining the length of a
performance test is challenging
Time
Stopping too early,
misses performance
issues
Stopping too late, 
delays the release
and wastes testing
resources
Optimal stopping
time
31
30
Determining the length of a
performance test is challenging
Time
Stopping too early,
misses performance
issues
Stopping too late, 
delays the release
and wastes testing
resources
Optimal stopping
time
32
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
33
30
Determining the length of a
performance test is challenging
Time
Stopping too early,
misses performance
issues
Stopping too late, 
delays the release
and wastes testing
resources
Optimal stopping
time
32
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
34
Pre-stopping data
 Post-stopping data
Time
STOP
Does our approach stop the test
too early?
00:00
 24:00
1
1) Select a random time
period from the post-
stopping data
2) Check if the random
time period has a
repetitive one from the
pre-stopping data
The test is likely to generate little new data,
after the stopping times (preserving more
than 91.9% of the information).
Repeat
1,000
times
35
30
Determining the length of a
performance test is challenging
Time
Stopping too early,
misses performance
issues
Stopping too late, 
delays the release
and wastes testing
resources
Optimal stopping
time
32
Pre-stopping data
 Post-stopping data
Time
STOP
Does our approach stop the test
too early?
00:00
 24:00
1
1) Select a random time
period from the post-
stopping data
2) Check if the random
time period has a
repetitive one from the
pre-stopping data
The test is likely to generate little new data,
after the stopping times (preserving more
than 91.9% of the information).
Repeat
1,000
times
32
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
36
There is a short delay
between the
recommended stopping
times and the most cost
effective stopping times
(The majority are under
4-hour delay).
Short
delay
Does our approach stop the test
too late?
2
37
30
Determining the length of a
performance test is challenging
Time
Stopping too early,
misses performance
issues
Stopping too late, 
delays the release
and wastes testing
resources
Optimal stopping
time
32
Pre-stopping data
 Post-stopping data
Time
STOP
Does our approach stop the test
too early?
00:00
 24:00
1
1) Select a random time
period from the post-
stopping data
2) Check if the random
time period has a
repetitive one from the
pre-stopping data
The test is likely to generate little new data,
after the stopping times (preserving more
than 91.9% of the information).
Repeat
1,000
times
33
There is a short delay
between the
recommended stopping
times and the most cost
effective stopping times
(The majority are under
4-hour delay).
Short
delay
Does our approach stop the test
too late?
2
32
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes

More Related Content

Similar to An Automated Approach for Recommending When to Stop Performance Tests

Input modeling
Input modelingInput modeling
Manual estimation approach for Pre-sale phase of a project
Manual estimation approach for Pre-sale phase of a projectManual estimation approach for Pre-sale phase of a project
Manual estimation approach for Pre-sale phase of a project
Vladimir Primakov (Volodymyr Prymakov)
 
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsTest Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Lionel Briand
 
Log-Based Slicing for System-Level Test Cases (ISSTA 2021)
Log-Based Slicing for System-Level Test Cases (ISSTA 2021)Log-Based Slicing for System-Level Test Cases (ISSTA 2021)
Log-Based Slicing for System-Level Test Cases (ISSTA 2021)
Donghwan Shin
 
Building resilient applications
Building resilient applicationsBuilding resilient applications
Building resilient applications
Nuno Caneco
 
Log-Based Slicing for System-Level Test Cases
Log-Based Slicing for System-Level Test CasesLog-Based Slicing for System-Level Test Cases
Log-Based Slicing for System-Level Test Cases
Lionel Briand
 
Deadlocks Part- III.pdf
Deadlocks Part- III.pdfDeadlocks Part- III.pdf
Deadlocks Part- III.pdf
Harika Pudugosula
 
Chapter06
Chapter06Chapter06
Chapter06
jh.cloudnine
 
Big Data Makes The Flake Go Away
Big Data Makes The Flake Go AwayBig Data Makes The Flake Go Away
Big Data Makes The Flake Go Away
Dave Cadwallader
 
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Codemotion
 
Applying principles of chaos engineering to Serverless (CodeMotion Berlin)
Applying principles of chaos engineering to Serverless (CodeMotion Berlin)Applying principles of chaos engineering to Serverless (CodeMotion Berlin)
Applying principles of chaos engineering to Serverless (CodeMotion Berlin)
Yan Cui
 
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Codemotion
 
Mattias Ratert - Incremental Scenario Testing
Mattias Ratert - Incremental Scenario TestingMattias Ratert - Incremental Scenario Testing
Mattias Ratert - Incremental Scenario Testing
TEST Huddle
 
Setting up and managing a test lab
Setting up and managing a test labSetting up and managing a test lab
Setting up and managing a test lab
kurkj
 
Good practices (and challenges) for reproducibility
Good practices (and challenges) for reproducibilityGood practices (and challenges) for reproducibility
Good practices (and challenges) for reproducibility
Javier Quílez Oliete
 
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos ToolkitExploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Sylvain Hellegouarch
 
Unit iios process scheduling and synchronization
Unit iios process scheduling and synchronizationUnit iios process scheduling and synchronization
Unit iios process scheduling and synchronization
donny101
 
QMRAS Project Presentation
QMRAS Project PresentationQMRAS Project Presentation
QMRAS Project Presentation
Gary Spencer
 
Acceleration life test
Acceleration life testAcceleration life test
Acceleration life test
DineshRaj Goud
 
Planning of experiment in industrial research
Planning of experiment in industrial researchPlanning of experiment in industrial research
Planning of experiment in industrial research
pbbharate
 

Similar to An Automated Approach for Recommending When to Stop Performance Tests (20)

Input modeling
Input modelingInput modeling
Input modeling
 
Manual estimation approach for Pre-sale phase of a project
Manual estimation approach for Pre-sale phase of a projectManual estimation approach for Pre-sale phase of a project
Manual estimation approach for Pre-sale phase of a project
 
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsTest Case Prioritization for Acceptance Testing of Cyber Physical Systems
Test Case Prioritization for Acceptance Testing of Cyber Physical Systems
 
Log-Based Slicing for System-Level Test Cases (ISSTA 2021)
Log-Based Slicing for System-Level Test Cases (ISSTA 2021)Log-Based Slicing for System-Level Test Cases (ISSTA 2021)
Log-Based Slicing for System-Level Test Cases (ISSTA 2021)
 
Building resilient applications
Building resilient applicationsBuilding resilient applications
Building resilient applications
 
Log-Based Slicing for System-Level Test Cases
Log-Based Slicing for System-Level Test CasesLog-Based Slicing for System-Level Test Cases
Log-Based Slicing for System-Level Test Cases
 
Deadlocks Part- III.pdf
Deadlocks Part- III.pdfDeadlocks Part- III.pdf
Deadlocks Part- III.pdf
 
Chapter06
Chapter06Chapter06
Chapter06
 
Big Data Makes The Flake Go Away
Big Data Makes The Flake Go AwayBig Data Makes The Flake Go Away
Big Data Makes The Flake Go Away
 
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
 
Applying principles of chaos engineering to Serverless (CodeMotion Berlin)
Applying principles of chaos engineering to Serverless (CodeMotion Berlin)Applying principles of chaos engineering to Serverless (CodeMotion Berlin)
Applying principles of chaos engineering to Serverless (CodeMotion Berlin)
 
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
 
Mattias Ratert - Incremental Scenario Testing
Mattias Ratert - Incremental Scenario TestingMattias Ratert - Incremental Scenario Testing
Mattias Ratert - Incremental Scenario Testing
 
Setting up and managing a test lab
Setting up and managing a test labSetting up and managing a test lab
Setting up and managing a test lab
 
Good practices (and challenges) for reproducibility
Good practices (and challenges) for reproducibilityGood practices (and challenges) for reproducibility
Good practices (and challenges) for reproducibility
 
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos ToolkitExploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos Toolkit
 
Unit iios process scheduling and synchronization
Unit iios process scheduling and synchronizationUnit iios process scheduling and synchronization
Unit iios process scheduling and synchronization
 
QMRAS Project Presentation
QMRAS Project PresentationQMRAS Project Presentation
QMRAS Project Presentation
 
Acceleration life test
Acceleration life testAcceleration life test
Acceleration life test
 
Planning of experiment in industrial research
Planning of experiment in industrial researchPlanning of experiment in industrial research
Planning of experiment in industrial research
 

More from SAIL_QU

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...
SAIL_QU
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
SAIL_QU
 
Improving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsImproving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load tests
SAIL_QU
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...
SAIL_QU
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...
SAIL_QU
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
SAIL_QU
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
SAIL_QU
 
Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...
SAIL_QU
 
Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?
SAIL_QU
 
Towards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesTowards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log Changes
SAIL_QU
 
The Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesThe Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution Analyses
SAIL_QU
 
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
SAIL_QU
 
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
SAIL_QU
 
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
SAIL_QU
 
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
SAIL_QU
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
SAIL_QU
 
What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?
SAIL_QU
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
SAIL_QU
 
Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...
SAIL_QU
 
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsMeasuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
SAIL_QU
 

More from SAIL_QU (20)

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
Improving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsImproving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load tests
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
 
Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...
 
Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?
 
Towards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesTowards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log Changes
 
The Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesThe Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution Analyses
 
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
 
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
 
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
 
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
 
Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...
 
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsMeasuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
 

Recently uploaded

Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
Fwdays
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
DianaGray10
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 

Recently uploaded (20)

Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
What is an RPA CoE? Session 2 – CoE Roles
What is an RPA CoE?  Session 2 – CoE RolesWhat is an RPA CoE?  Session 2 – CoE Roles
What is an RPA CoE? Session 2 – CoE Roles
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 

An Automated Approach for Recommending When to Stop Performance Tests

  • 1. An Automated Approach for Recommending When to Stop Performance Tests Hammam AlGhamdi Weiyi Shang Mark D. Syer Ahmed E. Hassan 1
  • 2. Failures in ultra large-scale systems are often due to performance issues rather than functional issues 2
  • 3. A 25-minutes service outage in 2013 cost Amazon approximately $1.7M 3
  • 4. 4 Performance testing is essential to prevent these failures System under test requests requests requests Performance counters, e.g., CPU, memory, I/O and response time Pre-defined workload Performance testing environment
  • 5. 5 Determining the length of a performance test is challenging Time Repetitive data is generated from the test Optimal stopping time
  • 6. 6 Determining the length of a performance test is challenging Time Stopping too early, misses performance issues Stopping too late, delays the release and wastes testing resources Optimal stopping time
  • 7. 7 Our approach for recommending when to stop a performance test Collected data Likelihood of repetitiveness First derivatives Whether to stop the test 1) Collect the already-generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test STOP No Yes
  • 8. 8 Our approach for recommending when to stop a performance test Collected data Likelihood of repetitiveness First derivatives Whether to stop the test 1) Collect the already-generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test STOP No Yes
  • 9. 9 Our approach for recommending when to stop a performance test Collected data Likelihood of repetitiveness First derivatives Whether to stop the test 1) Collect the already-generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test STOP No Yes
  • 10. 10 Our approach for recommending when to stop a performance test Collected data Likelihood of repetitiveness First derivatives Whether to stop the test 1) Collect the already-generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test STOP No Yes
  • 11. 11 Time Current time Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test Step 1: Collect the data that the test generates Performance counters, e.g., CPU, memory, I/O and response time
  • 12. 12 Time Step 2: Measure the likelihood of repetitiveness Select a random time period (e.g. 30 min) Current time Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test A
  • 13. 13 Time Current time Search for another non-overlapping time period that is NOT statistically significantly different. … Step 2: Measure the likelihood of repetitiveness … B…A Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test
  • 14. 14 Time Wilcoxon test between the distributions of every performance counter across both periods … Current time Step 2: Measure the likelihood of repetitiveness B…A Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test
  • 15. 15 Step 2: Measure the likelihood of repetitiveness Response time CPU Memory IO p-values 0.0258 0.313 0.687 0.645 Statistically significantly different in response time! Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test Time Wilcoxon test between every performance counter from both periods … Current time B…A
  • 16. 16 Step 2: Measure the likelihood of repetitiveness Response time CPU Memory IO p-values 0.67 0.313 0.687 0.645 Find a time period that is NOT statistically significantly different in ALL performance metrics! Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test Time Wilcoxon test between every performance counter from both periods … Current time B…A
  • 17. 17 Find a period that is NOT statistically significantly different? Yes. Repetitive! No. Not repetitive! Step 2: Measure the likelihood of repetitiveness Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test Time Wilcoxon test between every performance counter from both periods … Current time B…A
  • 18. 18 Step 2: Measure the likelihood of repetitiveness Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test Repeat this process a large number (e.g., 1,000) times to calculate the: likelihood of repetitiveness
  • 19. 19 Step 2: Measure the likelihood of repetitiveness 30 min 40 min Time … 1h 10 min Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test A new likelihood of repetitiveness is measured periodically, e.g., every 10 min, in order to get more frequent feedback on the repetitiveness
  • 20. 20 Step 2: Measure the likelihood of repetitiveness Time likelihood of repetitiveness 00:00 24:00 1% 100% Stabilization (little new information) Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test The likelihood of repetitiveness eventually starts stabilizing.
  • 21. 21 Step 3: Extrapolate the likelihood of repetitiveness Time likelihood of repetitiveness 00:00 24:00 1% 100% Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test To know when the repetitiveness stabilizes, we calculate the first derivative.
  • 22. 22 Step 4: Determine whether to stop the test Time likelihood of repetitivenes s 00:00 24:00 1% 100% Stop the test if the fist derivative is close to 0. Collected data Likelihood of repetitiveness First derivative Whether to stop the test 1) Collect the already- generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test To know when the repetitiveness stabilizes, we calculate the first derivative.
  • 23. 23 Our approach for recommending when to stop a performance test Collected data Likelihood of repetitiveness First derivatives Whether to stop the test 1) Collect the already-generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test STOP No Yes
  • 24. PetClinic Dell DVD Store CloudStore 24 We conduct 24-hour performance tests on three systems
  • 25. 25 We evaluate whether our approach: Stops the test too early? Stops the test too late? Optimal stopping time 1 2
  • 26. 26 Pre-stopping data Post-stopping data Time STOP Does our approach stop the test too early? 00:00 24:00 1 1) Select a random time period from the post- stopping data 2) Check if the random time period has a repetitive one from the pre-stopping data The test is likely to generate little new data, after the stopping times (preserving more than 91.9% of the information). Repeat 1,000 times
  • 27. 27 We apply our evaluation approach in RQ1 at the end of every hour during the test to find the most cost effective stopping time. Does our approach stop the test too late? 2 1h 2h Time … 10h 20h 24h
  • 28. The most cost-effective stopping time has: 1.  A big difference to the previous hour 2.  A small difference to the next hour 28 1% 100% 00:00 04:00 05:00 06:00 Does our approach stop the test too late? 2 likelihood of repetitiveness
  • 29. 29 There is a short delay between the recommended stopping times and the most cost effective stopping times (The majority are under 4-hour delay). Short delay Does our approach stop the test too late? 2
  • 30. 30 Determining the length of a performance test is challenging Time Stopping too early, misses performance issues Stopping too late, delays the release and wastes testing resources Optimal stopping time
  • 31. 31 30 Determining the length of a performance test is challenging Time Stopping too early, misses performance issues Stopping too late, delays the release and wastes testing resources Optimal stopping time
  • 32. 32 Our approach for recommending when to stop a performance test Collected data Likelihood of repetitiveness First derivatives Whether to stop the test 1) Collect the already-generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test STOP No Yes
  • 33. 33 30 Determining the length of a performance test is challenging Time Stopping too early, misses performance issues Stopping too late, delays the release and wastes testing resources Optimal stopping time 32 Our approach for recommending when to stop a performance test Collected data Likelihood of repetitiveness First derivatives Whether to stop the test 1) Collect the already-generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test STOP No Yes
  • 34. 34 Pre-stopping data Post-stopping data Time STOP Does our approach stop the test too early? 00:00 24:00 1 1) Select a random time period from the post- stopping data 2) Check if the random time period has a repetitive one from the pre-stopping data The test is likely to generate little new data, after the stopping times (preserving more than 91.9% of the information). Repeat 1,000 times
  • 35. 35 30 Determining the length of a performance test is challenging Time Stopping too early, misses performance issues Stopping too late, delays the release and wastes testing resources Optimal stopping time 32 Pre-stopping data Post-stopping data Time STOP Does our approach stop the test too early? 00:00 24:00 1 1) Select a random time period from the post- stopping data 2) Check if the random time period has a repetitive one from the pre-stopping data The test is likely to generate little new data, after the stopping times (preserving more than 91.9% of the information). Repeat 1,000 times 32 Our approach for recommending when to stop a performance test Collected data Likelihood of repetitiveness First derivatives Whether to stop the test 1) Collect the already-generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test STOP No Yes
  • 36. 36 There is a short delay between the recommended stopping times and the most cost effective stopping times (The majority are under 4-hour delay). Short delay Does our approach stop the test too late? 2
  • 37. 37 30 Determining the length of a performance test is challenging Time Stopping too early, misses performance issues Stopping too late, delays the release and wastes testing resources Optimal stopping time 32 Pre-stopping data Post-stopping data Time STOP Does our approach stop the test too early? 00:00 24:00 1 1) Select a random time period from the post- stopping data 2) Check if the random time period has a repetitive one from the pre-stopping data The test is likely to generate little new data, after the stopping times (preserving more than 91.9% of the information). Repeat 1,000 times 33 There is a short delay between the recommended stopping times and the most cost effective stopping times (The majority are under 4-hour delay). Short delay Does our approach stop the test too late? 2 32 Our approach for recommending when to stop a performance test Collected data Likelihood of repetitiveness First derivatives Whether to stop the test 1) Collect the already-generated data 2) Measure the likelihood of repetitiveness 3) Extrapolate the likelihood of repetitiveness 4) Determine whether to stop the test STOP No Yes