SlideShare a Scribd company logo
1 of 38
Using Load Test to Automatically Compare
the Subsystems of a Large Enterprise
System
Haroon Malik, Bram Adams & Ahmed E. Hassan
Software Analysis and Intelligence Lab (SAIL)
Queen’s University, Kingston, Canada
Parminder Flora & Gilbert Hamann
Performance Engineering
Research In Motion, Waterloo, Canada
 Today's Large scale systems (LSS) are composed of many
underlying subsystems.
 These LSS grow rapidly in size to handle growing traffic, complex
services and business critical functionality
 Performance analyst have to face the challenge of dealing with
performance bugs as processing is spread across thousands of
subsystems and mail lion of hardware nodes
LOAD
LOAD TESTING
Load Generator-1
Load Generator-2
Monitoring Tool
Performance counter Log
Performance Repository
System
Environment Setup Load test execution Load test analysis Report generation
CURRENT PRACTICE
1 2 3 4
CHALLENGES…
LARGE NUMBER OF PERFORMANCE
COUNTERS
RISK OF ERROR
Automated
Methodology
Require
d
home
Like
Just
Work
Now
really
:::::::
cool
cpt
Just man Work home
smash lunch day pretty
beer ready working
home day smash pretty
Time getting get well
dude dinner bucket
head really heading got
time night get dude got
Feeling matt dude last
4560 ut2465 like now
still good feel still next
might game today 4562
PC-1
PC-2
PC-3
Lot of Data Our Methodology Signature
METHODOLOGY
home
Like
Just
Work
Now
really
:::::::
cool
cpt
Just man Work home
smash lunch day pretty
beer ready working
home day smash pretty
Time getting get well
dude dinner bucket
head really heading got
time night get dude got
Feeling matt dude last
4560 ut2465 like now
still good feel still next
might game today 4562
PC-1
PC-2
PC-3
Lot of Data Our Methodology Signature
METHODOLOGYDatabase
Mail Web
METHODOLOGY
Commits/Sec
Writes/Sec
CPU
Utilization
Database Cache
% Hit
Subsystems Base-Line Load Test - 1 DeviationMatch
0.59
1
0.99
METHODOLOGY
STEPS
1 2 3 4 5 6
Data
Preparation
Counter
Normalization
Dimension
Reduction
Crafting
Performance
Signatures
Extracting
Performance
Deviations
Report
Generation
MEASURING THE PERFORMANCE
Base- Line
Test- 1
t1 t2 t3 t4 t5 t6
Deviations
Predicted (P)
Deviations
Occurred (O)
PO= P ∩ O
Precision = P ∩ O/ P = 1/4 = 0.25 Recall = P ∩ O/ O = 1/3 = 0.33
RESEARCH QUESTIONS
 Can our methodology identify the subsystems
of an LSS, which have performance deviations
relative to prior tests?
 Can we save time on the unnecessary load test
completion by early identifying the
performance deviations along different
subsystems of a LSS?
 How is the performance of our methodology
affected by different sampling intervals?
 Can our methodology identify the
subsystems of an LSS, which have
performance deviations relative to prior
tests?
RQ-1
APPROACH
4 Load tests  8 hours
700 performance counters each
Monitoring interval 15 sec  1922 instances
Baseline test 85% data reduction
Test-1  Baseline test reproduction
Test-2  Synthetic fault injection via mutation
Test-3  Increased the work load intensity (8X)
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 2 3 4 5 6 7 8 9 10 11
Base Line Test Test-A
Synthesized Test 8X- Load
Performance Counters
importance
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Web Server- A
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Application System
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Web Server- B
Database
FINDINGS
Our methodology help performance analysts
to identify sub-systems with performance
deviations relative to prior tests
Subsystems
Load Test
Test-A Synthesized 8-X load
Data Base 0.997 0.732 0.826
Web Server-A 1.000 0.701 0.795
Web Server-B 1.000 0.700 0.790
Application 1.000 0.623 0.681
Can we save time on the unnecessary load
test completion by early identifying the
performance deviations along different
subsystems of a LSS?
RQ-2
35
40
45
50
55
60
65
70
75
80
1
51
101
151
201
251
301
351
401
451
501
551
601
651
701
751
801
851
901
951
Observations
% CPU Utilization
35
40
45
50
55
60
65
70
75
80
1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
Observations
% CPU Utilization
Baseline
Load Test
CPU Stress
38
88
0 50 100
Time (min)
% CPU Utilization
 Two Load Test
 2 hours, each
 Monitoring rate – 15 sec
CPU stress on database server at the 60th min
for 15 sec.
 Test comparison
 Removed 12% sample - 10 min
6%
6%
APPROACH
Baseline
Load Test
CPU Stress
38
88
0 50 100
Time (min)
% CPU Utilization
 Two Load Test
 2 hours, each
 Monitoring rate – 15 sec
CPU stress on database server at the 60th min
for 15 sec.
 Test comparison
 Removed 12% sample - 10 min
6%
6%
APPROACH
Baseline
Load Test
CPU Stress
38
88
0 50 100
Time (min)
% CPU Utilization
 Two Load Test
 2 hours, each
 Monitoring rate – 15 sec
CPU stress on database server at the 60th min
for 15 sec.
 Test comparison
 Removed 12% sample - 10 min
6%
6%
APPROACH
Baseline
Load Test
CPU Stress
38
88
0 50 100
Time (min)
% CPU Utilization
 Two Load Test
 2 hours, each
 Monitoring rate – 15 sec
CPU stress on database server at the 60th min
for 15 sec.
 Test comparison
 Removed 12% sample - 10 min
6%
6%
APPROACH
Baseline
Load Test
CPU Stress
38
88
0 50 100
Time (min)
% CPU Utilization
 Two Load Test
 2 hours, each
 Monitoring rate – 15 sec
CPU stress on database server at the 60th min
for 15 sec.
 Test comparison
 Removed 12% sample - 10 min
6%
6%
APPROACH
Baseline
Load Test
CPU Stress
38
88
0 50 100
Time (min)
% CPU Utilization
 Two Load Test
 2 hours, each
 Monitoring rate – 15 sec
CPU stress on database server at the 60th min
for 15 sec.
 Test comparison
 Removed 12% sample - 10 min
6%
6%
APPROACH
Database
(30-mins)
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 2 3 4 5 6 7 8 9 10 11
Base-Line Test Load Test
Database
(15-mins)
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 2 3 4 5 6 7 8 9 10 11
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 2 3 4 5 6 7 8 9 10 11
Database
(10-mins)
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
1 2 3 4 5 6 7 8 9 10 11
Database
(5-mins)
Performance Counters
importance
FINDINGS
Time-(Observations) Database
30-mins (120) 1
15-mins ( 60) 1
10-mins (40) 0.9893
5-mins (20) 0.8255
Early identification of deviations  within
10 minutes or 40 Observations
How is the performance of our
methodology affected by different
sampling intervals?
RQ-3
 Two Load Test
 2 hours, Each
 Monitoring rate– 15 sec
Fault  Stopped Load Generators  10 Times- 15 sec each
 Measured the performance of methodology at different time
interval
 30 min – 4 Samples
 15 min – 8 Samples
Baseline
Load Test -1
30-min
APPROACH
Baseline
Load Test -1
30-min
 Two Load Test
 2 hours, Each
 Monitoring rate– 15 sec
Fault  Stopped Load Generators  10 Times- 15 sec each
 Measured the performance of methodology at different time
interval
 30 min – 4 Samples
 15 min – 8 Samples
APPROACH
Baseline
Load Test -130-min
 Two Load Test
 2 hours, Each
 Monitoring rate– 15 sec
Fault  Stopped Load Generators  10 Times- 15 sec each
 Measured the performance of methodology at different time
interval
 30 min – 4 Samples
 15 min – 8 Samples
15-min
APPROACH
Small sample yield high RECALL
FINDINGS
Test Run Database Web Server -1 Web Server- 2 Application System Average
Min Obs Samples Recall Prec Recall Prec Recall Prec Recall Prec Recall Prec
30 120 4 0.50 1.00 0.50 1.00 0.30 1.00 0.25 1.00 0.325 1.000
15 60 8 0.62 1.00 0.62 1.0 0.62 1.0 0.50 1.0 0.590 1.000
10 40 12 1.00 0.90 1.00 0.9 1.00 0.9 0.9 0.69 0.975 0.847
5 20 24 1.00 0.70 1.00 0.7 1.00 0.8 1.00 0.66 1.000 0.715
All - 0.78 0.90 0.78 0.90 0.73 0.92 0.66 0.83 0.738 0.890
Large sample yield high PRECISION
Methodology performs best at 10
minutes time interval with nice
balance of both recall and precision
Compsac2010 malik

More Related Content

Similar to Compsac2010 malik

Improving continuous process operation using data analytics delta v applicati...
Improving continuous process operation using data analytics delta v applicati...Improving continuous process operation using data analytics delta v applicati...
Improving continuous process operation using data analytics delta v applicati...Emerson Exchange
 
Need for Speed: How to Performance Test the right way by Annie Bhaumik
Need for Speed: How to Performance Test the right way by Annie BhaumikNeed for Speed: How to Performance Test the right way by Annie Bhaumik
Need for Speed: How to Performance Test the right way by Annie BhaumikQA or the Highway
 
Aplication of on line data analytics to a continuous process polybetene unit
Aplication of on line data analytics to a continuous process polybetene unitAplication of on line data analytics to a continuous process polybetene unit
Aplication of on line data analytics to a continuous process polybetene unitEmerson Exchange
 
Performance Test Plan - Sample 1
Performance Test Plan - Sample 1Performance Test Plan - Sample 1
Performance Test Plan - Sample 1Atul Pant
 
Automated Detection of Performance Regressions Using Statistical Process Cont...
Automated Detection of Performance Regressions Using Statistical Process Cont...Automated Detection of Performance Regressions Using Statistical Process Cont...
Automated Detection of Performance Regressions Using Statistical Process Cont...SAIL_QU
 
Icse2013 malik
Icse2013 malikIcse2013 malik
Icse2013 malikSAIL_QU
 
ICSE2017 - Analytics Driven Load Testing: An Industrial Experience Report on ...
ICSE2017 - Analytics Driven Load Testing: An Industrial Experience Report on ...ICSE2017 - Analytics Driven Load Testing: An Industrial Experience Report on ...
ICSE2017 - Analytics Driven Load Testing: An Industrial Experience Report on ...Concordia University
 
Towards a Unified View of Cloud Elasticity
Towards a Unified View of Cloud ElasticityTowards a Unified View of Cloud Elasticity
Towards a Unified View of Cloud ElasticitySrikumar Venugopal
 
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)Bibhuti Prasad Nanda
 
Automated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from MeasurementsAutomated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from MeasurementsWeikun Wang
 
6Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
6Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)6Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
6Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)Bibhuti Prasad Nanda
 
Automated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsAutomated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsSAIL_QU
 
Applied Reliability Symposium 2009 M Turner
Applied Reliability Symposium 2009 M TurnerApplied Reliability Symposium 2009 M Turner
Applied Reliability Symposium 2009 M TurnerMark Turner CRP
 
Performance Test Slideshow Recent
Performance Test Slideshow RecentPerformance Test Slideshow Recent
Performance Test Slideshow RecentFuture Simmons
 
Performance Test Slideshow R E C E N T
Performance Test Slideshow R E C E N TPerformance Test Slideshow R E C E N T
Performance Test Slideshow R E C E N TFuture Simmons
 
aa-automation-apc-complex-industrial-processes
aa-automation-apc-complex-industrial-processesaa-automation-apc-complex-industrial-processes
aa-automation-apc-complex-industrial-processesDavid Lyon
 
Seven deadly sins of ElasticSearch Benchmarking
Seven deadly sins of ElasticSearch BenchmarkingSeven deadly sins of ElasticSearch Benchmarking
Seven deadly sins of ElasticSearch BenchmarkingFan Robbin
 
Performancetestingjmeter 121109061704-phpapp02
Performancetestingjmeter 121109061704-phpapp02Performancetestingjmeter 121109061704-phpapp02
Performancetestingjmeter 121109061704-phpapp02Shivakumara .
 

Similar to Compsac2010 malik (20)

Improving continuous process operation using data analytics delta v applicati...
Improving continuous process operation using data analytics delta v applicati...Improving continuous process operation using data analytics delta v applicati...
Improving continuous process operation using data analytics delta v applicati...
 
Need for Speed: How to Performance Test the right way by Annie Bhaumik
Need for Speed: How to Performance Test the right way by Annie BhaumikNeed for Speed: How to Performance Test the right way by Annie Bhaumik
Need for Speed: How to Performance Test the right way by Annie Bhaumik
 
Aplication of on line data analytics to a continuous process polybetene unit
Aplication of on line data analytics to a continuous process polybetene unitAplication of on line data analytics to a continuous process polybetene unit
Aplication of on line data analytics to a continuous process polybetene unit
 
Performance Test Plan - Sample 1
Performance Test Plan - Sample 1Performance Test Plan - Sample 1
Performance Test Plan - Sample 1
 
Automated Detection of Performance Regressions Using Statistical Process Cont...
Automated Detection of Performance Regressions Using Statistical Process Cont...Automated Detection of Performance Regressions Using Statistical Process Cont...
Automated Detection of Performance Regressions Using Statistical Process Cont...
 
Icse2013 malik
Icse2013 malikIcse2013 malik
Icse2013 malik
 
ICSE2017 - Analytics Driven Load Testing: An Industrial Experience Report on ...
ICSE2017 - Analytics Driven Load Testing: An Industrial Experience Report on ...ICSE2017 - Analytics Driven Load Testing: An Industrial Experience Report on ...
ICSE2017 - Analytics Driven Load Testing: An Industrial Experience Report on ...
 
Towards a Unified View of Cloud Elasticity
Towards a Unified View of Cloud ElasticityTowards a Unified View of Cloud Elasticity
Towards a Unified View of Cloud Elasticity
 
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
 
Automated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from MeasurementsAutomated Parameterization of Performance Models from Measurements
Automated Parameterization of Performance Models from Measurements
 
6Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
6Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)6Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
6Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
 
03 broderick qsts_sand2016-4697 c
03 broderick qsts_sand2016-4697 c03 broderick qsts_sand2016-4697 c
03 broderick qsts_sand2016-4697 c
 
Automated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsAutomated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise Applications
 
Applied Reliability Symposium 2009 M Turner
Applied Reliability Symposium 2009 M TurnerApplied Reliability Symposium 2009 M Turner
Applied Reliability Symposium 2009 M Turner
 
Performance Test Slideshow Recent
Performance Test Slideshow RecentPerformance Test Slideshow Recent
Performance Test Slideshow Recent
 
Performance Test Slideshow R E C E N T
Performance Test Slideshow R E C E N TPerformance Test Slideshow R E C E N T
Performance Test Slideshow R E C E N T
 
aa-automation-apc-complex-industrial-processes
aa-automation-apc-complex-industrial-processesaa-automation-apc-complex-industrial-processes
aa-automation-apc-complex-industrial-processes
 
Seven deadly sins of ElasticSearch Benchmarking
Seven deadly sins of ElasticSearch BenchmarkingSeven deadly sins of ElasticSearch Benchmarking
Seven deadly sins of ElasticSearch Benchmarking
 
Mathworks CAE simulation suite – case in point from automotive and aerospace.
Mathworks CAE simulation suite – case in point from automotive and aerospace.Mathworks CAE simulation suite – case in point from automotive and aerospace.
Mathworks CAE simulation suite – case in point from automotive and aerospace.
 
Performancetestingjmeter 121109061704-phpapp02
Performancetestingjmeter 121109061704-phpapp02Performancetestingjmeter 121109061704-phpapp02
Performancetestingjmeter 121109061704-phpapp02
 

More from SAIL_QU

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...SAIL_QU
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...SAIL_QU
 
Improving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsImproving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsSAIL_QU
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...SAIL_QU
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...SAIL_QU
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...SAIL_QU
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...SAIL_QU
 
Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...SAIL_QU
 
Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?SAIL_QU
 
Towards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesTowards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesSAIL_QU
 
The Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesThe Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesSAIL_QU
 
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...SAIL_QU
 
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...SAIL_QU
 
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...SAIL_QU
 
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...SAIL_QU
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...SAIL_QU
 
What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?SAIL_QU
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...SAIL_QU
 
Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...SAIL_QU
 
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsMeasuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsSAIL_QU
 

More from SAIL_QU (20)

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
Improving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsImproving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load tests
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
 
Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...
 
Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?
 
Towards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesTowards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log Changes
 
The Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesThe Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution Analyses
 
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
 
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
 
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
 
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
 
Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...
 
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsMeasuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
 

Compsac2010 malik

  • 1. Using Load Test to Automatically Compare the Subsystems of a Large Enterprise System Haroon Malik, Bram Adams & Ahmed E. Hassan Software Analysis and Intelligence Lab (SAIL) Queen’s University, Kingston, Canada Parminder Flora & Gilbert Hamann Performance Engineering Research In Motion, Waterloo, Canada
  • 2.  Today's Large scale systems (LSS) are composed of many underlying subsystems.  These LSS grow rapidly in size to handle growing traffic, complex services and business critical functionality  Performance analyst have to face the challenge of dealing with performance bugs as processing is spread across thousands of subsystems and mail lion of hardware nodes
  • 4. LOAD TESTING Load Generator-1 Load Generator-2 Monitoring Tool Performance counter Log Performance Repository System
  • 5. Environment Setup Load test execution Load test analysis Report generation CURRENT PRACTICE 1 2 3 4
  • 7. LARGE NUMBER OF PERFORMANCE COUNTERS
  • 8.
  • 11. home Like Just Work Now really ::::::: cool cpt Just man Work home smash lunch day pretty beer ready working home day smash pretty Time getting get well dude dinner bucket head really heading got time night get dude got Feeling matt dude last 4560 ut2465 like now still good feel still next might game today 4562 PC-1 PC-2 PC-3 Lot of Data Our Methodology Signature METHODOLOGY
  • 12. home Like Just Work Now really ::::::: cool cpt Just man Work home smash lunch day pretty beer ready working home day smash pretty Time getting get well dude dinner bucket head really heading got time night get dude got Feeling matt dude last 4560 ut2465 like now still good feel still next might game today 4562 PC-1 PC-2 PC-3 Lot of Data Our Methodology Signature METHODOLOGYDatabase Mail Web
  • 14. METHODOLOGY STEPS 1 2 3 4 5 6 Data Preparation Counter Normalization Dimension Reduction Crafting Performance Signatures Extracting Performance Deviations Report Generation
  • 15.
  • 16. MEASURING THE PERFORMANCE Base- Line Test- 1 t1 t2 t3 t4 t5 t6 Deviations Predicted (P) Deviations Occurred (O) PO= P ∩ O Precision = P ∩ O/ P = 1/4 = 0.25 Recall = P ∩ O/ O = 1/3 = 0.33
  • 17. RESEARCH QUESTIONS  Can our methodology identify the subsystems of an LSS, which have performance deviations relative to prior tests?  Can we save time on the unnecessary load test completion by early identifying the performance deviations along different subsystems of a LSS?  How is the performance of our methodology affected by different sampling intervals?
  • 18.  Can our methodology identify the subsystems of an LSS, which have performance deviations relative to prior tests? RQ-1
  • 19. APPROACH 4 Load tests  8 hours 700 performance counters each Monitoring interval 15 sec  1922 instances Baseline test 85% data reduction Test-1  Baseline test reproduction Test-2  Synthetic fault injection via mutation Test-3  Increased the work load intensity (8X)
  • 20. 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 1 2 3 4 5 6 7 8 9 10 11 Base Line Test Test-A Synthesized Test 8X- Load Performance Counters importance 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Web Server- A 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Application System 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Web Server- B Database
  • 21. FINDINGS Our methodology help performance analysts to identify sub-systems with performance deviations relative to prior tests Subsystems Load Test Test-A Synthesized 8-X load Data Base 0.997 0.732 0.826 Web Server-A 1.000 0.701 0.795 Web Server-B 1.000 0.700 0.790 Application 1.000 0.623 0.681
  • 22. Can we save time on the unnecessary load test completion by early identifying the performance deviations along different subsystems of a LSS? RQ-2
  • 25. Baseline Load Test CPU Stress 38 88 0 50 100 Time (min) % CPU Utilization  Two Load Test  2 hours, each  Monitoring rate – 15 sec CPU stress on database server at the 60th min for 15 sec.  Test comparison  Removed 12% sample - 10 min 6% 6% APPROACH
  • 26. Baseline Load Test CPU Stress 38 88 0 50 100 Time (min) % CPU Utilization  Two Load Test  2 hours, each  Monitoring rate – 15 sec CPU stress on database server at the 60th min for 15 sec.  Test comparison  Removed 12% sample - 10 min 6% 6% APPROACH
  • 27. Baseline Load Test CPU Stress 38 88 0 50 100 Time (min) % CPU Utilization  Two Load Test  2 hours, each  Monitoring rate – 15 sec CPU stress on database server at the 60th min for 15 sec.  Test comparison  Removed 12% sample - 10 min 6% 6% APPROACH
  • 28. Baseline Load Test CPU Stress 38 88 0 50 100 Time (min) % CPU Utilization  Two Load Test  2 hours, each  Monitoring rate – 15 sec CPU stress on database server at the 60th min for 15 sec.  Test comparison  Removed 12% sample - 10 min 6% 6% APPROACH
  • 29. Baseline Load Test CPU Stress 38 88 0 50 100 Time (min) % CPU Utilization  Two Load Test  2 hours, each  Monitoring rate – 15 sec CPU stress on database server at the 60th min for 15 sec.  Test comparison  Removed 12% sample - 10 min 6% 6% APPROACH
  • 30. Baseline Load Test CPU Stress 38 88 0 50 100 Time (min) % CPU Utilization  Two Load Test  2 hours, each  Monitoring rate – 15 sec CPU stress on database server at the 60th min for 15 sec.  Test comparison  Removed 12% sample - 10 min 6% 6% APPROACH
  • 31. Database (30-mins) 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 1 2 3 4 5 6 7 8 9 10 11 Base-Line Test Load Test Database (15-mins) 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 1 2 3 4 5 6 7 8 9 10 11 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 1 2 3 4 5 6 7 8 9 10 11 Database (10-mins) 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 1 2 3 4 5 6 7 8 9 10 11 Database (5-mins) Performance Counters importance
  • 32. FINDINGS Time-(Observations) Database 30-mins (120) 1 15-mins ( 60) 1 10-mins (40) 0.9893 5-mins (20) 0.8255 Early identification of deviations  within 10 minutes or 40 Observations
  • 33. How is the performance of our methodology affected by different sampling intervals? RQ-3
  • 34.  Two Load Test  2 hours, Each  Monitoring rate– 15 sec Fault  Stopped Load Generators  10 Times- 15 sec each  Measured the performance of methodology at different time interval  30 min – 4 Samples  15 min – 8 Samples Baseline Load Test -1 30-min APPROACH
  • 35. Baseline Load Test -1 30-min  Two Load Test  2 hours, Each  Monitoring rate– 15 sec Fault  Stopped Load Generators  10 Times- 15 sec each  Measured the performance of methodology at different time interval  30 min – 4 Samples  15 min – 8 Samples APPROACH
  • 36. Baseline Load Test -130-min  Two Load Test  2 hours, Each  Monitoring rate– 15 sec Fault  Stopped Load Generators  10 Times- 15 sec each  Measured the performance of methodology at different time interval  30 min – 4 Samples  15 min – 8 Samples 15-min APPROACH
  • 37. Small sample yield high RECALL FINDINGS Test Run Database Web Server -1 Web Server- 2 Application System Average Min Obs Samples Recall Prec Recall Prec Recall Prec Recall Prec Recall Prec 30 120 4 0.50 1.00 0.50 1.00 0.30 1.00 0.25 1.00 0.325 1.000 15 60 8 0.62 1.00 0.62 1.0 0.62 1.0 0.50 1.0 0.590 1.000 10 40 12 1.00 0.90 1.00 0.9 1.00 0.9 0.9 0.69 0.975 0.847 5 20 24 1.00 0.70 1.00 0.7 1.00 0.8 1.00 0.66 1.000 0.715 All - 0.78 0.90 0.78 0.90 0.73 0.92 0.66 0.83 0.738 0.890 Large sample yield high PRECISION Methodology performs best at 10 minutes time interval with nice balance of both recall and precision

Editor's Notes

  1. Today's LSS such as Google, eBay ,Facebook and Amazon are composed of many underlying components and subsystems. These LSS grow rapidly in size to handle growing traffic, complex services and business critical functionality. This exponential growth increases the individual component’s complexity and hence, the integration between the geographically distributed components. - The performance of LSS is periodically measured to satisfy the high business demands on system quality, availability and responsive.
  2. -Load testing is an important weapon in LSS development to uncover functional and performance problems of a system under load Performance of the LSS is calibrated using Load test before it becomes a field or post-deployment problem. -Performance problems include an application not responding fast enough, crashing or hanging under heavy load or not meeting the desired service level agreements (SLA).
  3. Environment-Setup: First but the most important phase of the load testing. As the most common load test failures occurs due to improper environment setup for load tests. The environment setup includes installing the applications and load testing tools on different machines and possibly on different operating systems Load generators, which emulates the users interaction with the systems, need to carefully configured to match the real workload in field. Load Test Execution: It involves starting the components of the system under test, i.e., starting required services, hardware resources and tools ( load generators and performance monitors). Performance counter are recorded in this step too. Load Test Analysis: This step involves comparing the results of a load test against an other load tests results or against predefined thresh holds as baselines. Unlike functional and unit testing, which results in pass of failure classification for each test; load testing requires additional quantitative metrics like response time, throughput and hardware resources utilization to summarize results. The performance analysts selects few of the important performance counters among thousands collected. Based on his experience and domain knowledge performance analyst manually compare the selected performance counters with those of past runs to look for evidence of performance deviations, for example using plots and performing correlations tests. Report Generation: Includes filing the performance deviations, if found, based on the personal judgment of an analysts. Mostly the results produced are verified by an experience analysts. Based on the extent of performance deviation and its relevance to team responsible for handling the subsystems i.e., (database, application, web system etc.)
  4. - Unfortunately, the current practice to analyze load test is costly, time consuming and error prone. This is due to the fact that the load test analysis practices have not kept pace with the rapid growth in size and complexity of the large enterprise systems. In practice, the dominant tools and techniques to analyze large distributed systems have remained unchanged for over twenty years Most of the research had focus on the automatic generation of load testing suits rather then load test analysis - There are many challenges and limitation associated with the current practices of load test analysis that remains unsolved
  5. - Last from computer of hours to several days. They generate performance logs that can be of terra bytes in size Even logging all counters on typical machine at 1Hz generates about 8.6 million values in a single weeks A cluster of 12 machine over a week –13 TB of performance counter data per week. Assuming 64 bit representation for each counter value. Analysis of such large counter log is still a bit challenge in load tests.
  6. Performance analysts in LSS have only limited time to reach and complete diagnostics on performance counter logs and to make necessary configuration changes. Load testing is usually the last step in an already tight and usually delayed release schedule. Hence, managers are always eager to reduce the time allocated for performance testing.
  7. Error prone because of manual process involved in analyzing performance counter data in current practice Impossible for an analyst to skim through large volume of log data, indeed they analyst use few key performance counters know to them from past practices, performance experts and domain trends as ‘ rule of thumbs’. With large scale system that are continuously being evlved by adding new functionalities, applying same rules of thumb can mislead performance issues.
  8. Due to these challenges, we believe the current practice to perform load test analysis is neither effective nor sufficient to uncover performance deviations accurately and in limited time. -
  9. 1) The performance log obtained from a load test do not suffice for direct analysis by our methodology. These logs need to be prepared to make them suitable for statistical techniques employed by our methodology. In this step take care of data sanitization (missing counter variables and incomplete counter variables) and pre-treatment of data such as standardization and data scaling to remove the biasing of variance depended techniques.