SlideShare a Scribd company logo
1 of 58
Automated Discovery of
Performance Regressions
in Enterprise Applications
King Chun (Derek) Foo
Supervisors: Dr. Jenny Zou and Dr. Ahmed E. Hassen
Department of Electrical and Computer Engineering
Performance Regression
• Software changes over time
– Bug fixes
– Features enhancements
– Execution environments
• Performance regressions describe situations
where the performance degrades compared
to previous releases
2
Example of Performance Regression
3
Application
Server
Application
Server Data StoreData Store
SP1 introduces a new
default policy to throttle
the “# of RPC/min”
• Significant increase of job queue and response
time
• CPU utilization decreases
• Certification of 3rd
party component
Load Generator
Application
Server
Application
Server
Data Store
SP 1
Data Store
SP 1
Load Generator
Current Practice of
Performance Verification
4
FOUR CHALLENGES
The Current Practice of Performance Verification
5
1. Too Late in the Development Lifecycle
• Design changes are not evaluated until after
code is written
– Happens at the last stage of a delayed schedule
6
2. Lots of Data
• Industrial case studies
have 2000> counters
• Time consuming to
analyze
• Hard to compare more
than 2 tests at once
7
3. No Documented Behavior
8
• Analysts have different perceptions of
performance regressions
• Analysis may be influenced by
– Analyst’s knowledge
– Deadline
4. Heterogeneous Environments
• Multiple labs to parallelize test executions
– Hardware and software may differ
– Tests from one lab may not be used to analyze
tests from another lab
9
Categorize Each Challenge
10
Design Implementation
Implementation
Implementation
AT THE DESIGN LEVEL
Performance Verification
11
Evaluate Design Changes through
Performance Modeling
• Analytical models are often not suitable for all
stakeholders
– Abstract mathematical and statistical concepts
• Simulation models can be implemented with
support of existing framework
– Visualization
– No systematic approach to construct models that
can be used by different stakeholders
12
Layered Simulation Model
13
Physical layerComponent layer
World view layer
Can the current
infrastructure support
the projected growth
of users?
Investigate
threading model
Hardware resource
utilization
Case Studies
• We conducted two case studies
– RSS Cloud
• Show the process of constructed the model
• Derive the bottleneck of the application
– Performance monitor for ULS systems
• Evaluate whether or not an organization should re-
architect the software
• Our model can be used to extract important
information and aid in decision making
14
AT THE IMPLEMENTATION LEVEL
Performance Verification
15
Challenges with Analyzing
Performance Tests
• Lots of data
– Industrial case studies have 2000> counters
– Time consuming to analyze
– Hard to compare more than 2 tests at once
• No documented behavior
– Analyst’s subjectivity
16
Performance Signatures
Intuition: Counter correlations are the same across tests
17
RepositoryRepository
Arrival Rate
Medium
Arrival Rate
Medium
CPU
Utilization
Medium
CPU
Utilization
Medium
Throughput
Medium
Throughput
Medium
RAM
Utilization
Medium
RAM
Utilization
Medium
Job Queue
Size
Low
Job Queue
Size
Low
…
Performance Signatures
Approach Overview
18
Case Studies
• 2 Open Source Applications
– Dell DVD Store and JPetStore
– Manually injected bugs to simulate performance
regressions
• Enterprise Application
– Compare counters flagged by our technique
against analyst’s reports
19
Case Studies Result
• Open source applications:
– Precision: 75% - 100%
– Recall: 52% - 67%
• Enterprise application:
– Precision: 93%
– Recall: 100% (relative to the organization’s report)
– Discovered new regressions that were not
included in the analysis reports
20
HETEROGENEOUS ENVIRONMENTS
Analyzing Performance Tests conducted in
21
Heterogeneous Environments
• Different hardware and software
configurations
• Performance tests conducted in different lab
exhibits different behaviors
• Must identify performance regressions from
performance difference caused by
heterogeneous environments
22
Ensemble-based Approach
• Build a collection of models from the
repository
– Each model specializes in detecting the
performance regressions in a specific environment
• Reduces risks of following a single model which
may contain conflicting behaviors
23
Case Studies
• 2 Open Source Applications
– Dell DVD Store and JPetStore
– Manually injected bugs and varies
hardware/software resources
• Enterprise Application
– Use existing tests conducted from different lab
24
Case Studies Result
• Original approach
– Precision: 80%
– Recall: 50% (3-level discretization) - 60% (EW)
• Ensemble-based approach:
– Precision: 80% (Bagging) - 100% (Stacking)
– Recall: 80%
• Ensemble-based approach with stacking
produces the best result in our experiments
25
Major Contributions
• An approach to build layered simulation models
to evaluate design changes early
• An automated approach to detect performance
regression, allowing analysts to analyze large
amount of performance data while limiting
subjectivity
• An ensemble-based approach to deal with
performance tests conducted in heterogeneous
environments, which is common in practice
26
Conclusion
27
Publication
K. C. Foo, Z. M. Jiang, B. Adams, A. E. Hassan, Y.
Zou, P. Flora, "Mining Performance Regression
Testing Repositories for Automated
Performance Analysis," Proc. Int’l Conf. on
Quality Softw. (QSIC), 2010
28
Future Work
• Online analysis of performance test
• Compacting the performance regression
report
• Maintaining the training data for our
automated analysis approach
• Using performance signatures to build
performance models
29
30
Logical viewLogical view
Development
view
Development
view
Process viewProcess view Physical viewPhysical view
Scenarios
view
Figure 2 1: The "4+1" view model‑
31
Execution of
performance
regression test
Threshold-
based analysis
of test result
Manual
analysis of test
result
Report
generation
Figure 2 2: The process of performance verification‑
32
QN models Types of application suitable to be modeled
Open QN
Applications with jobs arriving externally; these jobs will eventually depart
from the applications.
Closed QN Applications with a fixed number of jobs circulating within the applications.
Mixed QN Applications with jobs that arrive externally and jobs that circulate within
the applications.SQN-HQN
SRN Distributed applications with synchronous communication.
LQN Distributed applications with synchronous or asynchronous communication.
Table 3 1: Summary of approaches based on QN model‑
33
Figure 3 1: Open queueing network model‑
Figure 3 2: Closed queueing network model‑
34
Stakeholder Performance Concerns
End user Overall system performance for various deployment scenarios
Programmer Organization and performance of system modules
System Engineer Hardware resource utilization of the running application
System Integrator Performance of each high-level component in the application
Table 4 1: Performance concerns of stakeholders‑
35
Stakeholder Layer in Our Simulation Model 4+1 View Model
Architects, Managers
End users
Sales Representatives
World View Layer Logical view
Programmers
System Integrators
Component Layer
Development View
Process View
System Engineers Physical Layer Physical View
All Stakeholders Scenario Scenario
Table 4 2: Mapping of our simulation models to the 4+1 view model‑
36
•Physical layer•Component layer
•World view layer
Figure 4 1: Example of layered simulation model for an RSS cloud‑
37
Layer Component Has connection to
World view layer
Users, blogs RSS server
RSS server Users, blogs
Component layer
In Queues, out queues Application logic
Application logic
Input queues, output queues,
hardware
Hardware Application logic
Physical layer
Hardware allocator CPU, RAM, disk
CPU, RAM, disk Hardware allocator
Table 4 3: Components and connections in Figure 4 1‑ ‑
38
Resource Requirement
CPU 2 unit
RAM 5 KB
Thread 1
Processing time 2 seconds
Table 4 4: Processing requirement for an RSS notification‑
39
Figure 4 2: Plot of the throughput of the RSS server at various request arrival rates‑
40
Figure 4 3: Plot of the response time of the RSS server at various request arrival rates‑
41
Figure 4 4: Plot of the hardware utilization of the RSS server at various request arrival rates‑
42
Figure 4-5: World view layer of the performance monitor for ULS applications
43
Layers Performance Data
World view layer Response time, Transmission Cost
Component layer Thread Utilization
Physical layer CPU and RAM utilization
Table 4 5: Performance data collected per layer‑
CPU Util. Low OK High Very High
Range (s) < 30 30 – 60 60 – 75 > 75
Discretization 0.25 0.5 0.75 1
Table 4 6: Categorization of CPU utilization‑
RAM Util. Low OK High Very High
Range (%) < 25 25 – 50 50 – 60 > 60
Discretization 0.25 0.5 0.75 1
Table 4 7: Categorization of RAM utilization‑
44
Data
Collection
Frequency
(Hz)
Layers
Data
Broadcast
Period (s)
Respons
e time
(s)
Cost ($)
Central
Monito
r
Thread
Util.
(%)
Central
Monito
r CPU
Util.
(%)
Central
Monito
r RAM
Util.
(%)
0.1
World
View
1 6.8 5.0 1.6 15.6 6.1
Component 1 6.8 5.0 1.6 15.6 6.1
Physical 1 6.8 5.0 1.6 15.6 6.1
0.2
World
View
1 7.7 5.0 4.0 40.3 15.7
Component 1 7.7 5.0 4.0 40.3 15.7
Physical 7 8.9 5.3 2.3 23.4 9.2
0.3
World
View
1 8.9 5.0 6.4 64.4 25.3
Component 1 8.9 5.0 6.4 64.4 25.3
Physical 3 9.2 5.0 5.6 56.0 21.9
Table 4 8: Simulation result for the performance monitor case study‑
45
(b) Details of performance regressions
Time series plots show
the periods where
performance regressions
are detected.Box plots give a quick visual
comparison between prior
tests and the new test.
Counters with performance
regressions (underlined) are
annotated with expected
counter correlations.
(a) Overview of problematic regressions
46
Figure 5 2: Overview of performance regression analysis approach‑
47
(a) Original counter data
(b) Counter discretization
(Shaded area corresponds to the medium Discretization level)
Figure 5 3: Counter normalization and discretization‑
48
For each counter,
High = All values above the medium level
Medium = Median +/- 1 standard deviation
Low = All values below the medium level
Figure 5 4: Definition of counter discretization levels‑
49
Figure 5 5: Example of an association rule‑
50
# of test
scenarios
Duration per
test (hours)
Average
precision
Average recall
DS2 4 1 100% 52%
JPetStore 2 0.5 75% 67%
Enterprise
Application
13 8 93%
100%
(relative to organization’s
original analysis)
Table 5 1: Average precision and recall‑
51
Load generator
% Processor Time
# Orders/minute
# Network Bytes Sent/second
# Network Bytes Received/Second
Tomcat
% Processor Time
# Threads
# Virtual Bytes
# Private Bytes
MySQL
% Processor Time
# Private Bytes
# Bytes written to disk/second
# Context Switches/second
# Page Reads/second
# Page Writes/second
% Committed Bytes In Use
# Disk Reads/second
# Disk Writes/second
# I/O Reads Bytes/second
# I/O Writes Bytes/second
Table 5 2: Summary of counters collected for DS2‑
52
53
Figure 5 6: Performance Regression Report for DS2 test D_4 (Increased Load)‑
54
Test
Summary of the report
submitted by the performance
analyst
Our findings
E_1 No performance problem found.
Our approach identified
abnormal behaviors in system
arrival rate and throughput
counters.
E_2
Arrival rates from two load
generators differ significantly.
Abnormally high database
transaction rate.
High spikes in job queue.
Our approach flagged the same
counters as the performance
analyst’s analysis with one false
positive.
E_3
Slight elevation of # database
transactions/second.
No counter flagged.
Table 5 4: Summary of analysis for the enterprise application‑
55
Model Counters flagged as Violation
R1 CPU utilization, throughput
R2 Memory utilization, throughput
R3 Memory utilization, throughput
R4 Database transactions/second
Table 6 1: Counters flagged in T5 by multiple rule sets‑
56
Counters flagged as Violation # of times flagged
Throughput 3
Memory utilization 2
CPU utilization 1
# Database transactions / second 1
Table 6 2: Count of counters flagged as violations by individual rule set‑
57
Performance Testing Repository New Test
T1 T2 T5
CPU 2 GHz, 2 cores 2 GHz, 2 cores
2 GHz, 2
cores
Memory 2 GB 1 GB 2 GB
Database Version 1 2 1
OS Architecture 32 bit 64 bit 64 bit
Table 6 3: Test Configurations‑
58
Table 6 4: Summary of performance of our approaches‑
P represents precision, R represents recall, and F represents F-measure
(Values are rounded up to 1 significant digit)

More Related Content

What's hot

Dataintensive
DataintensiveDataintensive
Dataintensive
sulfath
 
Using Control Charts for Detecting and Understanding Performance Regressions ...
Using Control Charts for Detecting and Understanding Performance Regressions ...Using Control Charts for Detecting and Understanding Performance Regressions ...
Using Control Charts for Detecting and Understanding Performance Regressions ...
SAIL_QU
 
Icse 2011 ds_1
Icse 2011 ds_1Icse 2011 ds_1
Icse 2011 ds_1
SAIL_QU
 
naveed-kamran-software-architecture-agile
naveed-kamran-software-architecture-agilenaveed-kamran-software-architecture-agile
naveed-kamran-software-architecture-agile
Naveed Kamran
 

What's hot (20)

Mining Performance Regression Testing Repositories for Automated Performance ...
Mining Performance Regression Testing Repositories for Automated Performance ...Mining Performance Regression Testing Repositories for Automated Performance ...
Mining Performance Regression Testing Repositories for Automated Performance ...
 
Software Reliability
Software ReliabilitySoftware Reliability
Software Reliability
 
Dataintensive
DataintensiveDataintensive
Dataintensive
 
Testing throughout the software life cycle
Testing throughout the software life cycleTesting throughout the software life cycle
Testing throughout the software life cycle
 
Using Control Charts for Detecting and Understanding Performance Regressions ...
Using Control Charts for Detecting and Understanding Performance Regressions ...Using Control Charts for Detecting and Understanding Performance Regressions ...
Using Control Charts for Detecting and Understanding Performance Regressions ...
 
Ch15 software reliability
Ch15 software reliabilityCh15 software reliability
Ch15 software reliability
 
Software metrics
Software metricsSoftware metrics
Software metrics
 
Testing throughout the software life cycle
Testing throughout the software life cycleTesting throughout the software life cycle
Testing throughout the software life cycle
 
Testing throughout the software life cycle
Testing throughout the software life cycleTesting throughout the software life cycle
Testing throughout the software life cycle
 
Sop test planning
Sop test planningSop test planning
Sop test planning
 
Maintenance, Re-engineering &Reverse Engineering in Software Engineering
Maintenance,Re-engineering &Reverse Engineering in Software EngineeringMaintenance,Re-engineering &Reverse Engineering in Software Engineering
Maintenance, Re-engineering &Reverse Engineering in Software Engineering
 
System development phases
System development phasesSystem development phases
System development phases
 
Icse 2011 ds_1
Icse 2011 ds_1Icse 2011 ds_1
Icse 2011 ds_1
 
naveed-kamran-software-architecture-agile
naveed-kamran-software-architecture-agilenaveed-kamran-software-architecture-agile
naveed-kamran-software-architecture-agile
 
Chaper 1 sdlc
Chaper 1 sdlcChaper 1 sdlc
Chaper 1 sdlc
 
Software metrics
Software metricsSoftware metrics
Software metrics
 
Partitioning composite code changes to facilitate code review
Partitioning composite code changes to facilitate code reviewPartitioning composite code changes to facilitate code review
Partitioning composite code changes to facilitate code review
 
Testing throughout the software life cycle
Testing throughout the software life cycleTesting throughout the software life cycle
Testing throughout the software life cycle
 
Human factors in software reliability engineering - Research Paper
Human factors in software reliability engineering - Research PaperHuman factors in software reliability engineering - Research Paper
Human factors in software reliability engineering - Research Paper
 
Quality & Reliability in Software Engineering
Quality & Reliability in Software EngineeringQuality & Reliability in Software Engineering
Quality & Reliability in Software Engineering
 

Viewers also liked

Luchtvaartmaatschappijen: al miljoen euro Brusselse boetes
Luchtvaartmaatschappijen: al miljoen euro Brusselse boetesLuchtvaartmaatschappijen: al miljoen euro Brusselse boetes
Luchtvaartmaatschappijen: al miljoen euro Brusselse boetes
Thierry Debels
 

Viewers also liked (11)

Large-Scale Empirical Studies of Mobile Apps
Large-Scale Empirical Studies of Mobile AppsLarge-Scale Empirical Studies of Mobile Apps
Large-Scale Empirical Studies of Mobile Apps
 
Claudia de marchi1-1
Claudia de marchi1-1Claudia de marchi1-1
Claudia de marchi1-1
 
Retrato do Brasil 2015
Retrato do Brasil 2015Retrato do Brasil 2015
Retrato do Brasil 2015
 
Luchtvaartmaatschappijen: al miljoen euro Brusselse boetes
Luchtvaartmaatschappijen: al miljoen euro Brusselse boetesLuchtvaartmaatschappijen: al miljoen euro Brusselse boetes
Luchtvaartmaatschappijen: al miljoen euro Brusselse boetes
 
Medios publicitarios
Medios publicitarios Medios publicitarios
Medios publicitarios
 
Proyecto de sociologia
Proyecto de sociologiaProyecto de sociologia
Proyecto de sociologia
 
Case History Carlo Cairoli
Case History Carlo CairoliCase History Carlo Cairoli
Case History Carlo Cairoli
 
Iniciativa del Senador Luis Humberto Fernández sobre coaliciones entre indepe...
Iniciativa del Senador Luis Humberto Fernández sobre coaliciones entre indepe...Iniciativa del Senador Luis Humberto Fernández sobre coaliciones entre indepe...
Iniciativa del Senador Luis Humberto Fernández sobre coaliciones entre indepe...
 
Powering of bangladesh- Vision 2021
Powering of bangladesh- Vision 2021Powering of bangladesh- Vision 2021
Powering of bangladesh- Vision 2021
 
Vođe prvog srpskog ustanka
Vođe prvog srpskog ustankaVođe prvog srpskog ustanka
Vođe prvog srpskog ustanka
 
Sustainability Day Leeds 2017
Sustainability Day Leeds 2017Sustainability Day Leeds 2017
Sustainability Day Leeds 2017
 

Similar to Automated Discovery of Performance Regressions in Enterprise Applications

performancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdfperformancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdf
MAshok10
 
SCQAA-SF Meeting on May 21 2014
SCQAA-SF Meeting on May 21 2014 SCQAA-SF Meeting on May 21 2014
SCQAA-SF Meeting on May 21 2014
Sujit Ghosh
 
Performance evaluation of a multi-core system using Systems development meth...
 Performance evaluation of a multi-core system using Systems development meth... Performance evaluation of a multi-core system using Systems development meth...
Performance evaluation of a multi-core system using Systems development meth...
Yoshifumi Sakamoto
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Lionel Briand
 
B2 2006 sizing_benchmarking (1)
B2 2006 sizing_benchmarking (1)B2 2006 sizing_benchmarking (1)
B2 2006 sizing_benchmarking (1)
Steve Feldman
 
B2 2006 sizing_benchmarking
B2 2006 sizing_benchmarkingB2 2006 sizing_benchmarking
B2 2006 sizing_benchmarking
Steve Feldman
 
Machine Learning in Software Engineering
Machine Learning in Software EngineeringMachine Learning in Software Engineering
Machine Learning in Software Engineering
Alaa Hamouda
 

Similar to Automated Discovery of Performance Regressions in Enterprise Applications (20)

performancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdfperformancetestinganoverview-110206071921-phpapp02.pdf
performancetestinganoverview-110206071921-phpapp02.pdf
 
SCQAA-SF Meeting on May 21 2014
SCQAA-SF Meeting on May 21 2014 SCQAA-SF Meeting on May 21 2014
SCQAA-SF Meeting on May 21 2014
 
PAD: Performance Anomaly Detection in Multi-Server Distributed Systems
PAD: Performance Anomaly Detection in Multi-Server Distributed SystemsPAD: Performance Anomaly Detection in Multi-Server Distributed Systems
PAD: Performance Anomaly Detection in Multi-Server Distributed Systems
 
Performance evaluation of a multi-core system using Systems development meth...
 Performance evaluation of a multi-core system using Systems development meth... Performance evaluation of a multi-core system using Systems development meth...
Performance evaluation of a multi-core system using Systems development meth...
 
software Engineering process
software Engineering processsoftware Engineering process
software Engineering process
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance Test
 
Performance testing : An Overview
Performance testing : An OverviewPerformance testing : An Overview
Performance testing : An Overview
 
software engineering
software engineering software engineering
software engineering
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
 
B2 2006 sizing_benchmarking (1)
B2 2006 sizing_benchmarking (1)B2 2006 sizing_benchmarking (1)
B2 2006 sizing_benchmarking (1)
 
B2 2006 sizing_benchmarking
B2 2006 sizing_benchmarkingB2 2006 sizing_benchmarking
B2 2006 sizing_benchmarking
 
Reinventing Performance Testing, CMG imPACt 2016 slides
  Reinventing Performance Testing, CMG imPACt 2016 slides  Reinventing Performance Testing, CMG imPACt 2016 slides
Reinventing Performance Testing, CMG imPACt 2016 slides
 
ADF Performance Monitor
ADF Performance MonitorADF Performance Monitor
ADF Performance Monitor
 
Automated Analysis of Natural-Language Requirements: Industrial Needs and Opp...
Automated Analysis of Natural-Language Requirements: Industrial Needs and Opp...Automated Analysis of Natural-Language Requirements: Industrial Needs and Opp...
Automated Analysis of Natural-Language Requirements: Industrial Needs and Opp...
 
Instrumentation and measurement
Instrumentation and measurementInstrumentation and measurement
Instrumentation and measurement
 
Mis unit iii by arnav
Mis unit iii by arnavMis unit iii by arnav
Mis unit iii by arnav
 
Enterprise resource planning_system
Enterprise resource planning_systemEnterprise resource planning_system
Enterprise resource planning_system
 
Service quality monitoring system architecture
Service quality monitoring system architectureService quality monitoring system architecture
Service quality monitoring system architecture
 
T3 Consortium's Performance Center of Excellence
T3 Consortium's Performance Center of ExcellenceT3 Consortium's Performance Center of Excellence
T3 Consortium's Performance Center of Excellence
 
Machine Learning in Software Engineering
Machine Learning in Software EngineeringMachine Learning in Software Engineering
Machine Learning in Software Engineering
 

More from SAIL_QU

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...
SAIL_QU
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...
SAIL_QU
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...
SAIL_QU
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
SAIL_QU
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
SAIL_QU
 
Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...
SAIL_QU
 

More from SAIL_QU (20)

Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...Studying the Integration Practices and the Evolution of Ad Libraries in the G...
Studying the Integration Practices and the Evolution of Ad Libraries in the G...
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
Improving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load testsImproving the testing efficiency of selenium-based load tests
Improving the testing efficiency of selenium-based load tests
 
Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...Studying User-Developer Interactions Through the Distribution and Reviewing M...
Studying User-Developer Interactions Through the Distribution and Reviewing M...
 
Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...Studying online distribution platforms for games through the mining of data f...
Studying online distribution platforms for games through the mining of data f...
 
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...
 
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...
 
Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...Mining Development Knowledge to Understand and Support Software Logging Pract...
Mining Development Knowledge to Understand and Support Software Logging Pract...
 
Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?Which Log Level Should Developers Choose For a New Logging Statement?
Which Log Level Should Developers Choose For a New Logging Statement?
 
Towards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log ChangesTowards Just-in-Time Suggestions for Log Changes
Towards Just-in-Time Suggestions for Log Changes
 
The Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution AnalysesThe Impact of Task Granularity on Co-evolution Analyses
The Impact of Task Granularity on Co-evolution Analyses
 
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...
 
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...
 
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...
 
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...
 
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...Studying the Dialogue Between Users and Developers of Free Apps in the Google...
Studying the Dialogue Between Users and Developers of Free Apps in the Google...
 
What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?What Do Programmers Know about Software Energy Consumption?
What Do Programmers Know about Software Energy Consumption?
 
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...
 
Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...Revisiting the Experimental Design Choices for Approaches for the Automated R...
Revisiting the Experimental Design Choices for Approaches for the Automated R...
 
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsMeasuring Program Comprehension: A Large-Scale Field Study with Professionals
Measuring Program Comprehension: A Large-Scale Field Study with Professionals
 

Recently uploaded

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 

Recently uploaded (20)

WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next IntegrationWSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
 
WSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in UgandaWSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in Uganda
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
WSO2CON 2024 - Unlocking the Identity: Embracing CIAM 2.0 for a Competitive A...
 
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
WSO2Con2024 - Simplified Integration: Unveiling the Latest Features in WSO2 L...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and ApplicationsWSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
WSO2CON 2024 - Architecting AI in the Enterprise: APIs and Applications
 
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
WSO2Con2024 - Navigating the Digital Landscape: Transforming Healthcare with ...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
 
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfAzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
WSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid Environments
 
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of TransformationWSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
WSO2CON 2024 - Designing Event-Driven Enterprises: Stories of Transformation
 
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
WSO2CON 2024 - Not Just Microservices: Rightsize Your Services!
 

Automated Discovery of Performance Regressions in Enterprise Applications

  • 1. Automated Discovery of Performance Regressions in Enterprise Applications King Chun (Derek) Foo Supervisors: Dr. Jenny Zou and Dr. Ahmed E. Hassen Department of Electrical and Computer Engineering
  • 2. Performance Regression • Software changes over time – Bug fixes – Features enhancements – Execution environments • Performance regressions describe situations where the performance degrades compared to previous releases 2
  • 3. Example of Performance Regression 3 Application Server Application Server Data StoreData Store SP1 introduces a new default policy to throttle the “# of RPC/min” • Significant increase of job queue and response time • CPU utilization decreases • Certification of 3rd party component Load Generator Application Server Application Server Data Store SP 1 Data Store SP 1 Load Generator
  • 5. FOUR CHALLENGES The Current Practice of Performance Verification 5
  • 6. 1. Too Late in the Development Lifecycle • Design changes are not evaluated until after code is written – Happens at the last stage of a delayed schedule 6
  • 7. 2. Lots of Data • Industrial case studies have 2000> counters • Time consuming to analyze • Hard to compare more than 2 tests at once 7
  • 8. 3. No Documented Behavior 8 • Analysts have different perceptions of performance regressions • Analysis may be influenced by – Analyst’s knowledge – Deadline
  • 9. 4. Heterogeneous Environments • Multiple labs to parallelize test executions – Hardware and software may differ – Tests from one lab may not be used to analyze tests from another lab 9
  • 10. Categorize Each Challenge 10 Design Implementation Implementation Implementation
  • 11. AT THE DESIGN LEVEL Performance Verification 11
  • 12. Evaluate Design Changes through Performance Modeling • Analytical models are often not suitable for all stakeholders – Abstract mathematical and statistical concepts • Simulation models can be implemented with support of existing framework – Visualization – No systematic approach to construct models that can be used by different stakeholders 12
  • 13. Layered Simulation Model 13 Physical layerComponent layer World view layer Can the current infrastructure support the projected growth of users? Investigate threading model Hardware resource utilization
  • 14. Case Studies • We conducted two case studies – RSS Cloud • Show the process of constructed the model • Derive the bottleneck of the application – Performance monitor for ULS systems • Evaluate whether or not an organization should re- architect the software • Our model can be used to extract important information and aid in decision making 14
  • 15. AT THE IMPLEMENTATION LEVEL Performance Verification 15
  • 16. Challenges with Analyzing Performance Tests • Lots of data – Industrial case studies have 2000> counters – Time consuming to analyze – Hard to compare more than 2 tests at once • No documented behavior – Analyst’s subjectivity 16
  • 17. Performance Signatures Intuition: Counter correlations are the same across tests 17 RepositoryRepository Arrival Rate Medium Arrival Rate Medium CPU Utilization Medium CPU Utilization Medium Throughput Medium Throughput Medium RAM Utilization Medium RAM Utilization Medium Job Queue Size Low Job Queue Size Low … Performance Signatures
  • 19. Case Studies • 2 Open Source Applications – Dell DVD Store and JPetStore – Manually injected bugs to simulate performance regressions • Enterprise Application – Compare counters flagged by our technique against analyst’s reports 19
  • 20. Case Studies Result • Open source applications: – Precision: 75% - 100% – Recall: 52% - 67% • Enterprise application: – Precision: 93% – Recall: 100% (relative to the organization’s report) – Discovered new regressions that were not included in the analysis reports 20
  • 22. Heterogeneous Environments • Different hardware and software configurations • Performance tests conducted in different lab exhibits different behaviors • Must identify performance regressions from performance difference caused by heterogeneous environments 22
  • 23. Ensemble-based Approach • Build a collection of models from the repository – Each model specializes in detecting the performance regressions in a specific environment • Reduces risks of following a single model which may contain conflicting behaviors 23
  • 24. Case Studies • 2 Open Source Applications – Dell DVD Store and JPetStore – Manually injected bugs and varies hardware/software resources • Enterprise Application – Use existing tests conducted from different lab 24
  • 25. Case Studies Result • Original approach – Precision: 80% – Recall: 50% (3-level discretization) - 60% (EW) • Ensemble-based approach: – Precision: 80% (Bagging) - 100% (Stacking) – Recall: 80% • Ensemble-based approach with stacking produces the best result in our experiments 25
  • 26. Major Contributions • An approach to build layered simulation models to evaluate design changes early • An automated approach to detect performance regression, allowing analysts to analyze large amount of performance data while limiting subjectivity • An ensemble-based approach to deal with performance tests conducted in heterogeneous environments, which is common in practice 26
  • 28. Publication K. C. Foo, Z. M. Jiang, B. Adams, A. E. Hassan, Y. Zou, P. Flora, "Mining Performance Regression Testing Repositories for Automated Performance Analysis," Proc. Int’l Conf. on Quality Softw. (QSIC), 2010 28
  • 29. Future Work • Online analysis of performance test • Compacting the performance regression report • Maintaining the training data for our automated analysis approach • Using performance signatures to build performance models 29
  • 30. 30 Logical viewLogical view Development view Development view Process viewProcess view Physical viewPhysical view Scenarios view Figure 2 1: The "4+1" view model‑
  • 31. 31 Execution of performance regression test Threshold- based analysis of test result Manual analysis of test result Report generation Figure 2 2: The process of performance verification‑
  • 32. 32 QN models Types of application suitable to be modeled Open QN Applications with jobs arriving externally; these jobs will eventually depart from the applications. Closed QN Applications with a fixed number of jobs circulating within the applications. Mixed QN Applications with jobs that arrive externally and jobs that circulate within the applications.SQN-HQN SRN Distributed applications with synchronous communication. LQN Distributed applications with synchronous or asynchronous communication. Table 3 1: Summary of approaches based on QN model‑
  • 33. 33 Figure 3 1: Open queueing network model‑ Figure 3 2: Closed queueing network model‑
  • 34. 34 Stakeholder Performance Concerns End user Overall system performance for various deployment scenarios Programmer Organization and performance of system modules System Engineer Hardware resource utilization of the running application System Integrator Performance of each high-level component in the application Table 4 1: Performance concerns of stakeholders‑
  • 35. 35 Stakeholder Layer in Our Simulation Model 4+1 View Model Architects, Managers End users Sales Representatives World View Layer Logical view Programmers System Integrators Component Layer Development View Process View System Engineers Physical Layer Physical View All Stakeholders Scenario Scenario Table 4 2: Mapping of our simulation models to the 4+1 view model‑
  • 36. 36 •Physical layer•Component layer •World view layer Figure 4 1: Example of layered simulation model for an RSS cloud‑
  • 37. 37 Layer Component Has connection to World view layer Users, blogs RSS server RSS server Users, blogs Component layer In Queues, out queues Application logic Application logic Input queues, output queues, hardware Hardware Application logic Physical layer Hardware allocator CPU, RAM, disk CPU, RAM, disk Hardware allocator Table 4 3: Components and connections in Figure 4 1‑ ‑
  • 38. 38 Resource Requirement CPU 2 unit RAM 5 KB Thread 1 Processing time 2 seconds Table 4 4: Processing requirement for an RSS notification‑
  • 39. 39 Figure 4 2: Plot of the throughput of the RSS server at various request arrival rates‑
  • 40. 40 Figure 4 3: Plot of the response time of the RSS server at various request arrival rates‑
  • 41. 41 Figure 4 4: Plot of the hardware utilization of the RSS server at various request arrival rates‑
  • 42. 42 Figure 4-5: World view layer of the performance monitor for ULS applications
  • 43. 43 Layers Performance Data World view layer Response time, Transmission Cost Component layer Thread Utilization Physical layer CPU and RAM utilization Table 4 5: Performance data collected per layer‑ CPU Util. Low OK High Very High Range (s) < 30 30 – 60 60 – 75 > 75 Discretization 0.25 0.5 0.75 1 Table 4 6: Categorization of CPU utilization‑ RAM Util. Low OK High Very High Range (%) < 25 25 – 50 50 – 60 > 60 Discretization 0.25 0.5 0.75 1 Table 4 7: Categorization of RAM utilization‑
  • 44. 44 Data Collection Frequency (Hz) Layers Data Broadcast Period (s) Respons e time (s) Cost ($) Central Monito r Thread Util. (%) Central Monito r CPU Util. (%) Central Monito r RAM Util. (%) 0.1 World View 1 6.8 5.0 1.6 15.6 6.1 Component 1 6.8 5.0 1.6 15.6 6.1 Physical 1 6.8 5.0 1.6 15.6 6.1 0.2 World View 1 7.7 5.0 4.0 40.3 15.7 Component 1 7.7 5.0 4.0 40.3 15.7 Physical 7 8.9 5.3 2.3 23.4 9.2 0.3 World View 1 8.9 5.0 6.4 64.4 25.3 Component 1 8.9 5.0 6.4 64.4 25.3 Physical 3 9.2 5.0 5.6 56.0 21.9 Table 4 8: Simulation result for the performance monitor case study‑
  • 45. 45 (b) Details of performance regressions Time series plots show the periods where performance regressions are detected.Box plots give a quick visual comparison between prior tests and the new test. Counters with performance regressions (underlined) are annotated with expected counter correlations. (a) Overview of problematic regressions
  • 46. 46 Figure 5 2: Overview of performance regression analysis approach‑
  • 47. 47 (a) Original counter data (b) Counter discretization (Shaded area corresponds to the medium Discretization level) Figure 5 3: Counter normalization and discretization‑
  • 48. 48 For each counter, High = All values above the medium level Medium = Median +/- 1 standard deviation Low = All values below the medium level Figure 5 4: Definition of counter discretization levels‑
  • 49. 49 Figure 5 5: Example of an association rule‑
  • 50. 50 # of test scenarios Duration per test (hours) Average precision Average recall DS2 4 1 100% 52% JPetStore 2 0.5 75% 67% Enterprise Application 13 8 93% 100% (relative to organization’s original analysis) Table 5 1: Average precision and recall‑
  • 51. 51 Load generator % Processor Time # Orders/minute # Network Bytes Sent/second # Network Bytes Received/Second Tomcat % Processor Time # Threads # Virtual Bytes # Private Bytes MySQL % Processor Time # Private Bytes # Bytes written to disk/second # Context Switches/second # Page Reads/second # Page Writes/second % Committed Bytes In Use # Disk Reads/second # Disk Writes/second # I/O Reads Bytes/second # I/O Writes Bytes/second Table 5 2: Summary of counters collected for DS2‑
  • 52. 52
  • 53. 53 Figure 5 6: Performance Regression Report for DS2 test D_4 (Increased Load)‑
  • 54. 54 Test Summary of the report submitted by the performance analyst Our findings E_1 No performance problem found. Our approach identified abnormal behaviors in system arrival rate and throughput counters. E_2 Arrival rates from two load generators differ significantly. Abnormally high database transaction rate. High spikes in job queue. Our approach flagged the same counters as the performance analyst’s analysis with one false positive. E_3 Slight elevation of # database transactions/second. No counter flagged. Table 5 4: Summary of analysis for the enterprise application‑
  • 55. 55 Model Counters flagged as Violation R1 CPU utilization, throughput R2 Memory utilization, throughput R3 Memory utilization, throughput R4 Database transactions/second Table 6 1: Counters flagged in T5 by multiple rule sets‑
  • 56. 56 Counters flagged as Violation # of times flagged Throughput 3 Memory utilization 2 CPU utilization 1 # Database transactions / second 1 Table 6 2: Count of counters flagged as violations by individual rule set‑
  • 57. 57 Performance Testing Repository New Test T1 T2 T5 CPU 2 GHz, 2 cores 2 GHz, 2 cores 2 GHz, 2 cores Memory 2 GB 1 GB 2 GB Database Version 1 2 1 OS Architecture 32 bit 64 bit 64 bit Table 6 3: Test Configurations‑
  • 58. 58 Table 6 4: Summary of performance of our approaches‑ P represents precision, R represents recall, and F represents F-measure (Values are rounded up to 1 significant digit)