Automated Detection of Performance Regressions Using Statistical Process Control Techniques

•Download as PPTX, PDF•

0 likes•140 views

This document proposes using statistical process control techniques like control charts to detect performance regressions. It acknowledges challenges like unstable inputs across test runs and multiple influencing factors. It then presents solutions like scaling counters based on input metrics and isolating influential factors. Finally, it evaluates the approach through two case studies, showing it can accurately detect regressions with a low average violation ratio and achieve high precision and recall.

Software

Automated Detection of Performance
Regressions Using Statistical Process Control
Techniques
Thanh Nguyen, Bram Adams, ZhenMing Jiang, Ahmed E. Hassan
Queen’s University, Kingston, Canada
Mohamed Nasser, Parminder Flora
Research in Motion, Waterloo, Canada
1

What is a performance regression?
3
Version 1 Version 1.1

Baseline Target

How to detect performance
regression?
4
Applying load
Version 1.1
Version 1
CPU %,
Memory usage
CPU %,
Memory usage
Detect
regression

Challenge in Performance Regression
Testing
5
Layer 1 –
Agent 1
Layer 1 –
Agent 2
Layer 2 –
Agent 1
Layer 2 –
Agent 2
Layer 2 –
Agent 3
Layer 2 –
Agent 4
Layer 3 –
Agent 1
Layer 4 –
Agent 1
56 counters x 8 agents = 448 counters
56 counters x 2 agents = 112 counters
Layer 1 Layer 2
Lots of data

Data mining -> Reduce and Relate
7
Reduce Relate

Proposed approach to use control charts to
find performance regression
8
Baseline
Performance counters
Target
Performance counters
Determine the
LCL, CL, UCL
730
735
740
745
750
755
760
765
770
775
0 5 10 15 20 25
Performance counter

Using control charts to verify load test
results
9
Baseline
Performance counters
Target
Performance counters
Determine the
LCL, CL, UCL
730
735
740
745
750
755
760
765
770
775
0 5 10 15 20 25
Performance counter
Violation
ratio
Reduce

10
Baseline
Performance counters
Target
Performance counters
Target
Performance counters
730
740
750
760
770
780
0 10 20 30
Performance counter
Baseline
Performance counters
720
730
740
750
760
770
780
0 10 20 30
Performance counter
Low
violation
ratio
High
violation
ratio
We can use violation ratio to detect
regression
Relate

Obstacles #1: Inputs are unstable
12
0
5
10
15
20
25
30
35
40
45
1 2 3 4 5 6
CPU%
Time
Version 1.0
Version 1.1Is there a
performance
regression?

It is very difficult to maintain stable
input across test runs
13
Applying load
Version 1.1
Version 1
CPU %,
Memory usage
CPU %,
Memory usage
Detect
regression
Randomization Cache
Warm up
Background tasks

Solution #1: Scale the counter according to
the input
• Step 1: Determine α and β
• Step 2:
14
CPU% Request/s
c = a *l + b
¢ct = ct *
a *lt + b
a *lb + b

Solution #1: Example of the effectiveness of
scaling
15

Obstacles #2: Multiple inputs
16
0
5
10
15
20
25
30
35
10 20 30 40 50 60 70 80 90 100
Density%
CPU Usage
Density plot of two test runs
IF … THEN
…
ELSE
…

0
5
10
15
20
25
30
35
10 20 30 40 50 60 70 80 90 100
Density%
CPU Usage
Density plot of two test runs
Solution #2: Isolating the counters
17
Local minima

Scale and filter
18
Applying load
Version 1.1
Version 1
CPU %,
Memory usage
CPU %,
Memory usage
Detect
regression
Scale
Scale
Filter
Filter

Experiment set up
20
Baseline
Performance counters
Target
Performance counters
Target
Performance counters
Average violation ratio should be low

21
Baseline
Performance counters
Target
Performance counters
Target
Performance counters
Average violation ratio should be high
Experiment set up

22
Normal
Problem0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Average violation ratio

24
Experiment set up
V.S.
Precision is high
Recall should be high

25
0
10
20
30
40
50
60
70
80
90
100
%
Threshold
Precision
Recall
F

The document proposes using control charts to automatically verify the results of load testing. It discusses how control charts can be used to determine normal performance thresholds and detect when new releases perform outside of acceptable ranges, potentially isolating problematic components. Two case studies examining an enterprise system and small e-commerce site show that control charts are effective at detecting performance regressions and identifying the source of issues.

Using Control Charts for Detecting and Understanding Performance Regressions ...

SAIL_QU

1) The document discusses using control charts to detect performance regressions in software. Control charts can identify when performance counters exceed their expected upper and lower control limits. 2) An approach is proposed to apply control charts to a baseline version's performance counters and compare them to a target version's counters. Out-of-control counters may indicate performance regressions. 3) Experiments show control charts can effectively detect regressions when the number of out-of-control counters is high or the average violation ratio exceeds expectations. The method helps identify where in the system regressions occur.

An Industrial Case Study of Automatically Identifying Performance Regression-...

SAIL_QU

1) Performance regressions in software can be caused by changes that impact resources like CPU usage, memory, disk I/O, and network I/O. 2) A study of performance regressions in industrial software identified the most common causes, like adding frequently executed database queries or mismatched database indices. 3) The study leveraged a repository of performance data across versions to analyze regression causes by comparing counter values and finding patterns associated with known causes.

ICPE2015

swy351

Our approach uses regression models on clustered performance counters to automatically detect performance regressions. It reduces counters, clusters remaining counters, selects target counters showing most significant differences between versions, and builds regression models to predict counters in the new version. When applied to real systems, our approach picks a small number of target counters and can accurately detect performance regressions, outperforming traditional approaches.

Test automation

vidisha Agrawal

This document presents information on test automation. It discusses what software testing is, the need for testing, different types of testing, and compares manual vs automated testing. It outlines which test cases are best to automate, the essential needs of test automation, and the process of automating tests. Automated testing provides benefits like eliminating human error, reusability, speed and cost reduction compared to manual testing, but has disadvantages like high upfront investment and effort needed.

Decreasing false positives in automated testing

Sauce Labs

QASource presented on reducing false positives in automated testing. Some key points: 1. False positives occur when tests are incorrectly marked as failed when they should have passed. Common causes include reliance on UI elements, synchronization issues, and unstable test code. 2. False positives can impact automation by wasting time investigating failures, decreasing productivity, and obscuring real bugs. 3. Strategies to reduce false positives include using stable locators, short independent tests, dynamic synchronization, teardown logic, and re-execution of failed tests. 4. Eliminating false positives leads to more certainty in test results and reduced costs of automation.

Adaptive software testing

Johan Hoberg

The document describes an adaptive test methodology that begins with low-cost automated tests to identify risks and inform higher-cost test selection. Manual user scenarios and scripted tests then systematically cover risk areas identified, with exploratory testing used to find root causes of major problems uncovered by previous tests. Resources are saved by starting broad with automated tests to reveal risks before targeting manual tests, while still thoroughly covering the risk space through a synergistic combination of test types.

Risk based testing a new case study

Bassam Al-Khatib

This document provides an overview of risk-based testing (RBT) including the process, examples, and benefits. It discusses how to identify risks, analyze them by considering likelihood and impact, and then mitigate risks through testing. The key aspects of RBT covered include building a risk matrix, prioritizing testing based on risk level, and applying RBT at different scales from large changes to small quick tasks. Overall, RBT is presented as a systematic approach to make testing more efficient and ensure the most critical risks are addressed.

This document discusses using a genetic algorithm to develop a machine learning model for predicting fault-prone software classes. It begins by introducing software reliability and fault prediction. It then explains that a genetic algorithm is a search technique that evaluates potential solutions, keeps the best ones, and generates new solutions iteratively. The algorithm uses software metrics like coupling, cohesion, inheritance, and size as inputs to classify classes as faulty or fault-free with 80.14% accuracy, helping to identify areas for improvement.

Risk based regression testing approach

Asim Ali

Reliable Relevant Metrics to the Right Audience - Manual Testing Whitepaper

Indium Software

Regression testing

Anamta Sayyed

Regression testing is retesting software after changes to ensure bugs have not been introduced or detected. It has the objectives of checking that bugs have been addressed, testing related areas that could be affected, and achieving a bug-free system. Strategies for regression testing include retesting all tests, selecting some tests to rerun based on areas affected by changes, and prioritizing test cases based on business impact and importance. An effective regression strategy can save organizations time and money by automating regression testing.

Machine learning in software testing

Thoughtworks

This document discusses how machine learning can be applied to various activities in software testing. It describes how machine learning works using training and test data to make predictions. Supervised and unsupervised learning techniques are discussed. Specific applications mentioned include software defect prediction, test planning, test case management, debugging, and refining blackbox test specifications. Challenges include availability of past data and finding predictable patterns, while potential steps forward include expanding machine learning to more blackbox techniques, identifying the right patterns for different test activities, algorithm analysis, and crowdsourcing.

Predictive Analytics in Software Testing

Pavan Kumar Kodedela

The document discusses challenges faced by companies with both in-house and outsourced software testing. It introduces predictive analytics as a solution to address common challenges like managing multiple releases and tools, measuring productivity, and generating customized reports. Predictive analytics uses models to analyze test data and predict issues, risks, delays and determine how to optimize testing. Integrating predictive analytics into a testing framework can help reduce costs, improve quality and make better decisions.

[HCMC STC Jan 2015] Making IT Count – Agile Test Metrics

Ho Chi Minh City Software Testing Club

Ho Chi Minh City Software Testing Conference January 2015 Software Testing in the Agile World Website: www.hcmc-stc.org Author: Richard Taylor Agile teams don’t need traditional metrics: we do everything so quickly that we only need to know our velocity and cycle time". Is this an extreme claim, or is it realistic? When it's possible to implement a completely pure and simple Agile methodology, and react to all feedback almost immediately, it might be true. It's certainly true that some of the metrics which work well in other types of project lifecycle aren't useful in an Agile one. But are test metrics irrelevant in a large Agile project, with multiple teams and a formal release mechanism? What happens when an Agile project has to comply with standards, or with regulatory requirements, to produce proof of product quality? And even if those things aren’t true, aren't there some things we can measure that will tell us how good our Agile testing is, and how it might get better? This presentation should be helpful to anybody who is, or will be, testing in or managing an Agile project team. In it, Richard Taylor explains how to make some of his favourite test metrics useful in an Agile environment and why some others might better be avoided. Various types of coverage, effectiveness and weighted defect measures are explained and demonstrated. Richard shows how we can present both product and process metrics in a way that gives their message clearly to all interested people, including those from the business and from management who aren’t IT specialists.

Case study on Test Automation under RUP

Oak Systems

This case study describes an automation project for testing an insurance software with stringent quality goals and short development cycles. The client previously struggled with manual testing being laborious and redundant. Oak Systems solved this by automating over 1300 test cases using a testing tool. They verified test cases and reorganized data, developed automation scripts that were dynamic and reusable, and synchronized regression testing with releases. This automation helped achieve quality goals through faster testing and higher quality, improving client and user satisfaction.

What is Regression Testing? | Edureka

Edureka!

The document discusses regression testing, including its definition, benefits, when it should be applied, types, techniques, challenges and best practices. Regression testing involves re-running all tests to ensure new code changes have not introduced new bugs or caused existing bugs to reappear. It helps find bugs early, increases chances of detecting bugs, ensures correctness and that fixed issues do not occur again.

What will testing look like in year 2020

BugRaptors

Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Markus Borg

Issue management is a costly part of software development. In large projects, the continuous inflow of issue reports contributes to the information overload in a project, i.e., "a state where individuals do not have time or capacity to process all available information". In issue triaging, an initial step in issue management, a developer must be able to overview existing issue reports and easily navigate the software engineering project landscape. In this presentation, we present support for two work tasks involved in issue management: 1) issue assignment and 2) change impact analysis. We use machine learning to harness the ever-growing number of issue reports, by training recommendation systems on previous issues. Our industrial evaluations on 50,000+ issue reports in two large software development organizations indicate that automated issue assignment performs in line with current manual work. Moreover, we present how traceability from already resolved issue reports to various artifacts can be reused to jump start change impact analyses for newly submitted issues. Finally, we speculate on future ways to tame information overload into helpful software engineering recommendations.

Defect MgmtBugDay Bangkok 2009: Defect Management

guest476528

Regression testing

Harsh verma

Regression and performance testing

Himanshu

This document discusses software regression testing and performance testing. It defines regression testing as retesting software after changes to check for defects in modified areas and unaffected areas. Regression testing happens at the unit, integration, and system levels and requires analyzing software changes and impacts. Performance testing evaluates a system's compliance with performance requirements by testing response times under various loads. It helps ensure applications meet customers' expectations as defined in SLAs. The document discusses performance testing tools and benefits of the approach while cautioning that performance testing is as much an art as a science due to the many variables involved.

Dominic Maes - Testing "slow flows" Fast, Automated End-2-End Testing using i...

TEST Huddle

An Industrial Case Study on the Automated Detection of Performance Regression...

SAIL_QU

This document presents an approach for automatically detecting performance regressions in heterogeneous environments. It uses association rule mining on performance counter data from past tests to generate performance rules. These rules are then used to detect violation metrics in a new test by identifying significant changes in rule confidence values. Results are combined from multiple heterogeneous lab environments using a weighted voting method based on environment similarities. The approach is evaluated on real-world systems using F-measure and is shown to outperform single model and bagging methods for detecting performance regressions.

Testing 3: Types Of Tests That May Be Required

ArleneAndrews2

Seven testing principles

Vaibhav Dash

The document outlines 7 testing principles: 1) Testing finds defects but finding none does not mean none exist, 2) Exhaustive testing is impossible so smarter testing is needed, 3) Early testing saves time and money and makes customers happy, 4) Defects tend to cluster together, 5) Test cases must be updated periodically to avoid outdated "pesticide" tests, 6) Testing methods vary depending on the software context, and 7) Software should be stable before testing to avoid false negatives from instability.

Risk-based Testing

Johan Hoberg

The document discusses the challenges of implementing risk-based testing for complex software systems. It explains that while risk-based testing aims to prioritize tests based on risk, determining the appropriate test scope for changes in a complex system with many configurations and dependencies is difficult. The key challenges identified are understanding the system dependencies, collecting relevant data over time to learn how changes impact the system, and ensuring tests and manual exploratory testing sessions adequately capture this information. While risk analysis, automated testing frameworks, and exploratory testing can help guide scope selection, it remains a complex problem with no simple solution.

Application performance testing services

Alisha Henderson

What are the Characteristics of High-rated Apps

SAIL_QU

This document presents a case study analyzing factors that contribute to high and low ratings of free Android apps. It examines 28 factors across dimensions like app size, code complexity, library dependence, UI complexity, user requirements, and marketing efforts. The study finds that high-rated apps differ significantly from low-rated apps in 17 factors. The top 3 most influential factors for predicting ratings are app size, number of promotional images, and target SDK version. While some prior studies identified other factors like library quality and UI complexity, this multi-factor analysis shows those are less important than size and marketing efforts. Future work could explore additional factors and conduct more fine-grained analysis by app category.

A Case Study of Bias in Bug-Fix Datasets

SAIL_QU

1) The document discusses potential biases in bug-fix datasets that could threaten the validity of software quality studies, including linkage bias and tagging bias. 2) Linkage bias occurs when unlinked bug reports have higher severity and involve less experienced developers, while tagging bias happens because about 2/3 of bug reports are not actual defects. 3) The document examines whether these biases exist in the Jazz dataset and how they could affect research results. It finds that linkage biases may exist but tagging biases do not strongly influence outcomes.

What's hot

Hello

Pallavi Batra

Risk based regression testing approach

Asim Ali

Reliable Relevant Metrics to the Right Audience - Manual Testing Whitepaper

Indium Software

Regression testing

Anamta Sayyed

Machine learning in software testing

Thoughtworks

Predictive Analytics in Software Testing

Pavan Kumar Kodedela

[HCMC STC Jan 2015] Making IT Count – Agile Test Metrics

Ho Chi Minh City Software Testing Club

Case study on Test Automation under RUP

Oak Systems

What is Regression Testing? | Edureka

Edureka!

What will testing look like in year 2020

BugRaptors

Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Markus Borg

Defect MgmtBugDay Bangkok 2009: Defect Management

guest476528

Regression testing

Harsh verma

Regression and performance testing

Himanshu

Dominic Maes - Testing "slow flows" Fast, Automated End-2-End Testing using i...

TEST Huddle

An Industrial Case Study on the Automated Detection of Performance Regression...

SAIL_QU

Testing 3: Types Of Tests That May Be Required

ArleneAndrews2

Seven testing principles

Vaibhav Dash

Risk-based Testing

Johan Hoberg

Application performance testing services

Alisha Henderson

What's hot (20)

Hello

Risk based regression testing approach

Reliable Relevant Metrics to the Right Audience - Manual Testing Whitepaper

Regression testing

Machine learning in software testing

Predictive Analytics in Software Testing

[HCMC STC Jan 2015] Making IT Count – Agile Test Metrics

Case study on Test Automation under RUP

What is Regression Testing? | Edureka

What will testing look like in year 2020

Automation in the Bug Flow - Machine Learning for Triaging and Tracing

Defect MgmtBugDay Bangkok 2009: Defect Management

Regression testing

Regression and performance testing

Dominic Maes - Testing "slow flows" Fast, Automated End-2-End Testing using i...

An Industrial Case Study on the Automated Detection of Performance Regression...

Testing 3: Types Of Tests That May Be Required

Seven testing principles

Risk-based Testing

Application performance testing services

Viewers also liked

What are the Characteristics of High-rated Apps

SAIL_QU

A Case Study of Bias in Bug-Fix Datasets

SAIL_QU

Impact of Installation Counts on Perceived Quality: A Case Study on Debian

SAIL_QU

Detecting Interaction Coupling from Task Interaction Histories

SAIL_QU

This document discusses detecting interaction coupling from task interaction histories to understand software maintenance tasks. Interaction coupling is defined as two artifacts that are frequently examined or changed together, indicating a latent relationship. The researchers analyzed task interaction histories from three programmers on two projects to identify interaction coupling patterns. They found patterns like moving adaptation and evolving interfaces. Analyzing interaction coupling can provide insights into software evolution and the structure of maintenance tasks.

Mining Performance Regression Testing Repositories for Automated Performance ...

SAIL_QU

This document proposes a technique to automate the analysis of performance regression tests by deriving performance signatures from historical data and comparing new tests against these signatures. It describes challenges with current manual analysis practices, including the large number of metrics, lack of up-to-date performance objectives, and subjectivity of analysts. The proposed technique involves normalizing metrics, discretizing values, deriving signatures from past tests, and comparing signatures to detect regressions. Case studies on open source and enterprise systems show the technique can precisely detect 75-100% of regressions and recall 52-67% compared to manual analysis.

Log Engineering: Towards Systematic Log Mining to Support the Development of ...

SAIL_QU

This document presents the thesis work on systematic log mining to support the development of ultra-large scale systems. There are five key findings from prior log mining research: 1) Little focus on logs in source code; 2) Little use of logs from development; 3) Ad hoc log transformation; 4) Lack of scalability; 5) Limited use for software development activities. The thesis proposes two parts: 1) Study challenges of understanding and evolving logs; 2) Approaches using logs to support development like testing and deployment verification. Evaluation shows logs help address real inquiries, evolve over time, correlate with defects, and can verify big data application deployment with high precision.

An Automated Approach for Recommending When to Stop Performance Tests

SAIL_QU

—Performance issues are often the cause of failures in today’s large-scale software systems. These issues make performance testing essential during software maintenance. However, performance testing is faced with many challenges. One challenge is determining how long a performance test must run. Although performance tests often run for hours or days to uncover performance issues (e.g., memory leaks), much of the data that is generated during a performance test is repetitive. Performance analysts can stop their performance tests (to reduce the time to market and the costs of performance testing) if they know that continuing the test will not provide any new information about the system’s performance. To assist performance analysts in deciding when to stop a performance test, we propose an automated approach that measures how much of the data that is generated during a performance test is repetitive. Our approach then provides a recommendation to stop the test when the data becomes highly repetitive and the repetitiveness has stabilized (i.e., little new information about the systems’ performance is generated).

Empircal Studies of Performance Bugs & Performance Analysis Approaches for La...

SAIL_QU

This document summarizes three empirical studies on software performance bugs: 1. A quantitative study found performance bugs have different characteristics than other bugs, such as taking longer to fix, but findings were not consistent across projects. 2. A qualitative study of bug reports found performance bugs have a higher impact, more context in reports, and require more collaborative fixing. 3. A user-centric performance analysis study found examining performance from users' perspectives provided a complementary view to traditional scenario-centric analyses. Considering individual users revealed different performance trends and consistency than aggregate analyses.

Large-Scale Empirical Studies of Mobile Apps

SAIL_QU

The document discusses large-scale empirical studies of mobile apps. It describes a dataset of over 500,000 app versions and 200,000 apps that was crawled from the Google Play store in 2011. The dataset includes app files, metadata, and information on software reuse within apps. The rating system for apps is also examined, noting that global ratings are more resilient than version ratings, which can be influenced by only a small number of raters. Most apps recover from rating drops within a few versions. Factors like app size and code reuse are found to correlate with rating increases. As over 90% of apps are free to download, many developers rely on in-app advertisements to generate revenue. Different ad serving models like ad networks,

Modeling the Performance of Ultra-Large-Scale Systems Using Layered Simulations

SAIL_QU

This document discusses modeling the performance of ultra-large-scale systems using layered simulations. It presents a three-layered simulation model consisting of a world view layer, component layer, and physical layer. This model allows performance to be evaluated at different levels of abstraction. Two case studies are described that apply this modeling approach: one identifies a CPU bottleneck in an RSS cloud system, and another compares centralized versus distributed monitoring of a large-scale system. The layered simulation model enables performance to be evaluated earlier in the development process and helps different stakeholders understand performance impacts.

Log Engineering: Towards Systematic Log Mining to Support the Development of ...

SAIL_QU

This document presents the thesis work on systematic log mining to support the development of ultra-large scale systems. There are five key findings from prior log mining research: 1) Little focus on logs in source code; 2) Little use of logs from development; 3) Ad hoc log transformation; 4) Lack of scalability; 5) Limited use for software development activities. The thesis proposes two parts: 1) Study challenges of understanding and evolving logs; 2) Approaches using logs to support development like testing and deployment verification. Evaluation shows logs provide insights into defects and an approach using execution sequences from logs precisely verifies big data application deployments with 86% effort reduction.

Automated Discovery of Performance Regressions in Enterprise Applications

SAIL_QU

This document summarizes the author's research on automated discovery of performance regressions in enterprise applications. It discusses challenges with current performance verification practices, and proposes approaches at the design and implementation levels. At the design level, it suggests using layered simulation models to evaluate design changes early. At the implementation level, it presents techniques to analyze large performance datasets, detect regressions while limiting subjectivity, and deal with tests in heterogeneous environments. Case studies show the approaches achieve 75-100% precision and 52-80% recall. The research aims to help analysts efficiently identify performance regressions.

Animated Visualization of Software History Using Software Evolution Storyboards

SAIL_QU

This document proposes and demonstrates evolution storyboards, a new technique for visualizing software evolution over time. Evolution storyboards display the evolution of a software system through a series of panels that depict important events and periods in the system's lifetime. Nodes represent code elements and edges show code changes. Properties of the layout algorithm and approach are described. The technique is illustrated using open source systems and shown to help understand software structure, explain decay, and identify refactoring opportunities.

Understanding the Rationale for Updating a Function's Comment

SAIL_QU

This document discusses a study that used machine learning to predict whether a comment associated with a code function would be updated based on characteristics of the function, change, and development process. The study analyzed comment update histories from four open source projects over 39 years. It found that comments were more likely to be updated for complex functions with many comments, changes that fixed bugs or had many dependent changes, and on certain weekdays. Combining data from all projects improved the predictive model's performance, with an overall misclassification rate of around 20%.

Supporting Software Evolution Using Adaptive Change Propagation

SAIL_QU

This document discusses an approach to adaptive change propagation using heuristics. It proposes tracking the performance of different change propagation heuristics over time and selecting the best performing heuristic for each entity in a Best Heuristic Table (BHT). An empirical study of open source projects found the adaptive heuristics approach achieved higher precision than traditional static heuristics, identifying relevant changed entities 23% more accurately. Over time, as projects evolved, the most effective heuristic was often recording historical changes. Integrating improved historical heuristics into the approach increased performance further, with adaptive heuristics coming within 91-93% of the optimal heuristic.

Viewers also liked (15)

What are the Characteristics of High-rated Apps

A Case Study of Bias in Bug-Fix Datasets

Impact of Installation Counts on Perceived Quality: A Case Study on Debian

Detecting Interaction Coupling from Task Interaction Histories

Mining Performance Regression Testing Repositories for Automated Performance ...

Log Engineering: Towards Systematic Log Mining to Support the Development of ...

An Automated Approach for Recommending When to Stop Performance Tests

Empircal Studies of Performance Bugs & Performance Analysis Approaches for La...

Large-Scale Empirical Studies of Mobile Apps

Modeling the Performance of Ultra-Large-Scale Systems Using Layered Simulations

Log Engineering: Towards Systematic Log Mining to Support the Development of ...

Automated Discovery of Performance Regressions in Enterprise Applications

Animated Visualization of Software History Using Software Evolution Storyboards

Understanding the Rationale for Updating a Function's Comment

Supporting Software Evolution Using Adaptive Change Propagation

Similar to Automated Detection of Performance Regressions Using Statistical Process Control Techniques

Compsac2010 malik

SAIL_QU

The document presents a methodology for automatically comparing the performance of subsystems in a large enterprise system using load testing. The methodology involves collecting performance counter data during load tests, normalizing and reducing the data, crafting performance signatures for subsystems, and measuring deviations. It was found to accurately identify subsystems with performance deviations compared to a baseline. It could identify deviations within 10 minutes, allowing tests to be stopped early. Performance was best at a 10 minute sampling interval with a balance of recall and precision.

Icse2013 malik

SAIL_QU

The document describes approaches for automatically detecting performance deviations in load testing of large scale systems. It presents four approaches: three unsupervised using clustering, random sampling, and PCA; and one supervised using WRAPPER. A case study evaluates the approaches on an open-source ecommerce system and industrial telecom system, finding the supervised WRAPPER and unsupervised PCA approaches most effective with high precision and recall. The WRAPPER approach allows real-time analysis but requires more manual overhead during training.

Aplication of on line data analytics to a continuous process polybetene unit

Emerson Exchange

This Emerson Exchange, 2013 presentation summarizes the 2013 field trail results achieved by applying on-line continuous data analytics to Lubrizol’s continuous polybutene process. Continuous data analytics may be used to provide an on-line prediction of quality parameters, and enable on-line detection of fault conditions. Information is provided on improvements made in the model used for quality parameter prediction, and how the field trail platform was integrated into the process unit. Presenters Qiwei Li, production engineer, Efren Hernandez and Robert Wojewodka, Lubrizol Corp., and Terry Blevins, principal technologist at Emerson, won best in conference in the process optimization track for this presentation.

SRA final project

ssuser542c21

This document summarizes a proposed system risk analysis method using bearing sensor data. It includes: 1. An introduction describing condition-based maintenance using bearing sensor data to detect anomalies. 2. A proposed method using a convolutional autoencoder (CAE) for feature extraction from bearing signals followed by a T2 control chart and EWMA for fault detection. Only normal bearing data is used to train these models. 3. An experiment applying the method to data from a bearing test rig, comparing normal and outer race fault bearings. The CAE and statistical techniques are evaluated on their ability to detect faults in test data.

TQM

YogeshBisht36

This document provides an overview of total quality management (TQM) concepts for manufacturing, including standard operating procedures (SOP), statistical process control (SPC), process capability indices, and control charts. It discusses how SOPs and quality control process charts are used to standardize operations and check quality. Statistical process control tools like control charts help monitor processes for variation. Process capability indices like Cp and Cpk indicate if a process is capable of meeting specifications. Together, these TQM elements aim to reduce variation and improve quality in manufacturing operations and supply chains.

Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)

Bibhuti Prasad Nanda

The document discusses a case study conducted at W.R. Grace to evaluate the measurement system for an important quality variable, CTQ1, at four worldwide production locations. An MSA study was performed to determine the %GRR, P/T ratio, and bias of the CTQ1 measurement. The results showed high measurement variation contributed by the operators and interactions between operators and samples. Process data was then linked to the MSA study, showing representative samples were selected and improvements to the measurement system could reduce hidden factory costs from over-processing and rework.

Improving continuous process operation using data analytics delta v applicati...

Emerson Exchange

The document discusses applying data analytics to improve continuous process operation. It describes developing models using process data to enable online fault detection and quality parameter prediction. A field trial was conducted on a CO2 recovery process that uses a 2-stage flash skid. The data analytics models allow operators to more quickly respond to conditions impacting process operation and quality.

Automated parameter optimization should be included in future  defect predict...

Chakkrit (Kla) Tantithamthavorn

Automated parameter optimization techniques like Caret can substantially improve the performance of defect prediction models over using default parameter settings. When applied to 18 datasets using 26 classification techniques, Caret optimized models improved average AUC performance by up to 40 percentage points for some techniques. Caret optimized models also tended to be more stable than default models, with the stability ratio being lower than 1 for 35% of techniques studied. Overall, automated parameter optimization can significantly enhance both the performance and stability of defect prediction models.

Automated Parameterization of Performance Models from Measurements

Weikun Wang

This is a tutorial presented in ICPE 2016 (https://icpe2016.spec.org/). In this tutorial, we present the problem of estimating parameters of performance models from measurements of real systems and discuss algorithms that can support researchers and practitioners in this task. The focus lies on performance models based on queueing systems, where the estimation of request arrival rates and service demands is a required input to the model. In the tutorial, we review existing estimation methods for service demands and present models to characterize time-varying arrival processes. The tutorial also demonstrates the use of relevant tools that automate demand estimation, such as LibRede, FG and M3A.

Smallsat 2021

klepsydratechnologie

Artificial intelligence (AI) has already been attracting the attention of deep tech investors for some years. The reasons why are clear. In its ‘Sizing The Prize’ analysis of artificial intelligence (AI), PwC forecast that AI will contribute $15.7 trillion to the global economy by 2030, with the ‘AI boost’ available to most national economies being approximately 26%. But what investors often overlook is that AI is not singular. Many individual components must work together to create AI. At its core artificial intelligence consists essentially of detecting statistical patterns in signals with many dimensions, such as analysis of audio frequencies (voice recognition) or high-resolution images (face recognition). The repetition of this search in order to detect these patterns is the basis of artificial intelligence. There are usually three components to AI: First, given a data set, learning what the patterns are. Second, building a model that can detect these patterns. Third, model deployment to the target environment. Traditionally, data mining or learning was done by experts in the matter who would develop some sort of classifier or detector based on certain features, and then try to see their correlations. This process was tedious and time consuming. https://klepsydra.com/cityam-ai-on-the-edge/

Thesis

Nachiket Kansara

Neural network modeling and control of data centers is presented. Data centers consume significant and increasing amounts of energy. A neural network model is developed and trained using steady state and transient data from a physical data center setup to map temperature outputs. The neural network accurately models temperatures with 95% accuracy. A neural network controller is then designed using the inverse model to stabilize temperatures according to reference values in response to varying workloads and power consumption. The controller successfully regulates temperatures in real-time simulation. Future work includes implementing the control on an actual system and expanding the control parameters.

ODVSML_Presentation

Shounak Mitra

This document presents a study on using vibration sensors and machine learning methods for occupancy detection. It discusses current energy issues in buildings and the need for an occupancy detection system. It describes using vibration sensors as an alternative to other sensor types. The study uses two wireless accelerometers to collect vibration data from a hallway and classroom as people walk by. Features are extracted from the data and a neural network is used to classify the number of occupants. The neural network model achieves over 90% accuracy in detecting 1-6 occupants. The study concludes neural networks provide the best results for occupancy detection compared to other machine learning models.

Deep time-to-failure: predicting failures, churns and customer lifetime with ...

Data Science Milan

1. The document discusses using deep learning models like recurrent neural networks to predict time-to-failure events from time series data. It specifically focuses on a technique called Deep Time-to-Failure which extends a Weibull Time-to-Event Recurrent Neural Network to predict a single failure event. 2. As a case study, the technique is applied to predict failure times of NASA jet engines using sensor data as inputs. The model is trained on historical sequences of data to learn the distribution of time-to-failure and can provide probabilistic predictions and confidence intervals. 3. Key aspects of the Deep Time-to-Failure approach include using censored and uncensored training data, consuming raw time series as input

From sensor readings to prediction: on the process of developing practical so...

Manuel Martín

Automatic data acquisition systems provide large amounts of streaming data generated by physical sensors. This data forms an input to computational models (soft sensors) routinely used for monitoring and control of industrial processes, traffic patterns, environment and natural hazards, and many more. The majority of these models assume that the data comes in a cleaned and pre-processed form, ready to be fed directly into a predictive model. In practice, to ensure appropriate data quality, most of the modelling efforts concentrate on preparing data from raw sensor readings to be used as model inputs. This study analyzes the process of data preparation for predictive models with streaming sensor data. We present the challenges of data preparation as a four-step process, identify the key challenges in each step, and provide recommendations for handling these issues. The discussion is focused on the approaches that are less commonly used, while, based on our experience, may contribute particularly well to solving practical soft sensor tasks. Our arguments are illustrated with a case study in the chemical production industry.

Issre2010 malik

SAIL_QU

This document describes a methodology for pinpointing the subsystems responsible for performance deviations during load testing of large software systems. The methodology involves: 1) preprocessing performance counter data, 2) crafting performance signatures for each subsystem using principal component analysis, 3) identifying deviations between a baseline and load tests, and 4) pinpointing the subsystems most likely causing deviations based on correlation between signature counters. The methodology is demonstrated through load tests on a Dell DVD store system, accurately pinpointing affected subsystems under different stress conditions.

Heuristic design of experiments w meta gradient search

Greg Makowski

Once you have started learning about predictive algorithms, and the basic knowledge discovery in databases process, what is the next level of detail to learn for a consulting project? * Give examples of the many model training parameters * Track results in a "model notebook" * Use a model metric that combines both accuracy and generalization to rank models * How to strategically search over the model training parameters - use a gradient descent approach * One way to describe an arbitrarily complex predictive system is by using sensitivity analysis

Next generation alerting and fault detection, SRECon Europe 2016

Dieter Plaetinck

There is a common belief that in order to solve more [advanced] alerting cases and get more complete coverage, we need complex, often math-heavy solutions based on machine learning or stream processing. This talk sets context and pro's/cons for such approaches, and provides anecdotal examples from the industry, nuancing the applicability of these methods. We then explore how we can get dramatically better alerting, as well as make our lives a lot easier by optimizing workflow and machine-human interaction through an alerting IDE (exemplified by bosun), basic logic, basic math and metric metadata, even for solving complicated alerting problems such as detecting faults in seasonal timeseries data. https://www.usenix.org/conference/srecon16europe/program/presentation/plaetinck

Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation

Thomas Ploetz

Tutorial @Ubicomp 2015: Bridging the Gap -- Machine Learning for Ubiquitous Computing (evaluation session). A tutorial on promises and pitfalls of Machine Learning for Ubicomp (and Human Computer Interaction). From Practitioners for Practitioners. Presenter: Nils Hammerla <n.hammerla@gmail.com> video recording of talks as they wer held at Ubicomp: https://youtu.be/LgnnlqOIXJc?list=PLh96aGaacSgXw0MyktFqmgijLHN-aQvdq

aa-automation-apc-complex-industrial-processes

David Lyon

This document describes the implementation of a new adaptive controller called the Universal Adaptive Controller (UAC) at two industrial sites. The UAC uses an orthogonal function space rather than a predetermined mathematical model to represent the plant transfer function, allowing it to identify the process dynamics with minimal prior knowledge. The UAC algorithm estimates plant parameters using Laguerre functions and recursive least squares. Testing at a chlor-alkali plant and precipitated calcium carbonate plant showed the UAC was able to control complex industrial processes adaptively with improved identification compared to other adaptive control schemes.

Hp 34401 a multimeter

om_jambrong

This document is a user's guide for the HP 34401A Multimeter. The guide contains information about: - The front panel features of the multimeter, including measurement function keys, math operation keys, and menu operation keys. - Flexible measurement capabilities such as high reading rates, data storage, and limit testing. - Programming the multimeter using standard languages like SCPI and HP 3478A. - Configuration options including interface selection, address setting, and calibration security. - Application examples and a measurement tutorial to help users obtain accurate readings.

Similar to Automated Detection of Performance Regressions Using Statistical Process Control Techniques (20)

Compsac2010 malik

Icse2013 malik

Aplication of on line data analytics to a continuous process polybetene unit

SRA final project

TQM

Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)

Improving continuous process operation using data analytics delta v applicati...

Automated parameter optimization should be included in future  defect predict...

Automated Parameterization of Performance Models from Measurements

Smallsat 2021

Thesis

ODVSML_Presentation

Deep time-to-failure: predicting failures, churns and customer lifetime with ...

From sensor readings to prediction: on the process of developing practical so...

Issre2010 malik

Heuristic design of experiments w meta gradient search

Next generation alerting and fault detection, SRECon Europe 2016

Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation

aa-automation-apc-complex-industrial-processes

Hp 34401 a multimeter

More from SAIL_QU

Studying the Integration Practices and the Evolution of Ad Libraries in the G...

SAIL_QU

In-app advertisements have become a major revenue for app developers in the mobile app economy. Ad libraries play an integral part in this ecosystem as app developers integrate these libraries into their apps to display ads. However, little is known about how app developers integrate these libraries with their apps and how these libraries have evolved over time. In this thesis, we study the ad library integration practices and the evolution of such libraries. To understand the integration practices of ad libraries, we manually study apps and derive a set of rules to automatically identify four strategies for integrating multiple ad libraries. We observe that integrating multiple ad libraries commonly occurs in apps with a large number of downloads and ones in categories with a high percentage of apps that display ads. We also observe that app developers prefer to manage their own integrations instead of using off the shelf features of ad libraries for integrating multiple ad libraries. To study the evolution of ad libraries, we conduct a longitudinal study of the 8 most popular ad libraries. In particular, we look at their evolution in terms of size, the main drivers for releasing a new ad library version, and their architecture. We observe that ad libraries are continuously evolving with a median release interval of 34 days. Some ad libraries have grown exponentially in size (e.g., Facebook Audience Network ad library), while other libraries have worked to reduce their size. To study the main drivers for releasing an ad library version, we manually study the release notes of the eight studied ad libraries. We observe that ad library developers continuously update their ad libraries to support a wider range of Android versions (i.e., to ensure that more devices can use the libraries without errors). Finally, we derive a reference architecture for ad libraries and study how the studied ad libraries diverged from this architecture during our study period. Our findings can assist ad library developers to understand the challenges for developing ad libraries and the desired features of these libraries.

Studying the Dialogue Between Users and Developers of Free Apps in the Google...

SAIL_QU

Improving the testing efficiency of selenium-based load tests

SAIL_QU

Studying User-Developer Interactions Through the Distribution and Reviewing M...

SAIL_QU

This document discusses studying user-developer interactions through the distribution and reviewing mechanisms of the Google Play Store. It analyzes emergency updates made by developers to fix issues, the dialogue between users and developers through reviews and responses, and how the reviewing mechanism can help identify good and bad updates. The study found that responding to reviews is six times more likely to increase an app's rating, with 84% of rating increases going to four or five stars. Three common patterns of developer responses were identified: responding to negative or long reviews, only negative reviews, and reviews shortly after an update.

Studying online distribution platforms for games through the mining of data f...

SAIL_QU

Our studies of Steam platform data provided insights into online game distribution: 1) Urgent game updates were used to fix crashes, balance issues, and functionality; frequent updaters released more 0-day patches. 2) The Early Access model attracted indie developers and increased game participation; reviews were more positive during Early Access. 3) Game reviews were typically short and in English; sales increased review volume more than new updates; negative reviews came after longer play.

Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...

SAIL_QU

This study analyzed factors that impact the speed of questions receiving accepted answers on four popular Stack Exchange websites: Stack Overflow, Mathematics, Ask Ubuntu, and Super User. The researchers examined question, answerer, asker, and answer factors from over 150,000 questions. They built classification models and found that key factors for fast answers included the past speed of answerers, length of the question, and past speed of answers for the question's tags. The models achieved AUCs of 0.85-0.95. Fast answers relied heavily on answerers, especially frequent answerers. The study suggests improving incentives for non-frequent and more difficult questions to attract diverse answerers.

Investigating the Challenges in Selenium Usage and Improving the Testing Effi...

SAIL_QU

Selenium is a popular tool for browser-based automation testing. The author analyzes challenges in using Selenium by mining Selenium questions on Stack Overflow. Programming language-related questions, especially for Java and Python, are most common and growing fastest. Less than half of questions receive accepted answers, and questions about browsers and components take longest. In the second part, the author develops an approach to improve efficiency of Selenium-based load testing by sharing browsers among user instances. This increases the number of error-free users by 20-22% while reducing memory usage.

Mining Development Knowledge to Understand and Support Software Logging Pract...

SAIL_QU

This document summarizes Heng Li's PhD thesis on mining development knowledge to understand and support software logging practices. It discusses how logging code is used to record runtime information but can be difficult for developers to maintain. The thesis aims to understand current logging practices and develop tools by mining change history, source code, issue reports, and other development knowledge. It presents research that analyzes logging-related issues to identify developers' logging concerns, uses code topics and structure to predict where logging statements should be added, leverages code changes to suggest when logging code needs updating, and applies machine learning models to recommend appropriate log levels.

Which Log Level Should Developers Choose For a New Logging Statement?

SAIL_QU

The document discusses choosing an appropriate log level when adding a new logging statement. It finds that an ordinal regression model can effectively model log levels, achieving an AUC of 0.76-0.81 in within-project evaluation and 0.71-0.8 in cross-project evaluation. The most influential factors for determining log levels vary between projects and include metrics related to the logging statement, containing code block, and file as well as code change and historical change metrics.

Towards Just-in-Time Suggestions for Log Changes

SAIL_QU

The document presents a study on providing just-in-time suggestions for log changes when developers make code changes. The researchers analyzed over 32,000 log changes from 4 systems. They found 20 reasons for log changes that fall into 4 categories: block changes, log improvements, dependence-driven changes, and logging issues. A random forest classifier using 25 software metrics related to code changes, history, and complexity achieved 0.84-0.91 AUC in predicting whether a log change is needed. Change metrics and product metrics were the most influential factors. The study aims to help developers make better logging decisions for failure diagnosis.

The Impact of Task Granularity on Co-evolution Analyses

SAIL_QU

The document discusses how task granularity at different levels (e.g. commits, pull requests, work items) can impact analyses of co-evolution in software projects. It finds that analyzing at the commit-level can overlook relationships between tasks that span multiple commits. Work item level analysis is recommended to provide a more complete view of co-evolution, as median of 29% of work items consist of multiple commits, and analyzing at the commit level would miss 24% of co-changed files and inability to group 83% of related commits.

A Framework for Evaluating the Results of the SZZ Approach for Identifying Bu...

SAIL_QU

How are Discussions Associated with Bug Reworking? An Empirical Study on Open...

SAIL_QU

1) Initial bug fix discussions with more comments and more developers participating are more likely to experience later bug reworking through re-opening or re-patching of the bug. 2) Manual analysis found that defective initial fixes and failure to reach consensus in discussions contributed to later reworking. 3) For re-opened bugs, initial discussions focused on addressing a particular problem through a burst of comments, while re-patched bugs lacked thorough code review and testing during the initial fix period.

A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...

SAIL_QU

This study examined the relationship between mobile device attributes and user-perceived quality of Android apps. The researchers analyzed 150,373 star ratings from Google Play across 30 devices and 280 apps. They found that the perceived quality of apps varies across devices, and having better characteristics of an attribute does not necessarily correlate with higher quality. Device OS version, resolution, and CPU showed significant relationships with ratings, as did some app attributes like lines of code and number of inputs. However, some device attributes had stronger relationships than app attributes.

A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...

SAIL_QU

This document presents the results of a large-scale study on the impact of feature selection techniques on defect classification models. The study used expanded scopes including multiple datasets from NASA and PROMISE with different feature types, more classification techniques from different paradigms, and additional feature selection techniques. The results show that correlation-based feature subset selection techniques like FS1 and FS2 consistently appear in the top ranks across most of the datasets, projects within the datasets, and classification techniques. The document concludes that future defect classification studies should consider applying correlation-based feature selection techniques.

Studying the Dialogue Between Users and Developers of Free Apps in the Google...

SAIL_QU

The study analyzes user-developer interactions through reviews and responses on the Google Play Store. It finds that responding to reviews has a significant positive impact, with 84% of rating increases due to the developer addressing the issue or providing guidance. Three common response patterns were identified: only negative reviews, negative or longer reviews, and reviews shortly after an update. Developers most often thank the user, ask for details, provide guidance, or ask for an endorsement. Guidance responses can address common issues through FAQs. The analysis considered over 2,000 apps, 355,000 review changes, 128,000 responses, and 4 million reviews.

What Do Programmers Know about Software Energy Consumption?

SAIL_QU

This document summarizes the results of a survey of 122 programmers about their knowledge of software energy consumption. The survey found that programmers have limited awareness of energy consumption and how to reduce it. They were unaware of the main causes of high energy usage. Programmers lacked knowledge about how to properly rank the energy consumption of different hardware components and were unfamiliar with strategies to improve efficiency, such as minimizing I/O and avoiding polling. The study concludes that programmers would benefit from more education on software energy usage and its causes.

Threshold for Size and Complexity Metrics: A Case Study from the Perspective ...

SAIL_QU

Revisiting the Experimental Design Choices for Approaches for the Automated R...

SAIL_QU

Prior research on automated duplicate issue report retrieval focused on improving performance metrics like recall rate. The author revisits experimental design choices from four perspectives: needed effort, data changes, data filtration, and evaluation process. The thesis contributions are: 1) Showing the importance of considering needed effort in performance measurement. 2) Proposing a "realistic evaluation" approach and analyzing prior findings with it. 3) Developing a genetic algorithm to filter old issue reports and improve performance. 4) Highlighting the impact of "just-in-time" features on evaluation. The findings help better understand benefits and limitations of prior work in this area.

Measuring Program Comprehension: A Large-Scale Field Study with Professionals

SAIL_QU

The document summarizes a large-scale field study that tracked the program comprehension activities of 78 professional developers over 3,148 hours. The study found that: 1) Program comprehension accounted for approximately 58% of developers' time on average, with navigation and editing making up the remaining portions. 2) Developers frequently used web browsers and document editors to aid comprehension beyond just IDEs. 3) Interviews and observations revealed that insufficient documentation, unclear code, and complex inheritance hierarchies contributed to long comprehension sessions.

More from SAIL_QU (20)