—Performance issues are often the cause of failures in
today’s large-scale software systems. These issues make performance
testing essential during software maintenance. However,
performance testing is faced with many challenges. One challenge
is determining how long a performance test must run. Although
performance tests often run for hours or days to uncover
performance issues (e.g., memory leaks), much of the data that is
generated during a performance test is repetitive. Performance
analysts can stop their performance tests (to reduce the time
to market and the costs of performance testing) if they know
that continuing the test will not provide any new information
about the system’s performance. To assist performance analysts
in deciding when to stop a performance test, we propose an
automated approach that measures how much of the data that is
generated during a performance test is repetitive. Our approach
then provides a recommendation to stop the test when the data
becomes highly repetitive and the repetitiveness has stabilized
(i.e., little new information about the systems’ performance is
generated).
Empirical Evaluations in Software Engineering Research: A Personal PerspectiveSAIL_QU
1. The document discusses various pitfalls and issues with empirical studies in software engineering research, such as including correlated metrics, not handling imbalanced data properly, and not using appropriate validation techniques.
2. It notes that many early studies did not address these issues adequately. Proper techniques like resampling imbalanced data and using out-of-sample validation are important to avoid inaccurate results.
3. The presentation argues that researchers should focus more on deeper analysis and partnering with practitioners, rather than trying to generalize results or follow industry trends without proper evaluation. Trailblazing research and transparency should be encouraged.
ICSE2017 - Analytics Driven Load Testing: An Industrial Experience Report on ...Concordia University
This document discusses using data analytics to improve various phases of load testing large-scale systems. It describes measuring performance counters and logs from prior tests to design more realistic new tests that include user-centric behaviors. Repetitiveness of performance counters is measured to determine when tests can be stopped early. Test results are stored in repositories with visualizations to help analysts quickly diagnose issues. Open challenges remain in better leveraging load testing for continuous performance improvement.
Properly Understanding Latency is Hard — What We Learned When We Did it Corre...ScyllaDB
We applied Gil Tene’s “oh shit” talk on understanding latency to our data pipeline. This talk will describe the measurement challenges we faced, what we learned, and how we improved with those learnings. This was vital for our capacity planning to scale up our data pipeline multiple times. The story that a latency spectrum told us allowed us to identify surprising sources of latency.
Database performance improvement, a six sigma project (mesure) by nirav shah Nirav Shah
This document discusses improving the performance of the UATIMNG database. It includes measuring the query execution time and number of chained rows using a sampling plan. Statpack reports show the top time consuming activity is db file sequential read. Measurement system analysis was performed including a gauge R&R to determine the reproducibility and repeatability of the measurement system.
Iasi code camp 12 october 2013 performance testing for web applications with...Codecamp Romania
The document discusses performance testing a web application using JMeter. It describes registering a login scenario with JMeter by recording HTTP requests in the browser. Issues discovered include the login not succeeding and fewer than 100 concurrent users achieved. Modifications are made like adding a transaction controller and cache manager. Results are analyzed in reports showing response times increase with load and peak throughput of 6 requests/second. Server resources are monitored and best practices like validating actions and transaction data are discussed.
A Study on Privacy Level in Publishing Data of Smart Tap NetworkHa Phuong
This document describes a study on quantifying privacy levels in publishing smart tap network data. It outlines the background, related works, methodology, results and conclusions. The methodology proposes using entropy, specifically approximate entropy and sample entropy, to quantify the amount of human activity information contained in power consumption data. This "privacy level" is defined as the entropy rate of a data set relative to white noise. The study applies this methodology to analyze smart tap data from the IREF building over 5 weeks, setting parameters like time lag, pattern length m, and threshold r for calculating entropy. The results show entropy rates can help determine safe privacy levels for publishing power consumption data.
The document presents a methodology for automatically comparing the performance of subsystems in a large enterprise system using load testing. The methodology involves collecting performance counter data during load tests, normalizing and reducing the data, crafting performance signatures for subsystems, and measuring deviations. It was found to accurately identify subsystems with performance deviations compared to a baseline. It could identify deviations within 10 minutes, allowing tests to be stopped early. Performance was best at a 10 minute sampling interval with a balance of recall and precision.
Automated Parameterization of Performance Models from MeasurementsWeikun Wang
This is a tutorial presented in ICPE 2016 (https://icpe2016.spec.org/). In this tutorial, we present the problem of estimating parameters of performance models from measurements of real systems and discuss algorithms that can support researchers and practitioners in this task. The focus lies on performance models based on queueing systems, where the estimation of request arrival rates and service demands is a required input to the model. In the tutorial, we review existing estimation methods for service demands and present models to characterize time-varying arrival processes. The tutorial also demonstrates the use of relevant tools that automate demand estimation, such as LibRede, FG and M3A.
Empirical Evaluations in Software Engineering Research: A Personal PerspectiveSAIL_QU
1. The document discusses various pitfalls and issues with empirical studies in software engineering research, such as including correlated metrics, not handling imbalanced data properly, and not using appropriate validation techniques.
2. It notes that many early studies did not address these issues adequately. Proper techniques like resampling imbalanced data and using out-of-sample validation are important to avoid inaccurate results.
3. The presentation argues that researchers should focus more on deeper analysis and partnering with practitioners, rather than trying to generalize results or follow industry trends without proper evaluation. Trailblazing research and transparency should be encouraged.
ICSE2017 - Analytics Driven Load Testing: An Industrial Experience Report on ...Concordia University
This document discusses using data analytics to improve various phases of load testing large-scale systems. It describes measuring performance counters and logs from prior tests to design more realistic new tests that include user-centric behaviors. Repetitiveness of performance counters is measured to determine when tests can be stopped early. Test results are stored in repositories with visualizations to help analysts quickly diagnose issues. Open challenges remain in better leveraging load testing for continuous performance improvement.
Properly Understanding Latency is Hard — What We Learned When We Did it Corre...ScyllaDB
We applied Gil Tene’s “oh shit” talk on understanding latency to our data pipeline. This talk will describe the measurement challenges we faced, what we learned, and how we improved with those learnings. This was vital for our capacity planning to scale up our data pipeline multiple times. The story that a latency spectrum told us allowed us to identify surprising sources of latency.
Database performance improvement, a six sigma project (mesure) by nirav shah Nirav Shah
This document discusses improving the performance of the UATIMNG database. It includes measuring the query execution time and number of chained rows using a sampling plan. Statpack reports show the top time consuming activity is db file sequential read. Measurement system analysis was performed including a gauge R&R to determine the reproducibility and repeatability of the measurement system.
Iasi code camp 12 october 2013 performance testing for web applications with...Codecamp Romania
The document discusses performance testing a web application using JMeter. It describes registering a login scenario with JMeter by recording HTTP requests in the browser. Issues discovered include the login not succeeding and fewer than 100 concurrent users achieved. Modifications are made like adding a transaction controller and cache manager. Results are analyzed in reports showing response times increase with load and peak throughput of 6 requests/second. Server resources are monitored and best practices like validating actions and transaction data are discussed.
A Study on Privacy Level in Publishing Data of Smart Tap NetworkHa Phuong
This document describes a study on quantifying privacy levels in publishing smart tap network data. It outlines the background, related works, methodology, results and conclusions. The methodology proposes using entropy, specifically approximate entropy and sample entropy, to quantify the amount of human activity information contained in power consumption data. This "privacy level" is defined as the entropy rate of a data set relative to white noise. The study applies this methodology to analyze smart tap data from the IREF building over 5 weeks, setting parameters like time lag, pattern length m, and threshold r for calculating entropy. The results show entropy rates can help determine safe privacy levels for publishing power consumption data.
The document presents a methodology for automatically comparing the performance of subsystems in a large enterprise system using load testing. The methodology involves collecting performance counter data during load tests, normalizing and reducing the data, crafting performance signatures for subsystems, and measuring deviations. It was found to accurately identify subsystems with performance deviations compared to a baseline. It could identify deviations within 10 minutes, allowing tests to be stopped early. Performance was best at a 10 minute sampling interval with a balance of recall and precision.
Automated Parameterization of Performance Models from MeasurementsWeikun Wang
This is a tutorial presented in ICPE 2016 (https://icpe2016.spec.org/). In this tutorial, we present the problem of estimating parameters of performance models from measurements of real systems and discuss algorithms that can support researchers and practitioners in this task. The focus lies on performance models based on queueing systems, where the estimation of request arrival rates and service demands is a required input to the model. In the tutorial, we review existing estimation methods for service demands and present models to characterize time-varying arrival processes. The tutorial also demonstrates the use of relevant tools that automate demand estimation, such as LibRede, FG and M3A.
This document discusses input modeling for simulation and outlines 4 steps:
1) Collect data from the real system or use expert opinion if data is unavailable
2) Identify a probability distribution to represent the input process
3) Choose parameters for the distribution family by estimating from the data
4) Evaluate the chosen distribution through goodness of fit tests or create an empirical distribution if none is found
In this presentation we are going to summarize and share with you QA estimation approach that was developed and successfully applied on different projects at Testing Center of Excellence at Ciklum. We will consider factors and basis which should be considered while starting estimation process, QA Estimation approach for main and additional activities should be taken into account, try to compose estimation guide for Regression testing and find out how to adjust QA Estimation by risks/assumptions multipliers.
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsLionel Briand
1) The document describes a multi-objective search-based approach for minimizing and prioritizing acceptance test cases for cyber physical systems to address challenges like time overhead, uncertainties in execution time, and risks of hardware damage.
2) The approach models acceptance tests, minimizes test cases by removing redundant operations, and prioritizes test cases using Monte Carlo simulation to estimate execution times while optimizing for criticality and risk.
3) An empirical evaluation on an industrial case study of in-orbit satellite testing shows the approach generates test suites that cover more test cases and have lower hardware risk within time budgets compared to manually created test suites.
Log-Based Slicing for System-Level Test Cases (ISSTA 2021)Donghwan Shin
The document proposes a technique called DS3 (Decomposing System Test Cases) to decompose complex system-level test cases into finer-grained test cases using log-based program slicing. DS3 performs static slicing on test cases initially, then refines the slices by analyzing log files to identify dependencies between statements and global resources. This addresses limitations of traditional static and dynamic slicing for system test cases. An evaluation on two subject systems found that DS3 achieved 100% effective slicing compared to only 24-33% for static slicing alone, and the sliced test cases had significantly lower execution times while maintaining similar code coverage and fault detection capabilities.
Log-Based Slicing for System-Level Test CasesLionel Briand
DS3 is a technique that decomposes complex system test cases into simpler test cases using log-based program slicing. It performs the following steps:
1. Uses static slicing to initially decompose test cases based on assertions.
2. Analyzes test case execution logs to identify dependencies between statements and global resources.
3. Refines the static slices by adding missing dependencies identified in the logs.
4. Merges slices that share all non-assertion statements to minimize duplication.
An evaluation on two large systems found that DS3 produced well-formed test case slices and significantly reduced test case execution times compared to the original complex test cases and standard static slicing. The sliced test cases also maintained similar
The objectives of Deadlocks in Operating Systems are:
- To develop a description of deadlocks, which prevent sets of concurrent processes from completing their tasks
- To present a number of different methods for preventing or avoiding deadlocks in a computer system
Deadlock occurs when a set of processes are permanently blocked waiting for resources held by other processes in the set. There are four necessary conditions for deadlock: mutual exclusion, hold and wait, no preemption, and circular wait. Deadlock can be prevented by ensuring that one of the conditions is not met through various techniques like requiring processes request all resources at once or defining a resource ordering. Deadlock can also be avoided by checking if a requested allocation would lead to an unsafe state through methods like the banker's algorithm. Once deadlock occurs, processes can be aborted or have resources preempted to resolve the deadlock.
Leveraging data visualization to improve the efficiency of large-scale test automation infrastructure.
Watch the full talk at: http://youtube.com/watch?v=oRIci6n566w
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...Codemotion
Most of the existing tools and articles around chaos engineering focus on killing servers, but what do you do when you can no longer access the underlying infrastructure? How can we apply the same principles of chaos to a serverless architecture built around AWS Lambda functions? These serverless architectures have more inherent chaos and complexity than their serverful counterparts, and, we have less control over their runtime behaviour. Can we adapt existing practices to expose the inherent chaos in these systems? What are the limitations and new challenges that we need to consider?
Applying principles of chaos engineering to Serverless (CodeMotion Berlin)Yan Cui
Chaos engineering is a discipline that focuses on improving system resilience through experiments that expose the inherent chaos and failure modes in our system, in a controlled fashion, before these failure modes manifest themselves like a wildfire in production and impact our users.
Netflix is undoubtedly the leader in this field, but much of the publicised tools and articles focus on killing EC2 instances, and the efforts in the serverless community has been largely limited to moving those tools into AWS Lambda functions.
But how can we apply the same principles of chaos to a serverless architecture built around AWS Lambda functions?
These serverless architectures have more inherent chaos and complexity than their serverful counterparts, and, we have less control over their runtime behaviour. In short, there are far more unknown unknowns with these systems.
Can we adapt existing practices to expose the inherent chaos in these systems? What are the limitations and new challenges that we need to consider?
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...Codemotion
The document discusses applying principles of chaos engineering to serverless architectures. It describes challenges with chaos engineering in serverless due to the lack of servers to inject failures into. It then discusses approaches to injecting latency and errors into serverless functions and downstream services as part of controlled experiments to test resiliency. The goal is not to break systems but to uncover weaknesses and improve robustness. Containment of experiments is emphasized.
EuroSTAR Software Testing Conference 2009 presentation on Incremental Scenario Testing by Mattias Ratert. See more at conferences.eurostarsoftwaretesting.com/past-presentations/
The document provides an overview of testing fundamentals including:
- Exhaustive testing is not possible due to the infinite number of inputs, timing variations, and paths a program can take. Even for programs with a few inputs, the number of combinations grows exponentially.
- The purpose of testing is to find defects, not prove correctness, as it is impossible to prove a system has no defects. Testing aims to provide evidence of defects, not prove a system is perfect.
- Factors like invalid inputs, timing variations, and complex branching logic mean that testing all possibilities is infeasible for all but trivial programs. The goal is to have high test coverage rather than trying to test everything.
Talk for first-year PhD students at the CRG. The goal of the talk was to present scenarios that students will likely face and that can compromise reproducibility and efficiency in the analysis of data in the life sciences. Importantly, making the questions is probably more important than the given answers.
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos ToolkitSylvain Hellegouarch
A report generated by the Chaos Toolkit after a Chaos Engineering experiment against OpenFaaS on Kubernetes.
View the run of the experiment at https://asciinema.org/a/dv4cNOMC5k1oWDhWDe97d5eij
Unit iios process scheduling and synchronizationdonny101
The document discusses process scheduling and synchronization in operating systems. It covers CPU scheduling algorithms like first-come first-served, shortest job first, priority scheduling, and round robin. It also discusses concepts like critical section problem, synchronization methods using semaphores and monitors, and solutions to deadlocks. Process synchronization ensures that processes access shared resources in a mutually exclusive way.
The document summarizes a project to develop a queue management system called QMRAS to improve the experience of students requesting assistance from teaching assistants in computer labs. The system aims to 1) enhance the student experience when requesting help and 2) improve efficiency in allocating teaching resources. User testing found the system accurately estimated wait times for 85% of users and streamlined the help request process. While students found it helpful, teaching staff found refreshing the page frustrating. Overall, the project was successful in meeting its goals of enhancing the student experience and improving resource allocation efficiency.
The document outlines an accelerated life test plan for a steam iron to estimate its reliability over time. It describes developing a test plan that includes determining stress levels, sample sizes, failure definitions, and data collection methods. The plan involves testing steam iron samples under different heat stress levels by temperature to see how fabrics like cotton, wool, and nylon are affected. The goal is to analyze failure data to better understand failure modes and improve steam iron design reliability before the product is released.
Planning of experiment in industrial researchpbbharate
This document discusses key concepts in the design of experiments. It begins with definitions of systems and processes, and defines an experiment as a test where input variables are deliberately changed to observe their effects on outputs. The objectives of experiments are identified as understanding factor effects and developing models. Basic principles for experimental design are outlined, including randomization, replication, and blocking. Guidelines are provided for various steps in designing an experiment, from problem definition to statistical analysis and conclusions. Examples are given throughout to illustrate experimental design concepts.
Studying the Integration Practices and the Evolution of Ad Libraries in the G...SAIL_QU
In-app advertisements have become a major revenue for app developers in the mobile app economy. Ad libraries play an integral part in this ecosystem as app
developers integrate these libraries into their apps to display ads. However, little is known about how app developers integrate these libraries with their apps and how these libraries have evolved over time.
In this thesis, we study the ad library integration practices and the evolution of such libraries. To understand the integration practices of ad libraries, we manually study apps and derive a set of rules to automatically identify four strategies for integrating
multiple ad libraries. We observe that integrating multiple ad libraries commonly occurs in apps with a large number of downloads and ones in categories with a high percentage of apps that display ads. We also observe that app developers prefer to manage their own integrations instead of using off the shelf features of ad libraries for integrating multiple ad libraries.
To study the evolution of ad libraries, we conduct a longitudinal study of the 8 most popular ad libraries. In particular, we look at their evolution in terms of size, the main drivers for releasing a new ad library version, and their architecture. We observe that ad libraries are continuously evolving with a median release interval of 34 days. Some ad libraries have grown exponentially in size (e.g., Facebook Audience Network ad library), while other libraries have worked to reduce their size. To study the main drivers for releasing an ad library version, we manually study the release notes of the eight studied ad libraries. We observe that ad library developers continuously update their ad libraries to support a wider range of Android versions (i.e., to ensure that more devices can use the libraries without errors). Finally, we derive a reference architecture for ad libraries and study how the studied ad libraries diverged from this architecture during our study period.
Our findings can assist ad library developers to understand the challenges for developing ad libraries and the desired features of these libraries.
This document discusses input modeling for simulation and outlines 4 steps:
1) Collect data from the real system or use expert opinion if data is unavailable
2) Identify a probability distribution to represent the input process
3) Choose parameters for the distribution family by estimating from the data
4) Evaluate the chosen distribution through goodness of fit tests or create an empirical distribution if none is found
In this presentation we are going to summarize and share with you QA estimation approach that was developed and successfully applied on different projects at Testing Center of Excellence at Ciklum. We will consider factors and basis which should be considered while starting estimation process, QA Estimation approach for main and additional activities should be taken into account, try to compose estimation guide for Regression testing and find out how to adjust QA Estimation by risks/assumptions multipliers.
Test Case Prioritization for Acceptance Testing of Cyber Physical SystemsLionel Briand
1) The document describes a multi-objective search-based approach for minimizing and prioritizing acceptance test cases for cyber physical systems to address challenges like time overhead, uncertainties in execution time, and risks of hardware damage.
2) The approach models acceptance tests, minimizes test cases by removing redundant operations, and prioritizes test cases using Monte Carlo simulation to estimate execution times while optimizing for criticality and risk.
3) An empirical evaluation on an industrial case study of in-orbit satellite testing shows the approach generates test suites that cover more test cases and have lower hardware risk within time budgets compared to manually created test suites.
Log-Based Slicing for System-Level Test Cases (ISSTA 2021)Donghwan Shin
The document proposes a technique called DS3 (Decomposing System Test Cases) to decompose complex system-level test cases into finer-grained test cases using log-based program slicing. DS3 performs static slicing on test cases initially, then refines the slices by analyzing log files to identify dependencies between statements and global resources. This addresses limitations of traditional static and dynamic slicing for system test cases. An evaluation on two subject systems found that DS3 achieved 100% effective slicing compared to only 24-33% for static slicing alone, and the sliced test cases had significantly lower execution times while maintaining similar code coverage and fault detection capabilities.
Log-Based Slicing for System-Level Test CasesLionel Briand
DS3 is a technique that decomposes complex system test cases into simpler test cases using log-based program slicing. It performs the following steps:
1. Uses static slicing to initially decompose test cases based on assertions.
2. Analyzes test case execution logs to identify dependencies between statements and global resources.
3. Refines the static slices by adding missing dependencies identified in the logs.
4. Merges slices that share all non-assertion statements to minimize duplication.
An evaluation on two large systems found that DS3 produced well-formed test case slices and significantly reduced test case execution times compared to the original complex test cases and standard static slicing. The sliced test cases also maintained similar
The objectives of Deadlocks in Operating Systems are:
- To develop a description of deadlocks, which prevent sets of concurrent processes from completing their tasks
- To present a number of different methods for preventing or avoiding deadlocks in a computer system
Deadlock occurs when a set of processes are permanently blocked waiting for resources held by other processes in the set. There are four necessary conditions for deadlock: mutual exclusion, hold and wait, no preemption, and circular wait. Deadlock can be prevented by ensuring that one of the conditions is not met through various techniques like requiring processes request all resources at once or defining a resource ordering. Deadlock can also be avoided by checking if a requested allocation would lead to an unsafe state through methods like the banker's algorithm. Once deadlock occurs, processes can be aborted or have resources preempted to resolve the deadlock.
Leveraging data visualization to improve the efficiency of large-scale test automation infrastructure.
Watch the full talk at: http://youtube.com/watch?v=oRIci6n566w
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...Codemotion
Most of the existing tools and articles around chaos engineering focus on killing servers, but what do you do when you can no longer access the underlying infrastructure? How can we apply the same principles of chaos to a serverless architecture built around AWS Lambda functions? These serverless architectures have more inherent chaos and complexity than their serverful counterparts, and, we have less control over their runtime behaviour. Can we adapt existing practices to expose the inherent chaos in these systems? What are the limitations and new challenges that we need to consider?
Applying principles of chaos engineering to Serverless (CodeMotion Berlin)Yan Cui
Chaos engineering is a discipline that focuses on improving system resilience through experiments that expose the inherent chaos and failure modes in our system, in a controlled fashion, before these failure modes manifest themselves like a wildfire in production and impact our users.
Netflix is undoubtedly the leader in this field, but much of the publicised tools and articles focus on killing EC2 instances, and the efforts in the serverless community has been largely limited to moving those tools into AWS Lambda functions.
But how can we apply the same principles of chaos to a serverless architecture built around AWS Lambda functions?
These serverless architectures have more inherent chaos and complexity than their serverful counterparts, and, we have less control over their runtime behaviour. In short, there are far more unknown unknowns with these systems.
Can we adapt existing practices to expose the inherent chaos in these systems? What are the limitations and new challenges that we need to consider?
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...Codemotion
The document discusses applying principles of chaos engineering to serverless architectures. It describes challenges with chaos engineering in serverless due to the lack of servers to inject failures into. It then discusses approaches to injecting latency and errors into serverless functions and downstream services as part of controlled experiments to test resiliency. The goal is not to break systems but to uncover weaknesses and improve robustness. Containment of experiments is emphasized.
EuroSTAR Software Testing Conference 2009 presentation on Incremental Scenario Testing by Mattias Ratert. See more at conferences.eurostarsoftwaretesting.com/past-presentations/
The document provides an overview of testing fundamentals including:
- Exhaustive testing is not possible due to the infinite number of inputs, timing variations, and paths a program can take. Even for programs with a few inputs, the number of combinations grows exponentially.
- The purpose of testing is to find defects, not prove correctness, as it is impossible to prove a system has no defects. Testing aims to provide evidence of defects, not prove a system is perfect.
- Factors like invalid inputs, timing variations, and complex branching logic mean that testing all possibilities is infeasible for all but trivial programs. The goal is to have high test coverage rather than trying to test everything.
Talk for first-year PhD students at the CRG. The goal of the talk was to present scenarios that students will likely face and that can compromise reproducibility and efficiency in the analysis of data in the life sciences. Importantly, making the questions is probably more important than the given answers.
Exploring OpenFaaS autoscalability on Kubernetes with the Chaos ToolkitSylvain Hellegouarch
A report generated by the Chaos Toolkit after a Chaos Engineering experiment against OpenFaaS on Kubernetes.
View the run of the experiment at https://asciinema.org/a/dv4cNOMC5k1oWDhWDe97d5eij
Unit iios process scheduling and synchronizationdonny101
The document discusses process scheduling and synchronization in operating systems. It covers CPU scheduling algorithms like first-come first-served, shortest job first, priority scheduling, and round robin. It also discusses concepts like critical section problem, synchronization methods using semaphores and monitors, and solutions to deadlocks. Process synchronization ensures that processes access shared resources in a mutually exclusive way.
The document summarizes a project to develop a queue management system called QMRAS to improve the experience of students requesting assistance from teaching assistants in computer labs. The system aims to 1) enhance the student experience when requesting help and 2) improve efficiency in allocating teaching resources. User testing found the system accurately estimated wait times for 85% of users and streamlined the help request process. While students found it helpful, teaching staff found refreshing the page frustrating. Overall, the project was successful in meeting its goals of enhancing the student experience and improving resource allocation efficiency.
The document outlines an accelerated life test plan for a steam iron to estimate its reliability over time. It describes developing a test plan that includes determining stress levels, sample sizes, failure definitions, and data collection methods. The plan involves testing steam iron samples under different heat stress levels by temperature to see how fabrics like cotton, wool, and nylon are affected. The goal is to analyze failure data to better understand failure modes and improve steam iron design reliability before the product is released.
Planning of experiment in industrial researchpbbharate
This document discusses key concepts in the design of experiments. It begins with definitions of systems and processes, and defines an experiment as a test where input variables are deliberately changed to observe their effects on outputs. The objectives of experiments are identified as understanding factor effects and developing models. Basic principles for experimental design are outlined, including randomization, replication, and blocking. Guidelines are provided for various steps in designing an experiment, from problem definition to statistical analysis and conclusions. Examples are given throughout to illustrate experimental design concepts.
Similar to An Automated Approach for Recommending When to Stop Performance Tests (20)
Studying the Integration Practices and the Evolution of Ad Libraries in the G...SAIL_QU
In-app advertisements have become a major revenue for app developers in the mobile app economy. Ad libraries play an integral part in this ecosystem as app
developers integrate these libraries into their apps to display ads. However, little is known about how app developers integrate these libraries with their apps and how these libraries have evolved over time.
In this thesis, we study the ad library integration practices and the evolution of such libraries. To understand the integration practices of ad libraries, we manually study apps and derive a set of rules to automatically identify four strategies for integrating
multiple ad libraries. We observe that integrating multiple ad libraries commonly occurs in apps with a large number of downloads and ones in categories with a high percentage of apps that display ads. We also observe that app developers prefer to manage their own integrations instead of using off the shelf features of ad libraries for integrating multiple ad libraries.
To study the evolution of ad libraries, we conduct a longitudinal study of the 8 most popular ad libraries. In particular, we look at their evolution in terms of size, the main drivers for releasing a new ad library version, and their architecture. We observe that ad libraries are continuously evolving with a median release interval of 34 days. Some ad libraries have grown exponentially in size (e.g., Facebook Audience Network ad library), while other libraries have worked to reduce their size. To study the main drivers for releasing an ad library version, we manually study the release notes of the eight studied ad libraries. We observe that ad library developers continuously update their ad libraries to support a wider range of Android versions (i.e., to ensure that more devices can use the libraries without errors). Finally, we derive a reference architecture for ad libraries and study how the studied ad libraries diverged from this architecture during our study period.
Our findings can assist ad library developers to understand the challenges for developing ad libraries and the desired features of these libraries.
Improving the testing efficiency of selenium-based load testsSAIL_QU
Slides for a paper published at AST 2019:
Shahnaz M. Shariff, Heng Li, Cor-Paul Bezemer, Ahmed E. Hassan, Thanh H. D. Nguyen, and Parminder Flora. 2019. Improving the testing efficiency of selenium-based load tests. In Proceedings of the 14th International Workshop on Automation of Software Test (AST '19). IEEE Press, Piscataway, NJ, USA, 14-20. DOI: https://doi.org/10.1109/AST.2019.00008
Studying User-Developer Interactions Through the Distribution and Reviewing M...SAIL_QU
This document discusses studying user-developer interactions through the distribution and reviewing mechanisms of the Google Play Store. It analyzes emergency updates made by developers to fix issues, the dialogue between users and developers through reviews and responses, and how the reviewing mechanism can help identify good and bad updates. The study found that responding to reviews is six times more likely to increase an app's rating, with 84% of rating increases going to four or five stars. Three common patterns of developer responses were identified: responding to negative or long reviews, only negative reviews, and reviews shortly after an update.
Studying online distribution platforms for games through the mining of data f...SAIL_QU
Our studies of Steam platform data provided insights into online game distribution:
1) Urgent game updates were used to fix crashes, balance issues, and functionality; frequent updaters released more 0-day patches.
2) The Early Access model attracted indie developers and increased game participation; reviews were more positive during Early Access.
3) Game reviews were typically short and in English; sales increased review volume more than new updates; negative reviews came after longer play.
Understanding the Factors for Fast Answers in Technical Q&A Websites: An Empi...SAIL_QU
This study analyzed factors that impact the speed of questions receiving accepted answers on four popular Stack Exchange websites: Stack Overflow, Mathematics, Ask Ubuntu, and Super User. The researchers examined question, answerer, asker, and answer factors from over 150,000 questions. They built classification models and found that key factors for fast answers included the past speed of answerers, length of the question, and past speed of answers for the question's tags. The models achieved AUCs of 0.85-0.95. Fast answers relied heavily on answerers, especially frequent answerers. The study suggests improving incentives for non-frequent and more difficult questions to attract diverse answerers.
Investigating the Challenges in Selenium Usage and Improving the Testing Effi...SAIL_QU
Selenium is a popular tool for browser-based automation testing. The author analyzes challenges in using Selenium by mining Selenium questions on Stack Overflow. Programming language-related questions, especially for Java and Python, are most common and growing fastest. Less than half of questions receive accepted answers, and questions about browsers and components take longest. In the second part, the author develops an approach to improve efficiency of Selenium-based load testing by sharing browsers among user instances. This increases the number of error-free users by 20-22% while reducing memory usage.
Mining Development Knowledge to Understand and Support Software Logging Pract...SAIL_QU
This document summarizes Heng Li's PhD thesis on mining development knowledge to understand and support software logging practices. It discusses how logging code is used to record runtime information but can be difficult for developers to maintain. The thesis aims to understand current logging practices and develop tools by mining change history, source code, issue reports, and other development knowledge. It presents research that analyzes logging-related issues to identify developers' logging concerns, uses code topics and structure to predict where logging statements should be added, leverages code changes to suggest when logging code needs updating, and applies machine learning models to recommend appropriate log levels.
Which Log Level Should Developers Choose For a New Logging Statement?SAIL_QU
The document discusses choosing an appropriate log level when adding a new logging statement. It finds that an ordinal regression model can effectively model log levels, achieving an AUC of 0.76-0.81 in within-project evaluation and 0.71-0.8 in cross-project evaluation. The most influential factors for determining log levels vary between projects and include metrics related to the logging statement, containing code block, and file as well as code change and historical change metrics.
Towards Just-in-Time Suggestions for Log ChangesSAIL_QU
The document presents a study on providing just-in-time suggestions for log changes when developers make code changes. The researchers analyzed over 32,000 log changes from 4 systems. They found 20 reasons for log changes that fall into 4 categories: block changes, log improvements, dependence-driven changes, and logging issues. A random forest classifier using 25 software metrics related to code changes, history, and complexity achieved 0.84-0.91 AUC in predicting whether a log change is needed. Change metrics and product metrics were the most influential factors. The study aims to help developers make better logging decisions for failure diagnosis.
The Impact of Task Granularity on Co-evolution AnalysesSAIL_QU
The document discusses how task granularity at different levels (e.g. commits, pull requests, work items) can impact analyses of co-evolution in software projects. It finds that analyzing at the commit-level can overlook relationships between tasks that span multiple commits. Work item level analysis is recommended to provide a more complete view of co-evolution, as median of 29% of work items consist of multiple commits, and analyzing at the commit level would miss 24% of co-changed files and inability to group 83% of related commits.
How are Discussions Associated with Bug Reworking? An Empirical Study on Open...SAIL_QU
1) Initial bug fix discussions with more comments and more developers participating are more likely to experience later bug reworking through re-opening or re-patching of the bug.
2) Manual analysis found that defective initial fixes and failure to reach consensus in discussions contributed to later reworking.
3) For re-opened bugs, initial discussions focused on addressing a particular problem through a burst of comments, while re-patched bugs lacked thorough code review and testing during the initial fix period.
A Study of the Relation of Mobile Device Attributes with the User-Perceived Q...SAIL_QU
This study examined the relationship between mobile device attributes and user-perceived quality of Android apps. The researchers analyzed 150,373 star ratings from Google Play across 30 devices and 280 apps. They found that the perceived quality of apps varies across devices, and having better characteristics of an attribute does not necessarily correlate with higher quality. Device OS version, resolution, and CPU showed significant relationships with ratings, as did some app attributes like lines of code and number of inputs. However, some device attributes had stronger relationships than app attributes.
A Large-Scale Study of the Impact of Feature Selection Techniques on Defect C...SAIL_QU
This document presents the results of a large-scale study on the impact of feature selection techniques on defect classification models. The study used expanded scopes including multiple datasets from NASA and PROMISE with different feature types, more classification techniques from different paradigms, and additional feature selection techniques. The results show that correlation-based feature subset selection techniques like FS1 and FS2 consistently appear in the top ranks across most of the datasets, projects within the datasets, and classification techniques. The document concludes that future defect classification studies should consider applying correlation-based feature selection techniques.
Studying the Dialogue Between Users and Developers of Free Apps in the Google...SAIL_QU
The study analyzes user-developer interactions through reviews and responses on the Google Play Store. It finds that responding to reviews has a significant positive impact, with 84% of rating increases due to the developer addressing the issue or providing guidance. Three common response patterns were identified: only negative reviews, negative or longer reviews, and reviews shortly after an update. Developers most often thank the user, ask for details, provide guidance, or ask for an endorsement. Guidance responses can address common issues through FAQs. The analysis considered over 2,000 apps, 355,000 review changes, 128,000 responses, and 4 million reviews.
What Do Programmers Know about Software Energy Consumption?SAIL_QU
This document summarizes the results of a survey of 122 programmers about their knowledge of software energy consumption. The survey found that programmers have limited awareness of energy consumption and how to reduce it. They were unaware of the main causes of high energy usage. Programmers lacked knowledge about how to properly rank the energy consumption of different hardware components and were unfamiliar with strategies to improve efficiency, such as minimizing I/O and avoiding polling. The study concludes that programmers would benefit from more education on software energy usage and its causes.
Revisiting the Experimental Design Choices for Approaches for the Automated R...SAIL_QU
Prior research on automated duplicate issue report retrieval focused on improving performance metrics like recall rate. The author revisits experimental design choices from four perspectives: needed effort, data changes, data filtration, and evaluation process.
The thesis contributions are: 1) Showing the importance of considering needed effort in performance measurement. 2) Proposing a "realistic evaluation" approach and analyzing prior findings with it. 3) Developing a genetic algorithm to filter old issue reports and improve performance. 4) Highlighting the impact of "just-in-time" features on evaluation. The findings help better understand benefits and limitations of prior work in this area.
Measuring Program Comprehension: A Large-Scale Field Study with ProfessionalsSAIL_QU
The document summarizes a large-scale field study that tracked the program comprehension activities of 78 professional developers over 3,148 hours. The study found that:
1) Program comprehension accounted for approximately 58% of developers' time on average, with navigation and editing making up the remaining portions.
2) Developers frequently used web browsers and document editors to aid comprehension beyond just IDEs.
3) Interviews and observations revealed that insufficient documentation, unclear code, and complex inheritance hierarchies contributed to long comprehension sessions.
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
This talk will cover ScyllaDB Architecture from the cluster-level view and zoom in on data distribution and internal node architecture. In the process, we will learn the secret sauce used to get ScyllaDB's high availability and superior performance. We will also touch on the upcoming changes to ScyllaDB architecture, moving to strongly consistent metadata and tablets.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
Discover top-tier mobile app development services, offering innovative solutions for iOS and Android. Enhance your business with custom, user-friendly mobile applications.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving
What began over 115 years ago as a supplier of precision gauges to the automotive industry has evolved into being an industry leader in the manufacture of product branding, automotive cockpit trim and decorative appliance trim. Value-added services include in-house Design, Engineering, Program Management, Test Lab and Tool Shops.
"What does it really mean for your system to be available, or how to define w...Fwdays
We will talk about system monitoring from a few different angles. We will start by covering the basics, then discuss SLOs, how to define them, and why understanding the business well is crucial for success in this exercise.
What is an RPA CoE? Session 2 – CoE RolesDianaGray10
In this session, we will review the players involved in the CoE and how each role impacts opportunities.
Topics covered:
• What roles are essential?
• What place in the automation journey does each role play?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
ScyllaDB is making a major architecture shift. We’re moving from vNode replication to tablets – fragments of tables that are distributed independently, enabling dynamic data distribution and extreme elasticity. In this keynote, ScyllaDB co-founder and CTO Avi Kivity explains the reason for this shift, provides a look at the implementation and roadmap, and shares how this shift benefits ScyllaDB users.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...DanBrown980551
This LF Energy webinar took place June 20, 2024. It featured:
-Alex Thornton, LF Energy
-Hallie Cramer, Google
-Daniel Roesler, UtilityAPI
-Henry Richardson, WattTime
In response to the urgency and scale required to effectively address climate change, open source solutions offer significant potential for driving innovation and progress. Currently, there is a growing demand for standardization and interoperability in energy data and modeling. Open source standards and specifications within the energy sector can also alleviate challenges associated with data fragmentation, transparency, and accessibility. At the same time, it is crucial to consider privacy and security concerns throughout the development of open source platforms.
This webinar will delve into the motivations behind establishing LF Energy’s Carbon Data Specification Consortium. It will provide an overview of the draft specifications and the ongoing progress made by the respective working groups.
Three primary specifications will be discussed:
-Discovery and client registration, emphasizing transparent processes and secure and private access
-Customer data, centering around customer tariffs, bills, energy usage, and full consumption disclosure
-Power systems data, focusing on grid data, inclusive of transmission and distribution networks, generation, intergrid power flows, and market settlement data
Session 1 - Intro to Robotic Process Automation.pdfUiPathCommunity
👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program:
https://bit.ly/Automation_Student_Kickstart
In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC.
📕 Detailed agenda:
What is RPA? Benefits of RPA?
RPA Applications
The UiPath End-to-End Automation Platform
UiPath Studio CE Installation and Setup
💻 Extra training through UiPath Academy:
Introduction to Automation
UiPath Business Automation Platform
Explore automation development with UiPath Studio
👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/
4. 4
Performance testing is essential to
prevent these failures
System under
test
requests
requests
requests
Performance
counters, e.g.,
CPU, memory, I/O
and response time
Pre-defined
workload
Performance testing
environment
5. 5
Determining the length of a
performance test is challenging
Time
Repetitive data is generated
from the test
Optimal stopping
time
6. 6
Determining the length of a
performance test is challenging
Time
Stopping too early,
misses performance
issues
Stopping too late,
delays the release
and wastes testing
resources
Optimal stopping
time
7. 7
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
8. 8
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
9. 9
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
10. 10
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
11. 11
Time
Current time
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
Step 1: Collect
the data that the
test generates
Performance counters,
e.g., CPU, memory, I/O
and response time
12. 12
Time
Step 2: Measure
the likelihood of
repetitiveness
Select a random time period (e.g. 30 min)
Current time
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
A
13. 13
Time
Current time
Search for another non-overlapping time period that is
NOT statistically significantly different.
…
Step 2: Measure
the likelihood of
repetitiveness
… B…A
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
14. 14
Time
Wilcoxon test
between the
distributions of
every performance
counter across both
periods
…
Current time
Step 2: Measure
the likelihood of
repetitiveness
B…A
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
15. 15
Step 2: Measure
the likelihood of
repetitiveness
Response
time
CPU Memory IO
p-values 0.0258 0.313 0.687 0.645
Statistically significantly
different in response time!
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
Time
Wilcoxon test
between every
performance
counter from
both periods
…
Current
time
B…A
16. 16
Step 2: Measure
the likelihood of
repetitiveness
Response
time
CPU Memory IO
p-values 0.67 0.313 0.687 0.645
Find a time period that is NOT
statistically significantly different
in ALL performance metrics!
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
Time
Wilcoxon test
between every
performance
counter from
both periods
…
Current
time
B…A
17. 17
Find a period that
is NOT statistically
significantly
different?
Yes. Repetitive!
No. Not
repetitive!
Step 2: Measure
the likelihood of
repetitiveness
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
Time
Wilcoxon test
between every
performance
counter from both
periods
…
Current time
B…A
18. 18
Step 2: Measure
the likelihood of
repetitiveness
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
Repeat this process a large
number (e.g., 1,000) times
to calculate the:
likelihood of
repetitiveness
19. 19
Step 2: Measure
the likelihood of
repetitiveness
30 min
40 min
Time
…
1h 10 min
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
A new likelihood of repetitiveness is
measured periodically, e.g., every 10 min, in
order to get more frequent feedback on the
repetitiveness
20. 20
Step 2: Measure
the likelihood of
repetitiveness
Time
likelihood of
repetitiveness
00:00
24:00
1%
100%
Stabilization
(little new information)
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
The likelihood of repetitiveness
eventually starts stabilizing.
21. 21
Step 3: Extrapolate
the likelihood of
repetitiveness
Time
likelihood of
repetitiveness
00:00
24:00
1%
100%
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
To know when the repetitiveness stabilizes,
we calculate the first derivative.
22. 22
Step 4: Determine
whether to stop
the test
Time
likelihood of
repetitivenes
s
00:00
24:00
1%
100%
Stop the test if the
fist derivative is
close to 0.
Collected
data
Likelihood of
repetitiveness
First
derivative
Whether to
stop the test
1) Collect the
already-
generated data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop
the test
To know when the repetitiveness stabilizes, we
calculate the first derivative.
23. 23
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
24. PetClinic
Dell DVD Store
CloudStore
24
We conduct 24-hour performance
tests on three systems
25. 25
We evaluate whether our
approach:
Stops the test too
early?
Stops the test too late?
Optimal
stopping time
1 2
26. 26
Pre-stopping data
Post-stopping data
Time
STOP
Does our approach stop the test
too early?
00:00
24:00
1
1) Select a random time
period from the post-
stopping data
2) Check if the random
time period has a
repetitive one from the
pre-stopping data
The test is likely to generate little new data,
after the stopping times (preserving more
than 91.9% of the information).
Repeat
1,000
times
27. 27
We apply our evaluation approach in RQ1
at the end of every hour during the test to
find the most cost effective stopping time.
Does our approach stop the test
too late?
2
1h
2h
Time
…
10h
20h
24h
28. The most cost-effective
stopping time has:
1. A big difference to
the previous hour
2. A small difference to
the next hour
28
1%
100%
00:00
04:00
05:00
06:00
Does our approach stop the test
too late?
2
likelihood of
repetitiveness
29. 29
There is a short delay
between the
recommended stopping
times and the most cost
effective stopping times
(The majority are under
4-hour delay).
Short
delay
Does our approach stop the test
too late?
2
30. 30
Determining the length of a
performance test is challenging
Time
Stopping too early,
misses performance
issues
Stopping too late,
delays the release
and wastes testing
resources
Optimal stopping
time
31. 31
30
Determining the length of a
performance test is challenging
Time
Stopping too early,
misses performance
issues
Stopping too late,
delays the release
and wastes testing
resources
Optimal stopping
time
32. 32
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
33. 33
30
Determining the length of a
performance test is challenging
Time
Stopping too early,
misses performance
issues
Stopping too late,
delays the release
and wastes testing
resources
Optimal stopping
time
32
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
34. 34
Pre-stopping data
Post-stopping data
Time
STOP
Does our approach stop the test
too early?
00:00
24:00
1
1) Select a random time
period from the post-
stopping data
2) Check if the random
time period has a
repetitive one from the
pre-stopping data
The test is likely to generate little new data,
after the stopping times (preserving more
than 91.9% of the information).
Repeat
1,000
times
35. 35
30
Determining the length of a
performance test is challenging
Time
Stopping too early,
misses performance
issues
Stopping too late,
delays the release
and wastes testing
resources
Optimal stopping
time
32
Pre-stopping data
Post-stopping data
Time
STOP
Does our approach stop the test
too early?
00:00
24:00
1
1) Select a random time
period from the post-
stopping data
2) Check if the random
time period has a
repetitive one from the
pre-stopping data
The test is likely to generate little new data,
after the stopping times (preserving more
than 91.9% of the information).
Repeat
1,000
times
32
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes
36. 36
There is a short delay
between the
recommended stopping
times and the most cost
effective stopping times
(The majority are under
4-hour delay).
Short
delay
Does our approach stop the test
too late?
2
37. 37
30
Determining the length of a
performance test is challenging
Time
Stopping too early,
misses performance
issues
Stopping too late,
delays the release
and wastes testing
resources
Optimal stopping
time
32
Pre-stopping data
Post-stopping data
Time
STOP
Does our approach stop the test
too early?
00:00
24:00
1
1) Select a random time
period from the post-
stopping data
2) Check if the random
time period has a
repetitive one from the
pre-stopping data
The test is likely to generate little new data,
after the stopping times (preserving more
than 91.9% of the information).
Repeat
1,000
times
33
There is a short delay
between the
recommended stopping
times and the most cost
effective stopping times
(The majority are under
4-hour delay).
Short
delay
Does our approach stop the test
too late?
2
32
Our approach for recommending
when to stop a performance test
Collected data
Likelihood of
repetitiveness
First
derivatives
Whether to
stop the test
1) Collect the
already-generated
data
2) Measure the
likelihood of
repetitiveness
3) Extrapolate the
likelihood of
repetitiveness
4) Determine
whether to stop the
test
STOP
No
Yes