This document discusses best practices for interpreting A/B test results and making decisions based on those results. It cautions against relying solely on significance values and p-values, noting that these only indicate the likelihood of the data given the null hypothesis, not the likelihood of the null hypothesis being true. It emphasizes considering confidence intervals, choosing the right metrics like profit or revenue, and weighing the risks of false positives versus missed opportunities when deciding whether to roll out a change. The key takeaways are to incorporate risk into test design, focus on the real goal of increasing profit, and recognize that the stakes are generally lower for online experiments than medical trials.
Statistics for CRO - Conversion Conference LondonTom Capper
The document discusses common mistakes in A/B testing and provides guidance on properly designing, interpreting results, and making decisions from A/B tests. It notes the most common serious errors include not considering multiple testing, choosing the wrong metrics, and improper stopping rules. It emphasizes the importance of considering significance and risk during test design, focusing on actual key performance indicators like profit, and recognizing the different risks between medical testing and business experimentation.
The document provides information about SPC (Statistical Process Control) training conducted by Hopez Institute. It includes the course content which covers topics like process definition, defect detection vs prevention, statistics fundamentals, variation and causes of variation, control charts for variables and attributes. It also discusses the history and evolution of SPC, which was pioneered by Walter Shewhart. The document aims to help participants understand why SPC is important and how it can be applied to processes.
This document discusses using the 5 Whys technique for root cause analysis. It begins by explaining why root cause analysis is used, which is to find the root causes of complex problems. It then provides an overview of the 5 Whys process, which involves identifying the problem, asking why it occurred, and repeating until the root cause is uncovered. As an example, it analyzes a problem where gloves were unexpectedly mixed into rubber compound using 5 Whys. It determines through iterative questioning that the root cause was lack of trash bins for glove disposal in the production area. Corrective actions included removing contaminated rubber and remilling, while preventative action was to provide trash bins.
This document presents a decision problem faced by a manufacturer. The manufacturer produces items that have a probability of being defective. These items are formed into batches of 150. The manufacturer can either screen each item in a batch to check for defects at a cost of $10 per item screened, or use the items directly without screening and incur a cost of $100 per defective item that makes it through. Based on the given probabilities of good vs. bad quality batches, the expected costs per batch are calculated for each option. A decision tree is constructed to model the problem, taking into account the option to first test a single randomly selected item from the batch before deciding whether to screen the entire batch or not. The optimal strategy is determined to be
Why you should stop using the generic term “Technical Debt” and start resolvi...Alex Fedorov
Do you feel like you are accruing all this Technical Debt every week and it slows your whole team down? Do you want to deal with it so badly, but you don’t know where even to start? Did you have “refactoring/rewriting sprints” in the past, and they have failed, and you had to revert?
The first advice is to stop calling it “Technical Debt.” And instead, sit down with your team and figure out the list of specific problems that are causing the most trouble in your software right at this moment. Attack the first problem on the list then only. Repeat on a regular basis.
In this talk, you will learn WHY you should do that, and HOW to do that. Moreover, you are going to be able to apply it tomorrow at your workplace!
The document discusses the 5 Why's technique for root cause analysis. It can be used for troubleshooting, quality improvement, and problem solving. The process involves repeatedly asking "Why?" five times to determine the root cause of a problem by drilling down through its symptoms. Tools like Ishikawa charts, design of experiments, and statistical analysis can also aid in root cause analysis.
This document discusses best practices for interpreting A/B test results and making decisions based on those results. It cautions against relying solely on significance values and p-values, noting that these only indicate the likelihood of the data given the null hypothesis, not the likelihood of the null hypothesis being true. It emphasizes considering confidence intervals, choosing the right metrics like profit or revenue, and weighing the risks of false positives versus missed opportunities when deciding whether to roll out a change. The key takeaways are to incorporate risk into test design, focus on the real goal of increasing profit, and recognize that the stakes are generally lower for online experiments than medical trials.
Statistics for CRO - Conversion Conference LondonTom Capper
The document discusses common mistakes in A/B testing and provides guidance on properly designing, interpreting results, and making decisions from A/B tests. It notes the most common serious errors include not considering multiple testing, choosing the wrong metrics, and improper stopping rules. It emphasizes the importance of considering significance and risk during test design, focusing on actual key performance indicators like profit, and recognizing the different risks between medical testing and business experimentation.
The document provides information about SPC (Statistical Process Control) training conducted by Hopez Institute. It includes the course content which covers topics like process definition, defect detection vs prevention, statistics fundamentals, variation and causes of variation, control charts for variables and attributes. It also discusses the history and evolution of SPC, which was pioneered by Walter Shewhart. The document aims to help participants understand why SPC is important and how it can be applied to processes.
This document discusses using the 5 Whys technique for root cause analysis. It begins by explaining why root cause analysis is used, which is to find the root causes of complex problems. It then provides an overview of the 5 Whys process, which involves identifying the problem, asking why it occurred, and repeating until the root cause is uncovered. As an example, it analyzes a problem where gloves were unexpectedly mixed into rubber compound using 5 Whys. It determines through iterative questioning that the root cause was lack of trash bins for glove disposal in the production area. Corrective actions included removing contaminated rubber and remilling, while preventative action was to provide trash bins.
This document presents a decision problem faced by a manufacturer. The manufacturer produces items that have a probability of being defective. These items are formed into batches of 150. The manufacturer can either screen each item in a batch to check for defects at a cost of $10 per item screened, or use the items directly without screening and incur a cost of $100 per defective item that makes it through. Based on the given probabilities of good vs. bad quality batches, the expected costs per batch are calculated for each option. A decision tree is constructed to model the problem, taking into account the option to first test a single randomly selected item from the batch before deciding whether to screen the entire batch or not. The optimal strategy is determined to be
Why you should stop using the generic term “Technical Debt” and start resolvi...Alex Fedorov
Do you feel like you are accruing all this Technical Debt every week and it slows your whole team down? Do you want to deal with it so badly, but you don’t know where even to start? Did you have “refactoring/rewriting sprints” in the past, and they have failed, and you had to revert?
The first advice is to stop calling it “Technical Debt.” And instead, sit down with your team and figure out the list of specific problems that are causing the most trouble in your software right at this moment. Attack the first problem on the list then only. Repeat on a regular basis.
In this talk, you will learn WHY you should do that, and HOW to do that. Moreover, you are going to be able to apply it tomorrow at your workplace!
The document discusses the 5 Why's technique for root cause analysis. It can be used for troubleshooting, quality improvement, and problem solving. The process involves repeatedly asking "Why?" five times to determine the root cause of a problem by drilling down through its symptoms. Tools like Ishikawa charts, design of experiments, and statistical analysis can also aid in root cause analysis.
The document describes the Kepner-Tregoe methodology, a structured approach for problem solving, decision making, and risk analysis. It was developed in the 1960s and has been used by teams like those that solved problems during the Apollo 13 mission. The methodology involves systematically gathering information, prioritizing objectives, generating and evaluating alternatives, and verifying solutions. It provides step-by-step guidance for tasks like defining problems, identifying potential causes, testing solutions, and monitoring outcomes. Examples are given for applying the various steps to hypothetical problems regarding product defects, customer issues, and other scenarios.
SAMPLE SIZE – The indispensable A/B test calculation that you’re not makingZack Notes
If you’re a marketer it’s very likely that you’ve run an A/B test. It’s also likely that you’ve never calculated the sample size for your tests, and instead, you run tests until they reach statistical significance. If this is the case, your strategy is statistically flawed. Conforming to sample size requires marketers to wait longer for test results, but choosing to ignore it will bear false positives and lead to bad decisions.
This deck was created for an email audience for there are valuable lessons for anyone who runs A/B tests.
This document provides an overview of 5 Why analysis, a root cause analysis tool. It discusses when to use 5 Why analysis, such as for recurring errors or quality issues. The general guidelines for 5 Why analysis include using a cross-functional team, asking "why" until the root cause is uncovered, and ensuring corrective actions address root causes rather than just symptoms. Examples of applying 5 Why analysis to problems like a vehicle not starting and long assembly times are also provided. Potential problems that can occur with 5 Why analysis include stopping at symptoms rather than root causes and different conclusions from different people.
Test reporting is something few testers take time to practice. Nevertheless, it's a fundamental skill—vital for your professional credibility and your own self management. Many people think management judges testing by bugs found or test cases executed. Actually, testing is judged by the story it tells. If your story sounds good, you win. A test report is the story of your testing. It begins as the story we tell ourselves, each moment we are testing, about what we are doing and why. We use the test story within our own minds, to guide our work. James Bach explores the skill of test reporting and examines some of the many different forms a test report might take. As in other areas of testing, context drives good reporting. Sometimes we make an oral report; occasionally we need to write it down. Join James for an in-depth look at the art of the reporting.
Are you one of those "gifted debuggers" that everyone turns to when they need to solve a difficult problem? Great! This talk isn't for you.
For the rest of us, debugging is often considered a mysterious trait that some engineers were born with, but alas, some simply haven't. This talk is here to bust that myth. It calls "bullshit" on the gifted-debugger myth and claims that with well-structured methodology and a couple of simple tips, we can all master debugging and stop using trial and error (and other witchcraft tactics) to find the cause of our problems. This methodology has served me well over the years to solve difficult problems, and will hopefully serve you as well.
A test strategy is the set of ideas that guides your test design. It's what explains why you test this instead of that, and why you test this way instead of that way. Strategic thinking matters because testers must make quick decisions about what needs testing right now and what can be left alone. You must be able to work through major threads without being overwhelmed by tiny details. James Bach describes how test strategy is organized around risk but is not defined before testing begins. Rather, it evolves alongside testing as we learn more about the product. We start with a vague idea of our strategy, organize it quickly, and document as needed in a concise way. In the end, the strategy can be as formal and detailed as you want it to be. In the beginning, though, we start small. If you want to focus on testing and not paperwork, this approach is for you.
SXSW 2016 - Everything you think about A/B testing is wrongDan Chuparkoff
Everything you've learned about A/B Testing is based on the fundamentally flawed belief that there's one right answer. But the era of mass-market, one-right-answers is over. A/B Testing is our most valuable tool in the battle to create a more engaging web. But our strategy is broken. Don't worry, we can gain a better understanding of our users with a little data science. And we can reinvent A/B Testing... I will show you how.
At Civis Analytics, we specialize in Data Science. From here, we can clearly see that all people are not the same. So why are A/B Tests designed to search for a single solution? In this session I'll show you where A/B Testing is headed next. See you in Austin!
Workshop on Root Cause Analysis tools: Ask Why five times and fishbone (Ishikawa) diagram. I use this to teach basic concepts and give people an experience of using the tools.
This document introduces the A3 problem solving process. The A3 process provides a structured approach to address complex problems involving multiple causes across an organization. It involves planning to understand the current and target conditions, analyzing root causes, developing countermeasures through experiments, checking the results, and acting on lessons learned. An example is provided of using the A3 process to address an increase in serious defects found in code releases. The example walks through planning, root cause analysis identifying potential causes like insufficient testing time and large stories, developing countermeasures like weekly backlog grooming and test automation, and checking for results like reduced defects.
When confronted with a problem, have you ever stopped and asked "why" five times? The Five Whys technique is a simple but powerful way to troubleshoot problems by exploring cause-and-effect relationships.
5 why’s technique and cause and effect analysisBhagya Silva
The document describes the 5 Whys technique and cause and effect analysis for problem solving. [1] The 5 Whys technique was developed in the 1930s by Toyota to repeatedly ask "Why?" to identify the root cause of a problem. [2] Cause and effect analysis uses a diagram to brainstorm potential causes within categories like people, materials, equipment that may be contributing to a problem. [3] The technique provides a structured approach to analyze problems, uncover relationships between causes, and identify solutions.
A Rapid Introduction to Rapid Software TestingTechWell
This document provides a summary of a presentation on Rapid Software Testing. The presentation was given by Michael Bolton of DevelopSense and covered the methodology and mindset of rapid software testing. It emphasizes testing software expertly under uncertainty and time pressure. The presentation defines rapid testing as testing more quickly and less expensively while still achieving excellent results. It compares rapid testing to other approaches like exhaustive, ponderous, and slapdash testing. The presentation also discusses principles of rapid testing, how to recognize problems quickly using heuristics, and testing rapidly to fulfill the mission of testing.
Julian Harty - Alternatives To Testing - EuroSTAR 2010TEST Huddle
EuroSTAR Software Testing Conference 2010 presentation on "Presentation Title" by "Speaker Name". See more at: http://conference.eurostarsoftwaretesting.com/past-presentations/
The document discusses exploratory testing and concepts such as testing with intent, testing as performance versus testing as artifact creation, and realizing the dynamic nature of testing. It provides examples of exploratory testing activities like finding the happy path in a system, identifying coverage, and reporting bugs using the RIMGEA method. The document also examines learning about features in layers and different roles in testing such as testers, programmers, and test automators. Overall, it presents exploratory testing as a systematic and rigorous approach to discovering risks in a system through analysis and testing heuristics.
My presentation for the Belgium Testing Days 2013 and the TestKit conference 2012. This presentation is on how a structured approach can help you to choose and implement a test tool in a successful way
Continuous Testing: Preparing for DevOpsSTePINForum
Digital transformation requires continuous testing to quickly adapt to changes and deliver value to customers. Traditional testing relies heavily on manual testing which is inefficient. Continuous testing uses test automation to test early and often through the development process. This shifts testing left and allows for more frequent testing of applications and APIs. Continuous testing leverages techniques like exploratory testing, test automation, and continuous integration to significantly increase testing efficiency and reduce risks. The goal is to close knowledge gaps through continuous and adaptive investigation.
The document provides 10 guidelines for running effective A/B tests:
1. Have one key metric per experiment to clarify decision making.
2. Use your key metric to calculate statistical power and determine required sample size.
3. Run experiments for the planned duration without early stopping.
4. Don't search for differences across many segments to avoid false positives.
5. Ensure experiment groups are balanced to avoid bucketing issues.
6. Don't overcomplicate methods when basics suffice.
7. Be cautious launching changes that didn't hurt without evidence of benefit.
8. Involve data scientists in the entire process for better design and analysis.
9. Only analyze people actually exposed to variations
Webinar: Experimentation & Product Management by Indeed Product LeadProduct School
Main Takeaways:
- Why should I run experiments as a Product Manager?
- How long should I run experiments?
- How do I interpret Experiment results and take low-risk decisions?
Meta-Analyses in Experimentation: The Whats and HowsVWO
Ruben shows a step-by-step method of using research and experimentation to combine all learnings into behavioral insights. With this method, you will not only improve your research and conversion rate optimization practices but actually learn and provide a much more pleasant journey for your potential customers! This will undoubtedly help you increase conversion rates and become a lot more successful in your job.
The Flexibility of Business Experimentation | Masters of Conversion by VWOVWO
There are times when rules can be broken. The severity of breaking or bending these rules changes as your industry and risk change. This presentation is about the shortcuts that business experimentation programs can take that still lead to business value but bend the rules of the standard scientific method. There are certain liberties that business testing programs can take that lead to a similar/the same outcome.
WEBINAR: How to Set Up and Run Hypothesis Tests (ENCORE!)GoLeanSixSigma.com
The first live presentation of this webinar was so popular that we’re doing an encore presentation!
Join us for this 1-hour advanced webinar where we answer the question, “Why do we need hypothesis tests in process improvement?” and then stay with us as we walk you through a real, live hypothesis test direct from the Bahama Bistro!
The document describes the Kepner-Tregoe methodology, a structured approach for problem solving, decision making, and risk analysis. It was developed in the 1960s and has been used by teams like those that solved problems during the Apollo 13 mission. The methodology involves systematically gathering information, prioritizing objectives, generating and evaluating alternatives, and verifying solutions. It provides step-by-step guidance for tasks like defining problems, identifying potential causes, testing solutions, and monitoring outcomes. Examples are given for applying the various steps to hypothetical problems regarding product defects, customer issues, and other scenarios.
SAMPLE SIZE – The indispensable A/B test calculation that you’re not makingZack Notes
If you’re a marketer it’s very likely that you’ve run an A/B test. It’s also likely that you’ve never calculated the sample size for your tests, and instead, you run tests until they reach statistical significance. If this is the case, your strategy is statistically flawed. Conforming to sample size requires marketers to wait longer for test results, but choosing to ignore it will bear false positives and lead to bad decisions.
This deck was created for an email audience for there are valuable lessons for anyone who runs A/B tests.
This document provides an overview of 5 Why analysis, a root cause analysis tool. It discusses when to use 5 Why analysis, such as for recurring errors or quality issues. The general guidelines for 5 Why analysis include using a cross-functional team, asking "why" until the root cause is uncovered, and ensuring corrective actions address root causes rather than just symptoms. Examples of applying 5 Why analysis to problems like a vehicle not starting and long assembly times are also provided. Potential problems that can occur with 5 Why analysis include stopping at symptoms rather than root causes and different conclusions from different people.
Test reporting is something few testers take time to practice. Nevertheless, it's a fundamental skill—vital for your professional credibility and your own self management. Many people think management judges testing by bugs found or test cases executed. Actually, testing is judged by the story it tells. If your story sounds good, you win. A test report is the story of your testing. It begins as the story we tell ourselves, each moment we are testing, about what we are doing and why. We use the test story within our own minds, to guide our work. James Bach explores the skill of test reporting and examines some of the many different forms a test report might take. As in other areas of testing, context drives good reporting. Sometimes we make an oral report; occasionally we need to write it down. Join James for an in-depth look at the art of the reporting.
Are you one of those "gifted debuggers" that everyone turns to when they need to solve a difficult problem? Great! This talk isn't for you.
For the rest of us, debugging is often considered a mysterious trait that some engineers were born with, but alas, some simply haven't. This talk is here to bust that myth. It calls "bullshit" on the gifted-debugger myth and claims that with well-structured methodology and a couple of simple tips, we can all master debugging and stop using trial and error (and other witchcraft tactics) to find the cause of our problems. This methodology has served me well over the years to solve difficult problems, and will hopefully serve you as well.
A test strategy is the set of ideas that guides your test design. It's what explains why you test this instead of that, and why you test this way instead of that way. Strategic thinking matters because testers must make quick decisions about what needs testing right now and what can be left alone. You must be able to work through major threads without being overwhelmed by tiny details. James Bach describes how test strategy is organized around risk but is not defined before testing begins. Rather, it evolves alongside testing as we learn more about the product. We start with a vague idea of our strategy, organize it quickly, and document as needed in a concise way. In the end, the strategy can be as formal and detailed as you want it to be. In the beginning, though, we start small. If you want to focus on testing and not paperwork, this approach is for you.
SXSW 2016 - Everything you think about A/B testing is wrongDan Chuparkoff
Everything you've learned about A/B Testing is based on the fundamentally flawed belief that there's one right answer. But the era of mass-market, one-right-answers is over. A/B Testing is our most valuable tool in the battle to create a more engaging web. But our strategy is broken. Don't worry, we can gain a better understanding of our users with a little data science. And we can reinvent A/B Testing... I will show you how.
At Civis Analytics, we specialize in Data Science. From here, we can clearly see that all people are not the same. So why are A/B Tests designed to search for a single solution? In this session I'll show you where A/B Testing is headed next. See you in Austin!
Workshop on Root Cause Analysis tools: Ask Why five times and fishbone (Ishikawa) diagram. I use this to teach basic concepts and give people an experience of using the tools.
This document introduces the A3 problem solving process. The A3 process provides a structured approach to address complex problems involving multiple causes across an organization. It involves planning to understand the current and target conditions, analyzing root causes, developing countermeasures through experiments, checking the results, and acting on lessons learned. An example is provided of using the A3 process to address an increase in serious defects found in code releases. The example walks through planning, root cause analysis identifying potential causes like insufficient testing time and large stories, developing countermeasures like weekly backlog grooming and test automation, and checking for results like reduced defects.
When confronted with a problem, have you ever stopped and asked "why" five times? The Five Whys technique is a simple but powerful way to troubleshoot problems by exploring cause-and-effect relationships.
5 why’s technique and cause and effect analysisBhagya Silva
The document describes the 5 Whys technique and cause and effect analysis for problem solving. [1] The 5 Whys technique was developed in the 1930s by Toyota to repeatedly ask "Why?" to identify the root cause of a problem. [2] Cause and effect analysis uses a diagram to brainstorm potential causes within categories like people, materials, equipment that may be contributing to a problem. [3] The technique provides a structured approach to analyze problems, uncover relationships between causes, and identify solutions.
A Rapid Introduction to Rapid Software TestingTechWell
This document provides a summary of a presentation on Rapid Software Testing. The presentation was given by Michael Bolton of DevelopSense and covered the methodology and mindset of rapid software testing. It emphasizes testing software expertly under uncertainty and time pressure. The presentation defines rapid testing as testing more quickly and less expensively while still achieving excellent results. It compares rapid testing to other approaches like exhaustive, ponderous, and slapdash testing. The presentation also discusses principles of rapid testing, how to recognize problems quickly using heuristics, and testing rapidly to fulfill the mission of testing.
Julian Harty - Alternatives To Testing - EuroSTAR 2010TEST Huddle
EuroSTAR Software Testing Conference 2010 presentation on "Presentation Title" by "Speaker Name". See more at: http://conference.eurostarsoftwaretesting.com/past-presentations/
The document discusses exploratory testing and concepts such as testing with intent, testing as performance versus testing as artifact creation, and realizing the dynamic nature of testing. It provides examples of exploratory testing activities like finding the happy path in a system, identifying coverage, and reporting bugs using the RIMGEA method. The document also examines learning about features in layers and different roles in testing such as testers, programmers, and test automators. Overall, it presents exploratory testing as a systematic and rigorous approach to discovering risks in a system through analysis and testing heuristics.
My presentation for the Belgium Testing Days 2013 and the TestKit conference 2012. This presentation is on how a structured approach can help you to choose and implement a test tool in a successful way
Continuous Testing: Preparing for DevOpsSTePINForum
Digital transformation requires continuous testing to quickly adapt to changes and deliver value to customers. Traditional testing relies heavily on manual testing which is inefficient. Continuous testing uses test automation to test early and often through the development process. This shifts testing left and allows for more frequent testing of applications and APIs. Continuous testing leverages techniques like exploratory testing, test automation, and continuous integration to significantly increase testing efficiency and reduce risks. The goal is to close knowledge gaps through continuous and adaptive investigation.
The document provides 10 guidelines for running effective A/B tests:
1. Have one key metric per experiment to clarify decision making.
2. Use your key metric to calculate statistical power and determine required sample size.
3. Run experiments for the planned duration without early stopping.
4. Don't search for differences across many segments to avoid false positives.
5. Ensure experiment groups are balanced to avoid bucketing issues.
6. Don't overcomplicate methods when basics suffice.
7. Be cautious launching changes that didn't hurt without evidence of benefit.
8. Involve data scientists in the entire process for better design and analysis.
9. Only analyze people actually exposed to variations
Webinar: Experimentation & Product Management by Indeed Product LeadProduct School
Main Takeaways:
- Why should I run experiments as a Product Manager?
- How long should I run experiments?
- How do I interpret Experiment results and take low-risk decisions?
Meta-Analyses in Experimentation: The Whats and HowsVWO
Ruben shows a step-by-step method of using research and experimentation to combine all learnings into behavioral insights. With this method, you will not only improve your research and conversion rate optimization practices but actually learn and provide a much more pleasant journey for your potential customers! This will undoubtedly help you increase conversion rates and become a lot more successful in your job.
The Flexibility of Business Experimentation | Masters of Conversion by VWOVWO
There are times when rules can be broken. The severity of breaking or bending these rules changes as your industry and risk change. This presentation is about the shortcuts that business experimentation programs can take that still lead to business value but bend the rules of the standard scientific method. There are certain liberties that business testing programs can take that lead to a similar/the same outcome.
WEBINAR: How to Set Up and Run Hypothesis Tests (ENCORE!)GoLeanSixSigma.com
The first live presentation of this webinar was so popular that we’re doing an encore presentation!
Join us for this 1-hour advanced webinar where we answer the question, “Why do we need hypothesis tests in process improvement?” and then stay with us as we walk you through a real, live hypothesis test direct from the Bahama Bistro!
Odoo Partners - From Seed to Tree; How to scale your BusinessOdoo
The document provides tips for growing a business, including focusing on sales channels, trusting account managers, doing demos, making clear proposals, implementing solutions internally first, starting small, upgrading existing clients, using recurrent service models, and continually learning. It also notes that defining problems clearly helps lead to solutions, and that recurrent services can help build revenue over time.
This document discusses various machine learning model validation techniques and ensemble methods such as bagging and boosting. It defines key concepts like overfitting, underfitting, bias-variance tradeoff, and different validation metrics. Cross validation techniques like k-fold and bootstrap are explained as ways to estimate model performance on unseen data. Bagging creates multiple models on resampled data and averages their predictions to reduce variance. Boosting iteratively adjusts weights of misclassified observations to build strong models, but risks overfitting. Gradient boosting and XGBoost are powerful ensemble methods.
One of the most commonly asked questions is “when is an MVT experiment or AB test finished?”
Is it at 30 days...? 100 conversions...? 10,000 visitors...?
The short answer is... it depends.
Gerlof Hoekstra - OMG What Have We Done - EuroSTAR 2013TEST Huddle
EuroSTAR Software Testing Conference 2013 presentation on "OMG What Have We Done" by Gerlof Hoekstra.
See more at: http://conference.eurostarsoftwaretesting.com/past-presentations/
Being Right Starts By Knowing You're WrongData Con LA
Data Con LA 2020
Description
The recent proliferation of predictive analytics within companies is of limited benefit unless these companies learn to measure, understand, and embrace a critical concept: error. There is no such thing as a perfect predictive model and all tools using any sort of predictive model will have error. Despite being relatively easy to implement and understand, consistent error measurement continues to be underutilized or even completely avoided. In this session we will discuss
*Why embracing error is so valuable to companies.
*We will then review basic ways to measure error in commonly used models and in data source systems such as CRMs and ERPs.
*Most importantly, we will review some ways to approach company leadership with the concept of error.
Speaker
Ryan Johnson, GoGuardian, Director of Science and Analytics
This document discusses the importance of testing in direct marketing and how to extract value from testing. It notes that testing is essential for survival and growth in a changing digital landscape. To extract value, tests should focus on questions that are actionable, relevant, and repeatable and can deliver a return on investment, such as tests of offers, contact strategies, channel mix, and customer preferences. The document emphasizes starting with a clear understanding of what needs to be tested and communicating results clearly to stakeholders.
1) How to design powerful experiments provides tips for setting up successful A/B tests and experiments to make data-driven decisions and reduce risks.
2) Key tips include having the proper experimentation infrastructure in place, following an iterative process of developing hypotheses, designing experiments, analyzing results, and executing.
3) Case studies show that A/B tests at Google and REA Group led to increased annual revenue and conversion rates, demonstrating the value of experimentation.
10 Best Practices to Becoming a Feedback Ninja (by @peoplemetrics @smcdade)PeopleMetrics
Customer feedback is all the rage, but how do you know you're using it effectively? Take a quick read through these 10 tips to using your customer experience data more effectively and efficiently.
In this presentation we answer the question, "Why do we need hypothesis tests in process improvement?" Then we walk you through a real, live hypothesis test direct from the Bahama Bistro!
You can find the rest of the webinar materials and questions from the webinar here:
https://goleansixsigma.com/webinar-set-run-hypothesis-tests/
Anton Muzhailo - Practical Test Process Improvement using ISTQBIevgenii Katsan
Here are a few potential questions from the document:
- What is the true value of ISTQB certifications beyond just checking a box for management? How can the knowledge be applied practically?
- How can metrics be designed and used effectively to assess quality and test coverage in an agile environment? What are some examples of valid and invalid metrics?
- What artifacts or information are useful to include in a test plan even for agile teams using tools like JIRA? How can a test plan provide value beyond just additional paperwork?
- What techniques can be used to effectively estimate defect severity when multiple testers with different perspectives are involved? How can consistency be achieved?
- How can root cause analysis be applied
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들Minho Lee
2021-09-04 프롬 특강 발표자료입니다.
---
많은 사람들이 A/B 테스트가 중요하다고 말합니다.
그런데 우리는 뭘 믿고 A/B 테스트에 의사결정을 맡기는 걸까요?
A/B 테스트는 그냥 돌리면 성과를 만들어주는 마법의 도구가 아닙니다.
신뢰할 수 있는 실험 결과를 위해 어떤 고민이 더 필요한지 살펴보려고 합니다.
This document summarizes key concepts from a presentation on A/B testing fundamentals. It discusses:
1. The different possible outcomes of A/B tests and how they relate to concepts like true positives, false positives, etc.
2. The difference between false positive rate and false discovery rate. False positive rate considers the probability of a false positive from a single test, while false discovery rate accounts for running multiple tests.
3. How to balance factors like error rates, effect size detection, and test duration by making tradeoffs between them, such as running tests longer to reduce error rates or detect smaller effects.
Build applications with generative AI on Google CloudMárton Kodok
We will explore Vertex AI - Model Garden powered experiences, we are going to learn more about the integration of these generative AI APIs. We are going to see in action what the Gemini family of generative models are for developers to build and deploy AI-driven applications. Vertex AI includes a suite of foundation models, these are referred to as the PaLM and Gemini family of generative ai models, and they come in different versions. We are going to cover how to use via API to: - execute prompts in text and chat - cover multimodal use cases with image prompts. - finetune and distill to improve knowledge domains - run function calls with foundation models to optimize them for specific tasks. At the end of the session, developers will understand how to innovate with generative AI and develop apps using the generative ai industry trends.
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of March 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
13. Proprietary and confidential 13
We hope to convince you:
● Standard A/B testing methods lead you to overestimate
the business impact of rolled out features.
14. Proprietary and confidential 14
We hope to convince you:
● Standard A/B testing methods lead you to overestimate
the business impact of rolled out features.
● The only way around this is to massively increase
sample size, run replications/holdouts, or to model out
the bias.
15. Proprietary and confidential 15
We hope to convince you:
● Standard A/B testing methods lead you to overestimate
the business impact of rolled out features.
● The only way around this is to massively increase
sample size, run replications/holdouts, or to model out
the bias.
● You should think hard about your goal: Is it just
hypothesis testing or is it estimation too?
29. Proprietary and confidential
● Our measurements will be biased when:
○ We expect variability in our measurements.
What’s going on?
30. Proprietary and confidential
● Our measurements will be biased when:
○ We expect variability in our measurements.
and
○ We condition on only those measurements that
pass some evidence threshold.
What’s going on?
31. Proprietary and confidential
Goal of research
● To generalize from some sample to a population:
○ Inferences about the population are only as good as the sample.
40. Proprietary and confidential
Simulated A/B tests
● Control:
○ Draw 50k samples from a binomial distribution where
probability of success = 50%.
● Treatment:
○ Draw 50k samples from a binomial distribution where
probability of success = 50% * 1.01 = 50.5%.
41. Proprietary and confidential
Simulated A/B tests
● Control:
○ Draw 50k samples from a binomial distribution where
probability of success = 50%.
● Treatment:
○ Draw 50k samples from a binomial distribution where
probability of success = 50% * 1.01 = 50.5%.
● Run statistical test
○ Roll out 2019 if it’s better than 2011 and p < 0.05.
42. Proprietary and confidential
Simulated A/B tests
● Control:
○ Draw 50k samples from a binomial distribution where
probability of success = 50%.
● Treatment:
○ Draw 50k samples from a binomial distribution where
probability of success = 50% * 1.01 = 50.5%.
● Run statistical test
○ Roll out 2019 if it’s better than 2011 and p < 0.05.
● Repeat 10k times
54. Proprietary and confidential
2 goals of experimentation
Hypothesis Testing
● Goal: Know if treatment is different from control.
● Output: Probability of observed data given null.
55. Proprietary and confidential
2 goals of experimentation
Hypothesis Testing
● Goal: Know if treatment is different from control.
● Output: Probability of observed data given null.
Treatment Effect Estimation
● Goal: Know how different treatment is from control.
● Output: distribution of treatment effect sizes compatible with observed
data.
59. Proprietary and confidential
Solutions
Increase Sample Size
● Pro: no inflation when power = 1.
● Cons: time consuming, opportunity costs.
Holdouts/Replications
● Pro: no inflation.
● Cons: resource intensive, time consuming, opportunity costs.
60. Proprietary and confidential
Solutions
Increase Sample Size
● Pro: no inflation when power = 1.
● Cons: time consuming, opportunity costs.
Holdouts/Replications
● Pro: no inflation.
● Cons: resource intensive, time consuming, opportunity costs.
Estimate and Remove Bias
● Pro: easy.
● Cons: relies on assumptions, based on long run expectation.
65. Proprietary and confidential
1%
50k samples from binomial
with probability of success
= 50%.
50k samples from binomial
with probability of success
= 50% * 1.01 = 50.5%.
Control Treatment
Simulated A/B tests
72. Proprietary and confidential
Inflation is a function of power
Problem
● We never know the true power of any test.
Solution
● Infer power from observed difference.
76. Proprietary and confidential
Once we infer expected inflation, we can shrink the
observed difference appropriately.
Infer inflation from observed difference
79. Proprietary and confidential 79
● Standard A/B testing methods lead you to overestimate
the business impact of rolled out features.
We hope we convinced you:
80. Proprietary and confidential 80
We hope we convinced you:
● Standard A/B testing methods lead you to overestimate
the business impact of rolled out features.
● The only way around this is to massively increase
sample size, run replications/holdouts, or to model out
the bias.
81. Proprietary and confidential 81
We hope we convinced you:
● Standard A/B testing methods lead you to overestimate
the business impact of rolled out features.
● The only way around this is to massively increase
sample size, run replications/holdouts, or to model out
the bias.
● You should think hard about your goal: Is it just
hypothesis testing or is it estimation too?
84. Proprietary and confidential
Early stopping/sequential testing
● Efficient A/B testing where you peek at the data more than once.
○ Controls Type I error rate by “spending” alpha incrementally at each
peak.
85. Proprietary and confidential
Early stopping/sequential testing
● Efficient A/B testing where you peek at the data more than once.
○ Controls Type I error rate by “spending” alpha incrementally at each
peak.
○ Example:
■ Collect data for 1 week, run test, stop test if t-statistic is more
extreme than t-statistic at that peek.
86. Proprietary and confidential
Early stopping/sequential testing
● Efficient A/B testing where you peek at the data more than once.
○ Controls Type I error rate by “spending” alpha incrementally at each
peak.
○ Example:
■ Collect data for 1 week, run test, stop test if t-statistic is more
extreme than t-statistic at that peek.
● Problem: leads to even more bias in treatment estimates!
87. Proprietary and confidential
Early stopping/sequential testing
● Efficient A/B testing where you peek at the data more than once.
○ Controls Type I error rate by “spending” alpha incrementally at each
peak.
○ Example:
■ Collect data for 1 week, run test, stop test if t-statistic is more
extreme than t-statistic at that peek.
● Problem: leads to even more bias in treatment estimates.