T-tests are the industry de facto method for analyzing A/B tests. Regression is a more general approach that allows you to also control for covariates, potentially increasing your power. This can lead to reduced run times, the ability to detect smaller changes, and higher testing velocity.
Come learn why Stitch Fix uses regression-based analysis for experiments. We'll also share practical insights on how we’re enabling this in an automated platform at scale at Stitch Fix.
Defining Success Metrics for Your Product by Google Product LeaderProduct School
Main Takeaways:
-Learn the what, why, how, when, and where of measuring product performance.
-Success metrics are more than just formulas.
-How to get org-wide alignment on performance measurement
Becoming a Successful PM with Product School's FounderProduct School
Are you trying to break into Product Management, or looking to stay on-top of your Product Management game? Product School's Founder gave insights on how to be a successful PM in Silicon Valley. He also covered how to grow your career once you're a product manager. We had an interactive discussion and answered questions from the audience in real time.
Customer to Product Idea Iteration by Amazon's Product ManagerProduct School
In this talk, Akshay Kerur from Amazon explored working backwards from the concept/customer to an initial product idea.
Main takeaways:
1. Why it's so important to put your product on paper.
2. Questions you need to know and answer about a product prior to any engineering commitment.
3. Ratifying your product idea through proper customer and internal stakeholder identification.
Cracking the Product Manager Interview with Gayle McDowellProduct School
In this talk, Gayle McDowell, author of the book "Cracking the PM Interview", taught people how to prepare for Product Manager interviews, what top companies like Google, Amazon, and Microsoft really look for, and how to tackle the toughest problems.
She talked about how the role of a Product Manager varies across companies, what experience you need, how to make your existing experience translate, what a great Product Manager resume and cover letter look like, and finally, how to master the Product Manager interview questions.
Instacart has revolutionized grocery shopping by bringing groceries to your door in a little as an hour. Behind the scenes, Instacart uses machine learning for everything from routing shoppers to ranking search results. In this talk, Jeremy will cover their recent tech blog post, Deep Learning with Emojis (not Math) ( https://tech.instacart.com/deep-learning-with-emojis-not-math-660ba1ad6cdc ), which details how Instacart is using Keras and Tensorflow to predict the sequence that shoppers will pick items in stores. Jeremy will discuss the data collection, mobile technology and deep learning architectures Instacart is applying to enable on-demand grocery delivery.
How to Crack the PM Interview by Gayle McDowellProduct School
Product Management Event Held at the Product Conference in San Francisco.
Gayle McDowell taught how to prepare for Product Manager interviews, what top companies like Google, Amazon, and Microsoft really look for, and how to tackle the toughest problems.
She also discussed how the ambiguously-named "PM" (product manager / program manager) role varies across companies, what experience you need, how to make your existing experience translate, what a great PM resume and cover letter look like, and finally, how to master the PM interview questions (estimation questions, behavioral questions, case questions, product questions, technical questions, and the super important "pitch").
Customer Centricity and Product Led Growth by Airbnb Product & Growth Product School
Product Management Event at #ProductCon San Francisco about Customer Centricity and Product Led Growth by Product & Growth Manager at Airbnb, Pratik Shah.
Defining Success Metrics for Your Product by Google Product LeaderProduct School
Main Takeaways:
-Learn the what, why, how, when, and where of measuring product performance.
-Success metrics are more than just formulas.
-How to get org-wide alignment on performance measurement
Becoming a Successful PM with Product School's FounderProduct School
Are you trying to break into Product Management, or looking to stay on-top of your Product Management game? Product School's Founder gave insights on how to be a successful PM in Silicon Valley. He also covered how to grow your career once you're a product manager. We had an interactive discussion and answered questions from the audience in real time.
Customer to Product Idea Iteration by Amazon's Product ManagerProduct School
In this talk, Akshay Kerur from Amazon explored working backwards from the concept/customer to an initial product idea.
Main takeaways:
1. Why it's so important to put your product on paper.
2. Questions you need to know and answer about a product prior to any engineering commitment.
3. Ratifying your product idea through proper customer and internal stakeholder identification.
Cracking the Product Manager Interview with Gayle McDowellProduct School
In this talk, Gayle McDowell, author of the book "Cracking the PM Interview", taught people how to prepare for Product Manager interviews, what top companies like Google, Amazon, and Microsoft really look for, and how to tackle the toughest problems.
She talked about how the role of a Product Manager varies across companies, what experience you need, how to make your existing experience translate, what a great Product Manager resume and cover letter look like, and finally, how to master the Product Manager interview questions.
Instacart has revolutionized grocery shopping by bringing groceries to your door in a little as an hour. Behind the scenes, Instacart uses machine learning for everything from routing shoppers to ranking search results. In this talk, Jeremy will cover their recent tech blog post, Deep Learning with Emojis (not Math) ( https://tech.instacart.com/deep-learning-with-emojis-not-math-660ba1ad6cdc ), which details how Instacart is using Keras and Tensorflow to predict the sequence that shoppers will pick items in stores. Jeremy will discuss the data collection, mobile technology and deep learning architectures Instacart is applying to enable on-demand grocery delivery.
How to Crack the PM Interview by Gayle McDowellProduct School
Product Management Event Held at the Product Conference in San Francisco.
Gayle McDowell taught how to prepare for Product Manager interviews, what top companies like Google, Amazon, and Microsoft really look for, and how to tackle the toughest problems.
She also discussed how the ambiguously-named "PM" (product manager / program manager) role varies across companies, what experience you need, how to make your existing experience translate, what a great PM resume and cover letter look like, and finally, how to master the PM interview questions (estimation questions, behavioral questions, case questions, product questions, technical questions, and the super important "pitch").
Customer Centricity and Product Led Growth by Airbnb Product & Growth Product School
Product Management Event at #ProductCon San Francisco about Customer Centricity and Product Led Growth by Product & Growth Manager at Airbnb, Pratik Shah.
Behind every great product is a great team doing work in a way that guarantees results. They are following a roadmap from the starting point to the end product. But a product roadmap can be elusive. This talk addresses why it is important and presents an approach to make one.
Lean Kanban India 2018 | From Upstream to Portfolio Kanban, a Fresh look | P...LeanKanbanIndia
Session Title :
From Upstream to Portfolio Kanban, a Fresh look
Session Overview:
Portfolio Kanban plays a crucial role in balancing demand with capability at the highest level of the organization. Most Portfolio Kanban systems are shallow, as is the currently available guidance. It does not go much further than visualization and a cadence of conversations. At best there is a crude notion of limiting the amount of initiatives in progress. It is time to take the next step. In this presentation we build on our experiences with implementing Upstream Kanban and using systems dynamics to give insight and guidance on creating enterprise flow with Upstream and Portfolio Kanban. We start from Upstream Kanban where we address the problem of managing a fluctuating demand. We analyse the feedback loops and delays that are the source of oscillation and show how to turn oscillation into a (more) steady end-to-end flow. We discuss capacity constraints, liquidity problems and ways of organizing the marketplace that emerges when needs of diverse customers with possibly conflicting priorities need to be matched with a heterogenous capability. Finally, we discuss the role of triage in a portfolio with incommensurable choices.
Summary of the strategy of building low-burn-rate startups, i.e. capital efficient and generally frugal. By taking advantage of open source, agile software, and iterative development, lean startups can operate with much less waste.
How to Think Product Analytics in PM Interviews by Amazon Sr PMProduct School
Main takeaways:
- Knowing what metrics to measure and how to measure them are key skills for a Product Manager. Interviewers are always going to gauge this aspect.
- How should we think about setting Product Metrics for every situation? How should we think about measuring these?
- What are the strengths and limitations of A/B testing. When can you use it and when should you rely on other methods? What are the different methods for measuring metrics and when to employ those.
How to Manage a Platform Product by Yelp Product ManagerProduct School
In this presentation you will learn what a platform product manager does, how to build platforms that delight your customers and learn more about the rewards and challenges of platform product management.
Investor Pitch Deck by Olymsearch - A Reference for Startup Fund Raising (Tru...Truong Bomi
A mockup of pitch deck to Investors that I and my team @ Olymsearch built for pitching while being incubated at Batch 1 by Vietnam Silicon Valley accelerator. Now is the right time to disclose a part of my startup journey with lots of failures and lessons to success and far more.
How to Build a Product Roadmap by eBay Director of ProductProduct School
Sudha Mahajan talked about how to build great roadmaps! Great roadmaps require right trade-offs, right prioritization, strong execution rigger and above all success metrics. A strong roadmap is your channel to success. There is no one size that fits all, but there are certain techniques that can help you get there.
A/B Testing for New Product Launches by Booking.com Sr PMProduct School
Main takeaways:
-There is no one right way of validating a product, A/B testing is just one of them
-Get your product 'qualitatively' validated before 'quantitatively' validating
-Use holdouts to measure the long term success of your new products, while running A/B test in parallel
DI&A Slides: Data Insights and Analytics FrameworksDATAVERSITY
This webinar will provide an overview of the standard architecture components needed to perform analytics and derive data insights from within each of the three common database environments. This will include the sandbox environment for initial data assessment and data science modeling, the big data environment for batch analytics that includes critical governance components and the real-time analytics environment for real-time retrieval of data, and lastly, the integration of real-time data sources.
We will also discuss:
- Components for the data scientist sandbox / lab
- Batch analytics with security and metadata
- Data pipelines
- Real-time access and streaming sources
Measure Your Way to Success by Sephora's former Dir. of ProductProduct School
Product management is about creating change. Metrics are the guideposts that help us ensure that the changes we make are leading to the results we want. They let us forecast, provide early warning signals, and create incentives for action.
In this talk Meghan Cochran talked about designing good metrics, gaining alignment among a broad range of stakeholders, and communicating progress effectively. She discussed the trade-offs of looking at rates & ratios vs absolute numbers, and talked about funnels, cohorts, and other fascinating and exciting measures of success.
How to Prepare For a Product Manager Interview by Google PMProduct School
In this presentation Google Product Manager Neha Bansal will be sharing her secrets on how to position oneself for a Product Manager role without an engineering degree and how to successfully pass a job interview for a PM position.
Talks@Coursera - A/B Testing @ Internet Scalecourseratalks
Talks@Coursera
This tech talk will describe how to build an experiment platform that can handle large-scale experiments. The talk will also discuss several best practices in designing and analyzing online experiments learned from companies like Coursera, Microsoft and LinkedIn.
About the Speakers
Ya Xu has been working in the domain of online A/B testing for over 4 years. She currently leads a team of engineers and data scientists building a world-class online A/B testing platform at LinkedIn. She also spearheads taking LinkedIn's A/B testing culture to the next level by evangelizing best practices and pushing for broad-based platform adoption. She holds a Ph.D. in Statistics from Stanford University.
Chuong (Tom) Do currently leads a team of data engineers and analysts in the Analytics team at Coursera, which is responsible for data infrastructure and quantitative analysis in support of the product and business. He completed his Ph.D. in Computer Science at Stanford University in 2009 and worked as a scientist in the personal genetics company 23andMe until 2012, where his research has collectively spanned the fields of machine learning, computational biology, and statistical genetics.
Dowhy: An end-to-end library for causal inferenceAmit Sharma
In addition to efficient statistical estimators of a treatment's effect, successful application of causal inference requires specifying assumptions about the mechanisms underlying observed data and testing whether they are valid, and to what extent. However, most libraries for causal inference focus only on the task of providing powerful statistical estimators. We describe DoWhy, an open-source Python library that is built with causal assumptions as its first-class citizens, based on the formal framework of causal graphs to specify and test causal assumptions. DoWhy presents an API for the four steps common to any causal analysis---1) modeling the data using a causal graph and structural assumptions, 2) identifying whether the desired effect is estimable under the causal model, 3) estimating the effect using statistical estimators, and finally 4) refuting the obtained estimate through robustness checks and sensitivity analyses. In particular, DoWhy implements a number of robustness checks including placebo tests, bootstrap tests, and tests for unoberved confounding. DoWhy is an extensible library that supports interoperability with other implementations, such as EconML and CausalML for the the estimation step.
TEST #1Perform the following two-tailed hypothesis test, using a.docxmattinsonjanel
TEST #1
Perform the following two-tailed hypothesis test, using a .05 significance level:
· Intrinsic by Gender
· State the null and an alternate statement for the test
· Use Microsoft Excel (Data Analysis Tools) to process your data and run the appropriate test. Copy and paste the results of the output to your report in Microsoft Word.
· Identify the significance level, the test statistic, and the critical value.
· State whether you are rejecting or failing to reject the null hypothesis statement.
· Explain how the results could be used by the manager of the company.
TEST #2
Perform the following two-tailed hypothesis test, using a .05 significance level:
· Extrinsic variable by Position Type
· State the null and an alternate statement for the test
· Use Microsoft Excel (Data Analysis Tools) to process your data and run the appropriate test.
· Copy and paste the results of the output to your report in Microsoft Word.
· Identify the significance level, the test statistic, and the critical value.
· State whether you are rejecting or failing to reject the null hypothesis statement.
· Explain how the results could be used by the manager of the company.
GENERAL ANALYSIS (Research Required)
Using your textbook or other appropriate college-level resources:
· Explain when to use a t-test and when to use a z-test. Explore the differences.
· Discuss why samples are used instead of populations.
The report should be well written and should flow well with no grammatical errors. It should include proper citation in APA formatting in both the in-text and reference pages and include a title page, be double-spaced, and in Times New Roman, 12-point font. APA formatting is necessary to ensure academic honesty.
Be sure to provide references in APA format for any resource you may use to support your answers.
Making Inferences
When data are collected, various summary statistics and graphs can be used for describing data; however, learning about what the data mean is where the power of statistics starts. For example, is there really a difference between two leading cola products? Hypothesis testing is an example of making these types of inferences on data sets.
Hypothesis Tests
Claims are made all the time, such as a particular light bulb will last a certain number of hours.
Claims like this are tested with hypothesis testing. It is a straight forward procedure that consists of the following steps:
1. A claim is made.
2. A value for probability of significance is chosen.
3. Data are collected.
4. The test is performed.
5. The results are analyzed.
Hypothesis tests are performed on the mean of the population. µ
It is not possible to test the full population. For example, it would be impossible to test every light bulb. Instead, the hypothesis test is performed on a sample of the population.
Setting up a Hypothesis Test
When performing hypothesis testing, the test is setup with a null hypothesis (or claim) and the alternative hypothesis. ...
Behind every great product is a great team doing work in a way that guarantees results. They are following a roadmap from the starting point to the end product. But a product roadmap can be elusive. This talk addresses why it is important and presents an approach to make one.
Lean Kanban India 2018 | From Upstream to Portfolio Kanban, a Fresh look | P...LeanKanbanIndia
Session Title :
From Upstream to Portfolio Kanban, a Fresh look
Session Overview:
Portfolio Kanban plays a crucial role in balancing demand with capability at the highest level of the organization. Most Portfolio Kanban systems are shallow, as is the currently available guidance. It does not go much further than visualization and a cadence of conversations. At best there is a crude notion of limiting the amount of initiatives in progress. It is time to take the next step. In this presentation we build on our experiences with implementing Upstream Kanban and using systems dynamics to give insight and guidance on creating enterprise flow with Upstream and Portfolio Kanban. We start from Upstream Kanban where we address the problem of managing a fluctuating demand. We analyse the feedback loops and delays that are the source of oscillation and show how to turn oscillation into a (more) steady end-to-end flow. We discuss capacity constraints, liquidity problems and ways of organizing the marketplace that emerges when needs of diverse customers with possibly conflicting priorities need to be matched with a heterogenous capability. Finally, we discuss the role of triage in a portfolio with incommensurable choices.
Summary of the strategy of building low-burn-rate startups, i.e. capital efficient and generally frugal. By taking advantage of open source, agile software, and iterative development, lean startups can operate with much less waste.
How to Think Product Analytics in PM Interviews by Amazon Sr PMProduct School
Main takeaways:
- Knowing what metrics to measure and how to measure them are key skills for a Product Manager. Interviewers are always going to gauge this aspect.
- How should we think about setting Product Metrics for every situation? How should we think about measuring these?
- What are the strengths and limitations of A/B testing. When can you use it and when should you rely on other methods? What are the different methods for measuring metrics and when to employ those.
How to Manage a Platform Product by Yelp Product ManagerProduct School
In this presentation you will learn what a platform product manager does, how to build platforms that delight your customers and learn more about the rewards and challenges of platform product management.
Investor Pitch Deck by Olymsearch - A Reference for Startup Fund Raising (Tru...Truong Bomi
A mockup of pitch deck to Investors that I and my team @ Olymsearch built for pitching while being incubated at Batch 1 by Vietnam Silicon Valley accelerator. Now is the right time to disclose a part of my startup journey with lots of failures and lessons to success and far more.
How to Build a Product Roadmap by eBay Director of ProductProduct School
Sudha Mahajan talked about how to build great roadmaps! Great roadmaps require right trade-offs, right prioritization, strong execution rigger and above all success metrics. A strong roadmap is your channel to success. There is no one size that fits all, but there are certain techniques that can help you get there.
A/B Testing for New Product Launches by Booking.com Sr PMProduct School
Main takeaways:
-There is no one right way of validating a product, A/B testing is just one of them
-Get your product 'qualitatively' validated before 'quantitatively' validating
-Use holdouts to measure the long term success of your new products, while running A/B test in parallel
DI&A Slides: Data Insights and Analytics FrameworksDATAVERSITY
This webinar will provide an overview of the standard architecture components needed to perform analytics and derive data insights from within each of the three common database environments. This will include the sandbox environment for initial data assessment and data science modeling, the big data environment for batch analytics that includes critical governance components and the real-time analytics environment for real-time retrieval of data, and lastly, the integration of real-time data sources.
We will also discuss:
- Components for the data scientist sandbox / lab
- Batch analytics with security and metadata
- Data pipelines
- Real-time access and streaming sources
Measure Your Way to Success by Sephora's former Dir. of ProductProduct School
Product management is about creating change. Metrics are the guideposts that help us ensure that the changes we make are leading to the results we want. They let us forecast, provide early warning signals, and create incentives for action.
In this talk Meghan Cochran talked about designing good metrics, gaining alignment among a broad range of stakeholders, and communicating progress effectively. She discussed the trade-offs of looking at rates & ratios vs absolute numbers, and talked about funnels, cohorts, and other fascinating and exciting measures of success.
How to Prepare For a Product Manager Interview by Google PMProduct School
In this presentation Google Product Manager Neha Bansal will be sharing her secrets on how to position oneself for a Product Manager role without an engineering degree and how to successfully pass a job interview for a PM position.
Talks@Coursera - A/B Testing @ Internet Scalecourseratalks
Talks@Coursera
This tech talk will describe how to build an experiment platform that can handle large-scale experiments. The talk will also discuss several best practices in designing and analyzing online experiments learned from companies like Coursera, Microsoft and LinkedIn.
About the Speakers
Ya Xu has been working in the domain of online A/B testing for over 4 years. She currently leads a team of engineers and data scientists building a world-class online A/B testing platform at LinkedIn. She also spearheads taking LinkedIn's A/B testing culture to the next level by evangelizing best practices and pushing for broad-based platform adoption. She holds a Ph.D. in Statistics from Stanford University.
Chuong (Tom) Do currently leads a team of data engineers and analysts in the Analytics team at Coursera, which is responsible for data infrastructure and quantitative analysis in support of the product and business. He completed his Ph.D. in Computer Science at Stanford University in 2009 and worked as a scientist in the personal genetics company 23andMe until 2012, where his research has collectively spanned the fields of machine learning, computational biology, and statistical genetics.
Dowhy: An end-to-end library for causal inferenceAmit Sharma
In addition to efficient statistical estimators of a treatment's effect, successful application of causal inference requires specifying assumptions about the mechanisms underlying observed data and testing whether they are valid, and to what extent. However, most libraries for causal inference focus only on the task of providing powerful statistical estimators. We describe DoWhy, an open-source Python library that is built with causal assumptions as its first-class citizens, based on the formal framework of causal graphs to specify and test causal assumptions. DoWhy presents an API for the four steps common to any causal analysis---1) modeling the data using a causal graph and structural assumptions, 2) identifying whether the desired effect is estimable under the causal model, 3) estimating the effect using statistical estimators, and finally 4) refuting the obtained estimate through robustness checks and sensitivity analyses. In particular, DoWhy implements a number of robustness checks including placebo tests, bootstrap tests, and tests for unoberved confounding. DoWhy is an extensible library that supports interoperability with other implementations, such as EconML and CausalML for the the estimation step.
TEST #1Perform the following two-tailed hypothesis test, using a.docxmattinsonjanel
TEST #1
Perform the following two-tailed hypothesis test, using a .05 significance level:
· Intrinsic by Gender
· State the null and an alternate statement for the test
· Use Microsoft Excel (Data Analysis Tools) to process your data and run the appropriate test. Copy and paste the results of the output to your report in Microsoft Word.
· Identify the significance level, the test statistic, and the critical value.
· State whether you are rejecting or failing to reject the null hypothesis statement.
· Explain how the results could be used by the manager of the company.
TEST #2
Perform the following two-tailed hypothesis test, using a .05 significance level:
· Extrinsic variable by Position Type
· State the null and an alternate statement for the test
· Use Microsoft Excel (Data Analysis Tools) to process your data and run the appropriate test.
· Copy and paste the results of the output to your report in Microsoft Word.
· Identify the significance level, the test statistic, and the critical value.
· State whether you are rejecting or failing to reject the null hypothesis statement.
· Explain how the results could be used by the manager of the company.
GENERAL ANALYSIS (Research Required)
Using your textbook or other appropriate college-level resources:
· Explain when to use a t-test and when to use a z-test. Explore the differences.
· Discuss why samples are used instead of populations.
The report should be well written and should flow well with no grammatical errors. It should include proper citation in APA formatting in both the in-text and reference pages and include a title page, be double-spaced, and in Times New Roman, 12-point font. APA formatting is necessary to ensure academic honesty.
Be sure to provide references in APA format for any resource you may use to support your answers.
Making Inferences
When data are collected, various summary statistics and graphs can be used for describing data; however, learning about what the data mean is where the power of statistics starts. For example, is there really a difference between two leading cola products? Hypothesis testing is an example of making these types of inferences on data sets.
Hypothesis Tests
Claims are made all the time, such as a particular light bulb will last a certain number of hours.
Claims like this are tested with hypothesis testing. It is a straight forward procedure that consists of the following steps:
1. A claim is made.
2. A value for probability of significance is chosen.
3. Data are collected.
4. The test is performed.
5. The results are analyzed.
Hypothesis tests are performed on the mean of the population. µ
It is not possible to test the full population. For example, it would be impossible to test every light bulb. Instead, the hypothesis test is performed on a sample of the population.
Setting up a Hypothesis Test
When performing hypothesis testing, the test is setup with a null hypothesis (or claim) and the alternative hypothesis. ...
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들Minho Lee
2021-09-04 프롬 특강 발표자료입니다.
---
많은 사람들이 A/B 테스트가 중요하다고 말합니다.
그런데 우리는 뭘 믿고 A/B 테스트에 의사결정을 맡기는 걸까요?
A/B 테스트는 그냥 돌리면 성과를 만들어주는 마법의 도구가 아닙니다.
신뢰할 수 있는 실험 결과를 위해 어떤 고민이 더 필요한지 살펴보려고 합니다.
One of the most commonly asked questions is “when is an MVT experiment or AB test finished?”
Is it at 30 days...? 100 conversions...? 10,000 visitors...?
The short answer is... it depends.
I love the smell of data in the morning (getting started with data science) ...Troy Magennis
Data Science 101 for software development. I know it misses the purist view of Data Science, but this is intended to get you started! First presented at Agile 2017 in Florida.
Supercharge your AB testing with automated causal inference - Community Works...Egor Kraev
An A/B test consists of splitting the customers into a test and a control group, and choosing a large enough sample size to observe the average treatment effect (ATE) we are interested in, in spite of all the other factors driving outcome variance. With causal inference models, we can do better than that, by estimating the effect conditional on customer features (CATE), thus turning customer variability from noise to be averaged over to a valuable source of segmentation, and potentially requiring smaller sample sizes as a result. Unfortunately, there are many different models available for estimating CATE, with many parameters to tune and very different performance. In this talk, we will present our auto-causality library, which combines the three marvelous packages from Microsoft – DoWhy, EconML, and FLAML – to do fully automated selection and tuning of causal models based on out-of-sample performance, just like any other AutoML package does. We will describe the projects inside Wise currently starting to apply it, and present results on comparative model performance and out-of-sample segmentation on Wise CRM data.
Optimizely Workshop: Take Action on Results with StatisticsOptimizely
Optimizely recently released the stats engine, which moves away from the traditional statistics model and into a new framework that is more aligned with modern business operations. In this workshop, we’ll walk you through the core trade-offs in A/B Testing, and how you can use them to decide when to stop running your test.
Building a Testing Playbook by Andrew RichardsonDelphic Digital
A testing playbook combines the best practices of testing and optimization, along with communication strategies, education, and gaining buy-in from your client. Andrew Richardson, Senior Director of Analytics at Delphic Digital, provides a peek behind the curtain to reveal how Delphic prioritizes tests, recruits/trains/staffs-up for a testing practice, and moved from A/B to multivariate testing. Come with and open mind, walk away with a Testing Playbook Template you can put to use at once.
Generalized linear models (GLMs) and gradient boosting machines (GBMs) are two of the most widely used supervised learning approaches in all of commercial data science. GLMs have been the go-to predictive and inferential modeling tool for decades, but important mathematical and computational advances have been made in training GLMs in recent years. This talk will contrast H2O’s implementation of penalized GLM techniques with ordinary least squares and give specific hints for building regularized and accurate GLMs for both predictive and inferential purposes. As more organizations begin experimenting with and embracing algorithms from the machine learning tradition, GBMs have come to prominence due to their predictive accuracy, the ability to train on real-world data, and resistance to overfitting training data. This talk will give some background on the GBM approach, some insight into the H2O implementation, and some tips for tuning and interpreting GBMs in H2O.
Patrick's Bio:
Patrick Hall is a senior data scientist and product engineer at H2O.ai. Patrick works with H2O.ai customers to derive substantive business value from machine learning technologies. His product work at H2O.ai focuses on two important aspects of applied machine learning, model interpretability and model deployment. Patrick is also currently an adjunct professor in the Department of Decision Sciences at George Washington University, where he teaches graduate classes in data mining and machine learning.
Prior to joining H2O.ai, Patrick held global customer facing roles and R & D research roles at SAS Institute. He holds multiple patents in automated market segmentation using clustering and deep neural networks. Patrick is the 11th person worldwide to become a Cloudera certified data scientist. He studied computational chemistry at the University of Illinois before graduating from the Institute for Advanced Analytics at North Carolina State University.
Personalization allows Stitch Fix to style its clients and provide recommendations to help them find what they love. To do this, the company gathers information about a client’s preferences up front when they sign up from the service and learns more about them as they become longer-term customers. This information is important for making recommendations but also must be protected and managed with care.
The data science team at Stitch Fix is the primary owner of the recommendation systems. Backing them up is the data platform team, who maintain the data infrastructure, data warehouse, and supporting tools and services. This data warehouse has several different data sources that read and write into it. This includes a logging pipeline for events, every Spark-based ETL, and daily snapshots of structured data from Stitch Fix applications.
Neelesh Srinivas Salian explains Stitch Fix’s process to better understand the movement and evolution of data within its data warehouse, from the initial ingestion from outside sources through all of its ETLs. Neelesh also details how Stitch Fix built a service that helps the company understand the lineage information that is associated with each table in the data warehouse. This service helps the company understand the source, parentage, and journey of all data in the warehouse. Although Stitch Fix makes sure to anonymize and filter out sensitive information from this data, the company needs a more flexible long-term solution as the business expands.
Stitch Fix aspires to help you find the style that you will love. Data, the backbone of the business, is used to help with styling recommendations, demand modeling, user acquisition, and merchandise planning and also to influence business decisions throughout the organization. These decisions are backed by algorithms and data collected and interpreted based on client preferences. Neelesh Srinivas Salian offers an overview of the compute infrastructure used by the data science team at Stitch Fix, covering the architecture, tools within the larger ecosystem, and the challenges that the team overcame along the way.
Apache Spark plays an important role in Stitch Fix’s data platform, and the company’s data scientists use Spark for their ETL and Presto for their ad hoc queries. The goal for the team running the compute infrastructure is to understand and make the data scientists’ lives easier, particularly in terms of usability of Spark, by building tools that expedite the process of getting started with Spark and transitioning from an ad hoc to a production workflow. The compute infrastructure is a part of the data platform that is responsible for all the needs of data scientists as Stitch Fix.
Neelesh shares Stitch Fix’s journey, exploring its ad hoc and production infrastructure and detailing its in-house tools and how they work in synergy with open source frameworks in a cloud environment. Neelesh also discusses the additional improvements to the infrastructure that help persist information for future use and optimization and explains how the implementation of Amazon’s EMR FS has helped make it easier to read from the S3 source.
Stitch Fix aspires to help you find the style that you will love. Data, the backbone of the business, is used to help with styling recommendations, demand modeling, user acquisition, and merchandise planning and also to influence business decisions throughout the organization. These decisions are backed by algorithms and data collected and interpreted based on client preferences. This talk offers an overview of the compute infrastructure used by the data science team at Stitch Fix, covering the architecture, tools within the larger ecosystem, and the challenges that the team overcame along the way.
Apache Spark plays an important role in Stitch Fix’s data platform, and the company’s data scientists use Spark for their ETL and Presto for their ad hoc queries. The goal for the team running the compute infrastructure is to understand and make the data scientists’ lives easier, particularly in terms of usability of Spark, by building tools that make it easier to get started with Spark and transition themselves to a daily workflow. The compute infrastructure is a part of the data platform that is responsible for all the needs of data scientists as Stitch Fix.
In this talk, we look at Stitch Fix’s journey, exploring its Spark setup, in-house tools and how they work in synergy with open source frameworks in a cloud environment. There are additional improvements to the infrastructure that help persist information for future use and optimization and we look at how the implementation of Amazon’s EMR FS has helped make it easier for us to read from the S3 source.
Moment-based estimation for hierarchical models in Apache SparkStitch Fix Algorithms
At Stitch Fix, hierarchical models are one of the core machine learning frameworks used in our recommender systems technology. Hierarchical models allow for estimation on clustered data, when classical assumptions of identically distributed random variables break down. Traditional likelihood-based methods for fitting hierarchical models often struggle with the scale of data found in industry, which has prompted recent research into moment-based procedures for parameter estimation. Spark doesn't have a native library for fitting these models, and to our knowledge, no moment-based estimation software has been developed previously utilizing a distributed computational system. This talk will review our development of Spark software utilizing these new estimation methods, detail the theory behind the approach, and compare our software to similar open source packages in Spark and other popular languages.
Have you built a model you’re confident in? Great! Now, you need to put it into production. Generally, this means reliably transmitting the output of a model to another system, but people can mean a lot of things when they say deployment: data, API, or code artifact deployments. First, we will discuss how these deployment types fit into model development workflows and integrate with outside systems. Then we will move on to the fun, gnarly stuff -- challenges in deployment and strategies from across the industry for managing these issues.
I'll provide guidelines for thinking about empirical performance evaluation of parallel programs in general and of Spark jobs in particular. It's easier to be systematic about this if you think in terms of "what's the effective network bandwidth we're getting?" instead of "How fast does this particular job run?" In addition, the figure of merit for parallel performance isn't necessarily obvious. If you want to minimize your AWS bill you should almost certainly run on a single node (but your job may take six months to finish). You may think you want answers as quickly as possible, but if you could make a job finish in 55 minutes instead 60 minutes while doubling your AWS bill, would you do it? No? Then what exactly is the metric that you should optimize?
When We Spark and When We Don’t: Developing Data and ML PipelinesStitch Fix Algorithms
The data platform at Stitch Fix runs thousands of jobs a day to feed data products that provide algorithmic capabilities to power nearly all aspects of the business, from merchandising to operations to styling recommendations. Many of these jobs are distributed across Spark clusters, while many others are scheduled as isolated single-node tasks in containers running Python, R, or Scala. Pipelines are often comprised of a mix of task types and containers.
This talk will cover thoughts and guidelines on how we develop, schedule, and maintain these pipelines at Stitch Fix. We’ll discuss guidelines on how we think about which portions of the pipelines we develop to run on what platforms (e.g. what is important to run distributed across Spark clusters vs run in stand-alone containers) and how we get them to play well together. We’ll also provide an overview of tools and abstractions that have been developed at Stitch Fix to facilitate the process from development, to deployment, to monitoring them in production.
When marketing teams spend money on a paid acquisitions program it is crucial to understand the effect of that ad spend. In this talk, we will outline incrementality as a way to measure the causal impact that ad spend has on acquiring new customers and its advantages over more traditional metrics. We will walk through several ad measurement products available today and give examples of how to apply them to your business.
Autonomy and ownership are core to working at Stitch Fix, particularly on the Algorithms team. Data Scientists are expected to build their systems end to end and maintain them in the long run. We rely on automation, documentation, and collaboration to enable data scientist to build and maintain production services. In this talk I will discuss what we have built and how we communicate about these tools with our data scientists.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Search and Society: Reimagining Information Access for Radical Futures
Progression by Regression: How to increase your A/B Test Velocity
1. Progression by Regression:
How to increase your A/B Test Velocity
August 2018
Aaron Bradley
linkedin.com/in/abradle2
Stefan Krawczyk
@stefkrawczyk
linkedin.com/in/skrawczyk
2. Contents
What is Stitch Fix?
Why A/B Test?
Why is A/B Test velocity important?
Formulating an Opinion
Those t-tests
Regression
Regression @ Stitch Fix
In Conclusion
3. Who: we’re data platform engineers working on Stitch Fix’s Expt. Platform
9. At your own leisure
Algorithms Tour:
- https://algorithms-tour.stitchfix.com/
Multithreaded Blog:
- https://multithreaded.stitchfix.com/algorithms/blog/
13. To attempt to infer causality for the purpose of
having confidence in making decisions
Goal of A/B Testing
14. Goal of A/B Testing: Example
http://blog.twn.ee/sites/default/files/inline-images/02.png.pagespeed.ce_.BmWcShEZAM.png
15. Goal of A/B Testing: Example
http://blog.twn.ee/sites/default/files/inline-images/02.png.pagespeed.ce_.BmWcShEZAM.png
https://pixabay.com/en/decision-choice-path-road-1697537/
?
22. How do we formulate an opinion?
http://blog.twn.ee/sites/default/files/inline-images/02.png.pagespeed.ce_.BmWcShEZAM.png
https://pixabay.com/en/decision-choice-path-road-1697537/
?
?
23. “Can we reject the null hypothesis?”
Formal Statistical Phrasing
24. “Given the observed data,
how likely could these differences
have occurred by chance?”
In Plain English
25. To name some:
● Chi-squared
● Binomial proportions
● ANOVA
● Regression
● Wald test
● Welch’s t-test
● One sample t-test
● Two sample t-test
● Paired t-test
● Z-test
● Generalized estimating
equations
There are a bunch of statistical tests
Choosing one depends on things
like:
● Type of data, e.g. binomial or
continuous
● Amount of data
● Independence assumptions of
the data
● Outcome that you’re testing
● Whether you’re a statistician...
26. Choosing one depends on things
like:
● Type of data, e.g. binomial or
continuous
● Amount of data
● Independence assumptions of
the data
● Outcome that you’re testing
● Whether you’re a statistician...
To name some:
● Chi-squared
● Binomial proportions
● ANOVA
● Regression
● Wald test
● Welch’s t-test
● One sample t-test
● Two sample t-test
● Paired t-test
● Z-test
● Generalized estimating
equations
There are a bunch of statistical tests
28. The t-test is the most common method used in A/B testing.
A t-test is a way to compare two means.
It relates to the T-distribution.
General form:
What is a t-test?
29. The t-test is the most common method used in A/B testing.
A t-test is a way to compare two means.
It relates to the T-distribution.
General form:
What is a t-test?
Difference of means
Standard Error:
Contains standard
deviation and sample
size.
Use this value to get a
measure of probability of
seeing this result by chance
using T-distribution
30. The t-test is the most common method used in A/B testing.
A t-test is a way to compare two means.
It relates to the T-distribution.
General form:
What is a t-test?
Difference of means
Standard Error*:
Use this value to get a
measure of probability of
seeing this result by chance
using T-distribution
31. The t-test is the most common method used in A/B testing.
A t-test is a way to compare two means.
It relates to the T-distribution.
General form:
What is a t-test?
Difference of means
Standard Error*:
Use this value to get a
measure of probability of
seeing this result by chance
using T-distribution
32. There are a few different variations of the t-test.
People most likely use/refer to the two-sample t-test.
A t-test is assumed to be only used for comparing continuous data:
E.g.:
● Height
● Weight
● Time spent on page
● Lifetime value (LTV)
● etc.
Two Sample t-test
33. There are a few different variations of the t-test.
People most likely use/refer to the two-sample t-test.
A t-test is assumed to be only used for comparing continuous data:
E.g.:
● Height
● Weight
● Time spent on page
● Lifetime value (LTV)
● etc.
Two Sample t-test
But using the Central Limit Theorem
you can also use it for:
● Proportions
● Count data
● ...
34. There are a few different T-tests.
People most likley use/refer to the two-sample t-test.
A t-test is assumed to be only used for comparing continuous data:
E.g.:
● Height
● Weight
● Time spent on page
● Lifetime value (LTV)
● etc.
Two Sample t-test
But using the Central Limit Theroem
you can also use it for:
● Proportions
● Count data
● ...
35. One reason for its widespread use is that it is easy to calculate:
● Just need to be able to sum, divide, square, and square root!
○ You can even do it in SQL … !
There are some assumptions on:
● Independence
● Normally distributed
● Homogeneity of variances*
Two Sample t-test
36. One reason for its widespread use is that it is easy to calculate:
● Just need to be able to sum, divide, square, and square root!
○ You can even do it in SQL … !
There are some assumptions on:
● Independence
● Normally distributed
● Homogeneity of variances
Two Sample t-test
38. Slow downs with the t-test
Type I Errors
(False Positives)
vs
Type II Errors
(False Negatives)
α β
39. We need to balance:
Type I Errors (false positives):
“Rejecting the null hypothesis while it is true”
Type II Errors (false negatives):
“Incorrectly retaining the null hypothesis.”
Reasons that slow us down
40. Controlling for Type I Errors == Significance == α
Typically set at 0.05 or 5%
→ so 1 / 20 False Positives
This where a p-value of 0.05 being significant comes from.
Typically you don’t change this threshold to go faster.
Reasons that slow us down
41. Controlling for Type I Errors == Significance == α
Typically set at 0.05 or 5%
→ so 1 / 20 False Positives
This where a p-value of 0.05 being significant comes from.
Typically you don’t change this threshold to go faster.
Reasons that slow us down
42. Controlling for Type II Errors == Power == (1 - 𝛃)
“Probability that you correctly rejected the null hypothesis.”
Standard is 0.8 or 80%
→ 4 / 5 times if there was an effect you’d be able to
detect it.
Power is affected by:
● Effect size.
● Sample size.
● Variation of the data
Reasons that slow us down
} Standard Error
43. Tangent: What is an underpowered expt.?
http://rpsychologist.com/d3/NHST/
44. Tangent: What is an underpowered expt.?
http://rpsychologist.com/d3/NHST/
46. So how can we move faster?
1. Only make bigger changes
47. So how can we move faster?
1. Only make bigger changes
→ Need bigger ideas / more resources.
…
48. So how can we move faster?
1. Only make bigger changes
→ Need bigger ideas / more resources.
1. Increase sample size
49. So how can we move faster?
1. Only make bigger changes
→ Need bigger ideas / more resources.
1. Increase sample size
→ Run longer tests.
50. So how can we move faster?
1. Only make bigger changes
→ Need bigger ideas / more resources.
1. Increase sample size
→ Run longer tests.
1. Reduce variability
51. So how can we move faster?
1. Only make bigger changes
→ Need bigger ideas / more resources.
1. Increase sample size
→ Run longer tests.
1. Reduce variability
→ Detect smaller changes / run shorter tests!
52. So how can we move faster?
1. Only make bigger changes
→ Need bigger ideas / more resources.
1. Increase sample size
→ Run longer tests.
1. Reduce variability
→ Detect smaller changes / run shorter tests!
→ Reduce the standard deviation term!
53. So how can we move faster?
1. Only make bigger changes
→ Need bigger ideas / more resources.
1. Increase sample size
→ Run longer tests.
1. Reduce variability
→ Detect smaller changes / run shorter tests!
→ Reduce the standard deviation term!
But you can’t do this with a two sample t-test!
55. How regression does and doesn’t help
Regression enables:
● Increasing power with
covariates
● Increased test velocity
● Bias correction*
● Handling of more complex
correlation structure*
Regression does not:
● Allow you to skip your power
analysis (you are running power
analyses, right? I’m sure you are)
● Allow you to run
underpowered experiments
● Remove the need for good
experimental design
● Solve peeking or multiple
comparisons concerns*
● Automatically enable
sequential testing*
● Adjust for winner’s curse**Not covered in this talk
56. How regression does and doesn’t help
Regression enables:
● Increasing power with
covariates
● Increased test velocity
● Bias correction*
● Handling of more complex
correlation structure*
Regression does not:
● Allow you to skip your power
analysis (you are running power
analyses, right? I’m sure you are)
● Allow you to run
underpowered experiments
● Remove the need for good
experimental design
● Solve peeking or multiple
comparisons concerns*
● Automatically enable
sequential testing*
● Adjust for winner’s curse**Not covered in this talk
57. People often think of regression for prediction, t-tests for
inference.
But t-tests are a special case of linear regression.
You can use regression in place of t-tests, and it opens the door to
new levers - efficiency.
What to get out of this section
58. You can use regression instead of t-tests. But why?
Using regression for hypothesis testing
within condition variability
between condition variability
59. You can use regression instead of t-tests. But why?
Using regression for hypothesis testing
Cell A Cell B
within condition variability
between condition variability
60. You can use regression instead of t-tests. But why?
Using regression for hypothesis testing
Cell A Cell B
within condition variability
between condition variability
61. You can use regression instead of t-tests. But why?
Using regression for hypothesis testing
Cell A Cell B
within condition variability
between condition variability
62. Using regression for hypothesis testing
β
^
H0: β = 0
Ha: β ≠ 0
within condition variability
between condition variability
You can use linear regression instead of t-tests. But why?
Regression gives us a lever to
decrease variance without
increasing n by modeling out
some within-condition variability
63. Using regression for hypothesis testing
shrinking within-condition
variability
same between-condition
variability
64. Example - Client Email Campaign
Control Variant
Are users who receive the new variant of a marketing email more likely
have an increased Average Order Value (AOV) on their next shipment?
65. Example - Client Email Campaign
Control Variant
What explains a higher order value for a client?
Between condition variability
● The treatment (hopefully!)
Within condition variability
● How long they’ve been a client
● A client’s order value on their last shipment
● Delay between when they received the
email and when they opened it
66. Example - Client Email Campaign
Control Variant
Between condition variability
● The treatment (hopefully!)
Within condition variability
● How long they’ve been a client
● A client’s order value on their last shipment
● Delay between when they received the
email and when they opened it
What explains a higher order value for a client?
67. Example - Client Email Campaign
Control Variant
Between condition variability
● The treatment (hopefully!)
Within condition variability
● How long they’ve been a client
● A client’s order value on their last shipment
● Delay between when they received the
email and when they opened it
aov ~ 1 + cell_id + client_tenure + ov_previous_shipment
https://exp-platform.com/Documents/2013-02-CUPED-ImprovingSensitivityOfControlledExperiments.pdf
68. Getting increased power by controlling for covariates requires you to find covariates
which decrease between-condition variability
● Make sure they aren’t correlated with the treatment
○ Rule of thumb: only use pre-experiment data
● Best covariates are highly correlated with your outcome variable
○ Often the pre-experiment value of your outcome is best one
● Visitor / conversion experiments: let us know what you find!
Covariates: what to use
https://exp-platform.com/Documents/2013-02-CUPED-ImprovingSensitivityOfControlledExperiments.pdf
70. Regression: How we do it
● Model computed on-the-fly in
metrics-service
● Simple python app fetching data
from presto
● statsmodels / patsy for regression
● BYOD for more complex models
(bootstrapping, hierarchical
mixed models, gee, etc)
Metrics Service Data
Warehouse
Presto Nightly ETLs
71. Regression: Things we’ve tried
● R vs Spark vs Python
● Data size: big vs small.
● Nightly ETL vs Online
● Slice & Dice vs Preset Filters
72. Regression: How we do it
● Metrics defined in yaml file
● Model is specified via type,
family, link, label column
(response), and covariates
● SQL query to provide necessary
columns from underlying
experiments tables
order_value ~ 1 + cell_id + tenure
73. python: statsmodels + patsy
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
# fetch data somehow. returned data frame has columns cell_id, order_value
df = get_data()
# make cell_id categorical
df.cell_id = df.cell_id.astype('category', categories=[1,2])
# intercept term is implicit in following formula
model = smf.ols(formula='order_value ~ cell_id', data=df)
model_fit = model.fit()
print(model_fit.summary())
control_cell_estimate = model_fit.params['Intercept']
treatment_cell_estimate = model_fit.params['Intercept'] + model_fit.params['cell_id[T.2]']
p = model_fit.pvalues['cell_id[T.2]']
Linear Regression: Example Code
Gotchas
● cell_id must be categorical - needs to be
dummy encoded
● continuous covariates: mean-center
● discrete covariates: think about proper
contrast coding
● be careful about 1 vs 2 sided hypotheses
● think about correlations between your
randomization units
79. ● You can use regression in place of a t-test today!
● Regression gives you the tools to better control
variance.
● Moar Power!
● With increased power you can conclude more
tests faster.
79
Conclusion
80. ● You can use regression in place of a t-test today!
● Regression gives you the tools to better control
variance.
● Moar Power!
● With increased power you can conclude more
tests faster.
● Or, you can measure smaller changes better.
80
Conclusion