SlideShare a Scribd company logo
Emily Robinson
@robinson_es
10 Guidelines for A/B
Testing
About Me
➔ R User (😱)
➔ Background in the social sciences
➔ Formerly at Etsy
➔ Data Scientist at DataCamp
What is A/B Testing?
A/B testing is everywhere
My perspective
Millions of
visitors daily
Data
engineering
pipeline set-up
Generating numbers is easy;
generating numbers you
should trust is hard!
Source: Trustworthy online controlled experiments: five puzzling outcomes explained
Guidelines
Disclaimer
This is Bowen
This is Bowen Bobo
He is our fictional PM for the day
Situation Problem
Bobo: Well, we’re hoping this test
will increase registrations, search
clicks, and course starts
The test increased registrations
by 5%, but decreased course
starts by 3%
1. Have one key metric per experiment
➔ Clarifies decision-making
➔ Can have additional “guardrail”
metrics that you don’t want to
negatively impact
Situation Problem
Bobo : I have 100 test ideas. How
long is each going to take to run?
And which ones should we choose?
Ideas are cheap; prioritizing
them is difficult
2. Use your key metric to do a power calculation
➔ 80% power = if there’s an effect of
this size, 80% chance you detect it
➔ 10,000 daily visitors, 10% conversion
rate, how many days to detect a 5%
increase?
➔ https://bookingcom.github.io/power
calculator/
Situation Problem
Bobo : I checked the experiment
today and we significantly
increased conversion rate! Quick,
stop the test!
Source: http://varianceexplained.org/r/bayesian-ab-testing/, David Robinson
3. Run your experiment the length you’ve planned on
➔ Stick to what you arrived to with
your power analysis
➔ Advanced: always Valid Inference
and sequential testing
Situation Problem
Bobo : I know the test didn’t work
overall, but when I look at Canadian
users on mobile we increased
conversion by 10%!
This is multiple hypothesis
testing and will increase your
false positive rate.
4. Don’t look for differences in every possible segment
➔ Pre-specify hypotheses
➔ Run separate tests
➔ Can use methods to adjust for
multiple hypothesis testing
Situation 5
Situation Problem
Bobo : The experiment was a big
success! The split was 50.5/49.5
instead of 50/50 as planned, but
that’s so small it doesn’t matter,
right?
If you have 200k people in your
experiment, a 50.5/49.5 has p <
.0001. You have bucketing skew
or sample-ratio mismatch.
5. Make sure your experiment is balanced
➔ Use a proportion test to check
your split
➔ If unbalanced, do not use the
results
➔ Bad news: difficult to debug.
Check segments
Situation Problem
Bobo : I read this article about how
much better multi-armed bandits is
better than traditional A/B tests.
Why don’t we use that?
Not full of understanding of
assumptions of those method
6. Don’t overcomplicate your methods
➔ Get the basics right
➔ Designing tests right > super
sophisticated methods
Situation Problem
Bobo: Well, nothing went up, but
nothing went down either, so let’s
just launch it!
May be a negative effect too
small to detect. Adds technical
upkeep.
7. Be careful of launching things because they “don’t hurt”
➔ Decide whether to “launch on
neutral” beforehand
➔ Non-inferiority testing
Situation Problem
Bobo: Hey, we just finished this
experiment. Can you analyze it for
us?
“To consult the statistician after an
experiment is finished is often merely to
ask [them] to conduct a post mortem
examination. [They] can perhaps say what
the experiment died of.”
- Robert Fisher
8. Have a data scientist/analyst involved in the whole process
➔ Helps decide whether it should
be an experiment at all
➔ Make sure you can measure
what you want
➔ Can surface problems along the
way
Situation Problem
Bobo: Hey, we accidentally added
everyone to the experiment. Can we
still use our dashboards to monitor it?
Non-impacted people add noise,
decreasing power
9. Only include people in your analysis who could have been affected
➔ Start tracking people after the
user sees the change
➔ Can be tricky – e.g. changing
threshold for free shipping offer
from $25 to $35
Situation Problem
Bobo: We spent 6 months
redesigning this page, made 50
changes to make it awesome, but the
A/B test shows it did worse. Why?
Time was wasted, and with many
changes hard or impossible to tell
what was the problem
10. Focus on smaller, incremental tests
➔ Work in small design-develop-
measure cycles
➔ Test assumptions
Conclusion
Recap
1. Have one key metric per experiment
2. Use your key metric to do a power
calculation
3. Run your experiment for the length you’ve
planned on
4. Don’t look for differences in every possible
segment
5. Make sure your experiment groups are
balanced
6. Don’t overcomplicate your methods
7. Be careful of launching things because
they don’t hurt
8. Have a data scientist/analyst involved in
the whole process
9. Only include people in your analysis who
could have been affected
10. Focus on smaller, incremental tests
Research papers
➔ Controlled experiments on the web: survey and practical guide (2008)
➔ Seven rules of thumb for web site experiments (2014)
➔ A dirty dozen: twelve common metric interpretation pitfalls in online
controlled experiments (2017)
➔ Democratizing online controlled experiments at Booking.com (2017)
Blog posts and presentations
➔ Design for Continuous Experimentation by Dan McKinley
➔ Scaling Airbnb’s Experimentation Platform by Jonathan Parks
➔ Please, please don’t A/B test that by Tal Raviv
➔ How Etsy handles peeking in A/B Testing by Callie McRee and
Kelly Shen
Thank you!
hookedondata.org
@robinson_es
bit.ly/guidelinesab

More Related Content

What's hot

Test for Success: A Guide to A/B Testing on Emails & Landing Pages
Test for Success: A Guide to A/B Testing on Emails & Landing PagesTest for Success: A Guide to A/B Testing on Emails & Landing Pages
Test for Success: A Guide to A/B Testing on Emails & Landing Pages
Optimizely
 
SXSW 2016 - Everything you think about A/B testing is wrong
SXSW 2016 - Everything you think about A/B testing is wrongSXSW 2016 - Everything you think about A/B testing is wrong
SXSW 2016 - Everything you think about A/B testing is wrong
Dan Chuparkoff
 
4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing
Janessa Lantz
 
A/B testing at Spotify
A/B testing at SpotifyA/B testing at Spotify
A/B testing at Spotify
Ali Sarrafi
 
Ab testing 101
Ab testing 101Ab testing 101
Ab testing 101
Ashish Dua
 
A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation
WrangleConf
 
A/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'tsA/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'ts
Ramkumar Ravichandran
 
Correlation, causation and incrementally recommendation problems at netflix ...
Correlation, causation and incrementally  recommendation problems at netflix ...Correlation, causation and incrementally  recommendation problems at netflix ...
Correlation, causation and incrementally recommendation problems at netflix ...
Roelof van Zwol
 
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
Minho Lee
 
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.jsNetflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Chris Saint-Amant
 
프로덕트를 빠르게 개선하기 위한 베이지안 A/B 테스트
프로덕트를 빠르게 개선하기 위한 베이지안 A/B 테스트프로덕트를 빠르게 개선하기 위한 베이지안 A/B 테스트
프로덕트를 빠르게 개선하기 위한 베이지안 A/B 테스트
Minho Lee
 
Experimentation Platform at Netflix
Experimentation Platform at NetflixExperimentation Platform at Netflix
Experimentation Platform at Netflix
Steve Urban
 
Personalized Playlists at Spotify
Personalized Playlists at SpotifyPersonalized Playlists at Spotify
Personalized Playlists at Spotify
Rohan Agrawal
 
Clover Rings Up Digital Growth to Drive Experimentation
Clover Rings Up Digital Growth to Drive ExperimentationClover Rings Up Digital Growth to Drive Experimentation
Clover Rings Up Digital Growth to Drive Experimentation
Optimizely
 
[팝콘 시즌1] 최보경 : 실무자를 위한 인과추론 활용 - Best Practices
[팝콘 시즌1] 최보경 : 실무자를 위한 인과추론 활용 - Best Practices[팝콘 시즌1] 최보경 : 실무자를 위한 인과추론 활용 - Best Practices
[팝콘 시즌1] 최보경 : 실무자를 위한 인과추론 활용 - Best Practices
PAP (Product Analytics Playground)
 
The Power of A/B Testing
The Power of A/B TestingThe Power of A/B Testing
The Power of A/B Testing
Alexandre Pallota
 
Ab testing
Ab testingAb testing
Ab testing
Fahad Zahid
 
Reimagine Growth: Execute on your customer journey strategy
Reimagine Growth: Execute on your customer journey strategyReimagine Growth: Execute on your customer journey strategy
Reimagine Growth: Execute on your customer journey strategy
CleverTap
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
Justin Basilico
 
[Causal Inference KR] 스타트업에서의 인과추론
[Causal Inference KR] 스타트업에서의 인과추론[Causal Inference KR] 스타트업에서의 인과추론
[Causal Inference KR] 스타트업에서의 인과추론
Bokyung Choi
 

What's hot (20)

Test for Success: A Guide to A/B Testing on Emails & Landing Pages
Test for Success: A Guide to A/B Testing on Emails & Landing PagesTest for Success: A Guide to A/B Testing on Emails & Landing Pages
Test for Success: A Guide to A/B Testing on Emails & Landing Pages
 
SXSW 2016 - Everything you think about A/B testing is wrong
SXSW 2016 - Everything you think about A/B testing is wrongSXSW 2016 - Everything you think about A/B testing is wrong
SXSW 2016 - Everything you think about A/B testing is wrong
 
4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing
 
A/B testing at Spotify
A/B testing at SpotifyA/B testing at Spotify
A/B testing at Spotify
 
Ab testing 101
Ab testing 101Ab testing 101
Ab testing 101
 
A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation
 
A/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'tsA/B Testing Best Practices - Do's and Don'ts
A/B Testing Best Practices - Do's and Don'ts
 
Correlation, causation and incrementally recommendation problems at netflix ...
Correlation, causation and incrementally  recommendation problems at netflix ...Correlation, causation and incrementally  recommendation problems at netflix ...
Correlation, causation and incrementally recommendation problems at netflix ...
 
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
신뢰할 수 있는 A/B 테스트를 위해 알아야 할 것들
 
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.jsNetflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
Netflix JavaScript Talks - Scaling A/B Testing on Netflix.com with Node.js
 
프로덕트를 빠르게 개선하기 위한 베이지안 A/B 테스트
프로덕트를 빠르게 개선하기 위한 베이지안 A/B 테스트프로덕트를 빠르게 개선하기 위한 베이지안 A/B 테스트
프로덕트를 빠르게 개선하기 위한 베이지안 A/B 테스트
 
Experimentation Platform at Netflix
Experimentation Platform at NetflixExperimentation Platform at Netflix
Experimentation Platform at Netflix
 
Personalized Playlists at Spotify
Personalized Playlists at SpotifyPersonalized Playlists at Spotify
Personalized Playlists at Spotify
 
Clover Rings Up Digital Growth to Drive Experimentation
Clover Rings Up Digital Growth to Drive ExperimentationClover Rings Up Digital Growth to Drive Experimentation
Clover Rings Up Digital Growth to Drive Experimentation
 
[팝콘 시즌1] 최보경 : 실무자를 위한 인과추론 활용 - Best Practices
[팝콘 시즌1] 최보경 : 실무자를 위한 인과추론 활용 - Best Practices[팝콘 시즌1] 최보경 : 실무자를 위한 인과추론 활용 - Best Practices
[팝콘 시즌1] 최보경 : 실무자를 위한 인과추론 활용 - Best Practices
 
The Power of A/B Testing
The Power of A/B TestingThe Power of A/B Testing
The Power of A/B Testing
 
Ab testing
Ab testingAb testing
Ab testing
 
Reimagine Growth: Execute on your customer journey strategy
Reimagine Growth: Execute on your customer journey strategyReimagine Growth: Execute on your customer journey strategy
Reimagine Growth: Execute on your customer journey strategy
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
[Causal Inference KR] 스타트업에서의 인과추론
[Causal Inference KR] 스타트업에서의 인과추론[Causal Inference KR] 스타트업에서의 인과추론
[Causal Inference KR] 스타트업에서의 인과추론
 

Similar to 10 Guidelines for A/B Testing

Kolbe A Result for David Dembinski
Kolbe A Result for David DembinskiKolbe A Result for David Dembinski
Kolbe A Result for David Dembinski
David Dembinski
 
Gerlof Hoekstra - OMG What Have We Done - EuroSTAR 2013
Gerlof Hoekstra - OMG What Have We Done - EuroSTAR 2013Gerlof Hoekstra - OMG What Have We Done - EuroSTAR 2013
Gerlof Hoekstra - OMG What Have We Done - EuroSTAR 2013
TEST Huddle
 
Behavior Based Approach to Experiment Design
Behavior Based Approach to Experiment DesignBehavior Based Approach to Experiment Design
Behavior Based Approach to Experiment Design
colemanerine
 
How to Increase Your Testing Success by Combining Qualitative and Quantitativ...
How to Increase Your Testing Success by Combining Qualitative and Quantitativ...How to Increase Your Testing Success by Combining Qualitative and Quantitativ...
How to Increase Your Testing Success by Combining Qualitative and Quantitativ...
Optimizely
 
SearchLove Boston 2017 | Richard Fergie | You Aren't Doing Science and That's OK
SearchLove Boston 2017 | Richard Fergie | You Aren't Doing Science and That's OKSearchLove Boston 2017 | Richard Fergie | You Aren't Doing Science and That's OK
SearchLove Boston 2017 | Richard Fergie | You Aren't Doing Science and That's OK
Distilled
 
The Scientific Method of Experimentation by Google PM
The Scientific Method of Experimentation by Google PMThe Scientific Method of Experimentation by Google PM
The Scientific Method of Experimentation by Google PM
Product School
 
Webinar: Experimentation & Product Management by Indeed Product Lead
Webinar: Experimentation & Product Management by Indeed Product LeadWebinar: Experimentation & Product Management by Indeed Product Lead
Webinar: Experimentation & Product Management by Indeed Product Lead
Product School
 
6 Guidelines for A/B Testing
6 Guidelines for A/B Testing6 Guidelines for A/B Testing
6 Guidelines for A/B Testing
Emily Robinson
 
Testing the unknown: the art and science of working with hypothesis
Testing the unknown: the art and science of working with hypothesisTesting the unknown: the art and science of working with hypothesis
Testing the unknown: the art and science of working with hypothesis
Ardita Karaj
 
UX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT Online 2020: Dr. Martin Tingley, NetflixUX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT
 
Your A/B Tests are Lying to You
Your A/B Tests are Lying to YouYour A/B Tests are Lying to You
Your A/B Tests are Lying to You
John Clevenger
 
Your A/B Tests are Lying to You
Your A/B Tests are Lying to YouYour A/B Tests are Lying to You
Your A/B Tests are Lying to You
John Clevenger
 
Selenium Users Anonymous
Selenium Users AnonymousSelenium Users Anonymous
Selenium Users Anonymous
Dave Haeffner
 
The real c in cro is communication
The real c in cro is communicationThe real c in cro is communication
The real c in cro is communication
VWO
 
Building a culture of testing like lucid
Building a culture of testing like lucidBuilding a culture of testing like lucid
Building a culture of testing like lucid
Kissmetrics on SlideShare
 
You have no idea what your users want - WordCamp PDX
You have no idea what your users want - WordCamp PDXYou have no idea what your users want - WordCamp PDX
You have no idea what your users want - WordCamp PDX
Evan Solomon
 
A/B Testing and the Infinite Monkey Theory
A/B Testing and the Infinite Monkey TheoryA/B Testing and the Infinite Monkey Theory
A/B Testing and the Infinite Monkey Theory
UseItBetter
 
Things Could Get Worse: Ideas About Regression Testing
Things Could Get Worse: Ideas About Regression TestingThings Could Get Worse: Ideas About Regression Testing
Things Could Get Worse: Ideas About Regression Testing
TechWell
 
Develop your inner tester
Develop your inner tester Develop your inner tester
Develop your inner tester
Anne-Marie Charrett
 
Scientific method notes & quiz
Scientific method notes & quizScientific method notes & quiz
Scientific method notes & quiz
jkentner
 

Similar to 10 Guidelines for A/B Testing (20)

Kolbe A Result for David Dembinski
Kolbe A Result for David DembinskiKolbe A Result for David Dembinski
Kolbe A Result for David Dembinski
 
Gerlof Hoekstra - OMG What Have We Done - EuroSTAR 2013
Gerlof Hoekstra - OMG What Have We Done - EuroSTAR 2013Gerlof Hoekstra - OMG What Have We Done - EuroSTAR 2013
Gerlof Hoekstra - OMG What Have We Done - EuroSTAR 2013
 
Behavior Based Approach to Experiment Design
Behavior Based Approach to Experiment DesignBehavior Based Approach to Experiment Design
Behavior Based Approach to Experiment Design
 
How to Increase Your Testing Success by Combining Qualitative and Quantitativ...
How to Increase Your Testing Success by Combining Qualitative and Quantitativ...How to Increase Your Testing Success by Combining Qualitative and Quantitativ...
How to Increase Your Testing Success by Combining Qualitative and Quantitativ...
 
SearchLove Boston 2017 | Richard Fergie | You Aren't Doing Science and That's OK
SearchLove Boston 2017 | Richard Fergie | You Aren't Doing Science and That's OKSearchLove Boston 2017 | Richard Fergie | You Aren't Doing Science and That's OK
SearchLove Boston 2017 | Richard Fergie | You Aren't Doing Science and That's OK
 
The Scientific Method of Experimentation by Google PM
The Scientific Method of Experimentation by Google PMThe Scientific Method of Experimentation by Google PM
The Scientific Method of Experimentation by Google PM
 
Webinar: Experimentation & Product Management by Indeed Product Lead
Webinar: Experimentation & Product Management by Indeed Product LeadWebinar: Experimentation & Product Management by Indeed Product Lead
Webinar: Experimentation & Product Management by Indeed Product Lead
 
6 Guidelines for A/B Testing
6 Guidelines for A/B Testing6 Guidelines for A/B Testing
6 Guidelines for A/B Testing
 
Testing the unknown: the art and science of working with hypothesis
Testing the unknown: the art and science of working with hypothesisTesting the unknown: the art and science of working with hypothesis
Testing the unknown: the art and science of working with hypothesis
 
UX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT Online 2020: Dr. Martin Tingley, NetflixUX STRAT Online 2020: Dr. Martin Tingley, Netflix
UX STRAT Online 2020: Dr. Martin Tingley, Netflix
 
Your A/B Tests are Lying to You
Your A/B Tests are Lying to YouYour A/B Tests are Lying to You
Your A/B Tests are Lying to You
 
Your A/B Tests are Lying to You
Your A/B Tests are Lying to YouYour A/B Tests are Lying to You
Your A/B Tests are Lying to You
 
Selenium Users Anonymous
Selenium Users AnonymousSelenium Users Anonymous
Selenium Users Anonymous
 
The real c in cro is communication
The real c in cro is communicationThe real c in cro is communication
The real c in cro is communication
 
Building a culture of testing like lucid
Building a culture of testing like lucidBuilding a culture of testing like lucid
Building a culture of testing like lucid
 
You have no idea what your users want - WordCamp PDX
You have no idea what your users want - WordCamp PDXYou have no idea what your users want - WordCamp PDX
You have no idea what your users want - WordCamp PDX
 
A/B Testing and the Infinite Monkey Theory
A/B Testing and the Infinite Monkey TheoryA/B Testing and the Infinite Monkey Theory
A/B Testing and the Infinite Monkey Theory
 
Things Could Get Worse: Ideas About Regression Testing
Things Could Get Worse: Ideas About Regression TestingThings Could Get Worse: Ideas About Regression Testing
Things Could Get Worse: Ideas About Regression Testing
 
Develop your inner tester
Develop your inner tester Develop your inner tester
Develop your inner tester
 
Scientific method notes & quiz
Scientific method notes & quizScientific method notes & quiz
Scientific method notes & quiz
 

Recently uploaded

UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
FODUU
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 

Recently uploaded (20)

UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 

10 Guidelines for A/B Testing

  • 2. About Me ➔ R User (😱) ➔ Background in the social sciences ➔ Formerly at Etsy ➔ Data Scientist at DataCamp
  • 3. What is A/B Testing?
  • 4. A/B testing is everywhere
  • 5. My perspective Millions of visitors daily Data engineering pipeline set-up
  • 6. Generating numbers is easy; generating numbers you should trust is hard! Source: Trustworthy online controlled experiments: five puzzling outcomes explained
  • 9. This is Bowen This is Bowen Bobo He is our fictional PM for the day
  • 10. Situation Problem Bobo: Well, we’re hoping this test will increase registrations, search clicks, and course starts The test increased registrations by 5%, but decreased course starts by 3%
  • 11. 1. Have one key metric per experiment ➔ Clarifies decision-making ➔ Can have additional “guardrail” metrics that you don’t want to negatively impact
  • 12. Situation Problem Bobo : I have 100 test ideas. How long is each going to take to run? And which ones should we choose? Ideas are cheap; prioritizing them is difficult
  • 13. 2. Use your key metric to do a power calculation ➔ 80% power = if there’s an effect of this size, 80% chance you detect it ➔ 10,000 daily visitors, 10% conversion rate, how many days to detect a 5% increase? ➔ https://bookingcom.github.io/power calculator/
  • 14. Situation Problem Bobo : I checked the experiment today and we significantly increased conversion rate! Quick, stop the test! Source: http://varianceexplained.org/r/bayesian-ab-testing/, David Robinson
  • 15. 3. Run your experiment the length you’ve planned on ➔ Stick to what you arrived to with your power analysis ➔ Advanced: always Valid Inference and sequential testing
  • 16. Situation Problem Bobo : I know the test didn’t work overall, but when I look at Canadian users on mobile we increased conversion by 10%! This is multiple hypothesis testing and will increase your false positive rate.
  • 17. 4. Don’t look for differences in every possible segment ➔ Pre-specify hypotheses ➔ Run separate tests ➔ Can use methods to adjust for multiple hypothesis testing
  • 18. Situation 5 Situation Problem Bobo : The experiment was a big success! The split was 50.5/49.5 instead of 50/50 as planned, but that’s so small it doesn’t matter, right? If you have 200k people in your experiment, a 50.5/49.5 has p < .0001. You have bucketing skew or sample-ratio mismatch.
  • 19. 5. Make sure your experiment is balanced ➔ Use a proportion test to check your split ➔ If unbalanced, do not use the results ➔ Bad news: difficult to debug. Check segments
  • 20. Situation Problem Bobo : I read this article about how much better multi-armed bandits is better than traditional A/B tests. Why don’t we use that? Not full of understanding of assumptions of those method
  • 21. 6. Don’t overcomplicate your methods ➔ Get the basics right ➔ Designing tests right > super sophisticated methods
  • 22. Situation Problem Bobo: Well, nothing went up, but nothing went down either, so let’s just launch it! May be a negative effect too small to detect. Adds technical upkeep.
  • 23. 7. Be careful of launching things because they “don’t hurt” ➔ Decide whether to “launch on neutral” beforehand ➔ Non-inferiority testing
  • 24. Situation Problem Bobo: Hey, we just finished this experiment. Can you analyze it for us? “To consult the statistician after an experiment is finished is often merely to ask [them] to conduct a post mortem examination. [They] can perhaps say what the experiment died of.” - Robert Fisher
  • 25. 8. Have a data scientist/analyst involved in the whole process ➔ Helps decide whether it should be an experiment at all ➔ Make sure you can measure what you want ➔ Can surface problems along the way
  • 26. Situation Problem Bobo: Hey, we accidentally added everyone to the experiment. Can we still use our dashboards to monitor it? Non-impacted people add noise, decreasing power
  • 27. 9. Only include people in your analysis who could have been affected ➔ Start tracking people after the user sees the change ➔ Can be tricky – e.g. changing threshold for free shipping offer from $25 to $35
  • 28. Situation Problem Bobo: We spent 6 months redesigning this page, made 50 changes to make it awesome, but the A/B test shows it did worse. Why? Time was wasted, and with many changes hard or impossible to tell what was the problem
  • 29. 10. Focus on smaller, incremental tests ➔ Work in small design-develop- measure cycles ➔ Test assumptions
  • 31. Recap 1. Have one key metric per experiment 2. Use your key metric to do a power calculation 3. Run your experiment for the length you’ve planned on 4. Don’t look for differences in every possible segment 5. Make sure your experiment groups are balanced 6. Don’t overcomplicate your methods 7. Be careful of launching things because they don’t hurt 8. Have a data scientist/analyst involved in the whole process 9. Only include people in your analysis who could have been affected 10. Focus on smaller, incremental tests
  • 32. Research papers ➔ Controlled experiments on the web: survey and practical guide (2008) ➔ Seven rules of thumb for web site experiments (2014) ➔ A dirty dozen: twelve common metric interpretation pitfalls in online controlled experiments (2017) ➔ Democratizing online controlled experiments at Booking.com (2017)
  • 33. Blog posts and presentations ➔ Design for Continuous Experimentation by Dan McKinley ➔ Scaling Airbnb’s Experimentation Platform by Jonathan Parks ➔ Please, please don’t A/B test that by Tal Raviv ➔ How Etsy handles peeking in A/B Testing by Callie McRee and Kelly Shen

Editor's Notes

  1. We’re going to talk about 10 situations you may encounter in you’re A/B testing journey. I want to make it clear this is not based on any particular person.
  2. And with that, here is Bobo. He’s our fictional PM for the day. Any resemblance of name and picture based to a product manager I’ve worked with previously is purely coicidencial
  3. I’d come in, I’d get my team