[PositConf 2023] How Data Scientists Broke A/B Testing (and How We Can Fix It)

C
Carl VogelData Scientist
HowDataScientists
BrokeA/BTesting
(andhowwecanfixit)
Questions?
pos.it/slido-A
A Completely
True Story
[PositConf 2023] How Data Scientists Broke A/B Testing (and How We Can Fix It)
Launch on
Neutral
(But thanks anyway)
ExistentialDread
(Get used to it)
A real PM
“If it’s something we really believe
in, I’ll launch on a flat result … if
it’s part of a broader strategy.”
“My features are hard as shit to build,
but easy to tweak, so I’m not always
worried about statistical significance.”
Another real PM
NotjustNHST
Features aren’t IID
Path dependencies in
feature roadmaps
We develop experiences by
building up features over
time and it’s helpful to
launch them incrementally
MDE is basically zero
Feature costs are nearly all
sunk before the test
Any lift pays off
NotjustNHST
Risk is mismeasured
Decision makers don’t
think about Type I and II
error rates, per se
They just want to make
more money than they lose
CanImakegood
decisionsabout
smalltomoderate
effectsquickly?
Youcan’tmake
reliableinferences
aboutsmallto
moderateeffects
quickly.
Didtheymisusethetool?
Ordidwehandthemthewrongone?
Non-Inferiority
Designs
Non-inferioritydesigns
Let’s try not to wreck the place
Superiority Non-Inferiority
Non-inferioritydesigns
Let’s try not to wreck the place
• Inferiority margins ( ) prompt us to ask:
• How much do we believe in this feature?
• How quickly will we improve on it?
• Stakeholders can give meaningful answers to these questions
• Compare to MDE/minimal lift, which is often made up
• Avoid meaningless minimum e
ff
ect estimates
• Can power against a “no e
ff
ect” alternative
Δ
[PositConf 2023] How Data Scientists Broke A/B Testing (and How We Can Fix It)
What’s
the rush?
Thecostsoflongexperiments
Time is money, folks
• Opportunity cost of time:
• Experimental features live on a roadmap, waiting for launch decisions
delays development of subsequent features
• Opportunity cost of sampling:
• As long as the experiment runs, many users aren’t getting the best
variant
• Maintenance costs:
• More experiments running means more complexity in the codebase,
more e
ff
ort, etc.
Value of
Information
Designs
Whenisdataworthit?
Good things are worth waiting for
•Waiting is costly, but data is valuable.
•We should keep going as long as the value
of more data exceeds the cost of more time
•Quantify our impatience as part of test
design
ExpectedValuevs.CostofData
$0
$20,000
$40,000
$60,000
$80,000
Test Length
0 15 30 45 60
Exp. Value
Cost
Net Exp.
Value
Whyisdatavaluable?
How dumb am I, in dollars?
• Before we have data, our range of potential lifts is wide
• Our best guess could be way o
ff
; we could make a big
mistake
• Observing data narrows the range, even if our new guess is
wrong, it won’t be wrong by as much.
• If the value of being less wrong (in expectation) exceeds the
cost of waiting for the data, LFG!
ExpectedValueofSampleInformation
ExpectedValueofSampleInformation
ExpectedValueofSampleInformation
ExpectedValueofSampleInformation
ExpectedValueofSampleInformation
ExpectedValueofSampleInformation
$0
$10K
$200K
Sequentialtestingdecisions
Don’t stop ’til you get enough
• We can do this again after collecting some data
• This changes the core decision from: “is B > A?” to “should I stop or
continue testing?”
• Good
fi
t for A/B tests, where we collect data passively just by
waiting
• Once more data isn’t worth it, launch the best observed variant,
the inference problem is irrelevant (Claxton ’96)
• This is our best information, and it’s not worth getting more
Lessons
What’stheProblem?
Going back to basics
There’s no silver bullet
You may have other problems; you’ll need
other solutions
Misuse of tools should prompt us to
rethink the problem
What are we actually trying to solve?
What are the costs, benefits, and risks?
What’stheProblem?
Going back to basics
Are we solving the problem, or treating
symptoms?
Launch-on-neutral, run-til-significant, peeking,
etc. are symptoms, not the root problem
Lots of advanced techniques speed up tests, but
don’t actually address reasons for impatience
Here,there,andeverywhere
You’re soaking in it
This isn’t just about A/B testing
But it’s a domain where we have very
familiar tools close at hand
Whatareweherefor?
People who solve problems for people are the luckiest people in the world
This is the fun stuff
This is where we add value as data
scientists
These problems aren’t solved
Try new stuff!
Carl Vogel
Principal Data Scientist
carl.vogel@babylist.com
Thanks!
1 of 34

Recommended

Always Valid Inference (Ramesh Johari, Stanford) by
Always Valid Inference (Ramesh Johari, Stanford)Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)Hakka Labs
3K views35 slides
QGIS Tutorial 2 by
QGIS Tutorial 2QGIS Tutorial 2
QGIS Tutorial 2niloyghosh1984
5.7K views43 slides
Introduction to GIS by
Introduction to GISIntroduction to GIS
Introduction to GISMayuresh Padalkar
568 views19 slides
Raster data ppt by
Raster data pptRaster data ppt
Raster data pptAvinashAvi110
548 views8 slides
Orienteering Mapping Using a GPS Watch by
Orienteering Mapping Using a GPS WatchOrienteering Mapping Using a GPS Watch
Orienteering Mapping Using a GPS WatchGord Hunter
2.8K views10 slides
1 introduction systèmes-information-géographique by
1 introduction systèmes-information-géographique1 introduction systèmes-information-géographique
1 introduction systèmes-information-géographiqueAfou Lazyboy
725 views36 slides

More Related Content

What's hot

Mobile mapping system by
Mobile mapping systemMobile mapping system
Mobile mapping systemYanto Budisusanto
3.4K views30 slides
Plugins in QGIS and its uses by
Plugins in QGIS and its usesPlugins in QGIS and its uses
Plugins in QGIS and its usesMayuresh Padalkar
1K views24 slides
Data input and transformation by
Data input and transformationData input and transformation
Data input and transformationMohsin Siddique
3.2K views46 slides
Praktikum Sistem Basis Data menggunakan PostgresSQL by
Praktikum Sistem Basis Data menggunakan PostgresSQLPraktikum Sistem Basis Data menggunakan PostgresSQL
Praktikum Sistem Basis Data menggunakan PostgresSQLMega Yasma Adha
206 views27 slides
[공간정보시스템 개론] L04 항공사진의 이해 by
[공간정보시스템 개론] L04 항공사진의 이해[공간정보시스템 개론] L04 항공사진의 이해
[공간정보시스템 개론] L04 항공사진의 이해Kwang Woo NAM
2.6K views20 slides
Geographical information systems by
Geographical information systemsGeographical information systems
Geographical information systemsGift Musanza
167 views57 slides

What's hot(20)

Data input and transformation by Mohsin Siddique
Data input and transformationData input and transformation
Data input and transformation
Mohsin Siddique3.2K views
Praktikum Sistem Basis Data menggunakan PostgresSQL by Mega Yasma Adha
Praktikum Sistem Basis Data menggunakan PostgresSQLPraktikum Sistem Basis Data menggunakan PostgresSQL
Praktikum Sistem Basis Data menggunakan PostgresSQL
Mega Yasma Adha206 views
[공간정보시스템 개론] L04 항공사진의 이해 by Kwang Woo NAM
[공간정보시스템 개론] L04 항공사진의 이해[공간정보시스템 개론] L04 항공사진의 이해
[공간정보시스템 개론] L04 항공사진의 이해
Kwang Woo NAM2.6K views
Geographical information systems by Gift Musanza
Geographical information systemsGeographical information systems
Geographical information systems
Gift Musanza167 views
Bonne introduction aux SIG by A/salem KEDA
Bonne introduction aux SIGBonne introduction aux SIG
Bonne introduction aux SIG
A/salem KEDA357 views
GPS ppt. by Jawad Ali
GPS ppt. GPS ppt.
GPS ppt.
Jawad Ali155.4K views
Spatial data analysis 1 by Johan Blomme
Spatial data analysis 1Spatial data analysis 1
Spatial data analysis 1
Johan Blomme6.6K views
GIS User to Web-GIS Developer Journey by Tek Kshetri
GIS User to Web-GIS Developer JourneyGIS User to Web-GIS Developer Journey
GIS User to Web-GIS Developer Journey
Tek Kshetri299 views
Data collection and input overview by srinivas2036
Data collection and input overviewData collection and input overview
Data collection and input overview
srinivas20368.2K views
WEB GIS AND WEB MAP.pptx by Asim Pt
WEB GIS AND WEB MAP.pptxWEB GIS AND WEB MAP.pptx
WEB GIS AND WEB MAP.pptx
Asim Pt1.1K views
S05 파크랩 DSLab.1기: 네트워크 분석(Network Analysis) by ByeongHyeokYu
S05 파크랩 DSLab.1기: 네트워크 분석(Network Analysis)S05 파크랩 DSLab.1기: 네트워크 분석(Network Analysis)
S05 파크랩 DSLab.1기: 네트워크 분석(Network Analysis)
ByeongHyeokYu805 views
Spatial vs non spatial by Sumant Diwakar
Spatial vs non spatialSpatial vs non spatial
Spatial vs non spatial
Sumant Diwakar117.6K views
Bases de données Spatiales - POSTGIS by Omar El Kharki
Bases de données Spatiales - POSTGISBases de données Spatiales - POSTGIS
Bases de données Spatiales - POSTGIS
Omar El Kharki2.5K views

Similar to [PositConf 2023] How Data Scientists Broke A/B Testing (and How We Can Fix It)

Tale of Two Tests by
Tale of Two TestsTale of Two Tests
Tale of Two TestsOptimizely
239 views41 slides
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making by
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision MakingData-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Makingindeedeng
2.5K views227 slides
To Estimate or Not to Estimate, Is that the Question? (2017 Better Software C... by
To Estimate or Not to Estimate, Is that the Question? (2017 Better Software C...To Estimate or Not to Estimate, Is that the Question? (2017 Better Software C...
To Estimate or Not to Estimate, Is that the Question? (2017 Better Software C...Matthew Philip
574 views50 slides
The Myths of Big Data by
The Myths of Big DataThe Myths of Big Data
The Myths of Big DataProphet
12.7K views29 slides
Iwsm2014 why cant people estimate (dan galorath) by
Iwsm2014   why cant people estimate (dan galorath)Iwsm2014   why cant people estimate (dan galorath)
Iwsm2014 why cant people estimate (dan galorath)Nesma
973 views40 slides
Building a culture of testing like lucid by
Building a culture of testing like lucidBuilding a culture of testing like lucid
Building a culture of testing like lucidKissmetrics on SlideShare
497 views22 slides

Similar to [PositConf 2023] How Data Scientists Broke A/B Testing (and How We Can Fix It)(20)

Tale of Two Tests by Optimizely
Tale of Two TestsTale of Two Tests
Tale of Two Tests
Optimizely239 views
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making by indeedeng
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision MakingData-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
Data-Driven off a Cliff: Anti-Patterns in Evidence-Based Decision Making
indeedeng2.5K views
To Estimate or Not to Estimate, Is that the Question? (2017 Better Software C... by Matthew Philip
To Estimate or Not to Estimate, Is that the Question? (2017 Better Software C...To Estimate or Not to Estimate, Is that the Question? (2017 Better Software C...
To Estimate or Not to Estimate, Is that the Question? (2017 Better Software C...
Matthew Philip574 views
The Myths of Big Data by Prophet
The Myths of Big DataThe Myths of Big Data
The Myths of Big Data
Prophet12.7K views
Iwsm2014 why cant people estimate (dan galorath) by Nesma
Iwsm2014   why cant people estimate (dan galorath)Iwsm2014   why cant people estimate (dan galorath)
Iwsm2014 why cant people estimate (dan galorath)
Nesma973 views
Actionable Machine Learning by Meir Maor
Actionable Machine LearningActionable Machine Learning
Actionable Machine Learning
Meir Maor391 views
Todd little - Risky Business | Real Options for Business Agility by Kanban Conferences
Todd little -  Risky Business | Real Options for Business AgilityTodd little -  Risky Business | Real Options for Business Agility
Todd little - Risky Business | Real Options for Business Agility
Kanban Conferences248 views
Portfolio Management Using Questionable Quality Data by Portfolio Decisions
Portfolio Management Using Questionable Quality DataPortfolio Management Using Questionable Quality Data
Portfolio Management Using Questionable Quality Data
GDG Cloud Southlake #5 Eric Harvieux: Site Reliability Engineering (SRE) in P... by James Anderson
GDG Cloud Southlake #5 Eric Harvieux: Site Reliability Engineering (SRE) in P...GDG Cloud Southlake #5 Eric Harvieux: Site Reliability Engineering (SRE) in P...
GDG Cloud Southlake #5 Eric Harvieux: Site Reliability Engineering (SRE) in P...
James Anderson198 views
mtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdf by Jens-Fabian Goetzmann
mtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdfmtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdf
mtpcon London+EMEA 2022 – Why Product Managers should not be data-driven.pdf
Managing Data Science by David Martínez Rego by Big Data Spain
Managing Data Science by David Martínez RegoManaging Data Science by David Martínez Rego
Managing Data Science by David Martínez Rego
Big Data Spain556 views
How to use data to make a hit tv show by Parul Verma
How to use data to make a hit tv showHow to use data to make a hit tv show
How to use data to make a hit tv show
Parul Verma67 views
Software estimation is crap by Ian Garrison
Software estimation is crapSoftware estimation is crap
Software estimation is crap
Ian Garrison66 views
Is data visualisation bullshit? by Alban Gérôme
Is data visualisation bullshit?Is data visualisation bullshit?
Is data visualisation bullshit?
Alban Gérôme637 views
CommonAnalyticMistakes_v1.17_Unbranded by Jim Parnitzke
CommonAnalyticMistakes_v1.17_UnbrandedCommonAnalyticMistakes_v1.17_Unbranded
CommonAnalyticMistakes_v1.17_Unbranded
Jim Parnitzke190 views
Is Bigger Data Really Better? 10 Facts from Theory and Practice by DataWorks Summit
Is Bigger Data Really Better? 10 Facts from Theory and PracticeIs Bigger Data Really Better? 10 Facts from Theory and Practice
Is Bigger Data Really Better? 10 Facts from Theory and Practice
DataWorks Summit720 views
Corporate Climb Presentation by Kirill Storch
Corporate Climb PresentationCorporate Climb Presentation
Corporate Climb Presentation
Kirill Storch332 views

Recently uploaded

[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...DataScienceConferenc1
8 views36 slides
Data about the sector workshop by
Data about the sector workshopData about the sector workshop
Data about the sector workshopinfo828217
15 views27 slides
[DSC Europe 23] Ivana Sesic - Use of AI in Public Health.pptx by
[DSC Europe 23] Ivana Sesic - Use of AI in Public Health.pptx[DSC Europe 23] Ivana Sesic - Use of AI in Public Health.pptx
[DSC Europe 23] Ivana Sesic - Use of AI in Public Health.pptxDataScienceConferenc1
5 views15 slides
apple.pptx by
apple.pptxapple.pptx
apple.pptxhoneybeeqwe
5 views15 slides
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ... by
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...DataScienceConferenc1
5 views19 slides
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ... by
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...DataScienceConferenc1
6 views15 slides

Recently uploaded(20)

[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by DataScienceConferenc1
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
Data about the sector workshop by info828217
Data about the sector workshopData about the sector workshop
Data about the sector workshop
info82821715 views
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ... by DataScienceConferenc1
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
[DSC Europe 23] Danijela Horak - The Innovator’s Dilemma: to Build or Not to ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ... by DataScienceConferenc1
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23][AI:CSI] Aleksa Stojanovic - Applying AI for Threat Detection ...
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx by DataScienceConferenc1
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
[DSC Europe 23] Stefan Mrsic_Goran Savic - Evolving Technology Excellence.pptx
UNEP FI CRS Climate Risk Results.pptx by pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 views
LIVE OAK MEMORIAL PARK.pptx by ms2332always
LIVE OAK MEMORIAL PARK.pptxLIVE OAK MEMORIAL PARK.pptx
LIVE OAK MEMORIAL PARK.pptx
ms2332always7 views
Chapter 3b- Process Communication (1) (1)(1) (1).pptx by ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20047 views
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... by DataScienceConferenc1
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
CRM stick or twist workshop by info828217
CRM stick or twist workshopCRM stick or twist workshop
CRM stick or twist workshop
info82821711 views
SUPER STORE SQL PROJECT.pptx by khan888620
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptx
khan88862013 views
CRIJ4385_Death Penalty_F23.pptx by yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1007 views
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation by DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
Data Journeys Hard Talk workshop final.pptx by info828217
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptx
info82821710 views
Cross-network in Google Analytics 4.pdf by GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views

[PositConf 2023] How Data Scientists Broke A/B Testing (and How We Can Fix It)