SlideShare a Scribd company logo
1 of 28
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
AB Tests
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
What’s an AB Test
Version “A” Version “B”
50% 50%
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
SUMMARY
3
Best Practices
and
State of the Art
Custom Engine Lessons Learned
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Best Practices
Best Practices and State of the Art
4
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Example
Take first letter of your name
A-M : variation A
N-Z : variation B
Rule
- Equally likely to see each variant (if 50/50): no bias towards
variation
Distributing Users in Variations
Anna
Variation A
Zoey
Variation B
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Example
Display “random” variation when a user
comes on the website
Rule
- Repeat assignments must be consistent
Distributing Users in Variations
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Example
Modulo 2 on memberId
Rule
- No correlation between experiments
Distributing Users in Variations
Variation A Variation B
Test 1
Member 1 Member 2
Test 2
Member 1 Member 2
………..
Member 1 Member 2
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Optional
- Support monotonic ramp-up
Increase users in an experiment without changing previous users’
assignments
- Support external control
Some users can be manually forced into or out of variations
Distributing Users in Variations
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
KPI
KPI must be defined before the beginning of the AB test
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Duration / Sample Size
- Too short:
Will not be significant, can lead to wrong conclusions
- Too long:
Potential loss of revenue
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Duration - t-test
- OA
and OB
are the estimated KPI values (e.g., averages)
- σd
is the estimated standard deviation of the difference
between the two KPIs
- t is the test result
- Threshold established based on confidence level
e.g., 1.96 for large samples and 95% confidence level
- If |t| > threshold, we conclude that the variation’s KPI is
statistically significantly different than the Control’s KPI.
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Duration - sample size
- n is the number of users in each variant and the variants are
assumed to be of equal size
- σ2
is the variance of the KPI
- Δ is the sensitivity, or the amount of change you want to
detect
- 16 is for 80% power, 21 for 90%.
If there is a difference in the KPIs, the power is the probability of determining
that the difference is statistically significant
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Duration - sample size
Alternative formula: r is the number of variations
(approx same size)
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Multi Variable Tests
+ You can test many factors in a short period of time, accelerating
improvement
+ You can estimate interactions between factors
- Some combinations of factors may give a poor user experience
- Analysis and interpretation are more difficult
- It can take longer to begin the test
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Multi Variable Tests
- Simple MVT
- fractional or full factorial
- MVT by running concurrent tests
- can estimate every interaction
- can turn off each factor independently as needed
- Overlapping experiments
- launch each test when the relevant is ready: more agile
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Parallel tests
Concurrent tests
- Booking.com: ~1000 tests in parallel
- Airbnb: ~500 tests in parallel
- Linkedin: 400+
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
References
- Theory about AB tests (Microsoft)
https://ai.stanford.edu/~ronnyk/2009controlledExperimentsOnTheWebSurvey.pdf
- Booking.com
https://taplytics.com/blog/how-booking-comss-tests-like-nobodys-business/
- Airbnb
https://medium.com/airbnb-engineering/experiments-at-airbnb-e2db3abf39e7
https://medium.com/airbnb-engineering/https-medium-com-jonathan-parks-scaling-erf-2
3fd17c91166
- Netflix
https://medium.com/netflix-techblog/its-all-a-bout-testing-the-netflix-experimentation-pla
tform-4e1ca458c15
- Linkedin
https://content.linkedin.com/content/dam/engineering/site-assets/pdfs/ABTestingSocialNet
work_share.pdf
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Custom
Engine
18
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Specific needs
- Cross platform
- Filters (OS, siteId, etc)
- “Technical” tests: progressive deployment
- Variation traffic vs experiment traffic
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
memberId + experimentId ⇒ hash
→ Fixed size number
+ Modulo 100 on that hash
→ Number between 0 and 99
+ Some “forced users” for
tests
✓ External control
✓ No bias towards variations
✓ Consistency
✓ No correlations between
experiments
✓ Monotonic ramp-up
✓ No storage needed
Distributing Users in Variations - Our Method
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/201921
“The difference between theory
and practice is larger in practice
than the difference between
theory and practice in theory”
Lessons
Learned
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
AB Testing is only one step
The code you AB Test is still code going in production.
Don’t forget to apply the same quality and quantity of
testing you would for any other code going to production.
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
KPI
KPI must be defined before the beginning of the AB test
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
AB Test duration
Real usecases mean that the n you’re computing is not
necessary users.
24
- Unique visitors
- Baskets
- Orders
- Product views
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Assignations: Storing vs Computing
- Storing assignations means big tables for all the members
assignations, and time to retrieve them
- Computing them solves that at the time you want the
assignation on your website
- ….. But what about afterwards?
➢ You probably need the assignations stored somewhere too for
posterior analysis
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Theoretical Practices vs Real Usecases
- Different repartition of users according to criteria
(country, device, etc)
- Correlated experiments
(same assignations for different tests)
- Proportion of variations changing during the test
26
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Going Further
27
Optimize assignations
If one variation is super efficient
compared to the other, you might not
want to keep assigning 50/50
Multi Armed Bandits
https://towardsdatascience.com/beyond-a-b-testi
ng-multi-armed-bandit-experiments-1493f709f80
4
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Thank you

More Related Content

Similar to AB Tests - Theory and Rex

The Pothole of Automating Too Much
The Pothole of Automating Too MuchThe Pothole of Automating Too Much
The Pothole of Automating Too MuchTechWell
 
Advanced System Engineering in the Automotive Industry - Dr Alain Pfouga (pro...
Advanced System Engineering in the Automotive Industry - Dr Alain Pfouga (pro...Advanced System Engineering in the Automotive Industry - Dr Alain Pfouga (pro...
Advanced System Engineering in the Automotive Industry - Dr Alain Pfouga (pro...Intland Software GmbH
 
Business Analytics and Optimization Introduction (part 2)
Business Analytics and Optimization Introduction (part 2)Business Analytics and Optimization Introduction (part 2)
Business Analytics and Optimization Introduction (part 2)Raul Chong
 
EMC2 - Enhancing the Total Customer Experience Through Data Visualizations
EMC2 - Enhancing the Total Customer Experience Through Data VisualizationsEMC2 - Enhancing the Total Customer Experience Through Data Visualizations
EMC2 - Enhancing the Total Customer Experience Through Data VisualizationsSocedo
 
mba570.marapr09.class2
mba570.marapr09.class2mba570.marapr09.class2
mba570.marapr09.class2Lawrence
 
September_08 SQuAd Presentation
September_08 SQuAd PresentationSeptember_08 SQuAd Presentation
September_08 SQuAd Presentationiradari
 
A B testing introduction.pptx
A B testing introduction.pptxA B testing introduction.pptx
A B testing introduction.pptxAhmed Khaled
 
Analyzing User Traffic & Expert’s Behavior on Teachable
Analyzing User Traffic & Expert’s Behavior on TeachableAnalyzing User Traffic & Expert’s Behavior on Teachable
Analyzing User Traffic & Expert’s Behavior on TeachableSagarKumar0812
 
[Webinar] The power of experimentation in direct-to-consumer eCommerce
[Webinar] The power of experimentation in direct-to-consumer eCommerce[Webinar] The power of experimentation in direct-to-consumer eCommerce
[Webinar] The power of experimentation in direct-to-consumer eCommerceChris Goward
 
Test Metrics in Agile: A Powerful Tool to Demonstrate Value
Test Metrics in Agile: A Powerful Tool to Demonstrate ValueTest Metrics in Agile: A Powerful Tool to Demonstrate Value
Test Metrics in Agile: A Powerful Tool to Demonstrate ValueTechWell
 
Neural Net: Machine Learning Web Application
Neural Net: Machine Learning Web ApplicationNeural Net: Machine Learning Web Application
Neural Net: Machine Learning Web ApplicationIRJET Journal
 
Accelerate Your Sap Testing with Bqurious
Accelerate Your Sap Testing with BquriousAccelerate Your Sap Testing with Bqurious
Accelerate Your Sap Testing with BquriousyadavSusheel
 
DO for WS - PA external v1
DO for WS - PA external v1DO for WS - PA external v1
DO for WS - PA external v1Alain Chabrier
 
Estimating the Cost for Executing Business Processes in the Cloud
Estimating the Cost for Executing Business Processes in the CloudEstimating the Cost for Executing Business Processes in the Cloud
Estimating the Cost for Executing Business Processes in the CloudVincenzo Ferme
 
How to Create Chargebacks & Utiltize Submeters On A Campus
How to Create Chargebacks & Utiltize Submeters On A CampusHow to Create Chargebacks & Utiltize Submeters On A Campus
How to Create Chargebacks & Utiltize Submeters On A CampusEnergyCAP, Inc.
 
Whitepaper: How to perform better test on SAP PI/PO
Whitepaper: How to perform better test on SAP PI/POWhitepaper: How to perform better test on SAP PI/PO
Whitepaper: How to perform better test on SAP PI/PODaniel Graversen
 
2020 10-08 measuring-qualityinproduction
2020 10-08 measuring-qualityinproduction2020 10-08 measuring-qualityinproduction
2020 10-08 measuring-qualityinproductionAbigail Bangser
 

Similar to AB Tests - Theory and Rex (20)

The Pothole of Automating Too Much
The Pothole of Automating Too MuchThe Pothole of Automating Too Much
The Pothole of Automating Too Much
 
Advanced System Engineering in the Automotive Industry - Dr Alain Pfouga (pro...
Advanced System Engineering in the Automotive Industry - Dr Alain Pfouga (pro...Advanced System Engineering in the Automotive Industry - Dr Alain Pfouga (pro...
Advanced System Engineering in the Automotive Industry - Dr Alain Pfouga (pro...
 
Business Analytics and Optimization Introduction (part 2)
Business Analytics and Optimization Introduction (part 2)Business Analytics and Optimization Introduction (part 2)
Business Analytics and Optimization Introduction (part 2)
 
EMC2 - Enhancing the Total Customer Experience Through Data Visualizations
EMC2 - Enhancing the Total Customer Experience Through Data VisualizationsEMC2 - Enhancing the Total Customer Experience Through Data Visualizations
EMC2 - Enhancing the Total Customer Experience Through Data Visualizations
 
mba570.marapr09.class2
mba570.marapr09.class2mba570.marapr09.class2
mba570.marapr09.class2
 
September_08 SQuAd Presentation
September_08 SQuAd PresentationSeptember_08 SQuAd Presentation
September_08 SQuAd Presentation
 
A B testing introduction.pptx
A B testing introduction.pptxA B testing introduction.pptx
A B testing introduction.pptx
 
Analyzing User Traffic & Expert’s Behavior on Teachable
Analyzing User Traffic & Expert’s Behavior on TeachableAnalyzing User Traffic & Expert’s Behavior on Teachable
Analyzing User Traffic & Expert’s Behavior on Teachable
 
[Webinar] The power of experimentation in direct-to-consumer eCommerce
[Webinar] The power of experimentation in direct-to-consumer eCommerce[Webinar] The power of experimentation in direct-to-consumer eCommerce
[Webinar] The power of experimentation in direct-to-consumer eCommerce
 
Test Metrics in Agile: A Powerful Tool to Demonstrate Value
Test Metrics in Agile: A Powerful Tool to Demonstrate ValueTest Metrics in Agile: A Powerful Tool to Demonstrate Value
Test Metrics in Agile: A Powerful Tool to Demonstrate Value
 
Performance Testing
Performance Testing Performance Testing
Performance Testing
 
Neural Net: Machine Learning Web Application
Neural Net: Machine Learning Web ApplicationNeural Net: Machine Learning Web Application
Neural Net: Machine Learning Web Application
 
Accelerate Your Sap Testing with Bqurious
Accelerate Your Sap Testing with BquriousAccelerate Your Sap Testing with Bqurious
Accelerate Your Sap Testing with Bqurious
 
DO for WS - PA external v1
DO for WS - PA external v1DO for WS - PA external v1
DO for WS - PA external v1
 
Estimating the Cost for Executing Business Processes in the Cloud
Estimating the Cost for Executing Business Processes in the CloudEstimating the Cost for Executing Business Processes in the Cloud
Estimating the Cost for Executing Business Processes in the Cloud
 
How to Create Chargebacks & Utiltize Submeters On A Campus
How to Create Chargebacks & Utiltize Submeters On A CampusHow to Create Chargebacks & Utiltize Submeters On A Campus
How to Create Chargebacks & Utiltize Submeters On A Campus
 
Power bi and azure ml
Power bi and azure mlPower bi and azure ml
Power bi and azure ml
 
Whitepaper: How to perform better test on SAP PI/PO
Whitepaper: How to perform better test on SAP PI/POWhitepaper: How to perform better test on SAP PI/PO
Whitepaper: How to perform better test on SAP PI/PO
 
2020 10-08 measuring-qualityinproduction
2020 10-08 measuring-qualityinproduction2020 10-08 measuring-qualityinproduction
2020 10-08 measuring-qualityinproduction
 
CNMES15 - Estimation con COSMIC - Alain Abran
CNMES15 - Estimation con COSMIC - Alain AbranCNMES15 - Estimation con COSMIC - Alain Abran
CNMES15 - Estimation con COSMIC - Alain Abran
 

Recently uploaded

DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 

Recently uploaded (20)

DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 

AB Tests - Theory and Rex

  • 1. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 AB Tests
  • 2. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 What’s an AB Test Version “A” Version “B” 50% 50%
  • 3. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 SUMMARY 3 Best Practices and State of the Art Custom Engine Lessons Learned
  • 4. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Best Practices Best Practices and State of the Art 4
  • 5. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Example Take first letter of your name A-M : variation A N-Z : variation B Rule - Equally likely to see each variant (if 50/50): no bias towards variation Distributing Users in Variations Anna Variation A Zoey Variation B
  • 6. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Example Display “random” variation when a user comes on the website Rule - Repeat assignments must be consistent Distributing Users in Variations
  • 7. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Example Modulo 2 on memberId Rule - No correlation between experiments Distributing Users in Variations Variation A Variation B Test 1 Member 1 Member 2 Test 2 Member 1 Member 2 ……….. Member 1 Member 2
  • 8. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Optional - Support monotonic ramp-up Increase users in an experiment without changing previous users’ assignments - Support external control Some users can be manually forced into or out of variations Distributing Users in Variations
  • 9. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 KPI KPI must be defined before the beginning of the AB test
  • 10. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Duration / Sample Size - Too short: Will not be significant, can lead to wrong conclusions - Too long: Potential loss of revenue
  • 11. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Duration - t-test - OA and OB are the estimated KPI values (e.g., averages) - σd is the estimated standard deviation of the difference between the two KPIs - t is the test result - Threshold established based on confidence level e.g., 1.96 for large samples and 95% confidence level - If |t| > threshold, we conclude that the variation’s KPI is statistically significantly different than the Control’s KPI.
  • 12. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Duration - sample size - n is the number of users in each variant and the variants are assumed to be of equal size - σ2 is the variance of the KPI - Δ is the sensitivity, or the amount of change you want to detect - 16 is for 80% power, 21 for 90%. If there is a difference in the KPIs, the power is the probability of determining that the difference is statistically significant
  • 13. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Duration - sample size Alternative formula: r is the number of variations (approx same size)
  • 14. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Multi Variable Tests + You can test many factors in a short period of time, accelerating improvement + You can estimate interactions between factors - Some combinations of factors may give a poor user experience - Analysis and interpretation are more difficult - It can take longer to begin the test
  • 15. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Multi Variable Tests - Simple MVT - fractional or full factorial - MVT by running concurrent tests - can estimate every interaction - can turn off each factor independently as needed - Overlapping experiments - launch each test when the relevant is ready: more agile
  • 16. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Parallel tests Concurrent tests - Booking.com: ~1000 tests in parallel - Airbnb: ~500 tests in parallel - Linkedin: 400+
  • 17. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 References - Theory about AB tests (Microsoft) https://ai.stanford.edu/~ronnyk/2009controlledExperimentsOnTheWebSurvey.pdf - Booking.com https://taplytics.com/blog/how-booking-comss-tests-like-nobodys-business/ - Airbnb https://medium.com/airbnb-engineering/experiments-at-airbnb-e2db3abf39e7 https://medium.com/airbnb-engineering/https-medium-com-jonathan-parks-scaling-erf-2 3fd17c91166 - Netflix https://medium.com/netflix-techblog/its-all-a-bout-testing-the-netflix-experimentation-pla tform-4e1ca458c15 - Linkedin https://content.linkedin.com/content/dam/engineering/site-assets/pdfs/ABTestingSocialNet work_share.pdf
  • 18. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Custom Engine 18
  • 19. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Specific needs - Cross platform - Filters (OS, siteId, etc) - “Technical” tests: progressive deployment - Variation traffic vs experiment traffic
  • 20. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 memberId + experimentId ⇒ hash → Fixed size number + Modulo 100 on that hash → Number between 0 and 99 + Some “forced users” for tests ✓ External control ✓ No bias towards variations ✓ Consistency ✓ No correlations between experiments ✓ Monotonic ramp-up ✓ No storage needed Distributing Users in Variations - Our Method
  • 21. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/201921 “The difference between theory and practice is larger in practice than the difference between theory and practice in theory” Lessons Learned
  • 22. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 AB Testing is only one step The code you AB Test is still code going in production. Don’t forget to apply the same quality and quantity of testing you would for any other code going to production.
  • 23. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 KPI KPI must be defined before the beginning of the AB test
  • 24. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 AB Test duration Real usecases mean that the n you’re computing is not necessary users. 24 - Unique visitors - Baskets - Orders - Product views
  • 25. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Assignations: Storing vs Computing - Storing assignations means big tables for all the members assignations, and time to retrieve them - Computing them solves that at the time you want the assignation on your website - ….. But what about afterwards? ➢ You probably need the assignations stored somewhere too for posterior analysis
  • 26. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Theoretical Practices vs Real Usecases - Different repartition of users according to criteria (country, device, etc) - Correlated experiments (same assignations for different tests) - Proportion of variations changing during the test 26
  • 27. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Going Further 27 Optimize assignations If one variation is super efficient compared to the other, you might not want to keep assigning 50/50 Multi Armed Bandits https://towardsdatascience.com/beyond-a-b-testi ng-multi-armed-bandit-experiments-1493f709f80 4
  • 28. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Thank you