SlideShare a Scribd company logo
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
AB Tests
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
What’s an AB Test
Version “A” Version “B”
50% 50%
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
SUMMARY
3
Best Practices
and
State of the Art
Custom Engine Lessons Learned
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Best Practices
Best Practices and State of the Art
4
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Example
Take first letter of your name
A-M : variation A
N-Z : variation B
Rule
- Equally likely to see each variant (if 50/50): no bias towards
variation
Distributing Users in Variations
Anna
Variation A
Zoey
Variation B
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Example
Display “random” variation when a user
comes on the website
Rule
- Repeat assignments must be consistent
Distributing Users in Variations
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Example
Modulo 2 on memberId
Rule
- No correlation between experiments
Distributing Users in Variations
Variation A Variation B
Test 1
Member 1 Member 2
Test 2
Member 1 Member 2
………..
Member 1 Member 2
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Optional
- Support monotonic ramp-up
Increase users in an experiment without changing previous users’
assignments
- Support external control
Some users can be manually forced into or out of variations
Distributing Users in Variations
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
KPI
KPI must be defined before the beginning of the AB test
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Duration / Sample Size
- Too short:
Will not be significant, can lead to wrong conclusions
- Too long:
Potential loss of revenue
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Duration - t-test
- OA
and OB
are the estimated KPI values (e.g., averages)
- σd
is the estimated standard deviation of the difference
between the two KPIs
- t is the test result
- Threshold established based on confidence level
e.g., 1.96 for large samples and 95% confidence level
- If |t| > threshold, we conclude that the variation’s KPI is
statistically significantly different than the Control’s KPI.
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Duration - sample size
- n is the number of users in each variant and the variants are
assumed to be of equal size
- σ2
is the variance of the KPI
- Δ is the sensitivity, or the amount of change you want to
detect
- 16 is for 80% power, 21 for 90%.
If there is a difference in the KPIs, the power is the probability of determining
that the difference is statistically significant
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Duration - sample size
Alternative formula: r is the number of variations
(approx same size)
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Multi Variable Tests
+ You can test many factors in a short period of time, accelerating
improvement
+ You can estimate interactions between factors
- Some combinations of factors may give a poor user experience
- Analysis and interpretation are more difficult
- It can take longer to begin the test
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Multi Variable Tests
- Simple MVT
- fractional or full factorial
- MVT by running concurrent tests
- can estimate every interaction
- can turn off each factor independently as needed
- Overlapping experiments
- launch each test when the relevant is ready: more agile
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Parallel tests
Concurrent tests
- Booking.com: ~1000 tests in parallel
- Airbnb: ~500 tests in parallel
- Linkedin: 400+
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
References
- Theory about AB tests (Microsoft)
https://ai.stanford.edu/~ronnyk/2009controlledExperimentsOnTheWebSurvey.pdf
- Booking.com
https://taplytics.com/blog/how-booking-comss-tests-like-nobodys-business/
- Airbnb
https://medium.com/airbnb-engineering/experiments-at-airbnb-e2db3abf39e7
https://medium.com/airbnb-engineering/https-medium-com-jonathan-parks-scaling-erf-2
3fd17c91166
- Netflix
https://medium.com/netflix-techblog/its-all-a-bout-testing-the-netflix-experimentation-pla
tform-4e1ca458c15
- Linkedin
https://content.linkedin.com/content/dam/engineering/site-assets/pdfs/ABTestingSocialNet
work_share.pdf
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Custom
Engine
18
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Specific needs
- Cross platform
- Filters (OS, siteId, etc)
- “Technical” tests: progressive deployment
- Variation traffic vs experiment traffic
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
memberId + experimentId ⇒ hash
→ Fixed size number
+ Modulo 100 on that hash
→ Number between 0 and 99
+ Some “forced users” for
tests
✓ External control
✓ No bias towards variations
✓ Consistency
✓ No correlations between
experiments
✓ Monotonic ramp-up
✓ No storage needed
Distributing Users in Variations - Our Method
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/201921
“The difference between theory
and practice is larger in practice
than the difference between
theory and practice in theory”
Lessons
Learned
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
AB Testing is only one step
The code you AB Test is still code going in production.
Don’t forget to apply the same quality and quantity of
testing you would for any other code going to production.
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
KPI
KPI must be defined before the beginning of the AB test
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
AB Test duration
Real usecases mean that the n you’re computing is not
necessary users.
24
- Unique visitors
- Baskets
- Orders
- Product views
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Assignations: Storing vs Computing
- Storing assignations means big tables for all the members
assignations, and time to retrieve them
- Computing them solves that at the time you want the
assignation on your website
- ….. But what about afterwards?
➢ You probably need the assignations stored somewhere too for
posterior analysis
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Theoretical Practices vs Real Usecases
- Different repartition of users according to criteria
(country, device, etc)
- Correlated experiments
(same assignations for different tests)
- Proportion of variations changing during the test
26
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Going Further
27
Optimize assignations
If one variation is super efficient
compared to the other, you might not
want to keep assigning 50/50
Multi Armed Bandits
https://towardsdatascience.com/beyond-a-b-testi
ng-multi-armed-bandit-experiments-1493f709f80
4
VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Thank you

More Related Content

Similar to AB Tests - Theory and Rex

Purchasing & Value Analysis
Purchasing & Value AnalysisPurchasing & Value Analysis
Purchasing & Value Analysis
Thomas Tanel
 
The Pothole of Automating Too Much
The Pothole of Automating Too MuchThe Pothole of Automating Too Much
The Pothole of Automating Too Much
TechWell
 
Advanced System Engineering in the Automotive Industry - Dr Alain Pfouga (pro...
Advanced System Engineering in the Automotive Industry - Dr Alain Pfouga (pro...Advanced System Engineering in the Automotive Industry - Dr Alain Pfouga (pro...
Advanced System Engineering in the Automotive Industry - Dr Alain Pfouga (pro...
Intland Software GmbH
 
Business Analytics and Optimization Introduction (part 2)
Business Analytics and Optimization Introduction (part 2)Business Analytics and Optimization Introduction (part 2)
Business Analytics and Optimization Introduction (part 2)
Raul Chong
 
EMC2 - Enhancing the Total Customer Experience Through Data Visualizations
EMC2 - Enhancing the Total Customer Experience Through Data VisualizationsEMC2 - Enhancing the Total Customer Experience Through Data Visualizations
EMC2 - Enhancing the Total Customer Experience Through Data Visualizations
Socedo
 
mba570.marapr09.class2
mba570.marapr09.class2mba570.marapr09.class2
mba570.marapr09.class2
Lawrence
 
September_08 SQuAd Presentation
September_08 SQuAd PresentationSeptember_08 SQuAd Presentation
September_08 SQuAd Presentation
iradari
 
A B testing introduction.pptx
A B testing introduction.pptxA B testing introduction.pptx
A B testing introduction.pptx
Ahmed Khaled
 
Analyzing User Traffic & Expert’s Behavior on Teachable
Analyzing User Traffic & Expert’s Behavior on TeachableAnalyzing User Traffic & Expert’s Behavior on Teachable
Analyzing User Traffic & Expert’s Behavior on Teachable
SagarKumar0812
 
[Webinar] The power of experimentation in direct-to-consumer eCommerce
[Webinar] The power of experimentation in direct-to-consumer eCommerce[Webinar] The power of experimentation in direct-to-consumer eCommerce
[Webinar] The power of experimentation in direct-to-consumer eCommerce
Chris Goward
 
Test Metrics in Agile: A Powerful Tool to Demonstrate Value
Test Metrics in Agile: A Powerful Tool to Demonstrate ValueTest Metrics in Agile: A Powerful Tool to Demonstrate Value
Test Metrics in Agile: A Powerful Tool to Demonstrate Value
TechWell
 
Performance Testing
Performance Testing Performance Testing
Performance Testing
GeetikaVerma16
 
Neural Net: Machine Learning Web Application
Neural Net: Machine Learning Web ApplicationNeural Net: Machine Learning Web Application
Neural Net: Machine Learning Web Application
IRJET Journal
 
Accelerate Your Sap Testing with Bqurious
Accelerate Your Sap Testing with BquriousAccelerate Your Sap Testing with Bqurious
Accelerate Your Sap Testing with Bqurious
yadavSusheel
 
DO for WS - PA external v1
DO for WS - PA external v1DO for WS - PA external v1
DO for WS - PA external v1
Alain Chabrier
 
Estimating the Cost for Executing Business Processes in the Cloud
Estimating the Cost for Executing Business Processes in the CloudEstimating the Cost for Executing Business Processes in the Cloud
Estimating the Cost for Executing Business Processes in the Cloud
Vincenzo Ferme
 
How to Create Chargebacks & Utiltize Submeters On A Campus
How to Create Chargebacks & Utiltize Submeters On A CampusHow to Create Chargebacks & Utiltize Submeters On A Campus
How to Create Chargebacks & Utiltize Submeters On A Campus
EnergyCAP, Inc.
 
Power bi and azure ml
Power bi and azure mlPower bi and azure ml
Power bi and azure ml
Berkovich Consulting
 
Whitepaper: How to perform better test on SAP PI/PO
Whitepaper: How to perform better test on SAP PI/POWhitepaper: How to perform better test on SAP PI/PO
Whitepaper: How to perform better test on SAP PI/PO
Daniel Graversen
 
2020 10-08 measuring-qualityinproduction
2020 10-08 measuring-qualityinproduction2020 10-08 measuring-qualityinproduction
2020 10-08 measuring-qualityinproduction
Abigail Bangser
 

Similar to AB Tests - Theory and Rex (20)

Purchasing & Value Analysis
Purchasing & Value AnalysisPurchasing & Value Analysis
Purchasing & Value Analysis
 
The Pothole of Automating Too Much
The Pothole of Automating Too MuchThe Pothole of Automating Too Much
The Pothole of Automating Too Much
 
Advanced System Engineering in the Automotive Industry - Dr Alain Pfouga (pro...
Advanced System Engineering in the Automotive Industry - Dr Alain Pfouga (pro...Advanced System Engineering in the Automotive Industry - Dr Alain Pfouga (pro...
Advanced System Engineering in the Automotive Industry - Dr Alain Pfouga (pro...
 
Business Analytics and Optimization Introduction (part 2)
Business Analytics and Optimization Introduction (part 2)Business Analytics and Optimization Introduction (part 2)
Business Analytics and Optimization Introduction (part 2)
 
EMC2 - Enhancing the Total Customer Experience Through Data Visualizations
EMC2 - Enhancing the Total Customer Experience Through Data VisualizationsEMC2 - Enhancing the Total Customer Experience Through Data Visualizations
EMC2 - Enhancing the Total Customer Experience Through Data Visualizations
 
mba570.marapr09.class2
mba570.marapr09.class2mba570.marapr09.class2
mba570.marapr09.class2
 
September_08 SQuAd Presentation
September_08 SQuAd PresentationSeptember_08 SQuAd Presentation
September_08 SQuAd Presentation
 
A B testing introduction.pptx
A B testing introduction.pptxA B testing introduction.pptx
A B testing introduction.pptx
 
Analyzing User Traffic & Expert’s Behavior on Teachable
Analyzing User Traffic & Expert’s Behavior on TeachableAnalyzing User Traffic & Expert’s Behavior on Teachable
Analyzing User Traffic & Expert’s Behavior on Teachable
 
[Webinar] The power of experimentation in direct-to-consumer eCommerce
[Webinar] The power of experimentation in direct-to-consumer eCommerce[Webinar] The power of experimentation in direct-to-consumer eCommerce
[Webinar] The power of experimentation in direct-to-consumer eCommerce
 
Test Metrics in Agile: A Powerful Tool to Demonstrate Value
Test Metrics in Agile: A Powerful Tool to Demonstrate ValueTest Metrics in Agile: A Powerful Tool to Demonstrate Value
Test Metrics in Agile: A Powerful Tool to Demonstrate Value
 
Performance Testing
Performance Testing Performance Testing
Performance Testing
 
Neural Net: Machine Learning Web Application
Neural Net: Machine Learning Web ApplicationNeural Net: Machine Learning Web Application
Neural Net: Machine Learning Web Application
 
Accelerate Your Sap Testing with Bqurious
Accelerate Your Sap Testing with BquriousAccelerate Your Sap Testing with Bqurious
Accelerate Your Sap Testing with Bqurious
 
DO for WS - PA external v1
DO for WS - PA external v1DO for WS - PA external v1
DO for WS - PA external v1
 
Estimating the Cost for Executing Business Processes in the Cloud
Estimating the Cost for Executing Business Processes in the CloudEstimating the Cost for Executing Business Processes in the Cloud
Estimating the Cost for Executing Business Processes in the Cloud
 
How to Create Chargebacks & Utiltize Submeters On A Campus
How to Create Chargebacks & Utiltize Submeters On A CampusHow to Create Chargebacks & Utiltize Submeters On A Campus
How to Create Chargebacks & Utiltize Submeters On A Campus
 
Power bi and azure ml
Power bi and azure mlPower bi and azure ml
Power bi and azure ml
 
Whitepaper: How to perform better test on SAP PI/PO
Whitepaper: How to perform better test on SAP PI/POWhitepaper: How to perform better test on SAP PI/PO
Whitepaper: How to perform better test on SAP PI/PO
 
2020 10-08 measuring-qualityinproduction
2020 10-08 measuring-qualityinproduction2020 10-08 measuring-qualityinproduction
2020 10-08 measuring-qualityinproduction
 

Recently uploaded

Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
lorraineandreiamcidl
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Envertis Software Solutions
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
Quickdice ERP
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
Yara Milbes
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
Rakesh Kumar R
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 

Recently uploaded (20)

Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 

AB Tests - Theory and Rex

  • 1. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 AB Tests
  • 2. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 What’s an AB Test Version “A” Version “B” 50% 50%
  • 3. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 SUMMARY 3 Best Practices and State of the Art Custom Engine Lessons Learned
  • 4. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Best Practices Best Practices and State of the Art 4
  • 5. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Example Take first letter of your name A-M : variation A N-Z : variation B Rule - Equally likely to see each variant (if 50/50): no bias towards variation Distributing Users in Variations Anna Variation A Zoey Variation B
  • 6. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Example Display “random” variation when a user comes on the website Rule - Repeat assignments must be consistent Distributing Users in Variations
  • 7. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Example Modulo 2 on memberId Rule - No correlation between experiments Distributing Users in Variations Variation A Variation B Test 1 Member 1 Member 2 Test 2 Member 1 Member 2 ……….. Member 1 Member 2
  • 8. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Optional - Support monotonic ramp-up Increase users in an experiment without changing previous users’ assignments - Support external control Some users can be manually forced into or out of variations Distributing Users in Variations
  • 9. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 KPI KPI must be defined before the beginning of the AB test
  • 10. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Duration / Sample Size - Too short: Will not be significant, can lead to wrong conclusions - Too long: Potential loss of revenue
  • 11. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Duration - t-test - OA and OB are the estimated KPI values (e.g., averages) - σd is the estimated standard deviation of the difference between the two KPIs - t is the test result - Threshold established based on confidence level e.g., 1.96 for large samples and 95% confidence level - If |t| > threshold, we conclude that the variation’s KPI is statistically significantly different than the Control’s KPI.
  • 12. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Duration - sample size - n is the number of users in each variant and the variants are assumed to be of equal size - σ2 is the variance of the KPI - Δ is the sensitivity, or the amount of change you want to detect - 16 is for 80% power, 21 for 90%. If there is a difference in the KPIs, the power is the probability of determining that the difference is statistically significant
  • 13. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Duration - sample size Alternative formula: r is the number of variations (approx same size)
  • 14. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Multi Variable Tests + You can test many factors in a short period of time, accelerating improvement + You can estimate interactions between factors - Some combinations of factors may give a poor user experience - Analysis and interpretation are more difficult - It can take longer to begin the test
  • 15. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Multi Variable Tests - Simple MVT - fractional or full factorial - MVT by running concurrent tests - can estimate every interaction - can turn off each factor independently as needed - Overlapping experiments - launch each test when the relevant is ready: more agile
  • 16. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Parallel tests Concurrent tests - Booking.com: ~1000 tests in parallel - Airbnb: ~500 tests in parallel - Linkedin: 400+
  • 17. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 References - Theory about AB tests (Microsoft) https://ai.stanford.edu/~ronnyk/2009controlledExperimentsOnTheWebSurvey.pdf - Booking.com https://taplytics.com/blog/how-booking-comss-tests-like-nobodys-business/ - Airbnb https://medium.com/airbnb-engineering/experiments-at-airbnb-e2db3abf39e7 https://medium.com/airbnb-engineering/https-medium-com-jonathan-parks-scaling-erf-2 3fd17c91166 - Netflix https://medium.com/netflix-techblog/its-all-a-bout-testing-the-netflix-experimentation-pla tform-4e1ca458c15 - Linkedin https://content.linkedin.com/content/dam/engineering/site-assets/pdfs/ABTestingSocialNet work_share.pdf
  • 18. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Custom Engine 18
  • 19. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Specific needs - Cross platform - Filters (OS, siteId, etc) - “Technical” tests: progressive deployment - Variation traffic vs experiment traffic
  • 20. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 memberId + experimentId ⇒ hash → Fixed size number + Modulo 100 on that hash → Number between 0 and 99 + Some “forced users” for tests ✓ External control ✓ No bias towards variations ✓ Consistency ✓ No correlations between experiments ✓ Monotonic ramp-up ✓ No storage needed Distributing Users in Variations - Our Method
  • 21. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/201921 “The difference between theory and practice is larger in practice than the difference between theory and practice in theory” Lessons Learned
  • 22. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 AB Testing is only one step The code you AB Test is still code going in production. Don’t forget to apply the same quality and quantity of testing you would for any other code going to production.
  • 23. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 KPI KPI must be defined before the beginning of the AB test
  • 24. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 AB Test duration Real usecases mean that the n you’re computing is not necessary users. 24 - Unique visitors - Baskets - Orders - Product views
  • 25. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Assignations: Storing vs Computing - Storing assignations means big tables for all the members assignations, and time to retrieve them - Computing them solves that at the time you want the assignation on your website - ….. But what about afterwards? ➢ You probably need the assignations stored somewhere too for posterior analysis
  • 26. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Theoretical Practices vs Real Usecases - Different repartition of users according to criteria (country, device, etc) - Correlated experiments (same assignations for different tests) - Proportion of variations changing during the test 26
  • 27. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Going Further 27 Optimize assignations If one variation is super efficient compared to the other, you might not want to keep assigning 50/50 Multi Armed Bandits https://towardsdatascience.com/beyond-a-b-testi ng-multi-armed-bandit-experiments-1493f709f80 4
  • 28. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019 Thank you