Like a lot of major websites, at Veepee we use A/B tests to improve the design of the site. A few years ago we decided to switch to a custom, homemade, A/B test engine. On this journey we learned a lot of valuable lessons about A/B testing, and we’ll share them with you on this talk.
Advancing Engineering with AI through the Next Generation of Strategic Projec...
AB Tests - Theory and Rex
1. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
AB Tests
2. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
What’s an AB Test
Version “A” Version “B”
50% 50%
3. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
SUMMARY
3
Best Practices
and
State of the Art
Custom Engine Lessons Learned
4. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Best Practices
Best Practices and State of the Art
4
5. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Example
Take first letter of your name
A-M : variation A
N-Z : variation B
Rule
- Equally likely to see each variant (if 50/50): no bias towards
variation
Distributing Users in Variations
Anna
Variation A
Zoey
Variation B
6. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Example
Display “random” variation when a user
comes on the website
Rule
- Repeat assignments must be consistent
Distributing Users in Variations
7. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Example
Modulo 2 on memberId
Rule
- No correlation between experiments
Distributing Users in Variations
Variation A Variation B
Test 1
Member 1 Member 2
Test 2
Member 1 Member 2
………..
Member 1 Member 2
8. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Optional
- Support monotonic ramp-up
Increase users in an experiment without changing previous users’
assignments
- Support external control
Some users can be manually forced into or out of variations
Distributing Users in Variations
9. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
KPI
KPI must be defined before the beginning of the AB test
10. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Duration / Sample Size
- Too short:
Will not be significant, can lead to wrong conclusions
- Too long:
Potential loss of revenue
11. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Duration - t-test
- OA
and OB
are the estimated KPI values (e.g., averages)
- σd
is the estimated standard deviation of the difference
between the two KPIs
- t is the test result
- Threshold established based on confidence level
e.g., 1.96 for large samples and 95% confidence level
- If |t| > threshold, we conclude that the variation’s KPI is
statistically significantly different than the Control’s KPI.
12. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Duration - sample size
- n is the number of users in each variant and the variants are
assumed to be of equal size
- σ2
is the variance of the KPI
- Δ is the sensitivity, or the amount of change you want to
detect
- 16 is for 80% power, 21 for 90%.
If there is a difference in the KPIs, the power is the probability of determining
that the difference is statistically significant
13. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Duration - sample size
Alternative formula: r is the number of variations
(approx same size)
14. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Multi Variable Tests
+ You can test many factors in a short period of time, accelerating
improvement
+ You can estimate interactions between factors
- Some combinations of factors may give a poor user experience
- Analysis and interpretation are more difficult
- It can take longer to begin the test
15. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Multi Variable Tests
- Simple MVT
- fractional or full factorial
- MVT by running concurrent tests
- can estimate every interaction
- can turn off each factor independently as needed
- Overlapping experiments
- launch each test when the relevant is ready: more agile
16. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Parallel tests
Concurrent tests
- Booking.com: ~1000 tests in parallel
- Airbnb: ~500 tests in parallel
- Linkedin: 400+
17. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
References
- Theory about AB tests (Microsoft)
https://ai.stanford.edu/~ronnyk/2009controlledExperimentsOnTheWebSurvey.pdf
- Booking.com
https://taplytics.com/blog/how-booking-comss-tests-like-nobodys-business/
- Airbnb
https://medium.com/airbnb-engineering/experiments-at-airbnb-e2db3abf39e7
https://medium.com/airbnb-engineering/https-medium-com-jonathan-parks-scaling-erf-2
3fd17c91166
- Netflix
https://medium.com/netflix-techblog/its-all-a-bout-testing-the-netflix-experimentation-pla
tform-4e1ca458c15
- Linkedin
https://content.linkedin.com/content/dam/engineering/site-assets/pdfs/ABTestingSocialNet
work_share.pdf
18. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Custom
Engine
18
19. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Specific needs
- Cross platform
- Filters (OS, siteId, etc)
- “Technical” tests: progressive deployment
- Variation traffic vs experiment traffic
20. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
memberId + experimentId ⇒ hash
→ Fixed size number
+ Modulo 100 on that hash
→ Number between 0 and 99
+ Some “forced users” for
tests
✓ External control
✓ No bias towards variations
✓ Consistency
✓ No correlations between
experiments
✓ Monotonic ramp-up
✓ No storage needed
Distributing Users in Variations - Our Method
21. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/201921
“The difference between theory
and practice is larger in practice
than the difference between
theory and practice in theory”
Lessons
Learned
22. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
AB Testing is only one step
The code you AB Test is still code going in production.
Don’t forget to apply the same quality and quantity of
testing you would for any other code going to production.
23. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
KPI
KPI must be defined before the beginning of the AB test
24. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
AB Test duration
Real usecases mean that the n you’re computing is not
necessary users.
24
- Unique visitors
- Baskets
- Orders
- Product views
25. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Assignations: Storing vs Computing
- Storing assignations means big tables for all the members
assignations, and time to retrieve them
- Computing them solves that at the time you want the
assignation on your website
- ….. But what about afterwards?
➢ You probably need the assignations stored somewhere too for
posterior analysis
26. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Theoretical Practices vs Real Usecases
- Different repartition of users according to criteria
(country, device, etc)
- Correlated experiments
(same assignations for different tests)
- Proportion of variations changing during the test
26
27. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Going Further
27
Optimize assignations
If one variation is super efficient
compared to the other, you might not
want to keep assigning 50/50
Multi Armed Bandits
https://towardsdatascience.com/beyond-a-b-testi
ng-multi-armed-bandit-experiments-1493f709f80
4
28. VEEPEE - DATA SCIENCE APPLIED IN E-COMMERCE - 13/05/2019
Thank you