Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Winning Ways for your Visualization... by PyData 1042 views
- Ferry - Share and Deploy Big Data A... by PyData 711 views
- The High Performance Python Landsca... by PyData 1353 views
- Daniel Krasner - High Performance T... by PyData 2235 views
- Nipype by PyData 99 views
- Philippe Bracke- Estimating Residen... by PyData 251 views

1,018 views

Published on

PyData London 2014 Martin Goodson - Most A/B Testing Results are Illusory

Published in:
Technology

No Downloads

Total views

1,018

On SlideShare

0

From Embeds

0

Number of Embeds

2

Shares

0

Downloads

14

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Most A/B testing results are Illusory Martin Goodson, Skimlinks
- 2. These are my opinions not those of my employer!
- 3. What’s an A/B test? Example: Free delivery A: Control B: Variant
- 4. ‘How can you talk for 40 minutes about A/B testing?’
- 5. A/B tests are very easy to get wrong
- 6. What my experience is based on
- 7. What this talk is about 3 Statistical concepts Errors and consequences These errors are exactly how A/B testing software works
- 8. What this talk is about Statistical Power Multiple Testing Regression to the Mean
- 9. What is Statistical Power? The probability that you will detect a true difference between two samples
- 10. What is Statistical Power? Example: are men taller than women, on average?
- 11. What is Statistical Power? Example: free delivery on a website
- 12. Why is Statistical Power important? 1. False negatives 2. False positives
- 13. Precision Proportion of true positives in the positive results Its a function of power, significance level and prevalence.
- 14. If you have good power? Out of 100 tests 10 really drive uplift You detect 8 5 false positives 8/13 of positive tests are real
- 15. If you have bad power? Out of 100 tests 10 really drive uplift You detect 3 5 false positives 3/8 of winning tests are real!
- 16. Marketer: ‘We need results in 2 weeks time’ Me: ‘We can’t run this test for only two weeks we won’t get robust results’
- 17. Marketer: ‘We need results in 2 weeks time’ Me: ‘We can’t run this test for only two weeks we won’t get robust results’ Marketer: ‘Why are you being so negative?’
- 18. Calculating Power Alpha: probability of a positive result when the null hypothesis is true (5%) Beta: probability of not seeing a positive result when the null hypothesis is true Power = 1- Beta (80-90%)
- 19. Calculating Power Use a power calculator: Online R (power.prop.test) python (statsmodels.stats.power)
- 20. Approximate sample sizes Using a power calculator and asking for 80% power and significance level of 5%: 6000 conversions to detect 5% uplift 1600 conversions to detect 10% uplift
- 21. Multiple testing
- 22. Effect of multiple testing if you run 20 tests at a significance level of 5% you will obtain 1 win, just by chance.
- 23. Giving targets for successful tests.
- 24. Stopping tests early
- 25. Stopping tests early Simulations show that stopping an A/A test when you see a positive results will result in successful test 41% of the time.
- 26. Stopping tests early That works out to a precision of 20%
- 27. Negative uplift. Stopping an A/B test with negative effect results in a win 9% of the time!
- 28. A True Story
- 29. Regression to the mean Give 100 students a true/false test They all answer randomly Take only the top scoring 10% of the class Test them again What will the results be?
- 30. Estimates of uplift are generally wrong.
- 31. What you need to do to get it right ● Do a power calculation first to estimate sample size ● Use a valid hypothesis - don’t use a scattergun approach ● Do not stop the test early ● Perform a second ‘validation’ test
- 32. My details martingoodson@gmail.com @martingoodson http://goo.gl/jvhwmB Download my whitepaper on A/B testing here
- 33. Skimlinks After Party! Levante Bar 5 minutes away Come hungry! Invites + Map at the booth http://skimlinks.com/jobs

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment