Upcoming SlideShare
×

# A/B Testing and the Infinite Monkey Theory

2,291 views

Published on

Surveys show that on average only 1 out of 7 A/B tests run by e-commerces end up to be successful. Lukasz Twardowski, the CEO of UseItBetter, tries to explain how some of the most successful online businesses master this process turning it into iterative, evidence-led experimentation at scale programme.

Published in: Data & Analytics
10 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
2,291
On SlideShare
0
From Embeds
0
Number of Embeds
96
Actions
Shares
0
18
0
Likes
10
Embeds 0
No embeds

No notes for slide

### A/B Testing and the Infinite Monkey Theory

1. 1. http://en.wikipedia.org/wiki/Portraits_of_Shakespeare A/B Testing   and the Inﬁnite Monkey Theorem. Lukasz Twardowski www.useitbetter.com
2. 2. a monkey hitting keys at random for an inﬁnite amount of time will almost surely type the complete works of William Shakespeare.
3. 3. a monkey hitting keys at random for an inﬁnite amount of time will almost surely A/B testing reach the conversion rate of Amazon.
4. 4. A/B testing helps ﬁnd out which of two versions performs better while running simultaneously. THEORY
5. 5. We do this because every day is different, unlike in the Groundhog Day movie. Groundhog Day (1993, Dir. Harold Ramis)
6. 6. http://nerds.airbnb.com/experiments-at-airbnb/ A single change, bad or good, will not change a trend. Unless a change is A/B tested, you won’t know its impact.
7. 7. Why the monkey metaphor?
8. 8. The industry average hit rate for A/B testing = Provide the benchmark: EXERCISE 1.
9. 9. The industry average hit rate for A/B testing = 14% Just 1 out of 7 A/B tests is successful! http://conversionxl.com/ab-tests-fail/ Provide the benchmark: EXERCISE 1.
10. 10. King Kong (1933, Dir. Merian Cooper, Ernest Schoedsack) How to be the greatest monkey in the biz if inﬁnity is not an option?
11. 11. Be a quick monkey. How to be the best monkey in the biz?
12. 12. 1 out of 7 tests wins x 2 weeks per test = slow growth Do the math: EXERCISE 2. Unless you experiment at scale.
13. 13. The currency in which you pay for A/B tests  is trafﬁc.
14. 14. The currency in which you pay for A/B tests   is trafﬁc. The more you have, the more tests you can run.
15. 15. The currency in which you pay for A/B tests  is trafﬁc. The more you have, the more tests you can run. Never waste what you have.
16. 16. Shop Direct Scaled to 101 experiments a month in two years. 100+ year old company Etsy 25 releases a day, most of them are   A/B tests. A startup launched in 2005 http://www.slideshare.net/danmckinley/design-for-continuous-experimentation(linkedin)
17. 17. Zero Tests Per Month. Here’s the test idea, numbers and execution. Can we proceed? Let’s meet to discuss. Maybe next week? Looks good. Will check with Z and get back to you. So here’s the test idea, numbers… Sorry, had other priorities. Can we meet next week? Sure! (D***!) Have you checked with Z? Have you…? Have you…?
18. 18. Ground rules:   1. Test ideas are subject to prioritization not approval.
19. 19. evidence x opportunity size x strategy = priority Magic formula: EXERCISE 3. The worst idea gets tested if resources are available.
20. 20. 101 Tests Per Month. Ok then, we’ll do this, this and that test. Others will wait. Guys, our strategy shifted to checkout optimization. Guys, we need to increase basket value. Now this and that one… And this… These two would work… Xmas is coming! DO NOTHING! …this, this and that…
21. 21. Ground rules:   2. Accept the fact that things will go wrong.
22. 22. Cheat like a monkey. How to be the best monkey in the biz?
23. 23. If 1 out of 7 tests wins, what about the other 6?
24. 24. https://www.groovehq.com/blog/failed-ab-tests What was the result of the  Button Colors Test by Groove? EXERCISE 1.
25. 25. If 1 out of 7 tests wins, what about the other 6? 5 of them will be inconclusive.
26. 26. Most tests are inconclusive because: a) too few users were using the changed feature for it to get statistical signiﬁcance. b) the changed feature had little to do with metrics used to evaluate the test. c) there were multiple changes in the same test and they levelled up.
27. 27. Complete the sentence: EXERCISE 4. You do it to ﬁnd out what works and how well. A/B testing is NOT about __________.making money
28. 28. You can successfully run tests that have no chance of success.
29. 29. … removing a feature … slowing down the website … Cheat: Experiment to test signiﬁcance. Test results show that… didn’t reduce conversion.
30. 30. … we shouldn’t waste time on that. Cheat: test signiﬁcance. Test results show that…
32. 32. … people don’t click “watch video” links. Cheat: Measure against your hypothesis. … adding videos had no impact on conversion. INCONCLUSIVE CONCLUSIVE Test results show that…
33. 33. A great presentation by Etsy: goo.gl/WQpY65
34. 34. The beneﬁt you get from A/B testing is knowledge not revenue.
35. 35. The beneﬁt you get from A/B testing is knowledge not revenue. Revenue will come as a result of applied knowledge.
36. 36. Don’t be  a monkey. How to be the best monkey in the biz? Don’t be a gnome either.
38. 38. http://conversionxl.com/ab-tests-fail/ 3 out of 4 companies (that are A/B testing) make changes based on intuition or best practices. 50% NOT A/B testing 50% A/B testing
39. 39. collect underpants + ? = proﬁt Solve equation: EXERCISE 5.
40. 40. A/B test is launched. Test results come back negative. The idea gets killed, next test is launched. A/B Testing Flow Fail Fast Approach
41. 41. One failed test doesn’t make collecting underpants a bad idea.
42. 42. A/B test is launched. Test results come back negative. Survey responses give a clue why. Users are surveyed alongside the test. Respondents’ logs give another clue. Respondents are emailed to clarify the issue. The issue is solved, the test relaunched. Users’ behaviors are logged. Pre-test research is done. Example of A/B Testing Flow at Spotify Prepare for failure. Courtesy of @bendressler researcher at Spotify
43. 43. The real price you pay for not researching why tests fail is the death of great ideas.
44. 44. User Testing Voice of Customer I predict that doing B will change X by Y% because of Z. Are Metrics Good? Accepted Rejected What really happened? Insight and Evidence Metrics Based Evaluation Hypothesis check Evidence-Led Flow Hypothesis Based  A/B Testing Qual/Quant Analytics
45. 45. User Testing Voice of Customer I predict that doing B will change X by Y% because of Z. Are Metrics Good? Accepted Rejected What really happened? Insight and Evidence Metrics Based Evaluation Hypothesis check Evidence-Led Flow Hypothesis Based  A/B Testing
46. 46. 1TB Behavioural Raw Data 40M Unique Interactions Collect behavioral data. Build segmentation rules. 41 Sets of Rules Created Explore, analyze. visualize. Quantify an opportunity Translate an insight into a test. average stats per website from the last month UseItBetter - The Platform for Evidence-Led Experimentation at Scale
47. 47. An analyst researching for an inﬁnite amount of time will almost surely get you to 100% hit ratio. Which isn’t good either.
48. 48. If you are going to A/B test:
49. 49. 1. Never waste your trafﬁc.
50. 50. 1. Never waste your trafﬁc. 2. Many small changes are better than one big change.
51. 51. 1. Never waste your trafﬁc. 2. Many small changes are better than one big change. 3. Even the smallest change needs an insight.
52. 52. 1. Never waste your trafﬁc. 2. Many small changes are better than one big change. 3. Even the smallest change needs an insight. 4. Prepare for failure.
53. 53. 1. Never waste your trafﬁc. 2. Many small changes are better than one big change. 3. Even the smallest change needs an insight. 4. Prepare for failure. 5. It’s OK to fail if you know why you failed.
54. 54. 1. Never waste your trafﬁc. 2. Many small changes are better than one big change. 3. Even the smallest change needs an insight. 4. Prepare for failure. 5. It’s OK to fail if you know why you failed. 6. Iterate.
55. 55. 1. Never waste your trafﬁc. 2. Many small changes are better than one big change. 3. Even the smallest change needs an insight. 4. Prepare for failure. 5. It’s OK to fail if you know why you failed. 6. Iterate. 7. Be honest.
56. 56. For the sake of this presentation, I assumed that the results of the 7 tests I referred to had been correctly read out by the people who are familiar with the terms like statistical signiﬁcance, conﬁdence intervals, p-value etc. Otherwise, it’s likely that the one winning test was just a phantom. Disclaimer
57. 57. Get in touch: THE FINAL EXERCISE Łukasz Twardowski https://linkedin.com/in/twardowski