A/B Testing and the Infinite Monkey Theory

2,291 views

Published on

Surveys show that on average only 1 out of 7 A/B tests run by e-commerces end up to be successful. Lukasz Twardowski, the CEO of UseItBetter, tries to explain how some of the most successful online businesses master this process turning it into iterative, evidence-led experimentation at scale programme.

Published in: Data & Analytics

A/B Testing and the Infinite Monkey Theory

  1. 1. http://en.wikipedia.org/wiki/Portraits_of_Shakespeare A/B Testing 
 and the Infinite Monkey Theorem. Lukasz Twardowski www.useitbetter.com
  2. 2. a monkey hitting keys at random for an infinite amount of time will almost surely type the complete works of William Shakespeare.
  3. 3. a monkey hitting keys at random for an infinite amount of time will almost surely A/B testing reach the conversion rate of Amazon.
  4. 4. A/B testing helps find out which of two versions performs better while running simultaneously. THEORY
  5. 5. We do this because every day is different, unlike in the Groundhog Day movie. Groundhog Day (1993, Dir. Harold Ramis)
  6. 6. http://nerds.airbnb.com/experiments-at-airbnb/ A single change, bad or good, will not change a trend. Unless a change is A/B tested, you won’t know its impact.
  7. 7. Why the monkey metaphor?
  8. 8. The industry average hit rate for A/B testing = Provide the benchmark: EXERCISE 1.
  9. 9. The industry average hit rate for A/B testing = 14% Just 1 out of 7 A/B tests is successful! http://conversionxl.com/ab-tests-fail/ Provide the benchmark: EXERCISE 1.
  10. 10. King Kong (1933, Dir. Merian Cooper, Ernest Schoedsack) How to be the greatest monkey in the biz if infinity is not an option?
  11. 11. Be a quick monkey. How to be the best monkey in the biz?
  12. 12. 1 out of 7 tests wins x 2 weeks per test = slow growth Do the math: EXERCISE 2. Unless you experiment at scale.
  13. 13. The currency in which you pay for A/B tests
 is traffic.
  14. 14. The currency in which you pay for A/B tests 
 is traffic. The more you have, the more tests you can run.
  15. 15. The currency in which you pay for A/B tests
 is traffic. The more you have, the more tests you can run. Never waste what you have.
  16. 16. Shop Direct Scaled to 101 experiments a month in two years. 100+ year old company Etsy 25 releases a day, most of them are 
 A/B tests. A startup launched in 2005 http://www.slideshare.net/danmckinley/design-for-continuous-experimentation(linkedin)
  17. 17. Zero Tests Per Month. Here’s the test idea, numbers and execution. Can we proceed? Let’s meet to discuss. Maybe next week? Looks good. Will check with Z and get back to you. So here’s the test idea, numbers… Sorry, had other priorities. Can we meet next week? Sure! (D***!) Have you checked with Z? Have you…? Have you…?
  18. 18. Ground rules: 
 1. Test ideas are subject to prioritization not approval.
  19. 19. evidence x opportunity size x strategy = priority Magic formula: EXERCISE 3. The worst idea gets tested if resources are available.
  20. 20. 101 Tests Per Month. Ok then, we’ll do this, this and that test. Others will wait. Guys, our strategy shifted to checkout optimization. Guys, we need to increase basket value. Now this and that one… And this… These two would work… Xmas is coming! DO NOTHING! …this, this and that…
  21. 21. Ground rules: 
 2. Accept the fact that things will go wrong.
  22. 22. Cheat like a monkey. How to be the best monkey in the biz?
  23. 23. If 1 out of 7 tests wins, what about the other 6?
  24. 24. https://www.groovehq.com/blog/failed-ab-tests What was the result of the
 Button Colors Test by Groove? EXERCISE 1.
  25. 25. If 1 out of 7 tests wins, what about the other 6? 5 of them will be inconclusive.
  26. 26. Most tests are inconclusive because: a) too few users were using the changed feature for it to get statistical significance. b) the changed feature had little to do with metrics used to evaluate the test. c) there were multiple changes in the same test and they levelled up.
  27. 27. Complete the sentence: EXERCISE 4. You do it to find out what works and how well. A/B testing is NOT about __________.making money
  28. 28. You can successfully run tests that have no chance of success.
  29. 29. … removing a feature … slowing down the website … Cheat: Experiment to test significance. Test results show that… didn’t reduce conversion.
  30. 30. … we shouldn’t waste time on that. Cheat: test significance. Test results show that…
  31. 31. Cheat: One change per test. Order matters. Select products, produce videos, upload, add links, launch test Add links Select products Produce videos … INCONCLUSIVE
  32. 32. … people don’t click “watch video” links. Cheat: Measure against your hypothesis. … adding videos had no impact on conversion. INCONCLUSIVE CONCLUSIVE Test results show that…
  33. 33. A great presentation by Etsy: goo.gl/WQpY65
  34. 34. The benefit you get from A/B testing is knowledge not revenue.
  35. 35. The benefit you get from A/B testing is knowledge not revenue. Revenue will come as a result of applied knowledge.
  36. 36. Don’t be
 a monkey. How to be the best monkey in the biz? Don’t be a gnome either.
  37. 37. What about this 1 test out of 7 that fails?
  38. 38. http://conversionxl.com/ab-tests-fail/ 3 out of 4 companies (that are A/B testing) make changes based on intuition or best practices. 50% NOT A/B testing 50% A/B testing
  39. 39. collect underpants + ? = profit Solve equation: EXERCISE 5.
  40. 40. A/B test is launched. Test results come back negative. The idea gets killed, next test is launched. A/B Testing Flow Fail Fast Approach
  41. 41. One failed test doesn’t make collecting underpants a bad idea.
  42. 42. A/B test is launched. Test results come back negative. Survey responses give a clue why. Users are surveyed alongside the test. Respondents’ logs give another clue. Respondents are emailed to clarify the issue. The issue is solved, the test relaunched. Users’ behaviors are logged. Pre-test research is done. Example of A/B Testing Flow at Spotify Prepare for failure. Courtesy of @bendressler researcher at Spotify
  43. 43. The real price you pay for not researching why tests fail is the death of great ideas.
  44. 44. User Testing Voice of Customer I predict that doing B will change X by Y% because of Z. Are Metrics Good? Accepted Rejected What really happened? Insight and Evidence Metrics Based Evaluation Hypothesis check Evidence-Led Flow Hypothesis Based
 A/B Testing Qual/Quant Analytics
  45. 45. User Testing Voice of Customer I predict that doing B will change X by Y% because of Z. Are Metrics Good? Accepted Rejected What really happened? Insight and Evidence Metrics Based Evaluation Hypothesis check Evidence-Led Flow Hypothesis Based
 A/B Testing
  46. 46. 1TB Behavioural Raw Data 40M Unique Interactions Collect behavioral data. Build segmentation rules. 41 Sets of Rules Created Explore, analyze. visualize. Quantify an opportunity Translate an insight into a test. average stats per website from the last month UseItBetter - The Platform for Evidence-Led Experimentation at Scale
  47. 47. An analyst researching for an infinite amount of time will almost surely get you to 100% hit ratio. Which isn’t good either.
  48. 48. If you are going to A/B test:
  49. 49. 1. Never waste your traffic.
  50. 50. 1. Never waste your traffic. 2. Many small changes are better than one big change.
  51. 51. 1. Never waste your traffic. 2. Many small changes are better than one big change. 3. Even the smallest change needs an insight.
  52. 52. 1. Never waste your traffic. 2. Many small changes are better than one big change. 3. Even the smallest change needs an insight. 4. Prepare for failure.
  53. 53. 1. Never waste your traffic. 2. Many small changes are better than one big change. 3. Even the smallest change needs an insight. 4. Prepare for failure. 5. It’s OK to fail if you know why you failed.
  54. 54. 1. Never waste your traffic. 2. Many small changes are better than one big change. 3. Even the smallest change needs an insight. 4. Prepare for failure. 5. It’s OK to fail if you know why you failed. 6. Iterate.
  55. 55. 1. Never waste your traffic. 2. Many small changes are better than one big change. 3. Even the smallest change needs an insight. 4. Prepare for failure. 5. It’s OK to fail if you know why you failed. 6. Iterate. 7. Be honest.
  56. 56. For the sake of this presentation, I assumed that the results of the 7 tests I referred to had been correctly read out by the people who are familiar with the terms like statistical significance, confidence intervals, p-value etc. Otherwise, it’s likely that the one winning test was just a phantom. Disclaimer
  57. 57. Get in touch: THE FINAL EXERCISE Łukasz Twardowski https://linkedin.com/in/twardowski

×