Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

**Scribd will begin operating the SlideShare business on December 1, 2020**
As of this date, Scribd will manage your SlideShare account and any content you may have on SlideShare, and Scribd's General Terms of Use and Privacy Policy will apply. If you wish to opt out, please close your SlideShare account. Learn more.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Everything You Didn't Know About Go... by Tom Capper 3364 views
- Does Google still need links? - Sea... by Tom Capper 3757 views
- Statistics for CRO - Conversion Con... by Tom Capper 1026 views
- SearchLove London 2016 | Marcus Tob... by Distilled 4106 views
- SearchLove London 2016 | Larry Kim ... by Distilled 6913 views
- B2B Content Marketing 2017 - Benchm... by Content Marketing... 174972 views

1,050 views

Published on

A walkthrough of the most common statistical pitfalls that marketers encounter in CRO testing.

Published in:
Marketing

No Downloads

Total views

1,050

On SlideShare

0

From Embeds

0

Number of Embeds

267

Shares

0

Downloads

6

Comments

0

Likes

1

No embeds

No notes for slide

- 1. @THCapper YOUR RESULTS ARE INVALID STATISTICS FOR CRO
- 2. A Good A/B Test Result: “10% Uplift, With 95% Significance”
- 3. ● What does this mean? ● Is this correct? “10% Uplift, With 95% Significance”
- 4. Easy Questions?
- 5. Most Tools Encourage Mistakes
- 6. Risk
- 7. Marketer: “Roll it out!” Statistician (me): *sobs*
- 8. “That’s Risky”
- 9. “Advanced”
- 10. You will learn today: ● The most common serious errors in A/B testing ● How to avoid them ● How to interpret your result ● Whether to roll it out
- 11. How to Run an A/B Test
- 12. 1. Test design 2. Results interpretation 3. Decision
- 13. Jargon: Null Hypothesis
- 14. Jargon: Null Hypothesis ● The hypothesis that your variant and original are functionally equivalent
- 15. e.g. an A/A Test vs. A A
- 16. Jargon: P-Value
- 17. Jargon: P-Value ● The chance of a result this extreme if the null hypothesis is true ● E.g. 0.05 for 95% significance
- 18. Jargon: Critical Value
- 19. Jargon: Critical Value ● What you compare your p-value with when deciding whether to reject the null hypothesis
- 20. 1. Test Design
- 21. How Many Tests?
- 22. A B C D E F Multivariate Testing Landing Page: Product Pages:
- 23. Multivariate Testing A B C D Landing Page: Product Page:
- 24. Multivariate Testing A C D B
- 25. Multivariate Testing A C D BA C D B
- 26. Multivariate Testing A C D BA C D BA C D B
- 27. Multivariate Testing A C D B A C D BA C D BA C D B
- 28. Multivariate Testing A BLanding Page: A: 5% B: 7.5%
- 29. Multivariate Testing C D C: 5% D: 7.5% Product Page:
- 30. Multivariate Testing A C D B A C D BA C D BA C D B
- 31. A C Multivariate Testing D B A C D BA C D BA C D B AC: 0% BD: 5% BC: 10% AD: 10%
- 32. A B
- 33. “Constantly Iterate”
- 34. Multiple Testing A B C D E F
- 35. False Positives Test: Healthy Test: Ill
- 36. False Positives Test: Healthy Test: Ill False Negatives
- 37. False Positives Test: Healthy Test: Ill False Positives False Negatives
- 38. Multiple Testing 1 A/A test: 5% chance of achieving 95% significance.
- 39. Multiple Testing 1 A/A Test: 5% chance
- 40. Multiple Testing 1 A/A Test: 2 A/A Tests: 5% chance 9.75% chance
- 41. Multiple Testing 1 A/A Test: 2 A/A Tests: 3 A/A Tests: 5% chance 9.75% chance 14.26% chance
- 42. Multiple Testing 1 A/A Test: 2 A/A Tests: 3 A/A Tests: 4 A/A Tests: 5% chance 9.75% chance 14.26% chance 18.55% chance
- 43. Multiple Testing 1 A/A Test: 2 A/A Tests: 3 A/A Tests: 4 A/A Tests: n A/A Tests: 5% chance 9.75% chance 14.26% chance 18.55% chance 1-0.95^n
- 44. Multiple Testing Solutions: 1. Accept risk of false positives
- 45. Multiple Testing Solutions: 1. Accept risk of false positives 2. Bonferroni correction
- 46. Bonferroni Approximation Standard: P-value vs………..…. 0.05
- 47. Bonferroni Approximation Standard: P-value vs………..…. Approximation: P-value vs…... 0.05 0.05/N
- 48. Bonferroni Correction Standard: P-value vs………..…. Bonferroni: P-value vs………. 0.05 1-(1-0.05)^(1/N)
- 49. Multiple Testing Solutions: 1. Accept risk of false positives 2. Bonferroni correction 3. Holm-Bonferroni correction
- 50. Choosing the Right Metric
- 51. Choosing the Right Metric Conversion Rate vs. Average Session Value
- 52. Choosing the Right Metric Conversion Rate vs. Average Session Value Profit?
- 53. Stopping Rules
- 54. Stopping Rules Common: When my test reaches significance.
- 55. “Significance so far” varies over time.
- 56. Stopping Rules Y Y Y Y Y N N N N N
- 57. Stopping Rules Y Y Y Y Y Y YN N N
- 58. Stopping Rules 20000
- 59. 20000
- 60. Exceptions https://en.wikipedia.org/wiki/Sequential_probability_ratio_test
- 61. Stopping Rules Solutions: 1. Sequential testing - e.g. Optimizely 2. Bayesian testing - e.g. VWO 3. Predetermined sample size
- 62. evanmiller.org/ab-testing/sample-size.html
- 63. Sample Size for Average Session Value Testing
- 64. =stdev(B:B) =stdev.s(B:B) Standard Deviation
- 65. powerandsamplesize.com/Calculators/
- 66. Cutting Your Losses
- 67. Test Design Recap Contamination Multiple Testing Metric Choice Stopping Rules
- 68. 1. Test design 2. Results interpretation 3. Decision
- 69. 2. Results Interpretation
- 70. Interpreting the P-Value
- 71. Interpreting the P-value 1 test reaches 95% significance: 5% chance of data this extreme if variants functionally equivalent.
- 72. 0
- 73. Analogy
- 74. Analogy Question: How likely is it that my analytics or site are broken?
- 75. Analogy Question: How likely is it that my analytics or site are broken? Non-Answer: We only go a whole day with no conversions once every 2 months.
- 76. Analytics is broken with probability 1 or 0.
- 77. Interpreting the P-value Question: How likely is it that this variation actually does nothing? Non-Answer: We’d only see a difference this big 5% of the time.
- 78. Meanwhile in Industry Tools: ● “Chance to beat baseline” ● “We are 95% certain that the changes in test “B” will improve your conversion rate”
- 79. Unanswered Questions
- 80. Unanswered Questions Question: How likely is it that the increase will be less than predicted?
- 81. Unanswered Questions Question: How likely is it that the increase will be negative?
- 82. One Mistake Probability of Outcome given Data vs. Probability of Data given Null
- 83. Unanswered Questions Question: How likely is it that these results are a fluke?
- 84. Confidence Intervals
- 85. Confidence Interval of Conversion Rate
- 86. Overlapping Confidence Intervals
- 87. Everything Else Still Applies
- 88. Choosing the Right Metric
- 89. evanmiller.org/ab-testing/t-test.html
- 90. Results Interpretation Recap Check Revenue P-Value Confidence Intervals
- 91. 1. Test design 2. Results interpretation 3. Decision
- 92. 3. Decision
- 93. A Good A/B Test Result: “10% Uplift, With 95% Significance”
- 94. But what about this? “10% Uplift, With 60% Significance”
- 95. Jargon: P-Value ● The chance of a result this extreme if the null hypothesis is true ● E.g. 0.05 for 95% significance
- 96. “10% Uplift, With 60% Significance” ● 40% chance of data at least this extreme if variation functionally identical
- 97. “10% Uplift, With 60% Significance” ● 40% chance of data at least this extreme if variation functionally identical ● The variation is probably better than the baseline
- 98. Drug Trials vs. Investment Banking
- 99. Are You OK With False Positives?
- 100. Data is Expensive
- 101. Data is Expensive: ● Opportunity Cost ● Exploration vs. Exploitation
- 102. Historical Comparisons are Invalid
- 103. Hang on… Why Should I Care About Significance?
- 104. 1. Ignoring Significance Doesn’t Allow You to Ignore Statistics
- 105. 2. Risk Aversion
- 106. Risk Factors: ● Agility ● Business attitudes ● What’s the worst that could happen?
- 107. Decision Recap Significant vs. Winning Risk Exploration vs. Exploitation
- 108. Conclusion: 3 Takeaways
- 109. 1. Think about significance and risk during test design
- 110. 2. Remember your real KPI: Profit
- 111. 3. You’re not testing medicines
- 112. @THCapper Takeaways: 1. Think about significance and risk during test design 2. Remember your real KPI: Profit 3. You’re not testing medicines

No public clipboards found for this slide

Be the first to comment