Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
Loading in …5
×

# Statistics for CRO - Conversion Conference London

956 views

Published on

A walkthrough of the most common statistical pitfalls that marketers encounter in CRO testing.

Published in: Marketing
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• By the way: I see you also refer to Evan Millers sample size calculator: Based on that I created a spreadsheet to calculate sample size for multiple websites/segments all at once: http://www.gxjansen.com/compute-required-sample-size-for-your-ab-tests/, would love to know what you think about that one :)

Are you sure you want to  Yes  No
Your message goes here
• Nice! do you also have the audio (or maybe even video) for this presentation?

Are you sure you want to  Yes  No
Your message goes here

### Statistics for CRO - Conversion Conference London

1. 1. @THCapper YOUR RESULTS ARE INVALID STATISTICS FOR CRO
2. 2. A Good A/B Test Result: “10% Uplift, With 95% Significance”
3. 3. ● What does this mean? ● Is this correct? “10% Uplift, With 95% Significance”
4. 4. Easy Questions?
5. 5. Most Tools Encourage Mistakes
6. 6. Risk
7. 7. Marketer: “Roll it out!” Statistician (me): *sobs*
8. 8. “That’s Risky”
9. 9. “Advanced”
10. 10. You will learn today: ● The most common serious errors in A/B testing ● How to avoid them ● How to interpret your result ● Whether to roll it out
11. 11. How to Run an A/B Test
12. 12. 1. Test design 2. Results interpretation 3. Decision
13. 13. Jargon: Null Hypothesis
14. 14. Jargon: Null Hypothesis ● The hypothesis that your variant and original are functionally equivalent
15. 15. e.g. an A/A Test vs. A A
16. 16. Jargon: P-Value
17. 17. Jargon: P-Value ● The chance of a result this extreme if the null hypothesis is true ● E.g. 0.05 for 95% significance
18. 18. Jargon: Critical Value
19. 19. Jargon: Critical Value ● What you compare your p-value with when deciding whether to reject the null hypothesis
20. 20. 1. Test Design
21. 21. How Many Tests?
22. 22. A B C D E F Multivariate Testing Landing Page: Product Pages:
23. 23. Multivariate Testing A B C D Landing Page: Product Page:
24. 24. Multivariate Testing A C D B
25. 25. Multivariate Testing A C D BA C D B
26. 26. Multivariate Testing A C D BA C D BA C D B
27. 27. Multivariate Testing A C D B A C D BA C D BA C D B
28. 28. Multivariate Testing A BLanding Page: A: 5% B: 7.5%
29. 29. Multivariate Testing C D C: 5% D: 7.5% Product Page:
30. 30. Multivariate Testing A C D B A C D BA C D BA C D B
31. 31. A C Multivariate Testing D B A C D BA C D BA C D B AC: 0% BD: 5% BC: 10% AD: 10%
32. 32. A B
33. 33. “Constantly Iterate”
34. 34. Multiple Testing A B C D E F
35. 35. False Positives Test: Healthy Test: Ill
36. 36. False Positives Test: Healthy Test: Ill False Negatives
37. 37. False Positives Test: Healthy Test: Ill False Positives False Negatives
38. 38. Multiple Testing 1 A/A test: 5% chance of achieving 95% significance.
39. 39. Multiple Testing 1 A/A Test: 5% chance
40. 40. Multiple Testing 1 A/A Test: 2 A/A Tests: 5% chance 9.75% chance
41. 41. Multiple Testing 1 A/A Test: 2 A/A Tests: 3 A/A Tests: 5% chance 9.75% chance 14.26% chance
42. 42. Multiple Testing 1 A/A Test: 2 A/A Tests: 3 A/A Tests: 4 A/A Tests: 5% chance 9.75% chance 14.26% chance 18.55% chance
43. 43. Multiple Testing 1 A/A Test: 2 A/A Tests: 3 A/A Tests: 4 A/A Tests: n A/A Tests: 5% chance 9.75% chance 14.26% chance 18.55% chance 1-0.95^n
44. 44. Multiple Testing Solutions: 1. Accept risk of false positives
45. 45. Multiple Testing Solutions: 1. Accept risk of false positives 2. Bonferroni correction
46. 46. Bonferroni Approximation Standard: P-value vs………..…. 0.05
47. 47. Bonferroni Approximation Standard: P-value vs………..…. Approximation: P-value vs…... 0.05 0.05/N
48. 48. Bonferroni Correction Standard: P-value vs………..…. Bonferroni: P-value vs………. 0.05 1-(1-0.05)^(1/N)
49. 49. Multiple Testing Solutions: 1. Accept risk of false positives 2. Bonferroni correction 3. Holm-Bonferroni correction
50. 50. Choosing the Right Metric
51. 51. Choosing the Right Metric Conversion Rate vs. Average Session Value
52. 52. Choosing the Right Metric Conversion Rate vs. Average Session Value Profit?
53. 53. Stopping Rules
54. 54. Stopping Rules Common: When my test reaches significance.
55. 55. “Significance so far” varies over time.
56. 56. Stopping Rules Y Y Y Y Y N N N N N
57. 57. Stopping Rules Y Y Y Y Y Y YN N N
58. 58. Stopping Rules 20000
59. 59. 20000
60. 60. Exceptions https://en.wikipedia.org/wiki/Sequential_probability_ratio_test
61. 61. Stopping Rules Solutions: 1. Sequential testing - e.g. Optimizely 2. Bayesian testing - e.g. VWO 3. Predetermined sample size
62. 62. evanmiller.org/ab-testing/sample-size.html
63. 63. Sample Size for Average Session Value Testing
64. 64. =stdev(B:B) =stdev.s(B:B) Standard Deviation
65. 65. powerandsamplesize.com/Calculators/
66. 66. Cutting Your Losses
67. 67. Test Design Recap Contamination Multiple Testing Metric Choice Stopping Rules
68. 68. 1. Test design 2. Results interpretation 3. Decision
69. 69. 2. Results Interpretation
70. 70. Interpreting the P-Value
71. 71. Interpreting the P-value 1 test reaches 95% significance: 5% chance of data this extreme if variants functionally equivalent.
72. 72. 0
73. 73. Analogy
74. 74. Analogy Question: How likely is it that my analytics or site are broken?
75. 75. Analogy Question: How likely is it that my analytics or site are broken? Non-Answer: We only go a whole day with no conversions once every 2 months.
76. 76. Analytics is broken with probability 1 or 0.
77. 77. Interpreting the P-value Question: How likely is it that this variation actually does nothing? Non-Answer: We’d only see a difference this big 5% of the time.
78. 78. Meanwhile in Industry Tools: ● “Chance to beat baseline” ● “We are 95% certain that the changes in test “B” will improve your conversion rate”
79. 79. Unanswered Questions
80. 80. Unanswered Questions Question: How likely is it that the increase will be less than predicted?
81. 81. Unanswered Questions Question: How likely is it that the increase will be negative?
82. 82. One Mistake Probability of Outcome given Data vs. Probability of Data given Null
83. 83. Unanswered Questions Question: How likely is it that these results are a fluke?
84. 84. Confidence Intervals
85. 85. Confidence Interval of Conversion Rate
86. 86. Overlapping Confidence Intervals
87. 87. Everything Else Still Applies
88. 88. Choosing the Right Metric
89. 89. evanmiller.org/ab-testing/t-test.html
90. 90. Results Interpretation Recap Check Revenue P-Value Confidence Intervals
91. 91. 1. Test design 2. Results interpretation 3. Decision
92. 92. 3. Decision
93. 93. A Good A/B Test Result: “10% Uplift, With 95% Significance”
94. 94. But what about this? “10% Uplift, With 60% Significance”
95. 95. Jargon: P-Value ● The chance of a result this extreme if the null hypothesis is true ● E.g. 0.05 for 95% significance
96. 96. “10% Uplift, With 60% Significance” ● 40% chance of data at least this extreme if variation functionally identical
97. 97. “10% Uplift, With 60% Significance” ● 40% chance of data at least this extreme if variation functionally identical ● The variation is probably better than the baseline
98. 98. Drug Trials vs. Investment Banking
99. 99. Are You OK With False Positives?
100. 100. Please Don’t Tweet This:
101. 101. Tweet This Instead:
102. 102. Data is Expensive
103. 103. Data is Expensive: ● Opportunity Cost ● Exploration vs. Exploitation
104. 104. Historical Comparisons are Invalid
105. 105. Hang on… Why Should I Care About Significance?
106. 106. 1. Ignoring Significance Doesn’t Allow You to Ignore Statistics
107. 107. 2. Risk Aversion
108. 108. Risk Factors: ● Agility ● Business attitudes ● What’s the worst that could happen?
109. 109. Decision Recap Significant vs. Winning Risk Exploration vs. Exploitation
110. 110. Conclusion: 3 Takeaways
111. 111. 1. Think about significance and risk during test design
112. 112. 2. Remember your real KPI: Profit
113. 113. 3. You’re not testing medicines
114. 114. @THCapper Takeaways: 1. Think about significance and risk during test design 2. Remember your real KPI: Profit 3. You’re not testing medicines