A/B testing and problems with
statistics
Web Analytics Wednesday Singapore
Nikolay Novozhilov, Wego.com
www.novozhilov.co
Is there a problem with A/B testing?
Imaginary uplifts
100 tests done, 10 successful, 10% uplift each…
…expect 159% growth!
Expectation Reality
Why?
… and what to do about it
Lies, damned lies, and statistics
All different! All based on assumptions!!!
Tool Test used
Optimizely Two-tailed sequential likelihood ratio test
with false discovery rate controls
Google Analytics Bayes estimate with uniform beta prior
VWO Intersection of confidence intervals for
binominal distribution
Leanplum Confidence intervals at p=5%, unknown
statistic
Usereffect Chi-square statistics
Commerce Sciences Welch's t-test
What is p-value and why it is 5%?
All tests are
based on
assumptions!
Assumption #1:
You don’t look at
the data upfront
What happens if you look?
I played Monte Carlo in
Excel
And here is the result:
• 5% p-value
• 1000 “users” in each
sample
• CR of 2%
• A wins over A 29% of
the times!
What do you do about it?
Don’t look! (just kidding)
Google “O'Brien & Fleming interim
analysis” (no, still kidding )
Keep calm, more stuff coming!
“My test on Buy button showed
interesting results…”
Buy Now! Buy Now! Buy Now! Buy Now!
Buy Now! Buy Now! Buy Now! Buy Now!
Buy Now! Buy Now! Buy Now! Buy Now!
-3% -23% +6% -9%
-2% +22% -11% -14%
-1% +9% -12% -1%
10000 users in each variant, base CR=1%
But in reality all colors were
the same…
Buy Now! Buy Now! Buy Now! Buy Now!
Buy Now! Buy Now! Buy Now! Buy Now!
Buy Now! Buy Now! Buy Now! Buy Now!
-3% -23% +6% -9%
-2% +22% -11% -14%
-1% +9% -12% -1%
10000 users in each variant, base CR=1%
The real problem!
Multivariate testing
Multiple comparisons
Be smart or be Google
Sample
size
Significa
nce
Effect
size
Power
Start with a good hypothesis!
But people are good in finding plausible
explanations for data!
Replication
Do your dirty
business
Register Replicate
This might work!
Stop math, I’m a web designer!
Visual way of doing it
Has some stat meaning!
ReplicationsVariance observation

A/B testing problems

  • 1.
    A/B testing andproblems with statistics Web Analytics Wednesday Singapore Nikolay Novozhilov, Wego.com www.novozhilov.co
  • 2.
    Is there aproblem with A/B testing?
  • 3.
    Imaginary uplifts 100 testsdone, 10 successful, 10% uplift each… …expect 159% growth! Expectation Reality
  • 4.
    Why? … and whatto do about it
  • 5.
    Lies, damned lies,and statistics All different! All based on assumptions!!! Tool Test used Optimizely Two-tailed sequential likelihood ratio test with false discovery rate controls Google Analytics Bayes estimate with uniform beta prior VWO Intersection of confidence intervals for binominal distribution Leanplum Confidence intervals at p=5%, unknown statistic Usereffect Chi-square statistics Commerce Sciences Welch's t-test
  • 6.
    What is p-valueand why it is 5%? All tests are based on assumptions! Assumption #1: You don’t look at the data upfront
  • 7.
    What happens ifyou look? I played Monte Carlo in Excel And here is the result: • 5% p-value • 1000 “users” in each sample • CR of 2% • A wins over A 29% of the times!
  • 8.
    What do youdo about it? Don’t look! (just kidding) Google “O'Brien & Fleming interim analysis” (no, still kidding ) Keep calm, more stuff coming!
  • 9.
    “My test onBuy button showed interesting results…” Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! -3% -23% +6% -9% -2% +22% -11% -14% -1% +9% -12% -1% 10000 users in each variant, base CR=1%
  • 10.
    But in realityall colors were the same… Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! Buy Now! -3% -23% +6% -9% -2% +22% -11% -14% -1% +9% -12% -1% 10000 users in each variant, base CR=1%
  • 11.
    The real problem! Multivariatetesting Multiple comparisons
  • 12.
    Be smart orbe Google Sample size Significa nce Effect size Power
  • 13.
    Start with agood hypothesis! But people are good in finding plausible explanations for data!
  • 14.
  • 15.
    Stop math, I’ma web designer!
  • 16.
    Visual way ofdoing it
  • 17.
    Has some statmeaning! ReplicationsVariance observation

Editor's Notes

  • #3 Ask audience: Do you do A/B testing? What you think is a problem?