Lars Lofgren and Will Kurt
Keep Your Gains from A/B Tests
Without Killing Them Later
May 2014
@larslofgren
Hit me up
1 The limits of A/B tests
We’ll cover…
2 The standard solutions
3 Simulations! Woohoo!
#KISSwebinar
4 The 3 strategies of ...
WATCH WEBINAR RECORDING NOW
Limits of
A/B Tests
A/B tests don’t give you
perfect decisions.
No ma er what you do, you’re never 100% certain
If we’re not careful, winners aren’t really winners
Your conversions go up… and then they come back down
The
Standard
Solution
Run your test until you hit
95% statistical significance.
Go to getdatadriven.com if
you need a significance
calculator.
1 Pick the minimal improvement
Scientific A/B testing:
2 Determine your sample size
3 Determine degree of certainty (95%)
#...
Martin Goodson’s PDF on
poor testing methods:
kiss.ly/bad-testing
This gives us the best data
but not necessarily the
best ROI.
So how far do we take
this?
Simulation
Time!
We modeled several A/B testing strategies.
Using Monte Carlo simulations, we
tested different strategies over 1
million obs...
Will Kurt gets full credit
for all this.
@willkurt
1 Pick the minimal improvement
The Scientist:
2 Determine your sample size
3 Determine degree of certainty (95%)
#KISSwebi...
Results for the Scientist:
1 Waits until 80% significance
The Reckless Marketer:
#KISSwebinar
2 Calls a winner as soon as 80% gets hit
Results for the Reckless Marketer:
1 Waits for 95% significance
The Impatient Marketer:
#KISSwebinar
2 Moves on to the next test a er 500 people
Results for the Impatient Marketer:
The Realist
#KISSwebinar
1 Waits for 99% significance
2 Moves on to the next test a er 2,000 people
Results for the Realist:
The Persistent Realist
#KISSwebinar
1 Waits for 99% significance
2 Moves on to the next test a er 20,000 people
Results for the Persistent Realist:
The Blitz Realist
#KISSwebinar
1 Waits for 99% significance
2 Moves on to the next test a er 200 people
Results for the Blitz Realist:
Let’s compare them using
the area under the curve.
A/B Strategy Scores
Strategy Conditions Score
Scientist Stats like a pro 67759
Reckless Marketer 80% 57649
Impatient Marke...
0
17500
35000
52500
70000
Persistent Realist Realist Scientist Blitz Realist Impatient Reckless No Testing
50,000
57,649
6...
LOG IN WITH GOOGLE
Start Your Free KISSmetrics Trial
Text
3 Strategies
Don’t make decisions at
less than 95% significance.
You’ll waste all the time you spend testing
1 Be a scientist at 95%
We have 3 viable strategies for making this work:
2 Only make changes at 99%
3 Sloppy 95% but make...
1 Pick the minimal improvement
Be a scientist when you have lots of data and resources
2 Determine your sample size
3 Dete...
If you don’t have the data
or resources to be a
scientist, go fast at 99%.
And if you still want to
play at 95% without being
a scientist, never stop
testing.
How We
A/B Test
First, get volume to 4000+
people/month.
Only make changes at 99%
significance.
Let the test run at least 1
week before checking
results.
If not at 99% a er two
weeks, launch the next
test.
If the next test isn’t ready,
let it keep running while
you build the next one.
The KISSmetrics A/B Testing Strategy
1 Get to 4,000 people/month for test
2 Only change the control if you reach 99%
3 Che...
This strategy isn’t perfect.
It’s a balance between
good data and speed.
1 Be a scientist at 95%
Remember the 3 strategies:
2 Only make changes at 99%
3 Sloppy 95% but make it up in volume
#KISSw...
Q&A Time!
Lars Lofgren
@larslofgren
llofgren@kissmetrics.com
Upcoming SlideShare
Loading in...5
×

How to Keep Your Gains from A/B Tests Without Accidentally Killing Them Later

4,377

Published on

Limits of A/B Tests
A/B tests don’t give you perfect decisions.
No matter what you do, you’re never 100% certain
If we’re not careful, winners aren’t really winners
Your conversions go up… and then they come back down
The Standard Solution
Run your test until you hit 95% statistical significance.
Go to getdatadriven.com if you need a significance calculator.
Martin Goodson’s PDF on poor testing methods: kiss.ly/bad-testing
This gives us the best data but not necessarily the best ROI.
So how far do we take this?
Simulation Time!
We modeled several A/B testing strategies. Using Monte Carlo simulations, we tested different strategies over 1 million observations (people).
Will Kurt gets full credit for all this. @willkurt
1 Pick the minimal improvement The Scientist: 2 Determine your sample size 3 Determine degree of certainty (95%) 4 Start test but don’t check it early 5 If results aren’t significant, keep control
Results for the Scientist:
1 Waits until 80% significance The Reckless Marketer: 2 Calls a winner as soon as 80% gets hit
Results for the Reckless Marketer:
1 Waits for 95% significance The Impatient Marketer: 2 Moves on to the next test after 500 people
Results for the Impatient Marketer:
The Realist 1 Waits for 99% significance 2 Moves on to the next test after 2,000 people
Results for the Realist:
The Persistent Realist 1 Waits for 99% significance 2 Moves on to the next test after 20,000 people
Results for the Persistent Realist:
The Blitz Realist 1 Waits for 99% significance 2 Moves on to the next test after 200 people
Results for the Blitz Realist:
Let’s compare them using the area under the curve.
Don’t make decisions at less than 95% significance.
You’ll waste all the time you spend testing
1 Be a scientist at 95% We have 3 viable strategies for making this work: 2 Only make changes at 99% 3 Sloppy 95% but make it up in volume
1 Pick the minimal improvement Be a scientist when you have lots of data and resources 2 Determine your sample size 3 Determine degree of certainty (95% 4 Start test but don’t check it early 5 If results aren’t significant, keep control
If you don’t have the data or resources to be a scientist, go fast at 99%.
And if you still want to play at 95% without being a scientist, never stop testing.
How We A/B Test
First, get volume to 4000+ people/month.
Only make changes at 99% significance.
Let the test run at least 1 week before checking results.
If not at 99% after two weeks, launch the next test.
If the next test isn’t ready, let it keep running while you build the next one.
The KISSmetrics A/B Testing Strategy 1 Get to 4,000 people/month for test 2 Only change the control if you reach 99% 3 Check results after 1 week 4 Launch the next test at 2 weeks 5 Let old tests run if you’re still building
This strategy isn’t perfect. It’s a balance between good data and speed.

Published in: Marketing, Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,377
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
39
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

How to Keep Your Gains from A/B Tests Without Accidentally Killing Them Later

  1. 1. Lars Lofgren and Will Kurt Keep Your Gains from A/B Tests Without Killing Them Later May 2014
  2. 2. @larslofgren Hit me up
  3. 3. 1 The limits of A/B tests We’ll cover… 2 The standard solutions 3 Simulations! Woohoo! #KISSwebinar 4 The 3 strategies of A/B testing that work 5 How we A/B test at KISSmetrics
  4. 4. WATCH WEBINAR RECORDING NOW
  5. 5. Limits of A/B Tests
  6. 6. A/B tests don’t give you perfect decisions.
  7. 7. No ma er what you do, you’re never 100% certain
  8. 8. If we’re not careful, winners aren’t really winners
  9. 9. Your conversions go up… and then they come back down
  10. 10. The Standard Solution
  11. 11. Run your test until you hit 95% statistical significance.
  12. 12. Go to getdatadriven.com if you need a significance calculator.
  13. 13. 1 Pick the minimal improvement Scientific A/B testing: 2 Determine your sample size 3 Determine degree of certainty (95%) #KISSwebinar 4 Start test but don’t check it early 5 If results aren’t significant, keep control
  14. 14. Martin Goodson’s PDF on poor testing methods: kiss.ly/bad-testing
  15. 15. This gives us the best data but not necessarily the best ROI.
  16. 16. So how far do we take this?
  17. 17. Simulation Time!
  18. 18. We modeled several A/B testing strategies. Using Monte Carlo simulations, we tested different strategies over 1 million observations (people).
  19. 19. Will Kurt gets full credit for all this. @willkurt
  20. 20. 1 Pick the minimal improvement The Scientist: 2 Determine your sample size 3 Determine degree of certainty (95%) #KISSwebinar 4 Start test but don’t check it early 5 If results aren’t significant, keep control
  21. 21. Results for the Scientist:
  22. 22. 1 Waits until 80% significance The Reckless Marketer: #KISSwebinar 2 Calls a winner as soon as 80% gets hit
  23. 23. Results for the Reckless Marketer:
  24. 24. 1 Waits for 95% significance The Impatient Marketer: #KISSwebinar 2 Moves on to the next test a er 500 people
  25. 25. Results for the Impatient Marketer:
  26. 26. The Realist #KISSwebinar 1 Waits for 99% significance 2 Moves on to the next test a er 2,000 people
  27. 27. Results for the Realist:
  28. 28. The Persistent Realist #KISSwebinar 1 Waits for 99% significance 2 Moves on to the next test a er 20,000 people
  29. 29. Results for the Persistent Realist:
  30. 30. The Blitz Realist #KISSwebinar 1 Waits for 99% significance 2 Moves on to the next test a er 200 people
  31. 31. Results for the Blitz Realist:
  32. 32. Let’s compare them using the area under the curve.
  33. 33. A/B Strategy Scores Strategy Conditions Score Scientist Stats like a pro 67759 Reckless Marketer 80% 57649 Impatient Marketer 95% and 500 people 60532 Realist 99% and 2,000 people 67896 Persistent Realist 99% and 20,000 people 68346 Blitz Realist 99% and 200 people 62836 No Testing Testing? NOPE! 50000 Each score is the area under the curve from the simulation. The higher the score, the more conversions you received.
  34. 34. 0 17500 35000 52500 70000 Persistent Realist Realist Scientist Blitz Realist Impatient Reckless No Testing 50,000 57,649 60,532 62,836 67,75967,89668,346 A/B Strategy Scores
  35. 35. LOG IN WITH GOOGLE Start Your Free KISSmetrics Trial Text
  36. 36. 3 Strategies
  37. 37. Don’t make decisions at less than 95% significance.
  38. 38. You’ll waste all the time you spend testing
  39. 39. 1 Be a scientist at 95% We have 3 viable strategies for making this work: 2 Only make changes at 99% 3 Sloppy 95% but make it up in volume #KISSwebinar
  40. 40. 1 Pick the minimal improvement Be a scientist when you have lots of data and resources 2 Determine your sample size 3 Determine degree of certainty (95%) #KISSwebinar 4 Start test but don’t check it early 5 If results aren’t significant, keep control
  41. 41. If you don’t have the data or resources to be a scientist, go fast at 99%.
  42. 42. And if you still want to play at 95% without being a scientist, never stop testing.
  43. 43. How We A/B Test
  44. 44. First, get volume to 4000+ people/month.
  45. 45. Only make changes at 99% significance.
  46. 46. Let the test run at least 1 week before checking results.
  47. 47. If not at 99% a er two weeks, launch the next test.
  48. 48. If the next test isn’t ready, let it keep running while you build the next one.
  49. 49. The KISSmetrics A/B Testing Strategy 1 Get to 4,000 people/month for test 2 Only change the control if you reach 99% 3 Check results a er 1 week 4 Launch the next test at 2 weeks 5 Let old tests run if you’re still building
  50. 50. This strategy isn’t perfect. It’s a balance between good data and speed.
  51. 51. 1 Be a scientist at 95% Remember the 3 strategies: 2 Only make changes at 99% 3 Sloppy 95% but make it up in volume #KISSwebinar
  52. 52. Q&A Time! Lars Lofgren @larslofgren llofgren@kissmetrics.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×