Oh 
Boy! 
These A/B 
tests look like 
total bullshit! 
@OptimiseOrDie
@OptimiseOrDie 
• UX, Analytics, Split Testing and Growth Rate Optimisation 
• Started doing testing & CRO 2004 
• Split tested over 40M visitors in 19 languages 
• 60+ mistakes I MADE with AB testing 
• Like riding a bike… 
• Want to optimise your optimisation? Get in touch!
Top 
Tes'ng 
F***ups 
for 
2014 
1. Tes'ng 
in 
the 
wrong 
place 
2. Your 
hypothesis 
inputs 
are 
crap 
3. No 
analy'cs 
integra'on 
4. Your 
test 
will 
finish 
a=er 
you 
die 
5. You 
don’t 
test 
for 
long 
enough 
6. You 
peek 
before 
it’s 
ready 
7. No 
QA 
for 
your 
split 
test 
8. Opportuni'es 
are 
not 
priori'sed 
9. Tes'ng 
cycles 
are 
too 
slow 
10. You 
don’t 
know 
when 
tests 
are 
ready 
11. Your 
test 
fails 
12. The 
test 
is 
‘about 
the 
same’ 
13. Test 
flips 
behaviour 
14. Test 
keeps 
moving 
around 
15. You 
run 
an 
A/A 
test 
and 
waste 
'me 
16. Nobody 
‘feels’ 
the 
test 
17. You 
forgot 
you 
were 
responsive 
18. You 
forgot 
you 
had 
no 
traffic 
19. You 
ran 
the 
wrong 
test 
type 
20. You 
didn’t 
try 
all 
the 
flavours 
of 
tes'ng 
@OptimiseOrDie 
slidesha.re/1wBbZ9c
#fail 
@OptimiseOrDie
@OptimiseOrDie 
26.6M
@OptimiseOrDie 28.4M
Oppan Gangnam Style! 
@OptimiseOrDie 6.9M
@OptimiseOrDie
@OptimiseOrDie
@OptimiseOrDie
The 95% Stopping Problem 
• Many people use 95, 99% ‘confidence’ to stop 
• This value is unreliable 
• Read this Nature article : bit.ly/1dwk0if 
• You can hit 95% early in a test 
• If you stop, it could be a false result 
• Testing Tools need to be smarter about what they imply! 
• This 95% thingy – it’s the last signal you should use to stop a test 
• Let me explain 
@OptimiseOrDie
False Positives and Negatives 
@OptimiseOrDie 
Scenario 
1 
Scenario 
2 
Scenario 
3 
Scenario 
4 
A"er 
200 
observa-ons 
Insignificant 
Insignificant 
Significant! 
Significant! 
A"er 
500 
observa-ons 
Insignificant 
Significant! 
Insignificant 
Significant! 
End 
of 
experiment 
Insignificant 
Significant! 
Insignificant 
Significant! 
Scenario 
1 
Scenario 
2 
Scenario 
3 
Scenario 
4 
A"er 
200 
observa-ons 
Insignificant 
Insignificant 
Significant! 
Significant! 
A"er 
500 
observa-ons 
Insignificant 
Significant! 
trial 
stopped 
trial 
stopped 
End 
of 
experiment 
Insignificant 
Significant! 
Significant! 
Significant!
A B 
62.5cm 
+/- 1cm 
@OptimiseOrDie 
9.1% 
± 0.5 
9.3% 
± 0.5 
9.1% 
± 0.2 
9.3% 
± 0.2 
9.1% 
± 0.1 
9.3% 
± 0.1
AB Testing Visualisation Tool 
@OptimiseOrDie abtestguide.com/calc/
The 95% Stopping Problem 
“You should know that stopping a test once it’s significant is 
deadly sin number 1 in A/B testing land. 
77% of A/A tests (testing the same thing as A and B) will reach 
significance at a certain point.” 
Ton Wesseling, Online Dialogue 
“I always tell people that you need a representative sample if 
your data needs to be valid. What does ‘representative’ mean? 
First of all you need to include all the weekdays and weekends. 
You need different weather, because it impacts buyer behaviour. 
But most important: Your traffic needs to have all traffic 
sources, especially newsletter, special campaigns, TV,… 
everything!” 
Andre Morys, Web Arts
Three Articles you MUST read 
“Statistical Significance does not equal Validity” 
http://bit.ly/1wMfmY2 
“Why every Internet Marketer should be a Statistician” 
http://bit.ly/1wMfs1G 
“Understanding the Cycles in your site” 
http://mklnd.com/1pGSOUP
Business & Purchase Cycles 
@OptimiseOrDie 
Start Test Finish Avg Cycle 
• Customers change 
• Your traffic mix changes 
• Markets, competitors 
• Be aware of all the waves 
• Always test whole cycles 
• Minimum 2 cycles (wk/mo) 
• Don’t exclude slower buyers
19
• TWO BUSINESS CYCLES minimum (week/mo) 
• 1 PURCHASE CYCLE minimum 
• 250 CONVERSIONS minimum per creative (e.g. checkouts) 
• 350 & MORE! if response is very similar 
• FULL WEEKS/CYCLES never part of one 
• KNOW what marketing, competitors and cycles are doing 
• RUN a test length calculator - bit.ly/XqCxuu 
• SET your test run time , RUN IT, STOP IT, ANALYSE IT 
• ONLY RUN LONGER if you need more data 
• DON’T RUN LONGER just because the test isn’t giving the result you want! 
@OptimiseOrDie 
How Long? Simple Rules to follow
Oops! No QA testing 
for the AB test!
QA 
Test 
or 
lose 
loads 
of 
MONEY!!! 
• Over 
40% 
of 
AB 
tests 
I’ve 
worked 
on 
were 
broken 
(some 
seriously) 
• I’ve 
also 
found 
over 
£20M 
p.a. 
of 
browser 
bugs 
in 
the 
last 
18 
months 
• It’s 
very 
easy 
to 
break 
or 
bias 
your 
tes'ng 
Browser testing 
www.crossbrowsertesting.com 
www.browserstack.com 
www.spoon.net 
www.saucelabs.com 
www.multibrowserviewer.com 
Mobile devices 
www.appthwack.com 
www.deviceanywhere.com 
www.opendevicelab.com 
Read this article bit.ly/1wBccsJ 
@OptimiseOrDie
Gamble the Company AWAY! 
• I get 60-65% right 
• UX and Copywriters good at picking! 
• C level execs are easy marks 
• Ironically, many decide ‘designs’ 
• You need collaborative test design 
• It’s a team game, with customers 
• Flip a coin, anyone?
WE’RE ALL WINGING IT
2004 Headspace 
What I thought 
I knew in 2004 
Reality
2014 Headspace 
What I 
KNOW 
I know 
Me, on a 
good day
Guessaholics Anonymous
Rumsfeldian Space
The Blind Octopus 
@OptimiseOrDie
Business Future Testing? 
Congratulations! 
Today you’re the lucky 
winner of our random 
awards programme. 
You get all these extra 
features for free, on us. 
Enjoy. 
Mr D. Vader
#1 : CULTURE 
• Smart Talented Polymath People 
• Flexible and Agile ‘One Team’ approach 
• Smash the Silos 
• Proper Agile, Rapid, Iterative 
The 5 Legged Optimisation Barstool
Fittest? Agile! 
@OptimiseOrDie
@OptimiseOrDie 
#2 : Analytics Investment (TOOLS, PEOPLE, DEV TIME)
@OptimiseOrDie 
#3 : Expensive and tedious UX research?
@OptimiseOrDie 
#3 : LCorows sC Coshta, nRnemel,o Mteu, lRtia Dpiedv iUcXe Dreisaerayr Scthu dies
#4 : PERSUASIVE COPYWRITING 
“On the average, five times as many people 
read the headline as read the body copy. 
When you have written your headline, you 
have spent eighty cents out of your dollar.” 
David Ogilvy 
“In 9 years and 40M split tests with visitors, 
the majority of my testing success came 
from playing with the words.” 
@OptimiseOrDie
• Google Content Experiments 
bit.ly/Ljg7Ds 
• Optimizely 
www.optimizely.com 
• Visual Website Optimizer 
www.visualwebsiteoptimizer.com 
• Multi Armed Bandit Explanation 
bit.ly/Xa80O8 
• New Machine Learning Tools 
www.conductrics.com 
@OptimiseOrDie 
#5 : Split Testing Tools
The 5 Legged Optimisation Barstool 
@OptimiseOrDie 
#1 Culture & Team 
#2 Toolkit & Analytics investment 
#3 UX, CX, Service Design, Insight 
#4 Persuasive Copywriting 
#5 Experimentation (testing) tools
READ STUFF
READ STUFF
READ STUFF
#5 : FIND STUFF 
@OptimiseOrDie 
@danbarker Analytics 
@fastbloke Analytics 
@timlb Analytics 
@jamesgurd Analytics 
@therustybear Analytics 
@carmenmardiros Analytics 
@davechaffey Analytics 
@priteshpatel9 Analytics 
@cutroni Analytics 
@avinash Analytics 
@Aschottmuller Analytics, CRO 
@cartmetrix Analytics, CRO 
@Kissmetrics CRO / UX 
@Unbounce CRO / UX 
@Morys CRO / Neuro 
@UXFeeds UX / Neuro 
@Psyblog Neuro 
@Gfiorelli1 SEO / Analytics 
@PeepLaja CRO 
@TheGrok CRO 
@UIE UX 
@LukeW UX / Forms 
@cjforms UX / Forms 
@axbom UX 
@iatv UX 
@Chudders Photo UX 
@JeffreyGroks Innovation 
@StephanieRieger Innovation 
@BrianSolis Innovation 
@DrEscotet Neuro 
@TheBrainLady Neuro 
@RogerDooley Neuro 
@Cugelman Neuro 
@Smashingmag Dev / UX 
@uxmag UX 
@Webtrends UX / CRO
#5 : LEARN STUFF 
@OptimiseOrDie 
Baymard.com 
Lukew.com 
Smashingmagazine.com 
ConversionXL.com 
Medium.com 
Whichtestwon.com 
Unbounce.com 
Measuringusability.com 
RogerDooley.com 
Kissmetrics.com 
Uxmatters.com 
Smartinsights.com 
Econsultancy.com 
Cutroni.com 
www.GetMentalNotes.com
#12 : The Best Companies… 
• Invest 
con'nually 
in 
analy'cs 
instrumenta'on, 
tools, 
people 
• Use 
an 
Agile, 
itera've, 
cross-­‐silo, 
one 
team 
project 
culture 
• Prefer 
collabora've 
tools 
to 
having 
lots 
of 
mee'ngs 
• Priori'se 
development 
based 
on 
numbers 
and 
insight 
• Prac'ce 
real 
con'nuous 
product 
improvement, 
not 
SLEDD* 
• Are 
fixing 
bugs, 
cru=, 
bad 
stuff 
as 
well 
as 
op'mising 
• Source 
photos 
and 
content 
that 
support 
persuasion 
and 
u'lity 
• Have 
cross 
channel, 
cross 
device 
design, 
tes'ng 
and 
QA 
• Segment 
their 
data 
for 
valuable 
insights, 
every 
test 
or 
change 
• Con'nually 
reduce 
cycle 
(itera'on) 
'me 
in 
their 
process 
• Blend 
‘long’ 
design, 
con'nuous 
improvement 
AND 
split 
tests 
• Make 
op'misa'on 
the 
engine 
of 
change, 
not 
the 
slave 
of 
ego 
* Single Large Expensive Doomed Developments
THE FUTURE OF TESTING
Thank You! 
Mail : sullivac@gmail.com 
Deck : slideshare.com/sullivac 
Linkedin : linkd.in/pvrg14

Craig Sullivan - Oh Boy! These A/B tests look like total bullshit! MKTFEST 2014

  • 1.
    Oh Boy! TheseA/B tests look like total bullshit! @OptimiseOrDie
  • 2.
    @OptimiseOrDie • UX,Analytics, Split Testing and Growth Rate Optimisation • Started doing testing & CRO 2004 • Split tested over 40M visitors in 19 languages • 60+ mistakes I MADE with AB testing • Like riding a bike… • Want to optimise your optimisation? Get in touch!
  • 4.
    Top Tes'ng F***ups for 2014 1. Tes'ng in the wrong place 2. Your hypothesis inputs are crap 3. No analy'cs integra'on 4. Your test will finish a=er you die 5. You don’t test for long enough 6. You peek before it’s ready 7. No QA for your split test 8. Opportuni'es are not priori'sed 9. Tes'ng cycles are too slow 10. You don’t know when tests are ready 11. Your test fails 12. The test is ‘about the same’ 13. Test flips behaviour 14. Test keeps moving around 15. You run an A/A test and waste 'me 16. Nobody ‘feels’ the test 17. You forgot you were responsive 18. You forgot you had no traffic 19. You ran the wrong test type 20. You didn’t try all the flavours of tes'ng @OptimiseOrDie slidesha.re/1wBbZ9c
  • 5.
  • 6.
  • 7.
  • 8.
    Oppan Gangnam Style! @OptimiseOrDie 6.9M
  • 9.
  • 10.
  • 11.
  • 12.
    The 95% StoppingProblem • Many people use 95, 99% ‘confidence’ to stop • This value is unreliable • Read this Nature article : bit.ly/1dwk0if • You can hit 95% early in a test • If you stop, it could be a false result • Testing Tools need to be smarter about what they imply! • This 95% thingy – it’s the last signal you should use to stop a test • Let me explain @OptimiseOrDie
  • 13.
    False Positives andNegatives @OptimiseOrDie Scenario 1 Scenario 2 Scenario 3 Scenario 4 A"er 200 observa-ons Insignificant Insignificant Significant! Significant! A"er 500 observa-ons Insignificant Significant! Insignificant Significant! End of experiment Insignificant Significant! Insignificant Significant! Scenario 1 Scenario 2 Scenario 3 Scenario 4 A"er 200 observa-ons Insignificant Insignificant Significant! Significant! A"er 500 observa-ons Insignificant Significant! trial stopped trial stopped End of experiment Insignificant Significant! Significant! Significant!
  • 14.
    A B 62.5cm +/- 1cm @OptimiseOrDie 9.1% ± 0.5 9.3% ± 0.5 9.1% ± 0.2 9.3% ± 0.2 9.1% ± 0.1 9.3% ± 0.1
  • 15.
    AB Testing VisualisationTool @OptimiseOrDie abtestguide.com/calc/
  • 16.
    The 95% StoppingProblem “You should know that stopping a test once it’s significant is deadly sin number 1 in A/B testing land. 77% of A/A tests (testing the same thing as A and B) will reach significance at a certain point.” Ton Wesseling, Online Dialogue “I always tell people that you need a representative sample if your data needs to be valid. What does ‘representative’ mean? First of all you need to include all the weekdays and weekends. You need different weather, because it impacts buyer behaviour. But most important: Your traffic needs to have all traffic sources, especially newsletter, special campaigns, TV,… everything!” Andre Morys, Web Arts
  • 17.
    Three Articles youMUST read “Statistical Significance does not equal Validity” http://bit.ly/1wMfmY2 “Why every Internet Marketer should be a Statistician” http://bit.ly/1wMfs1G “Understanding the Cycles in your site” http://mklnd.com/1pGSOUP
  • 18.
    Business & PurchaseCycles @OptimiseOrDie Start Test Finish Avg Cycle • Customers change • Your traffic mix changes • Markets, competitors • Be aware of all the waves • Always test whole cycles • Minimum 2 cycles (wk/mo) • Don’t exclude slower buyers
  • 19.
  • 20.
    • TWO BUSINESSCYCLES minimum (week/mo) • 1 PURCHASE CYCLE minimum • 250 CONVERSIONS minimum per creative (e.g. checkouts) • 350 & MORE! if response is very similar • FULL WEEKS/CYCLES never part of one • KNOW what marketing, competitors and cycles are doing • RUN a test length calculator - bit.ly/XqCxuu • SET your test run time , RUN IT, STOP IT, ANALYSE IT • ONLY RUN LONGER if you need more data • DON’T RUN LONGER just because the test isn’t giving the result you want! @OptimiseOrDie How Long? Simple Rules to follow
  • 21.
    Oops! No QAtesting for the AB test!
  • 22.
    QA Test or lose loads of MONEY!!! • Over 40% of AB tests I’ve worked on were broken (some seriously) • I’ve also found over £20M p.a. of browser bugs in the last 18 months • It’s very easy to break or bias your tes'ng Browser testing www.crossbrowsertesting.com www.browserstack.com www.spoon.net www.saucelabs.com www.multibrowserviewer.com Mobile devices www.appthwack.com www.deviceanywhere.com www.opendevicelab.com Read this article bit.ly/1wBccsJ @OptimiseOrDie
  • 23.
    Gamble the CompanyAWAY! • I get 60-65% right • UX and Copywriters good at picking! • C level execs are easy marks • Ironically, many decide ‘designs’ • You need collaborative test design • It’s a team game, with customers • Flip a coin, anyone?
  • 24.
  • 25.
    2004 Headspace WhatI thought I knew in 2004 Reality
  • 26.
    2014 Headspace WhatI KNOW I know Me, on a good day
  • 27.
  • 28.
  • 29.
    The Blind Octopus @OptimiseOrDie
  • 30.
    Business Future Testing? Congratulations! Today you’re the lucky winner of our random awards programme. You get all these extra features for free, on us. Enjoy. Mr D. Vader
  • 31.
    #1 : CULTURE • Smart Talented Polymath People • Flexible and Agile ‘One Team’ approach • Smash the Silos • Proper Agile, Rapid, Iterative The 5 Legged Optimisation Barstool
  • 32.
  • 33.
    @OptimiseOrDie #2 :Analytics Investment (TOOLS, PEOPLE, DEV TIME)
  • 34.
    @OptimiseOrDie #3 :Expensive and tedious UX research?
  • 35.
    @OptimiseOrDie #3 :LCorows sC Coshta, nRnemel,o Mteu, lRtia Dpiedv iUcXe Dreisaerayr Scthu dies
  • 36.
    #4 : PERSUASIVECOPYWRITING “On the average, five times as many people read the headline as read the body copy. When you have written your headline, you have spent eighty cents out of your dollar.” David Ogilvy “In 9 years and 40M split tests with visitors, the majority of my testing success came from playing with the words.” @OptimiseOrDie
  • 37.
    • Google ContentExperiments bit.ly/Ljg7Ds • Optimizely www.optimizely.com • Visual Website Optimizer www.visualwebsiteoptimizer.com • Multi Armed Bandit Explanation bit.ly/Xa80O8 • New Machine Learning Tools www.conductrics.com @OptimiseOrDie #5 : Split Testing Tools
  • 38.
    The 5 LeggedOptimisation Barstool @OptimiseOrDie #1 Culture & Team #2 Toolkit & Analytics investment #3 UX, CX, Service Design, Insight #4 Persuasive Copywriting #5 Experimentation (testing) tools
  • 39.
  • 40.
  • 41.
  • 42.
    #5 : FINDSTUFF @OptimiseOrDie @danbarker Analytics @fastbloke Analytics @timlb Analytics @jamesgurd Analytics @therustybear Analytics @carmenmardiros Analytics @davechaffey Analytics @priteshpatel9 Analytics @cutroni Analytics @avinash Analytics @Aschottmuller Analytics, CRO @cartmetrix Analytics, CRO @Kissmetrics CRO / UX @Unbounce CRO / UX @Morys CRO / Neuro @UXFeeds UX / Neuro @Psyblog Neuro @Gfiorelli1 SEO / Analytics @PeepLaja CRO @TheGrok CRO @UIE UX @LukeW UX / Forms @cjforms UX / Forms @axbom UX @iatv UX @Chudders Photo UX @JeffreyGroks Innovation @StephanieRieger Innovation @BrianSolis Innovation @DrEscotet Neuro @TheBrainLady Neuro @RogerDooley Neuro @Cugelman Neuro @Smashingmag Dev / UX @uxmag UX @Webtrends UX / CRO
  • 43.
    #5 : LEARNSTUFF @OptimiseOrDie Baymard.com Lukew.com Smashingmagazine.com ConversionXL.com Medium.com Whichtestwon.com Unbounce.com Measuringusability.com RogerDooley.com Kissmetrics.com Uxmatters.com Smartinsights.com Econsultancy.com Cutroni.com www.GetMentalNotes.com
  • 44.
    #12 : TheBest Companies… • Invest con'nually in analy'cs instrumenta'on, tools, people • Use an Agile, itera've, cross-­‐silo, one team project culture • Prefer collabora've tools to having lots of mee'ngs • Priori'se development based on numbers and insight • Prac'ce real con'nuous product improvement, not SLEDD* • Are fixing bugs, cru=, bad stuff as well as op'mising • Source photos and content that support persuasion and u'lity • Have cross channel, cross device design, tes'ng and QA • Segment their data for valuable insights, every test or change • Con'nually reduce cycle (itera'on) 'me in their process • Blend ‘long’ design, con'nuous improvement AND split tests • Make op'misa'on the engine of change, not the slave of ego * Single Large Expensive Doomed Developments
  • 45.
  • 46.
    Thank You! Mail: sullivac@gmail.com Deck : slideshare.com/sullivac Linkedin : linkd.in/pvrg14