Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Craig Sullivan - Oh Boy! These A/B tests look like total bullshit! MKTFEST 2014


Published on

Get videos from all our lectures -

Marketing Festival - World-Class Digital Marketing Event #mktfest Czech Republic

Published in: Business
  • I went from getting $3 surveys to $500 surveys every day!! learn more... ■■■
    Are you sure you want to  Yes  No
    Your message goes here

Craig Sullivan - Oh Boy! These A/B tests look like total bullshit! MKTFEST 2014

  1. 1. Oh Boy! These A/B tests look like total bullshit! @OptimiseOrDie
  2. 2. @OptimiseOrDie • UX, Analytics, Split Testing and Growth Rate Optimisation • Started doing testing & CRO 2004 • Split tested over 40M visitors in 19 languages • 60+ mistakes I MADE with AB testing • Like riding a bike… • Want to optimise your optimisation? Get in touch!
  3. 3. Top Tes'ng F***ups for 2014 1. Tes'ng in the wrong place 2. Your hypothesis inputs are crap 3. No analy'cs integra'on 4. Your test will finish a=er you die 5. You don’t test for long enough 6. You peek before it’s ready 7. No QA for your split test 8. Opportuni'es are not priori'sed 9. Tes'ng cycles are too slow 10. You don’t know when tests are ready 11. Your test fails 12. The test is ‘about the same’ 13. Test flips behaviour 14. Test keeps moving around 15. You run an A/A test and waste 'me 16. Nobody ‘feels’ the test 17. You forgot you were responsive 18. You forgot you had no traffic 19. You ran the wrong test type 20. You didn’t try all the flavours of tes'ng @OptimiseOrDie
  4. 4. #fail @OptimiseOrDie
  5. 5. @OptimiseOrDie 26.6M
  6. 6. @OptimiseOrDie 28.4M
  7. 7. Oppan Gangnam Style! @OptimiseOrDie 6.9M
  8. 8. @OptimiseOrDie
  9. 9. @OptimiseOrDie
  10. 10. @OptimiseOrDie
  11. 11. The 95% Stopping Problem • Many people use 95, 99% ‘confidence’ to stop • This value is unreliable • Read this Nature article : • You can hit 95% early in a test • If you stop, it could be a false result • Testing Tools need to be smarter about what they imply! • This 95% thingy – it’s the last signal you should use to stop a test • Let me explain @OptimiseOrDie
  12. 12. False Positives and Negatives @OptimiseOrDie Scenario 1 Scenario 2 Scenario 3 Scenario 4 A"er 200 observa-ons Insignificant Insignificant Significant! Significant! A"er 500 observa-ons Insignificant Significant! Insignificant Significant! End of experiment Insignificant Significant! Insignificant Significant! Scenario 1 Scenario 2 Scenario 3 Scenario 4 A"er 200 observa-ons Insignificant Insignificant Significant! Significant! A"er 500 observa-ons Insignificant Significant! trial stopped trial stopped End of experiment Insignificant Significant! Significant! Significant!
  13. 13. A B 62.5cm +/- 1cm @OptimiseOrDie 9.1% ± 0.5 9.3% ± 0.5 9.1% ± 0.2 9.3% ± 0.2 9.1% ± 0.1 9.3% ± 0.1
  14. 14. AB Testing Visualisation Tool @OptimiseOrDie
  15. 15. The 95% Stopping Problem “You should know that stopping a test once it’s significant is deadly sin number 1 in A/B testing land. 77% of A/A tests (testing the same thing as A and B) will reach significance at a certain point.” Ton Wesseling, Online Dialogue “I always tell people that you need a representative sample if your data needs to be valid. What does ‘representative’ mean? First of all you need to include all the weekdays and weekends. You need different weather, because it impacts buyer behaviour. But most important: Your traffic needs to have all traffic sources, especially newsletter, special campaigns, TV,… everything!” Andre Morys, Web Arts
  16. 16. Three Articles you MUST read “Statistical Significance does not equal Validity” “Why every Internet Marketer should be a Statistician” “Understanding the Cycles in your site”
  17. 17. Business & Purchase Cycles @OptimiseOrDie Start Test Finish Avg Cycle • Customers change • Your traffic mix changes • Markets, competitors • Be aware of all the waves • Always test whole cycles • Minimum 2 cycles (wk/mo) • Don’t exclude slower buyers
  18. 18. 19
  19. 19. • TWO BUSINESS CYCLES minimum (week/mo) • 1 PURCHASE CYCLE minimum • 250 CONVERSIONS minimum per creative (e.g. checkouts) • 350 & MORE! if response is very similar • FULL WEEKS/CYCLES never part of one • KNOW what marketing, competitors and cycles are doing • RUN a test length calculator - • SET your test run time , RUN IT, STOP IT, ANALYSE IT • ONLY RUN LONGER if you need more data • DON’T RUN LONGER just because the test isn’t giving the result you want! @OptimiseOrDie How Long? Simple Rules to follow
  20. 20. Oops! No QA testing for the AB test!
  21. 21. QA Test or lose loads of MONEY!!! • Over 40% of AB tests I’ve worked on were broken (some seriously) • I’ve also found over £20M p.a. of browser bugs in the last 18 months • It’s very easy to break or bias your tes'ng Browser testing Mobile devices Read this article @OptimiseOrDie
  22. 22. Gamble the Company AWAY! • I get 60-65% right • UX and Copywriters good at picking! • C level execs are easy marks • Ironically, many decide ‘designs’ • You need collaborative test design • It’s a team game, with customers • Flip a coin, anyone?
  24. 24. 2004 Headspace What I thought I knew in 2004 Reality
  25. 25. 2014 Headspace What I KNOW I know Me, on a good day
  26. 26. Guessaholics Anonymous
  27. 27. Rumsfeldian Space
  28. 28. The Blind Octopus @OptimiseOrDie
  29. 29. Business Future Testing? Congratulations! Today you’re the lucky winner of our random awards programme. You get all these extra features for free, on us. Enjoy. Mr D. Vader
  30. 30. #1 : CULTURE • Smart Talented Polymath People • Flexible and Agile ‘One Team’ approach • Smash the Silos • Proper Agile, Rapid, Iterative The 5 Legged Optimisation Barstool
  31. 31. Fittest? Agile! @OptimiseOrDie
  32. 32. @OptimiseOrDie #2 : Analytics Investment (TOOLS, PEOPLE, DEV TIME)
  33. 33. @OptimiseOrDie #3 : Expensive and tedious UX research?
  34. 34. @OptimiseOrDie #3 : LCorows sC Coshta, nRnemel,o Mteu, lRtia Dpiedv iUcXe Dreisaerayr Scthu dies
  35. 35. #4 : PERSUASIVE COPYWRITING “On the average, five times as many people read the headline as read the body copy. When you have written your headline, you have spent eighty cents out of your dollar.” David Ogilvy “In 9 years and 40M split tests with visitors, the majority of my testing success came from playing with the words.” @OptimiseOrDie
  36. 36. • Google Content Experiments • Optimizely • Visual Website Optimizer • Multi Armed Bandit Explanation • New Machine Learning Tools @OptimiseOrDie #5 : Split Testing Tools
  37. 37. The 5 Legged Optimisation Barstool @OptimiseOrDie #1 Culture & Team #2 Toolkit & Analytics investment #3 UX, CX, Service Design, Insight #4 Persuasive Copywriting #5 Experimentation (testing) tools
  38. 38. READ STUFF
  39. 39. READ STUFF
  40. 40. READ STUFF
  41. 41. #5 : FIND STUFF @OptimiseOrDie @danbarker Analytics @fastbloke Analytics @timlb Analytics @jamesgurd Analytics @therustybear Analytics @carmenmardiros Analytics @davechaffey Analytics @priteshpatel9 Analytics @cutroni Analytics @avinash Analytics @Aschottmuller Analytics, CRO @cartmetrix Analytics, CRO @Kissmetrics CRO / UX @Unbounce CRO / UX @Morys CRO / Neuro @UXFeeds UX / Neuro @Psyblog Neuro @Gfiorelli1 SEO / Analytics @PeepLaja CRO @TheGrok CRO @UIE UX @LukeW UX / Forms @cjforms UX / Forms @axbom UX @iatv UX @Chudders Photo UX @JeffreyGroks Innovation @StephanieRieger Innovation @BrianSolis Innovation @DrEscotet Neuro @TheBrainLady Neuro @RogerDooley Neuro @Cugelman Neuro @Smashingmag Dev / UX @uxmag UX @Webtrends UX / CRO
  42. 42. #5 : LEARN STUFF @OptimiseOrDie
  43. 43. #12 : The Best Companies… • Invest con'nually in analy'cs instrumenta'on, tools, people • Use an Agile, itera've, cross-­‐silo, one team project culture • Prefer collabora've tools to having lots of mee'ngs • Priori'se development based on numbers and insight • Prac'ce real con'nuous product improvement, not SLEDD* • Are fixing bugs, cru=, bad stuff as well as op'mising • Source photos and content that support persuasion and u'lity • Have cross channel, cross device design, tes'ng and QA • Segment their data for valuable insights, every test or change • Con'nually reduce cycle (itera'on) 'me in their process • Blend ‘long’ design, con'nuous improvement AND split tests • Make op'misa'on the engine of change, not the slave of ego * Single Large Expensive Doomed Developments
  45. 45. Thank You! Mail : Deck : Linkedin :