Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

20 Ways to Shaft your Split Tesring : Conversion Conference


Published on

This talk is the latest deck showing common problems that will easily break or skew your ab and multivariate testing results. Avoid these problems by following the simple advice in this deck!

Published in: Internet
  • Dating for everyone is here: ♥♥♥ ♥♥♥
    Are you sure you want to  Yes  No
    Your message goes here
  • Sex in your area is here: ❤❤❤ ❤❤❤
    Are you sure you want to  Yes  No
    Your message goes here
  • Positions Available Now! We currently have several openings for writing workers. ♣♣♣
    Are you sure you want to  Yes  No
    Your message goes here
  • A 7 Time Lotto Winner Stepped Up to Share His Secrets With YOU ■■■
    Are you sure you want to  Yes  No
    Your message goes here
  • Do This Simple 2-Minute Ritual To Loss 1 Pound Of Belly Fat Every 72 Hours ➤➤
    Are you sure you want to  Yes  No
    Your message goes here

20 Ways to Shaft your Split Tesring : Conversion Conference

  1. 1. 20 ways to Shaft your Split testing @OptimiseOrDie
  2. 2. @OptimiseOrDi e • UX, Analytics, Testing and Innovation • Started doing testing & CRO 2004 • Split tested over 40M visitors in 19 languages • 60+ mistakes with AB testing • I’ve made every one of them • Like riding a bike… • Get in touch for workshops, skill transfer, CRO methodology design, training and programme mentoring…
  3. 3. @OptimiseOrDie Hands on!
  4. 4. AB Test Hype Cycle Zen Plumbing @OptimiseOrDie Timeline Tested stupid ideas, lots Most AB or MVT tests are bullshit Discovered AB testing Triage, Triangulation, Prioritisation, Maths
  5. 5. @OptimiseOrDie
  6. 6. @OptimiseOrDie
  7. 7. Oppan Gangnam Style! @OptimiseOrDie
  8. 8. #1 : You’re doing it in the wrong place @OptimiseOrDie
  9. 9. #1 : You’re doing it in the wrong place There are 4 areas a CRO expert always looks at: 1. Inbound attrition (medium, source, landing page, keyword, intent and many more…) 2. Key conversion points (product, basket, registration) 3. Processes, lifecycles and steps (forms, logins, registration, checkout, onboarding, emails, push) 4. Layers of engagement (search, category, product, add) 1. Use visitor flow reports for attrition – very useful. 2. For key conversion points, look at loss rates & interactions 3. Processes and steps – look at funnels or make your own 4. Layers and engagement – make a ring model @OptimiseOrDie
  10. 10. Examples – Concept Bounce Engage Outcome @OptimiseOrDie
  11. 11. Examples – Bounce Login to Account Content Engage Start Application Type and Details Eligibility Photo Complete @OptimiseOrDie
  12. 12. Examples – Guide Dogs Bounce Content Engage Donation Pathway Donation Page Starts process Funnel steps Complete @OptimiseOrDie
  13. 13. Within a layer Page 1 Page 2 Page 3 Page 4 Page 5 Exit Deeper Layer Email Wishlist Contact Like Micro Conversions @OptimiseOrDie
  14. 14. #1 : Make a Money Model • Get to know the flow and loss (leaks) inbound, inside and through key processes or conversion points. • Once you know the key steps you’re losing people at and how much traffic you have – make a money model. • 20,000 see the basket page – what’s the basket page to checkout page ratio? • Estimate how much you think you can shift the key metric (e.g. basket adds, basket -> checkout) • What downstream revenue or profit would that generate? • Sort by the money column • Congratulations – you’ve now built the worlds first IT plan for growth with a return on investment estimate attached! • I’ll talk more about prioritising later – but a good real world analogy for you to use: @OptimiseOrDie
  15. 15. Think like a store owner! If you can’t refurbish the entire store, which floors or departments will you invest in optimising? Wherever there is: • Footfall • Low return @OptimiseOrDie
  16. 16. #2 : Your hypothesis is crap! Insight - Inputs #FAIL Competitor copying Guessing Dice rolling Panic Competitor change An article the CEO read Ego Opinion Cherished notions Marketing whims Cosmic rays Not ‘on brand’ enough IT inflexibility Internal company needs Some dumbass consultant Shiny feature blindness Knee jerk reactons @OptimiseOrDie
  17. 17. #2 : These are the inputs you need… Insight - Inputs Insight Eye tracking Segmentation Surveys Sales and Call Centre Customer contact Social analytics Session Replay Usability testing Forms analytics Search analytics Voice of Customer Market research A/B and MVT testing Big & unstructured data Web analytics Competitor Customer evals services @OptimiseOrDie
  18. 18. Insight - Inputs @OptimiseOrDie #2 : Brainstorming the test • Check your inputs • Assemble the widest possible team • Share your data and research • Design Emotive Writing guidelines
  19. 19. Insight - Inputs @OptimiseOrDie #2 : Emotive Writing - example Customers do not know what to do and need support and advice • Emphasize the fact that you understand that their situation is stressful • Emphasize your expertise and leadership in vehicle glazing and will help them get the best solution for their situation • Explain what they will need to do online and during the call-back so that they know what the next steps will be • Explain that they will be able ask any other questions they might have during the call-back Customers do not feel confident in assessing the damage • Emphasize the fact that you will help them assess the damage correctly online Customers need to understand the benefits of booking online • Emphasize that the online booking system is quick, easy and provides all the information they need in regards with their appointment and general cost information Customers mistrust insurers and find dealing with their insurance situation very frustrating • Where possible communicate the fact that the job is most likely to be free for insured customers, or good value for money for cash customers • Show that you understand the hassle of dealing with insurance companies – emphasise that you will help with their insurance paperwork for them, freeing them of this burden Some customers cannot be bothered to take action to fix their car glass • Emphasize the consequences of not doing anything, e.g. ‘It’s going to cost you more if the chip develops into a crack’
  20. 20. Insight - Inputs @OptimiseOrDie #2 : THE DARK SIDE “Keep your family safe and get back on the road fast with Autoglass.”
  21. 21. Insight - Inputs @OptimiseOrDie #2 : NOW YOU CAN BEGIN • You should have inputs, research, data, guidelines • Sit down with the team and prompt with 12 questions: – Who is this page (or process) for? – What problem does this solve for the user? – How do we know they need it? – What is the primary action we want people to take? – What might prompt the user to take this action? – How will we know if this is doing what we want it to do? – How do people get to this page? – How long are people here on this page? – What can we remove from this page? – How can we test this solution with people? – How are we solving the users needs in different and better ways than other places on our site? – If this is a homepage, ask these too (
  22. 22. Insight - Inputs @OptimiseOrDie #2 : PROMPT YOURSELF • Check your UX or Copywriting guidelines. • Use Get Mental Notes • What levers can we apply now? • Create a hypothesis: “WE BELIEVE THAT DOING [A] FOR PEOPLE [B] WILL MAKE OUTCOME [C] HAPPEN. WE'LL KNOW THIS WHEN WE SEE DATA [D] AND FEEDBACK [E]”
  23. 23. Insight - Inputs @OptimiseOrDie #2 : THE FUN BIT! • Collaborative Sketching • Brainwriting • Refine and Test!
  24. 24. We believe that doing [A] for People [B] will make outcome [C] happen. We’ll know this when we observe data [D] and obtain feedback [E]. (reverse) @OptimiseOrDie
  25. 25. #2 : Solutions • You need multiple tool inputs – Tool decks are here : • Collaborative, Customer connected team – If you’re not doing this, you’re hosed • Session replay tools provide vital input – Get vital additional customer evidence • Simple page Analytics don’t cut it – Invest in your analytics, especially event tracking • Ego, Opinion, Cherished notions – fill gaps – Fill these vacuums with insights and data • Champion the user – Give them a chair at every meeting @OptimiseOrDie
  26. 26. #2 : HYPOTHESIS DESIGN SUMMARY Insight - Inputs @OptimiseOrDie • Inputs – get the right stuff • Research, Guidelines, Data • Framing the problem(s) • Questions to get you going • Use card prompts for Psychology • Create a hypothesis • Collaborative Sketching • Brainwriting • Refine and Check Hypothesis • Instrument and Test
  27. 27. #3 : No analytics integration • Investigating problems with tests • Segmentation of results • Tests that fail, flip or move around • Tests that don’t make sense • Broken test setups • What drives the averages you see? @OptimiseOrDie
  28. 28. 29 A B B A
  29. 29. These Danish porn sites are so hardcore! We’re still waiting for our AB tests to finish! #4 : The test will finish after you die • Use a test length calculator like this one: •
  30. 30. @OptimiseOrDie #5 : You get false results
  31. 31. The 95% Stopping Problem • Many people use 95, 99% ‘confidence’ to stop • This value is unreliable • Read this Nature article : • You can hit 95% early in a test • If you stop, it could be a false positive • Tools need to be smarter about inference • This 95% thingy – it’s last on your list for reasons to stop testing • Let me explain @OptimiseOrDie
  32. 32. #5 : When to stop • Self stopping is a huge problem: – “I stopped the test when it looked good” – “It hit 20% on Thursday, so I figured – time to cut and run” – “We need test time for something else. Looks good to us” – “We’ve got a big sample now so why not finish it today?” • False Positives and Negatives – If you cut part of a business cycle, you bias the segments you have in the test. – So if you ignore weekend shoppers by stopping your test on Friday, that will affect results – The other problems is FALSE POSITIVES and FALSE NEGATIVES @OptimiseOrDie
  33. 33. #5 : When to stop Scenario 1 Scenario 2 Scenario 3 Scenario 4 @OptimiseOrDie After 200 observations Insignificant Insignificant Significant! Significant! After 500 observations Insignificant Significant! Insignificant Significant! End of experiment Insignificant Significant! Insignificant Significant! Scenario 1 Scenario 2 Scenario 3 Scenario 4 After 200 observations Insignificant Insignificant Significant! Significant! After 500 observations Insignificant Significant! trial stopped trial stopped End of experiment Insignificant Significant! Significant! Significant!
  34. 34. @OptimiseOrDie The 95% Stopping Problem
  35. 35. The 95% Stopping Problem @OptimiseOrDie
  36. 36. The 95% Stopping Problem @OptimiseOrDie
  37. 37. 62.5cm +/- 1cm @OptimiseOrDie 9.1% ± 0.5 9.3% ± 0.5 9.1% ± 0.2 9.3% ± 0.2 9.1% ± 0.1 9.3% ± 0.1
  38. 38. Graph is a range, not a line: 9.1 ± 1.9% 9.1 ± 0.9% 9.1 ± 0.3%
  39. 39. The 95% Stopping Problem “You should know that stopping a test once it’s significant is deadly sin number 1 in A/B testing land. 77% of A/A tests (testing the same thing as A and B) will reach significance at a certain point.” Ton Wesseling, Online Dialogue “I always tell people that you need a representative sample if your data needs to be valid. What does ‘representative’ mean? First of all you need to include all the weekdays and weekends. You need different weather, because it impacts buyer behaviour. But most important: Your traffic needs to have all traffic sources, especially newsletter, special campaigns, TV,… everything! The longer the test runs, the more insights you get. Andre Morys, Web Arts
  40. 40. Three Articles you MUST read “Statistical Significance does not equal Validity” “Why every Internet Marketer should be a Statistician” “Understanding the Cycles in your site”
  41. 41. Business & Purchase Cycles @OptimiseOrDie Start Test Finish Avg Cycle • Customers change • Your traffic mix changes • Markets, competitors • Be aware of all the waves • Always test whole cycles • Minimum 2 cycles (wk/mo) • Don’t exclude slower buyers
  42. 42. When to stop? • MINIMUM two business cycles (week/mo.) • MINIMUM of 1 purchase cycle • MINIMUM 250 outcomes/conversions per creative • MORE if relative difference is low • ALWAYS test full weeks • KNOW what marketing and cycles are doing • RUN a test length calculator - • SET your test run time • Run it • Stop it • Analyse the data • When do I run over? Not enough data… @OptimiseOrDie
  43. 43. 44 #6 : You peek too early!
  44. 44. #6 : The early stages of a test… • Ignore the graphs. Don’t draw conclusions. Don’t dance. Calm down. • Get a feel for the test but don’t do anything yet! • Remember – in A/B - 50% of returning visitors will see a new shiny website! • Until your test has had at least 2 business cycles and 250+ outcomes, don’t bother even getting remotely excited! • Watching regularly is good though. You’re looking for anything that looks really odd – if everyone is looking (but not concluding) then oddities will get spotted. • All tests move around or show big swings early in the testing cycle. Here is a very high traffic site – it still takes 10 days to start settling. Lower traffic sites will stretch this period further. 45
  45. 45. #7 : No QA testing for the AB test?
  46. 46. #7 – BIG SECRET! • Over 40% of tests have had QA issues. • Over £20M in browser conversion issues! Browser testing Tablets & Mobiles FREE Device lab! @OptimiseOrDie
  47. 47. #7 : What other QA testing should I do? • Testing from several locations (office, home, elsewhere) • Testing the IP filtering is set up • Test tags are firing correctly (analytics and the test tool) • Test as a repeat visitor and check session timeouts • Cross check figures from 2+ sources • Monitor closely from launch, recheck, watch • WATCH FOR BIAS! @OptimiseOrDie
  48. 48. #8 : Tests are random and not prioritised Once you have a list of potential test areas, rank them by opportunity vs. effort. The common ranking metrics that I use include: •Opportunity (revenue, impact) •Dev resource •Time to market •Risk / Complexity Make yourself a quadrant
  49. 49. #9 : Velocity or Scope problems 0 6 12 18 Months Conversio n @OptimiseOrDie
  50. 50. #9 : Widen the optimisation scope @OptimiseOrDie
  51. 51. #9 : Solutions • Give Priority Boarding for opportunities – The best seats reserved for metric shifters • Release more often to close the gap – More testing resource helps, analytics ‘hawk eye’ • Kaizen – continuous improvement – Others call it JFDI (just f***ing do it) • Make changes AS WELL as tests, basically! – These small things add up as well as compounding effort • Run simultaneous tests – With analytics integration, decoding this becomes easy • Online Hair Booking – over 100 tiny tweaks – No functional changes at all – 37% improvement • Completed in-between product releases – The added lift for 10 days work, worth 360k @OptimiseOrDie
  52. 52. 53 #11 : Your test fails @OptimiseOrDie
  53. 53. #11: Your test fails • Learn from the failure! If you can’t learn from the failure, you’ve designed a crap test. • Next time you design, imagine all your stuff failing. What would you do? If you don’t know or you’re not sure, get it changed so that a negative becomes insightful. • So : failure itself at a creative or variable level should tell you something. • On a failed test, always analyse the segmentation and analytics • One or more segments will be over and under • Check for varied performance • Now add the failure info to your Knowledge Base: • Look at it carefully – what does the failure tell you? Which element do you think drove the failure? • If you know what failed (e.g. making the price bigger) then you have very useful information • You turned the handle the wrong way • Now brainstorm a new test @OptimiseOrDie
  54. 54. #12 : The test is ‘about the same’ • Analyse the segmentation • Check the analytics and instrumentation • One or more segments may be over and under • They may be cancelling out – the average is a lie • The segment level performance will help you (beware of small sample sizes) • If you genuinely have a test which failed to move any segments, it’s a crap test – be bolder • This usually happens when it isn’t bold or brave enough in shifting away from the original design, particularly on lower traffic sites • Get testing again! @OptimiseOrDie
  55. 55. #13 : The test keeps moving around • There are three reasons it is moving around – Your sample size (outcomes) is still too small – The external traffic mix, customers or reaction has suddenly changed or – Your inbound marketing driven traffic mix is completely volatile (very rare) • Check the sample size • Check all your marketing activity • Check the instrumentation • If no reason, check segmentation @OptimiseOrDie
  56. 56. #14 : The test has flipped on me • Something like this can happen: • Check your sample size. If it’s still small, then expect this until the test settles. • If the test does genuinely flip – and quite severely – then something has changed with the traffic mix, the customer base or your advertising. Maybe the PPC budget ran out? Seriously! • To analyse a flipped test, you’ll need to check your segmented data. This is why you have a split testing package AND an analytics system. • The segmented data will help you to identify the source of the shift in response to your test. I rarely get a flipped one and it’s always something
  57. 57. • No – and this is why: – It’s a waste of time – It’s easier to test and monitor instead – You are eating into test time – Also applies to A/A/B/B testing – A/B/A running at 25%/50%/25% is the best • Read my post here : 58 #15 : Should I run an A/A test first
  58. 58. #16 : Nobody feels the test • You promised a 25% rise in checkouts - you only see 2% • Traffic, Advertising, Marketing may have changed • Check they’re using the same precise metrics • Run a calibration exercise • I often leave a 5 or 10% stub running in a test • This tracks old creative once new one goes live • If conversion is also down for that one, BINGO! • Remember – the AB test is an estimate – it doesn’t precisely record future performance • This is why infrequent testing is bad • Always be trying a new test instead of basking in the glory of one you ran 6 months ago. You’re only as good as your next test. @OptimiseOrDie
  59. 59. #17 : You forgot about Mobile & Tablet • If you’re AB testing a responsive site, pay attention • Content will break differently on many screens • Know thy users and their devices • Use bango or google analytics to define a test list • Make sure you test mobile devices & viewports • What looks good on your desk may not be for the user • Harder to design cross device tests • You’ll need to segment mobile, tablet & desktop response in the analytics or AB testing package • Your personal phone is not a device mix • Ask me about making your device list • Buy core devices, rent the rest from @OptimiseOrDie
  60. 60. #18 : Oh shit – no traffic • If small volumes, contact customers – reach out. • If data volumes aren’t there, there are still customers! • Drive design from levers you can apply – game the system • Pick clean and simple clusters of change (hypothesis driven) • Use a goal at an earlier ring stage or funnel step • Beware of using clickthroughs when attrition is high on the other side • Try before and after testing on identical time periods (measure in analytics model) • Be careful about small sample sizes (<100 outcomes) • Are you working automated emails? • Fix JFDI, performance and UX issues too!
  61. 61. #18 : Oh shit – no traffic • Forget MVT or A/B/N tests – run your numbers • Test things with high impact – don’t be a wuss! • Use UX, Session Replay to aid insight • Run a task gap survey (4Q style) • Run a dropped basket survey (LF style) • Run a general survey + check social + other sites • Run sitewide tests that appear on all pages or large clusters of pages – • UVPs (“We are a cool brand”), USPs (“Free returns!”), UCPs (“10% off today”). • Headers, Footers, Nudge Bars, USP bars, footer changes, Navigation, Product pages, Delivery info etc.
  62. 62. #19 : I chose the wrong test type • A/B testing – good for: – A single change of content or design layout – A group of related changes (e.g. payment security) – Finding a new and radical shift for a template design – Lower traffic pages or shorter test times • Multivariate testing – good for: – Higher traffic pages – Groups of unrelated changes (e.g. delivery & security) – Multiple content or design style changes – Finding specific drivers of test lifts – Testing multiple versions (e.g. click here, book now, go) – Where you need to understand strong and weak cross variable interactions – Don’t use to settle arguments or sloppy thinking!
  63. 63. Netherlands A/B Shift Example Previous winner +7.25% +8.19% additional lift
  64. 64. #20 – Other flavours of testing • Micro testing (tiny change) – good for: – Proving to the boss that testing works – Demonstrating to IT that it works without impact – Showing the impact of a seemingly tiny change – Proof of concept before larger test • Funnel testing – good for: – Checkouts – Lead gen – Forms processes – Quotations – Any multi-step process with data entry • Fake it and Build it – good for: – Testing new business ideas – Trying out promotions on a test sample – Estimating impact before you build – Helps you calculate ROI – You can even split test entire server farms Vs.
  65. 65. #20 – Other flavours of testing “Congratulations! Today you’re the lucky winner of our random awards programme. You get all these extra features for free, on us. Enjoy.”
  66. 66. Top F***ups for 2014 1. Testing in the wrong place 2. Your hypothesis inputs are crap 3. No analytics integration 4. Your test will finish after you die 5. You don’t test for long enough 6. You peek before it’s ready 7. No QA for your split test 8. Opportunities are not prioritised 9. Testing cycles are too slow 10. You don’t know when tests are ready 11. Your test fails 12. The test is ‘about the same’ 13. Test flips behaviour 14. Test keeps moving around 15. You run an A/A test and waste time 16. Nobody ‘feels’ the test 17. You forgot you were responsive 18. You forgot you had no traffic 19. You ran the wrong test type 20. You didn’t try all the flavours of testing @OptimiseOrDie
  68. 68. 2004 Headspace What I thought I knew in 2004 Reality
  69. 69. 2014 Headspace What I know I know On a good day
  70. 70. Guessaholics Anonymous
  71. 71. Rumsfeldian Space @OptimiseOrDie
  72. 72. Rumsfeldian Space @OptimiseOrDie
  73. 73. #1 Smart Talented Polymath People The 5 Legged Optimisation Barstool @OptimiseOrDie Flexible and Agile teams
  74. 74. Fittest? Agile! @OptimiseOrDie
  75. 75. #2 : Analytics Investment (tools, people, dev time) @OptimiseOrDie
  76. 76. @OptimiseOrDie #3 : User research and insight
  77. 77. #3 : THE BEST IDEAS COME FROM? @OptimiseOrDie
  78. 78. #4 : GREAT COPYWRITING “On the average, five times as many people read the headline as read the body copy. When you have written your headline, you have spent eighty cents out of your dollar.” David Ogilvy “In 9 years and 40M split tests with visitors, the majority of my testing success came from playing with the words.” @OptimiseOrDie
  79. 79. • Google Content Experiments • Optimizely • Visual Website Optimizer • Multi Armed Bandit Explanation • New Machine Learning Tools @OptimiseOrDie #5 : Split Testing Tools
  80. 80. The 5 Legged Optimisation @OptimiseOrDie Barstool #1 Culture & Team #2 Toolkit & Analytics investment #3 UX, CX, Service Design, Insight #4 Persuasive Copywriting #5 Experimentation (testing) tools
  81. 81. READ STUFF
  82. 82. READ STUFF
  83. 83. READ STUFF
  84. 84. #5 : FIND STUFF @OptimiseOrDie @danbarker Analytics @fastbloke Analytics @timlb Analytics @jamesgurd Analytics @therustybear Analytics @carmenmardiros Analytics @davechaffey Analytics @priteshpatel9 Analytics @cutroni Analytics @avinash Analytics @Aschottmuller Analytics, CRO @cartmetrix Analytics, CRO @Kissmetrics CRO / UX @Unbounce CRO / UX @Morys CRO / Neuro @UXFeeds UX / Neuro @Psyblog Neuro @Gfiorelli1 SEO / Analytics @PeepLaja CRO @TheGrok CRO @UIE UX @LukeW UX / Forms @cjforms UX / Forms @axbom UX @iatv UX @Chudders Photo UX @JeffreyGroks Innovation @StephanieRieger Innovation @BrianSolis Innovation @DrEscotet Neuro @TheBrainLady Neuro @RogerDooley Neuro @Cugelman Neuro @Smashingmag Dev / UX @uxmag UX @Webtrends UX / CRO
  85. 85. #5 : LEARN STUFF @OptimiseOrDie
  86. 86. #12 : The Best Companies… • Invest continually in analytics instrumentation, tools, people • Use an Agile, iterative, cross-silo, one team project culture • Prefer collaborative tools to having lots of meetings • Prioritise development based on numbers and insight • Practice real continuous product improvement, not SLEDD* • Are fixing bugs, cruft, bad stuff as well as optimising • Source photos and content that support persuasion and utility • Have cross channel, cross device design, testing and QA • Segment their data for valuable insights, every test or change • Continually reduce cycle (iteration) time in their process • Blend ‘long’ design, continuous improvement AND split tests • Make optimisation the engine of change, not the slave of ego * Single Large Expensive Doomed Developments
  88. 88. Projects? Questions? Mail me! Mail : Deck : Linkedin : @OptimiseOrDie