Why do my AB tests suck? measurecamp


The top reasons and solutions for not getting value out of your AB tests - some practical tips for designing insightful and correctly instrumented test

  Why does my AB testing suck? 5th Sep 2013
  2. 2. #11 Top Split Test & CRO questions #1 How to choose test type! #2 Top CRO questions 2
  3. 3. 11.1 – How to choose the test type • Test complexity – this drains time! • Analytics – will this allow you to test before/after? • Money now or Precise data later • What stage are you at with the client? • How volatile is the data and traffic? • Are there huge non weekly patterns – seasonal? • A/B tests – Design shift, new baseline, local maxima • MVT – Need to know variables, client can’t settle • Small MVT – A/B + MVT benefits (2x2x2 or 2x2x4) • Traffic is your biggest factor here • Use the VWO test run calculator • Do a rough calculation yourself • Let’s look at some recent pages of yours
  4. 4. 11.2 – Top Conversion Questions • 32 questions, picked by Practitioners • I plan to record them all as a course! • What top stuff did I hear? “How long will my test take?” “When should I check the results?” “How do I know if it‟s ready?” 4
  5. 5. #1 The tennis court – Let’s say we want to estimate, on average, what height Roger Federer and Nadal hit the ball over the net at. So, let’s start the match: 5
  6. 6. First Set Federer 6-4 – We start to collect values 6 62cm +/- 2cm 63.5cm +/- 2cm
  7. 7. Second Set – Nadal 7-6 – Nadal starts sending them low over the net 7 62cm +/- 1cm 62.5cm +/- 1cm
  8. 8. Final Set Nadal 7-6 – We start to collect values 61.8cm +/- .3cm 62cm +/- .3cm
  9. 9. Let’s look at this a different way 9 62.5cm +/- 1cm 9.1 ± 0.3%
  10. 10. Graph is a range, not a line: 9.1 ± 0.3%
  11. 11. #1 Summary • The minimum length – 2 business cycles OR – 250, pref 350 outcomes in each – Calculate time to reach minimum – Work on footfall, not sitewide – So, if you get 1,000 visitors a day to a new test page – You convert at around 5% (50 checkouts) – You have two creatives – At current volumes, you’ll get ~25 checkouts a day for each creative – That means you need 14 days minimum – If they separate, it might take less (but business cycle rule kicks in) – If they don’t separate, it could take longer – Remember it’s a fuzzy region – not a precise point 11
  12. 12. #1 Summary • The minimum length – Depends on performance – If you test two shades of blue? – Traffic may change – PPC budget might run out – TV advertising may start – Segment level performance may drive – You can estimate a test length – you cannot predict it – Be aware of your marketing activity, always – Watch your test like a hyper-engaged chef 12
  13. 13. 11.3 – Are we there yet? Early test stages… • Ignore the graphs. Don’t draw conclusions. Don’t dance. Calm down. • Get a feel for the test but don’t do anything yet! • Remember – in A/B - 50% of returning visitors will see a new shiny website! • Until your test has had at least 1 business cycle and 250-350 outcomes, don’t bother drawing conclusions or getting excited! • You’re looking for anything that looks really odd – your analytics person should be checking all the figures until you’re satisfied • All tests move around or show big swings early in the testing cycle. Here is a very high traffic site – it still takes 10 days to start settling. Lower traffic sites will stretch this period further. 13
  14. 14. 11.4 – What happens when a test flips on me? • Something like this can happen: • Check your sample size. If it‟s still small, then expect this until the test settles. • If the test does genuinely flip – and quite severely – then something has changed with the traffic mix, the customer base or your advertising. Maybe the PPC budget ran out? Seriously! • To analyse a flipped test, you‟ll need to check your segmented data. This is why you have a split testing package AND an analytics system. • The segmented data will help you to identify the source of the shift in response to your test. I rarely get a flipped one and it‟s always something changing on me, without being told. The heartless bastards. 14
  15. 15. 11.5 – What happens if a test is still moving around? • There are three reasons it is moving around – Your sample size (outcomes) is still too small – The external traffic mix, customers or reaction has suddenly changed or – Your inbound marketing driven traffic mix is completely volatile (very rare) • Check the sample size • Check all your marketing activity • Check the instrumentation • If no reason, check segmentation 15
  16. 16. 11.6 – How do I know when it’s ready? • The hallmarks of a cooked test are: – It’s done at least 1 or 2 (preferred) cycles – You have at least 250-350 outcomes for each recipe – It’s not moving around hugely at creative or segment level performance – The test results are clear – even if the precise values are not – The intervals are not overlapping (much) – If a test is still moving around, you need to investigate – Always declare on a business cycle boundary – not the middle of a period (this introduces bias) – Don’t declare in the middle of a limited time period advertising campaign (e.g. TV, print, online) – Always test before and after large marketing campaigns (one week on, one week off) 16
  17. 17. 11.7 – What happens if it’s inconclusive? • Analyse the segmentation • One or more segments may be over and under • They may be cancelling out – the average is a lie • The segment level performance will help you (beware of small sample sizes) • If you genuinely have a test which failed to move any segments, it’s a crap test • This usually happens when it isn’t bold or brave enough in shifting away from the original design, particularly on lower traffic sites • Get testing again! 17
  18. 18. 11.8 – What QA testing should I do? • Cross Browser Testing • Testing from several locations (office, home, elsewhere) • Testing the IP filtering is set up • Test tags are firing correctly (analytics and the test tool) • Test as a repeat visitor and check session timeouts • Cross check figures from 2+ sources • Monitor closely from launch, recheck 18
  19. 19. 11.9 – What happens if it fails? • Learn from the failure • If you can’t learn from the failure, you’ve designed a crap test. Next time you design, imagine all your stuff failing. What would you do? If you don’t know or you’re not sure, get it changed so that a negative becomes useful. • So : failure itself at a creative or variable level should tell you something. • On a failed test, always analyse the segmentation • One or more segments will be over and under • Check for varied performance • Now add the failure info to your Knowledge Base: • Look at it carefully – what does the failure tell you? Which element do you think drove the failure? • If you know what failed (e.g. making the price bigger) then you have very useful information • You turned the handle the wrong way • Now brainstorm a new test 19
  20. 20. 11.10 – Should I run an A/A test first? • No – and this is why: – It’s a waste of time – It’s easier to test and monitor instead – You are eating into test time – Also applies to A/A/B/B testing – A/B/A running at 25%/50%/25% is the best • Read my post here : http://bit.ly/WcI9EZ 20
  21. 21. 11.11 – What is a good conversion rate? Higher than the one you had last month! 21
  22. 22. #12 – Top reasons summary • You weren’t bold enough • You made the test too complex • Your test didn’t tell you anything (failures too!) • You didn’t do browser QA • The session model is broken • Your redirects are flawed • Your office is part of the bias • The test isn’t truly random / The samples aren’t representative • Your sample size is too small • You didn’t test for long enough • You didn’t look at the error rates • You didn’t cross instrument 22 • You’ve missed one or more underlying cycles • You don’t factor in before/after cycles • One test has an inherent performance bias (load time, for example) • You didn’t watch segment performance • You’re measuring too shallowly in the funnel • Your traffic mix has changed • You’re not measuring channel switchers (phone/email/chat etc.) • The analytics setup is broken!
  23. 23. #13 – Summary - tests • This isn’t about tools – it’s about your thinking and approach to problems. Bravery and curiosity more important than wizardry! • Keep it simple and aim for actionable truths and insights • Invest in staff, training, analytics (yours and your clients) • More wired in clients means happier agency! • Fixing problems impresses clients even before you start (health check) • Prioritise issues into opportunity & effort • Showing models around money is a winner • Do something every week to make the client configuration better • Let me use a till analogy! • What about a Formula 1 racing car? • Get clients to pay you to invest in their future • Give staff time to train themselves, go on courses, get qualified • On that note – experience with core skills + topups = GA experts • Tap into the community out there • Hopefully this has given you a great springboard to MORE! 23
  24. 24. Is there a way to fix this then? 24 Conversion Heroes! @OptimiseOrDie
  25. 25. END & QUESTIONS 25
  26. 26. Email Twitter : sullivac@gmail.com : @OptimiseOrDie : linkd.in/pvrg14 More reading. 26
  92. 92. Why does my CRO suck? 5th Sep 2013 @OptimiseOrDie
  93. 93. @OptimiseOrDie Timeline - 1998 1999 - 2004 2004-2008 2008-2012
  94. 94. Belron Brands @OptimiseOrDie
  95. 95. SE O @OptimiseOrDie PP C UX Analytics A/B and Multivariate testing Customer Satisfaction Design QADevelopment 40+ websites, 34 countries, 19 languages, €1bn+ revenue Performance 8 people
  96. 96. @OptimiseOrDie Ahh, how it hurt
  97. 97. If you‟re not a part of the solution, there‟s good money to be made in prolonging the problem
  98. 98. Out of my comfort zone… @OptimiseOrDie
  99. 99. Behind enemy lines… @OptimiseOrDie
  100. 100. Nice day at the office, dear? @OptimiseOrDie
  101. 101. Competition…
  102. 102. Traffic is harder! SEO/PPC
  103. 103. Panguin tool…
  104. 104. Casino Psychology
  105. 105. If it isn‟t working, you‟re not doing it right @OptimiseOrDie
  106. 106. #1 : Your analytics are cattle trucked @OptimiseOrDie
  107. 107. #1 : Your analytics are cattle trucked @OptimiseOrDie
  108. 108. #1 : Common problems (GA) • Dual purpose goal page – One page used by two outcomes – and not split • Cross domain tracking – Where you jump between sites, this borks the data • Filters not correctly set up – Your office, agencies, developers are skewing data • Code missing or double code – Causes visit splitting, double pageviews, skews bounce rate • Campaign, Social, Email tracking etc. – External links you generate are not setup to record properly • Errors not tracked (404, 5xx, Other) – You are unaware of error volumes, locations and impact • Dual flow funnels – Flows join in the middle of a funnel or loop internally • Event tracking skews bounce rate – If an event is set to be „interactive‟ – it can skew bounce rate (example) @OptimiseOrDie
  109. 109. #1 : Common problems (GA) – EXAMPLE 110 Landing 1st interaction Loss 2nd interaction Loss 3rd interaction Loss 4th interaction Loss 55900 527 99.1% 66 87.5% 55 16.7% 33 40.0% 30900 4120 86.7% 2470 40.0% 1680 32.0% 1240 26.2%
  110. 110. #1 : Solutions • Get a Health Check for your Analytics – Try @prwd, @danbarker, @peter_oneill or ask me! • Invest continually in instrumentation – Aim for at least 5% of dev time to fix + improve • Stop shrugging : plug your insight gaps – Change „I don‟t know‟ to „I‟ll find out‟ • Look at event tracking (Google Analytics) – If set up correctly, you get wonderful insights • Would you use paper instead of a till? – You wouldn‟t do it in retail so stop doing it online! • How do you win F1 races? – With the wrong performance data, you won‟t @OptimiseOrDie
  111. 111. Insight - Inputs #FAIL Competitor copying Guessing Dice rolling An article the CEO read Competitor change Panic Ego Opinion Cherished notions Marketing whims Cosmic rays Not ‘on brand’ enough IT inflexibility Internal company needs Some dumbass consultant Shiny feature blindness Knee jerk reactons #2 : Your inputs are all wrong @OptimiseOrDie
  112. 112. Insight - Inputs Insight Segmentation Surveys Sales and Call Centre Session Replay Social analytics Customer contact Eye tracking Usability testing Forms analytics Search analytics Voice of Customer Market research A/B and MVT testing Big & unstructured data Web analytics Competitor evalsCustomer services #2 : Your inputs are all wrong @OptimiseOrDie
  113. 113. #2 : Solutions • Usability testing and User Centred design – If you‟re not doing this properly, you‟re hosed • Champion UX+ - with added numbers – (Re)designing without inputs + numbers is guessing • You need one team on this, not silos – Stop handing round the baby (I‟ll come back to this) • Ego, Opinion, Cherished notions – fill gaps – Fill these vacuums with insights and data • Champion the users – Someone needs to take their side! • You need multiple tool inputs – Let me show you my core list @OptimiseOrDie
  114. 114. #2 : Core tools • Properly set up analytics – Without this foundation, you‟re toast • Session replay tools – Clicktale, Tealeaf, Sessioncam and more… • Cheap / Crowdsourced usability testing – See the resource pack for more details • Voice of Customer / Feedback tools – 4Q, Kampyle, Qualaroo, Usabilla and more… • A/B and Multivariate testing – Optimizely, Google Content Experiments, VWO • Email, Browser and Mobile testing – You don‟t know if it works unless you check @OptimiseOrDie
  115. 115. #3 : You‟re not testing (enough) @OptimiseOrDie
  116. 116. #3 : Common problems • Let’s take a quick poll – How many tests do you complete a month? • Not enough resource – You MUST hire, invest and ringfence time and staff for CRO • Testing has gone to sleep – Some vendors have a „rescue‟ team for these accounts • Vanity testing takes hold – Getting one test done a quarter? Still showing it a year later? • You keep testing without buyin at C-Level – If nobody sees the flower, was it there? • You haven’t got a process – just a plugin – Insight, Brainstorm, Wireframe, Design, Build, QA test, Monitor, Analyse. Tools, Process, People, Time -> INVEST • IT or release barriers slow down work – Circumvent with tagging tools – Develop ways around the innovation barrier @OptimiseOrDie
  117. 117. #4 : Not executing fast enough @OptimiseOrDie
  118. 118. #4 : Not executing fast enough • Silo Mentality means pass the product – No „one team‟ approach means no „one product‟ • The process is badly designed – See the resource pack or ask me later! • People mistake hypotheses for finals – Endless argument, tweaking means NO TESTING – let the test decide, please! • No clarity : authority or decision making – You need a strong leader to get things decided • Signoff takes far too long – Signoff by committee is a velocity killer – the CUSTOMER and the NUMBERS are the signoff • You set your target too low – Aim for a high target and keep increasing it @OptimiseOrDie
  119. 119. CRO @OptimiseOrDie
  120. 120. #4 : Execution solutions • Agile, One Team approach – Everyone works on the lifecycle, together • Hire Polymaths – T-shaped or just multi-skilled, I hire them a lot • Use Collaborative Tools, not meetings – See the resource pack • Market the results – Market this stuff internally like a PR agency – Encourage betting in the office • Smash down silos – a special mission – Involve the worst offenders in the hypothesis team – “Hold your friends close, and your enemies closer” – Work WITH the developers to find solutions – Ask Developers and IT for solutions, not apologies @OptimiseOrDie
  121. 121. #5 : Product cycles are too long 0 6 12 18 Months Conversion @OptimiseOrDie
  122. 122. #5 : Solutions • Give Priority Boarding for opportunities – The best seats reserved for metric shifters • Release more often to close the gap – More testing resource helps, analytics „hawk eye‟ • Kaizen – continuous improvement – Others call it JFDI (just f***ing do it) • Make changes AS WELL as tests, basically! – These small things add up • RUSH Hair booking – Over 100 changes – No functional changes at all – 37% improvement • Inbetween product lifecycles? – The added lift for 10 days work, worth 360k @OptimiseOrDie
  123. 123. #5 : Make your own cycles @OptimiseOrDie
  124. 124. #6 – No Photo UX 24 Jan 2012 • Persuasion / Influence / Direction / Explanation • Helps people process information and stories • Vital to sell an „experience‟ • Helps people recognise and discriminate between things • Supports Scanning Visitors • Drives emotional response short.cx/YrBczl
  125. 125. • Very powerful and under-estimated area • I‟ve done over 20M visitor tests with people images for a service industry – some tips: • The person, pose, eye gaze, facial expressions and body language – cause visceral emotional reactions and big changes in behaviour • Eye gaze crucial – to engage you or to „point‟ Photo UX 24 Jan 2012
  126. 126. • Negative body language is a turnoff • Uniforms and branding a positive (ball cap) • Hands are hard to handle – use a prop to help • For Ecommerce – tip! test bigger images! • Autoglass and Belron always use real people • In most countries (out of 33) with strong female and male images in test, female wins • Smile and authenticity in these examples is absolutely vital • So, I have a question for you Photo UX @OptimiseOrDie
  127. 127. @OptimiseOrDie
  128. 128. Terrible Stock Photos : headsethotties.com & awkwardstockphotos.com Laughing at Salads : womenlaughingwithsalad.tumblr.com BBC Fake Smile Test : bbc.in/5rtnv @OptimiseOrDie
  129. 129. SPAIN +22% over control 99% confidence @OptimiseOrDie
  130. 130. @OptimiseOrDie
  131. 131. #7 : Your tests are cattle trucked • Many tests fail due to QA or browser bugs – Always do cross browser QA testing – see resources • Don’t rely on developers saying ‘yes’ – Use your analytics to define the list to test • Cross instrument your analytics – You need this to check the test software works • Store the variant(s) seen in analytics – Compare people who saw A/B/A vs. A/B/B • Segment your data to find variances – Failed tests usually show differences for segments • Watch the test and analytics CLOSELY – After you go live, religiously check both – Read this article : stanford.io/15UYov0 @OptimiseOrDie
  132. 132. #8 : Stats are confusing • Many testers & marketing people struggle – How long will it take to run the test? – Is the test ready? – How long should I keep it running for? – It says it‟s ready after 3 days – is it? – Can we close it now – the numbers look great! • A/B testing maths for dummies: – http://bit.ly/15UXLS4 • For more advanced testers: – Read this : http://bit.ly/1a4iJ1H • I’m going to build a stats course – To explain all the common questions – To save me having to explain this crap all the time @OptimiseOrDie
  133. 133. #9 : You‟re not segmenting • Averages lie – What about new vs. returning visitors? – What about different keyword groups? – Landing pages? Routes? Attributes • Failed tests are just ‘averaged out’ – You must look at segment level data – You must integrate the analytics + a/b test software • The downside? – You‟ll need more test data – to segment • The upside? – Helps figure out why test didn‟t perform – Finds value in failed or „no difference‟ tests – Drives further testing focus @OptimiseOrDie
  134. 134. #10 : You‟re unichannel optimising • Not using call tracking – Look at Infinity Tracking (UK) – Get Google keyword level call volumes! • You don’t measure channel switchers – People who bail a funnel and call – People who use chat or other contact/sales • You ‘forget’ mobile & tablet journeys – Walk the path from search -> ppc/seo -> site – Optimise for all your device mix & journeys • You’re responsive – Testing may now bleed across device platforms – Changing in one place may impact many others – QA, Device and Browser testing even more vital @OptimiseOrDie
  135. 135. SUMMARY : The best Companies…. • Invest continually in Analytics instrumentation, tools & people • Use an Agile, iterative, Cross-silo, One team project culture • Prefer collaborative tools to having lots of meetings • Prioritise development based on numbers and insight • Practice real continuous product improvement, not SLED • Source photos and copy that support persuasion and utility • Have cross channel, cross device design, testing and QA • Segment their data for valuable insights, every test or change • Continually try to reduce cycle (iteration) time in their process • Blend ‘long’ design, continuous improvement AND split tests • Make optimisation the engine of change, not the slave of ego • See the Maturity Model in the resource pack @OptimiseOrDie