Bad Data:The 3 validity threats that make your tests look conclusive(when they are deeply flawed)Educational Funding Provi...
Join the conversation on Twitter                       #webclinic                                      2Funding provided b...
Today’s team                       Dr. Flint McGlaughlin    Phillip Porter                       Managing Director        ...
Background and test design                Experiment ID: HubSpot Lead Gen Test                 Location: MarketingExperim...
Experiment: Control                                    5Funding provided by:                       #webclinic
How do we get 200+ marketers to agree?                                            6Funding provided by:                   ...
Experiment: Variable Options    Variable: Headline                       Discover the Key Components of a Successful Landi...
Experiment: Variable Options    Variable: Call-to-Action               Control     Download Now               Option #1   ...
Experiment: Variable Options    Variable: Image          Control                   Option #1        Option #2      $447   ...
Experiment: Variable Options    Variable: Copy             Control (long copy)                Option #1 (short copy)      ...
Experiment: Variable Options    Variable: Copy               Option #2 (bullets only)            Option #3 (chapter excerp...
Experiment: Variable Options    Variable: Copy                         Option #1 (short copy)                             ...
Experiment: Variable Options    Variable: Layout                       Control                Option #1                   ...
Experiment: Variable Options    Variable: Layout                       Option #2                Option #3                 ...
Experiment: Variable Options    Variable: Layout        Option #3                                        15Funding provide...
Experiment: Treatment                                    Top                                          16Funding provided b...
Experiment: Treatment                                Bottom                                         17Funding provided by:...
Experiment: Side-by-side                                              Treatment                       Control             ...
Experiment: Results                       Versions                          CR      Rel. diff   Stat. Conf          Contro...
Experiment: Data     Conversion Rate (page views-to-downloads)                                                 20Funding p...
Experiment: Data     Conversion Rate (page views-to-downloads)                                                 21Funding p...
Experiment: Data     Conversion Rate (page views-to-downloads)                                                 22Funding p...
Experiment: Results                        Versions                    CR       Rel. diff   Stat. Conf          Control   ...
• 10 years of B2B research & handbooks     • 416 B2B articles    • 523 B2B case studies                     • 245 industry...
What we discovered  F        Key Principles         1.    Just because a test looks conclusive doesn’t mean it’s conclusiv...
Validity Threat #1: History Effect                                            26Funding provided by:                      ...
History Effect: Definition     History Effect: The effect on a dependent variable by an extraneous variable     associated...
History Effect: Example                Experiment ID: Protected                 Location: MarketingExperiments Research L...
History Effect: Example   We prepared a headline test using Google AdWords as the split-testing platform. The   headlines ...
History Effect: Example                                    • During the test, Dateline                                    ...
History Effect: Example           Headline                               Impressions   Clicks        CTR           Predato...
•                                                                                                                 •       ...
Seasonality. Does a valid test in the "off-             ?         season” translate to our busy season?                   ...
Validity Threat #2: Instrumentation Effect                                                    34Funding provided by:      ...
Instrumentation Effect: Definition     Instrumentation Effect: The effect on the dependent variable, caused by a variable ...
Instrumentation Effect: Example                Experiment ID: (Protected)                 Location: MarketingExperiments ...
Instrumentation Effect: Example                 Control Values                          Treatment                         ...
Instrumentation Effect: Example    Experiment Notes:       •   We discovered that in the testing software, a “fail safe” f...
Instrumentation Effect: Example                                           Page Load Time Reference Chart (in seconds)     ...
Instrumentation Effect: Precautions    •   Set up second, backup metrics        tool                                      ...
What is the general quality of the A/B &                       multivariate tools on the market - do they             ?   ...
Validity Threat #3: Selection Effect                                              42Funding provided by:                  ...
Selection Effect: Definition     Selection Effect: The effect on a dependent variable, by an extraneous variable     assoc...
Selection Effect: Example                            Experiment ID: Protected                       Location: MarketingEx...
Selection Effect: Example      • In a series of tests lasting 5 weeks, we tested 7 different email templates        design...
Selection Effect: Example                          Week 1         Week 2   Week 3       Control Template  Treatment Templa...
Selection Effect: Example                        74% Increase in Conversion                        Simple side-by-side lay...
Selection Effect: Example                                           Week 2 Results    •   In the subsequent weeks         ...
Selection Effect: Example      •   For the remaining test duration, the results never get above 7% indicating the         ...
Selection Effect: Example     •   During the first week, the                                               Treatment 1    ...
Selection Effect: Example           What recipients to understand: The difference in the distribution of            email...
With a low-volume site, it could take weeks             ?         to get to validity. How do we make timely               ...
Is there a “simple" calculation to use, to know                       when a test is statistically significant?           ...
Summary: Putting it all together     F        Key Principles            1.    Just because a test looks conclusive doesn’t...
Learn How to Tweak Your Email Messaging to   Generate More Leads    This webinar includes:    • The best (and worst) times...
View the original presentation free  You can view this  presentation in its entirety  for free at  MarketingExperiments.co...
Upcoming SlideShare
Loading in...5
×

Bad Data: The 3 validity threats that make your tests look conclusive (when they are deeply flawed)

1,426

Published on

View the full replay here: http://www.marketingexperiments.com/site-optimization/bad-data.html

Published in: Business, Economy & Finance
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,426
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
43
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Chart Explanation:Weeks 1-2: test validatesWeek 3: Unusual deviation. Is there a reason to exclude the outlier?Week 4: Long-term results not conclusive
  • Phillip to answer
  • Phillipto answer
  • Phillip to answer
  • Flint to answer
  • Give them a second to write this down.
  • Bad Data: The 3 validity threats that make your tests look conclusive (when they are deeply flawed)

    1. 1. Bad Data:The 3 validity threats that make your tests look conclusive(when they are deeply flawed)Educational Funding Provided by:
    2. 2. Join the conversation on Twitter #webclinic 2Funding provided by: #webclinic
    3. 3. Today’s team Dr. Flint McGlaughlin Phillip Porter Managing Director Data Analyst Austin McCraw Senior Editorial Analyst 3Funding provided by: #webclinic
    4. 4. Background and test design  Experiment ID: HubSpot Lead Gen Test Location: MarketingExperiments Research Library Test Protocol Number: TP3055 Research Notes: Background: HubSpot teamed up with MarketingExperiments and the Optimization Summit audience to create a treatment email offer landing page for a free chapter of the MarketingSherpa LPO Benchmark Report Goal: To increase HubSpot’s knowledge of their current email list as well as to grow it Primary research question: Which landing page will result in more chapter downloads? Approach: A/B multifactor split test 4Funding provided by: #webclinic
    5. 5. Experiment: Control 5Funding provided by: #webclinic
    6. 6. How do we get 200+ marketers to agree? 6Funding provided by: #webclinic
    7. 7. Experiment: Variable Options Variable: Headline Discover the Key Components of a Successful Landing Page Control Optimization Strategy Free chapter from MarketingSherpa’s first-ever Landing Page Optimization Benchmark Report. Maximize your marketing ROI with the MarketingSherpa Landing Page Option #1 Optimization Benchmark Report Free chapter from MarketingSherpa’s first-ever Landing Page Optimization Benchmark Report. Get a FREE chapter from the 2011 MarketingSherpa Landing Page Option #2 Optimization Benchmark Report (list price $447) Discover the Key Components of a Successful Landing Page Optimization Strategy Get a FREE 40-page Chapter Containing the Latest Landing Page Option #3 Optimization Research from MarketingSherpa Discover the Key Components of a Successful Landing Page Optimization Strategy 7Funding provided by: #webclinic
    8. 8. Experiment: Variable Options Variable: Call-to-Action Control Download Now Option #1 Get Your Free Chapter Now Option #2 Get the Latest LPO Research ** CTA Option #3 to go with Copy Option #3 only Option #3 Read More 8Funding provided by: #webclinic
    9. 9. Experiment: Variable Options Variable: Image Control Option #1 Option #2 $447 Option #3 9Funding provided by: #webclinic
    10. 10. Experiment: Variable Options Variable: Copy Control (long copy) Option #1 (short copy) 10Funding provided by: #webclinic
    11. 11. Experiment: Variable Options Variable: Copy Option #2 (bullets only) Option #3 (chapter excerpt) 11Funding provided by: #webclinic
    12. 12. Experiment: Variable Options Variable: Copy Option #1 (short copy) 12Funding provided by: #webclinic
    13. 13. Experiment: Variable Options Variable: Layout Control Option #1 13Funding provided by: #webclinic
    14. 14. Experiment: Variable Options Variable: Layout Option #2 Option #3 14Funding provided by: #webclinic
    15. 15. Experiment: Variable Options Variable: Layout Option #3 15Funding provided by: #webclinic
    16. 16. Experiment: Treatment Top 16Funding provided by: #webclinic
    17. 17. Experiment: Treatment Bottom 17Funding provided by: #webclinic
    18. 18. Experiment: Side-by-side Treatment Control 18Funding provided by: #webclinic
    19. 19. Experiment: Results Versions CR Rel. diff Stat. Conf Control 47.91% - - Treatment 48.24% 0.7% 90% Level of Confidence 95% 0% 25% 50% 75% 100% 19Funding provided by: #webclinic
    20. 20. Experiment: Data Conversion Rate (page views-to-downloads) 20Funding provided by: #webclinic
    21. 21. Experiment: Data Conversion Rate (page views-to-downloads) 21Funding provided by: #webclinic
    22. 22. Experiment: Data Conversion Rate (page views-to-downloads) 22Funding provided by: #webclinic
    23. 23. Experiment: Results Versions CR Rel. diff Stat. Conf Control 47.91% - - Treatment 48.24% 0.7% 90%  What you need to understand: Manywe considered this test conclusive and confidence conclusive. However, had marketers consider a 90% level of pushed the winner live, it would have been a poor decision. 23Funding provided by: #webclinic
    24. 24. • 10 years of B2B research & handbooks • 416 B2B articles • 523 B2B case studies • 245 industry experts training • 1,480 research findings • 9,000+ B2B marketers survey MarketingSherpa.com/B2BSummit 24Funding provided by: #webclinic
    25. 25. What we discovered F Key Principles 1. Just because a test looks conclusive doesn’t mean it’s conclusive. 2. There are at least 3 validity threats beyond sample size that you need to consider when testing: • History effect • Instrumentation effect • Selection effect 25Funding provided by: #webclinic
    26. 26. Validity Threat #1: History Effect 26Funding provided by: #webclinic
    27. 27. History Effect: Definition History Effect: The effect on a dependent variable by an extraneous variable associated with the passing of time. Plain English Definition: Something happens in the outside world that causes flawed data in the test. 27Funding provided by: #webclinic
    28. 28. History Effect: Example  Experiment ID: Protected Location: MarketingExperiments Research Library Research Notes: Background: Online sex offender registry service for parents concerned about their surrounding areas Goal: To increase the click-through rate of a PPC advertisement Primary research question: Which ad headline will produce the most clickthrough? Test Design: A/B/C/D split test focusing on the headlines of a PPC advertisement 28Funding provided by: #webclinic
    29. 29. History Effect: Example We prepared a headline test using Google AdWords as the split-testing platform. The headlines were chosen by the participants of the certification course from a pool which they created. The test was conducted for seven days and received 55,000 impressions. Child Predator Registry Is Your Child Safe? Identify sex offenders living in Identify sex offenders living in your area. Protect your kids today. your area. Protect your kids today. www.XXXXXXXXXXXXXXXX.com www.XXXXXXXXXXXXXXXX.com Predators in Your Area Find Child Predators Identify sex offenders living in Identify sex offenders living in your area. Protect your kids today. your area. Protect your kids today. www.XXXXXXXXXXXXXXXX.com www.XXXXXXXXXXXXXXXX.com 29Funding provided by: #webclinic
    30. 30. History Effect: Example • During the test, Dateline aired a special called “To Catch a Predator,” which was viewed by approximately 10 million individuals • Throughout this program, sex offenders are referred to as “predators” 30Funding provided by: #webclinic
    31. 31. History Effect: Example Headline Impressions Clicks CTR Predators in Your Area 21,096 1,423 6.74% Child Predator Registry 14,712 652 4.43% Find Child Predators 18,459 817 4.42% Is Your Child Safe? 15,128 437 2.89%  What was a considerable spike in overall click-through, but a relative special, there you need to understand: In the two days following the Dateline difference between those ads with "predator" in the headline and those ads without "predator" of up to 133%. So, in effect, an extraneous variable (the Dateline special) associated with the passage of time jeopardized the validity of our experiment. 31Funding provided by: #webclinic
    32. 32. • • • • • data tests Funding provided by: impacted by the media Use media tracking tools Make sure everyone in the Monitor for test anomalies If possible, track day-to-day company knows you’re testing (Google alerts, etc.) if you plan If possible, always run a/b split on testing around search terms#webclinic 3.000 0.000 1.000 2.000 4.000 (3.000) (2.000) (1.000) Saturday, October 11, 2008 Sunday, October 12, 2008 Monday, October 13, 2008 History Effect: Precautions Tuesday, October 14, 2008 Wednesday, October 15, 2008 Thursday, October 16, 2008 Friday, October 17, 2008 YES Saturday, October 18, 2008 Sunday, October 19, 2008 Monday, October 20, 2008 Tuesday, October 21, 2008 Wednesday, October 22, 2008 Thursday, October 23, 2008 Friday, October 24, 2008 Saturday, October 25, 2008 Sunday, October 26, 2008 Monday, October 27, 2008 Tuesday, October 28, 2008 NO Wednesday, October 29, 2008 Thursday, October 30, 2008 Friday, October 31, 2008 Saturday, November 01, 2008 Sunday, November 02, 2008 Monday, November 03, 2008 Tuesday, November 04, 2008 Wednesday, November 05, 2008 Standardized Conversion Rate Thursday, November 06, 2008 NO Friday, November 07, 2008 Saturday, November 08, 2008 Sunday, November 09, 2008 Monday, November 10, 2008 Tuesday, November 11, 2008 Wednesday, November 12, 2008 Thursday, November 13, 2008 Graphed results of a 4-week email test with an ecommerce retailer: Normalized Normalized B Normalized Traffic Normalized Traffic B 32
    33. 33. Seasonality. Does a valid test in the "off- ? season” translate to our busy season? -Greg 33Funding provided by: #webclinic
    34. 34. Validity Threat #2: Instrumentation Effect 34Funding provided by: #webclinic
    35. 35. Instrumentation Effect: Definition Instrumentation Effect: The effect on the dependent variable, caused by a variable external to an experiment, which is associated with a change in the measurement instrument. Plain English Definition: Something happens with the testing tools (or instruments) that causes flawed data in the test. 35Funding provided by: #webclinic
    36. 36. Instrumentation Effect: Example  Experiment ID: (Protected) Location: MarketingExperiments Research Library Research Notes: Background: Online sex offender registry service for parents concerned about their surrounding areas Goal: To increase the conversion Primary research question: Which landing page will produce the most clickthrough? Test Design: A/B spit test (variable cluster) 36Funding provided by: #webclinic
    37. 37. Instrumentation Effect: Example Control Values Treatment Values • In this test, we were comparing five variables with two differing values each (highlighted above). 37Funding provided by: #webclinic
    38. 38. Instrumentation Effect: Example Experiment Notes: • We discovered that in the testing software, a “fail safe” feature was enabled specifically that, if for any reason the treatment page was not running correctly, the page would default back to the control page. • This was enabled by loading hidden Control values of experimental variables whenever the Treatment was loaded. • This caused the Treatment pages to have substantially longer load times than the Control. 38Funding provided by: #webclinic
    39. 39. Instrumentation Effect: Example Page Load Time Reference Chart (in seconds) Page Size(kb) Connection Rate 14.4 28.8 33.6 56 128 1440 (ISDN) (T1) Control Page 50 35.69 17.85 15.30 9.18 4.02 0.36 75 53.54 26.77 22.95 13.77 6.02 0.54 84.9 60.61 30.30 25.98 15.59 6.82 0.61 Treatment Page 100 71.39 35.69 30.60 18.36 8.03 0.71 125 89.24 44.62 38.24 22.95 10.04 0.89 137 97.80 48.90 41.92 25.15 11.00 0.98 150 107.08 53.54 45.89 27.54 12.05 1.07 175 124.93 62.47 53.54 32.13 14.05 1.25 Add. Load Time (s) 37.19 18.60 15.94 9.56 4.18 0.37  What experience” to be asymmetricartificially long load times caused the “user you need to understand: The among the page versions, thereby threatening test validity. 39Funding provided by: #webclinic
    40. 40. Instrumentation Effect: Precautions • Set up second, backup metrics tool Double Control • Match results to transactional data Control • Test with a double control Channel Treatment #1 Shopping Cart • Monitor anomalies Treatment #2 40Funding provided by: #webclinic
    41. 41. What is the general quality of the A/B & multivariate tools on the market - do they ? really deliver valid results even at small sample sizes? -Steve 41Funding provided by: #webclinic
    42. 42. Validity Threat #3: Selection Effect 42Funding provided by: #webclinic
    43. 43. Selection Effect: Definition Selection Effect: The effect on a dependent variable, by an extraneous variable associated with different types of subjects not being evenly distributed between experimental treatments. Plain English Definition: Selection effect occurs when we wrongly assume some portion of the traffic represents the totality of the traffic. 43Funding provided by: #webclinic
    44. 44. Selection Effect: Example  Experiment ID: Protected Location: MarketingExperiments Research Library Test Protocol Number: TP2047 Research Notes: Background: An ecommerce site focusing on special occasion gifts Goal: To increase clickthrough and conversion Primary research question: Which email design will yield the highest conversion rate? Approach: Series of sequential A/B variable cluster split tests 44Funding provided by: #webclinic
    45. 45. Selection Effect: Example • In a series of tests lasting 5 weeks, we tested 7 different email templates designed for their most loyal customer segment. Below are examples of three of those email templates tested. Control Template Treatment Template #1 Treatment Template #2 MAIN PRODUCT COPY MAIN NEW MAIN MAIN MAIN OFFER PRODUCT PRODUCT MAIN PRODUCT PRODUCT IMAGE COPY PRODUCT COPY IMAGE IMAGE CTA NEW CTA OFFER CTA NEW OFFER 45Funding provided by: #webclinic
    46. 46. Selection Effect: Example Week 1 Week 2 Week 3 Control Template Treatment Template #1 Treatment Template #2 46Funding provided by: #webclinic
    47. 47. Selection Effect: Example 74% Increase in Conversion Simple side-by-side layout outperformed the Control Week 1 Results Template Version CR Rel. Diff. Control Template 14.01% - Treatment Template #1 17.06% 25.68% Treatment Template #2 24.38% 74.05%  What you needrateunderstand: After a weekcontrol. However, as the converted at a to 74.05% higher than the of testing, Treatment 2 subsequent tests were conducted, there was a noticeable shift in results. 47Funding provided by: #webclinic
    48. 48. Selection Effect: Example Week 2 Results • In the subsequent weeks Template Version CR Rel. Diff. of testing these email Control Template 24.08% - templates, the overall results of the treatment Treatment Template #1 22.59% -6.17% templates declined to as Treatment Template #2 24.89% 3.38% low as -6% for Treatment 1, and as low as 6% for Treatment 2. Week 3 Results Template Version CR Rel. Diff. Control Template 19.04% - Treatment Template #1 19.09% -1.60% Treatment Template #2 20.74% 6.89% 48Funding provided by: #webclinic
    49. 49. Selection Effect: Example • For the remaining test duration, the results never get above 7% indicating the need examine the results and verify whether the test was actually valid. Week 1 Week 2 Week 3 Week 4 Week 5 Control 14.01% 24.08% 19.04% 19.77% 20.05% Treatment #1 17.06% 22.59% 19.09% 19.42% 19.52% Treatment #2 24.38% 24.89% 20.74% 17.93% 20.50% Rel. Diff. (T2) 74% 3% 7% -9% 2% • As we drilled down into the numbers, we learned that it was the selection effect concerning the distributed traffic directed to the control of week 1. 49Funding provided by: #webclinic
    50. 50. Selection Effect: Example • During the first week, the Treatment 1 treatments received evenly All Email Addresses Treatment 2 distributed traffic coming from a specific segment of Treatment 3 frequent buyers Frequent Treatment 4 Buyers • However, the control Segment Treatment 5 received traffic from a Treatment 6 mixture of their frequent buyers and the rest of their Control (mixed email recipients) email list 50Funding provided by: #webclinic
    51. 51. Selection Effect: Example  What recipients to understand: The difference in the distribution of email you need between the control and treatments caused enough of a validity threat within the first week that the data has to be excluded from analysis. 51Funding provided by: #webclinic
    52. 52. With a low-volume site, it could take weeks ? to get to validity. How do we make timely decisions? -Beth 52Funding provided by: #webclinic
    53. 53. Is there a “simple" calculation to use, to know when a test is statistically significant? ? - Several Clinic Registrants Basic Statistical Significance Calculator Download MarketinExperiments.com/ValidityTool 53Funding provided by: #webclinic
    54. 54. Summary: Putting it all together F Key Principles 1. Just because a test looks conclusive doesn’t mean it’s conclusive. 2. There are at least 3 validity threats beyond sample size that you need to consider when testing: • History effect • Instrumentation effect • Selection effect 54Funding provided by: #webclinic
    55. 55. Learn How to Tweak Your Email Messaging to Generate More Leads This webinar includes: • The best (and worst) times to send email • The best (and worst) words to use in subject lines • Real data about sending frequency • Actual proof about the importance of subscriber freshness View “The Science of Email Marketing” goo.gl/pEIJt 55Funding provided by: #webclinic
    56. 56. View the original presentation free You can view this presentation in its entirety for free at MarketingExperiments.com View the original presentation >> 56Funding provided by: #webclinic
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×