• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Bad Data: The 3 validity threats that make your tests look conclusive (when they are deeply flawed)
 

Bad Data: The 3 validity threats that make your tests look conclusive (when they are deeply flawed)

on

  • 984 views

View the full replay here: http://www.marketingexperiments.com/site-optimization/bad-data.html

View the full replay here: http://www.marketingexperiments.com/site-optimization/bad-data.html

Statistics

Views

Total Views
984
Views on SlideShare
984
Embed Views
0

Actions

Likes
2
Downloads
42
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Chart Explanation:Weeks 1-2: test validatesWeek 3: Unusual deviation. Is there a reason to exclude the outlier?Week 4: Long-term results not conclusive
  • Phillip to answer
  • Phillipto answer
  • Phillip to answer
  • Flint to answer
  • Give them a second to write this down.

Bad Data: The 3 validity threats that make your tests look conclusive (when they are deeply flawed) Bad Data: The 3 validity threats that make your tests look conclusive (when they are deeply flawed) Presentation Transcript

  • Bad Data:The 3 validity threats that make your tests look conclusive(when they are deeply flawed)Educational Funding Provided by:
  • Join the conversation on Twitter #webclinic 2Funding provided by: #webclinic
  • Today’s team Dr. Flint McGlaughlin Phillip Porter Managing Director Data Analyst Austin McCraw Senior Editorial Analyst 3Funding provided by: #webclinic
  • Background and test design  Experiment ID: HubSpot Lead Gen Test Location: MarketingExperiments Research Library Test Protocol Number: TP3055 Research Notes: Background: HubSpot teamed up with MarketingExperiments and the Optimization Summit audience to create a treatment email offer landing page for a free chapter of the MarketingSherpa LPO Benchmark Report Goal: To increase HubSpot’s knowledge of their current email list as well as to grow it Primary research question: Which landing page will result in more chapter downloads? Approach: A/B multifactor split test 4Funding provided by: #webclinic
  • Experiment: Control 5Funding provided by: #webclinic
  • How do we get 200+ marketers to agree? 6Funding provided by: #webclinic
  • Experiment: Variable Options Variable: Headline Discover the Key Components of a Successful Landing Page Control Optimization Strategy Free chapter from MarketingSherpa’s first-ever Landing Page Optimization Benchmark Report. Maximize your marketing ROI with the MarketingSherpa Landing Page Option #1 Optimization Benchmark Report Free chapter from MarketingSherpa’s first-ever Landing Page Optimization Benchmark Report. Get a FREE chapter from the 2011 MarketingSherpa Landing Page Option #2 Optimization Benchmark Report (list price $447) Discover the Key Components of a Successful Landing Page Optimization Strategy Get a FREE 40-page Chapter Containing the Latest Landing Page Option #3 Optimization Research from MarketingSherpa Discover the Key Components of a Successful Landing Page Optimization Strategy 7Funding provided by: #webclinic
  • Experiment: Variable Options Variable: Call-to-Action Control Download Now Option #1 Get Your Free Chapter Now Option #2 Get the Latest LPO Research ** CTA Option #3 to go with Copy Option #3 only Option #3 Read More 8Funding provided by: #webclinic
  • Experiment: Variable Options Variable: Image Control Option #1 Option #2 $447 Option #3 9Funding provided by: #webclinic
  • Experiment: Variable Options Variable: Copy Control (long copy) Option #1 (short copy) 10Funding provided by: #webclinic
  • Experiment: Variable Options Variable: Copy Option #2 (bullets only) Option #3 (chapter excerpt) 11Funding provided by: #webclinic
  • Experiment: Variable Options Variable: Copy Option #1 (short copy) 12Funding provided by: #webclinic
  • Experiment: Variable Options Variable: Layout Control Option #1 13Funding provided by: #webclinic
  • Experiment: Variable Options Variable: Layout Option #2 Option #3 14Funding provided by: #webclinic
  • Experiment: Variable Options Variable: Layout Option #3 15Funding provided by: #webclinic
  • Experiment: Treatment Top 16Funding provided by: #webclinic
  • Experiment: Treatment Bottom 17Funding provided by: #webclinic
  • Experiment: Side-by-side Treatment Control 18Funding provided by: #webclinic
  • Experiment: Results Versions CR Rel. diff Stat. Conf Control 47.91% - - Treatment 48.24% 0.7% 90% Level of Confidence 95% 0% 25% 50% 75% 100% 19Funding provided by: #webclinic
  • Experiment: Data Conversion Rate (page views-to-downloads) 20Funding provided by: #webclinic
  • Experiment: Data Conversion Rate (page views-to-downloads) 21Funding provided by: #webclinic
  • Experiment: Data Conversion Rate (page views-to-downloads) 22Funding provided by: #webclinic
  • Experiment: Results Versions CR Rel. diff Stat. Conf Control 47.91% - - Treatment 48.24% 0.7% 90%  What you need to understand: Manywe considered this test conclusive and confidence conclusive. However, had marketers consider a 90% level of pushed the winner live, it would have been a poor decision. 23Funding provided by: #webclinic
  • • 10 years of B2B research & handbooks • 416 B2B articles • 523 B2B case studies • 245 industry experts training • 1,480 research findings • 9,000+ B2B marketers survey MarketingSherpa.com/B2BSummit 24Funding provided by: #webclinic
  • What we discovered F Key Principles 1. Just because a test looks conclusive doesn’t mean it’s conclusive. 2. There are at least 3 validity threats beyond sample size that you need to consider when testing: • History effect • Instrumentation effect • Selection effect 25Funding provided by: #webclinic
  • Validity Threat #1: History Effect 26Funding provided by: #webclinic
  • History Effect: Definition History Effect: The effect on a dependent variable by an extraneous variable associated with the passing of time. Plain English Definition: Something happens in the outside world that causes flawed data in the test. 27Funding provided by: #webclinic
  • History Effect: Example  Experiment ID: Protected Location: MarketingExperiments Research Library Research Notes: Background: Online sex offender registry service for parents concerned about their surrounding areas Goal: To increase the click-through rate of a PPC advertisement Primary research question: Which ad headline will produce the most clickthrough? Test Design: A/B/C/D split test focusing on the headlines of a PPC advertisement 28Funding provided by: #webclinic
  • History Effect: Example We prepared a headline test using Google AdWords as the split-testing platform. The headlines were chosen by the participants of the certification course from a pool which they created. The test was conducted for seven days and received 55,000 impressions. Child Predator Registry Is Your Child Safe? Identify sex offenders living in Identify sex offenders living in your area. Protect your kids today. your area. Protect your kids today. www.XXXXXXXXXXXXXXXX.com www.XXXXXXXXXXXXXXXX.com Predators in Your Area Find Child Predators Identify sex offenders living in Identify sex offenders living in your area. Protect your kids today. your area. Protect your kids today. www.XXXXXXXXXXXXXXXX.com www.XXXXXXXXXXXXXXXX.com 29Funding provided by: #webclinic
  • History Effect: Example • During the test, Dateline aired a special called “To Catch a Predator,” which was viewed by approximately 10 million individuals • Throughout this program, sex offenders are referred to as “predators” 30Funding provided by: #webclinic
  • History Effect: Example Headline Impressions Clicks CTR Predators in Your Area 21,096 1,423 6.74% Child Predator Registry 14,712 652 4.43% Find Child Predators 18,459 817 4.42% Is Your Child Safe? 15,128 437 2.89%  What was a considerable spike in overall click-through, but a relative special, there you need to understand: In the two days following the Dateline difference between those ads with "predator" in the headline and those ads without "predator" of up to 133%. So, in effect, an extraneous variable (the Dateline special) associated with the passage of time jeopardized the validity of our experiment. 31Funding provided by: #webclinic
  • • • • • • data tests Funding provided by: impacted by the media Use media tracking tools Make sure everyone in the Monitor for test anomalies If possible, track day-to-day company knows you’re testing (Google alerts, etc.) if you plan If possible, always run a/b split on testing around search terms#webclinic 3.000 0.000 1.000 2.000 4.000 (3.000) (2.000) (1.000) Saturday, October 11, 2008 Sunday, October 12, 2008 Monday, October 13, 2008 History Effect: Precautions Tuesday, October 14, 2008 Wednesday, October 15, 2008 Thursday, October 16, 2008 Friday, October 17, 2008 YES Saturday, October 18, 2008 Sunday, October 19, 2008 Monday, October 20, 2008 Tuesday, October 21, 2008 Wednesday, October 22, 2008 Thursday, October 23, 2008 Friday, October 24, 2008 Saturday, October 25, 2008 Sunday, October 26, 2008 Monday, October 27, 2008 Tuesday, October 28, 2008 NO Wednesday, October 29, 2008 Thursday, October 30, 2008 Friday, October 31, 2008 Saturday, November 01, 2008 Sunday, November 02, 2008 Monday, November 03, 2008 Tuesday, November 04, 2008 Wednesday, November 05, 2008 Standardized Conversion Rate Thursday, November 06, 2008 NO Friday, November 07, 2008 Saturday, November 08, 2008 Sunday, November 09, 2008 Monday, November 10, 2008 Tuesday, November 11, 2008 Wednesday, November 12, 2008 Thursday, November 13, 2008 Graphed results of a 4-week email test with an ecommerce retailer: Normalized Normalized B Normalized Traffic Normalized Traffic B 32
  • Seasonality. Does a valid test in the "off- ? season” translate to our busy season? -Greg 33Funding provided by: #webclinic
  • Validity Threat #2: Instrumentation Effect 34Funding provided by: #webclinic
  • Instrumentation Effect: Definition Instrumentation Effect: The effect on the dependent variable, caused by a variable external to an experiment, which is associated with a change in the measurement instrument. Plain English Definition: Something happens with the testing tools (or instruments) that causes flawed data in the test. 35Funding provided by: #webclinic
  • Instrumentation Effect: Example  Experiment ID: (Protected) Location: MarketingExperiments Research Library Research Notes: Background: Online sex offender registry service for parents concerned about their surrounding areas Goal: To increase the conversion Primary research question: Which landing page will produce the most clickthrough? Test Design: A/B spit test (variable cluster) 36Funding provided by: #webclinic
  • Instrumentation Effect: Example Control Values Treatment Values • In this test, we were comparing five variables with two differing values each (highlighted above). 37Funding provided by: #webclinic
  • Instrumentation Effect: Example Experiment Notes: • We discovered that in the testing software, a “fail safe” feature was enabled specifically that, if for any reason the treatment page was not running correctly, the page would default back to the control page. • This was enabled by loading hidden Control values of experimental variables whenever the Treatment was loaded. • This caused the Treatment pages to have substantially longer load times than the Control. 38Funding provided by: #webclinic
  • Instrumentation Effect: Example Page Load Time Reference Chart (in seconds) Page Size(kb) Connection Rate 14.4 28.8 33.6 56 128 1440 (ISDN) (T1) Control Page 50 35.69 17.85 15.30 9.18 4.02 0.36 75 53.54 26.77 22.95 13.77 6.02 0.54 84.9 60.61 30.30 25.98 15.59 6.82 0.61 Treatment Page 100 71.39 35.69 30.60 18.36 8.03 0.71 125 89.24 44.62 38.24 22.95 10.04 0.89 137 97.80 48.90 41.92 25.15 11.00 0.98 150 107.08 53.54 45.89 27.54 12.05 1.07 175 124.93 62.47 53.54 32.13 14.05 1.25 Add. Load Time (s) 37.19 18.60 15.94 9.56 4.18 0.37  What experience” to be asymmetricartificially long load times caused the “user you need to understand: The among the page versions, thereby threatening test validity. 39Funding provided by: #webclinic
  • Instrumentation Effect: Precautions • Set up second, backup metrics tool Double Control • Match results to transactional data Control • Test with a double control Channel Treatment #1 Shopping Cart • Monitor anomalies Treatment #2 40Funding provided by: #webclinic
  • What is the general quality of the A/B & multivariate tools on the market - do they ? really deliver valid results even at small sample sizes? -Steve 41Funding provided by: #webclinic
  • Validity Threat #3: Selection Effect 42Funding provided by: #webclinic
  • Selection Effect: Definition Selection Effect: The effect on a dependent variable, by an extraneous variable associated with different types of subjects not being evenly distributed between experimental treatments. Plain English Definition: Selection effect occurs when we wrongly assume some portion of the traffic represents the totality of the traffic. 43Funding provided by: #webclinic
  • Selection Effect: Example  Experiment ID: Protected Location: MarketingExperiments Research Library Test Protocol Number: TP2047 Research Notes: Background: An ecommerce site focusing on special occasion gifts Goal: To increase clickthrough and conversion Primary research question: Which email design will yield the highest conversion rate? Approach: Series of sequential A/B variable cluster split tests 44Funding provided by: #webclinic
  • Selection Effect: Example • In a series of tests lasting 5 weeks, we tested 7 different email templates designed for their most loyal customer segment. Below are examples of three of those email templates tested. Control Template Treatment Template #1 Treatment Template #2 MAIN PRODUCT COPY MAIN NEW MAIN MAIN MAIN OFFER PRODUCT PRODUCT MAIN PRODUCT PRODUCT IMAGE COPY PRODUCT COPY IMAGE IMAGE CTA NEW CTA OFFER CTA NEW OFFER 45Funding provided by: #webclinic
  • Selection Effect: Example Week 1 Week 2 Week 3 Control Template Treatment Template #1 Treatment Template #2 46Funding provided by: #webclinic
  • Selection Effect: Example 74% Increase in Conversion Simple side-by-side layout outperformed the Control Week 1 Results Template Version CR Rel. Diff. Control Template 14.01% - Treatment Template #1 17.06% 25.68% Treatment Template #2 24.38% 74.05%  What you needrateunderstand: After a weekcontrol. However, as the converted at a to 74.05% higher than the of testing, Treatment 2 subsequent tests were conducted, there was a noticeable shift in results. 47Funding provided by: #webclinic
  • Selection Effect: Example Week 2 Results • In the subsequent weeks Template Version CR Rel. Diff. of testing these email Control Template 24.08% - templates, the overall results of the treatment Treatment Template #1 22.59% -6.17% templates declined to as Treatment Template #2 24.89% 3.38% low as -6% for Treatment 1, and as low as 6% for Treatment 2. Week 3 Results Template Version CR Rel. Diff. Control Template 19.04% - Treatment Template #1 19.09% -1.60% Treatment Template #2 20.74% 6.89% 48Funding provided by: #webclinic
  • Selection Effect: Example • For the remaining test duration, the results never get above 7% indicating the need examine the results and verify whether the test was actually valid. Week 1 Week 2 Week 3 Week 4 Week 5 Control 14.01% 24.08% 19.04% 19.77% 20.05% Treatment #1 17.06% 22.59% 19.09% 19.42% 19.52% Treatment #2 24.38% 24.89% 20.74% 17.93% 20.50% Rel. Diff. (T2) 74% 3% 7% -9% 2% • As we drilled down into the numbers, we learned that it was the selection effect concerning the distributed traffic directed to the control of week 1. 49Funding provided by: #webclinic
  • Selection Effect: Example • During the first week, the Treatment 1 treatments received evenly All Email Addresses Treatment 2 distributed traffic coming from a specific segment of Treatment 3 frequent buyers Frequent Treatment 4 Buyers • However, the control Segment Treatment 5 received traffic from a Treatment 6 mixture of their frequent buyers and the rest of their Control (mixed email recipients) email list 50Funding provided by: #webclinic
  • Selection Effect: Example  What recipients to understand: The difference in the distribution of email you need between the control and treatments caused enough of a validity threat within the first week that the data has to be excluded from analysis. 51Funding provided by: #webclinic
  • With a low-volume site, it could take weeks ? to get to validity. How do we make timely decisions? -Beth 52Funding provided by: #webclinic
  • Is there a “simple" calculation to use, to know when a test is statistically significant? ? - Several Clinic Registrants Basic Statistical Significance Calculator Download MarketinExperiments.com/ValidityTool 53Funding provided by: #webclinic
  • Summary: Putting it all together F Key Principles 1. Just because a test looks conclusive doesn’t mean it’s conclusive. 2. There are at least 3 validity threats beyond sample size that you need to consider when testing: • History effect • Instrumentation effect • Selection effect 54Funding provided by: #webclinic
  • Learn How to Tweak Your Email Messaging to Generate More Leads This webinar includes: • The best (and worst) times to send email • The best (and worst) words to use in subject lines • Real data about sending frequency • Actual proof about the importance of subscriber freshness View “The Science of Email Marketing” goo.gl/pEIJt 55Funding provided by: #webclinic
  • View the original presentation free You can view this presentation in its entirety for free at MarketingExperiments.com View the original presentation >> 56Funding provided by: #webclinic