Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data Quality Doesn’t Just Happen: And Here’s What Some of the Industry’s Most Influential Players are Doing About It by Mark Menig of TrueSample - Presented at Insight Innovation eXchange North America 2013


Published on

Data quality isn’t always the sexiest topic, but it’s critical and one that buyers and suppliers often neglect to have. The ramifications of ignoring it can cost millions of dollars. Some of the …

Data quality isn’t always the sexiest topic, but it’s critical and one that buyers and suppliers often neglect to have. The ramifications of ignoring it can cost millions of dollars. Some of the industry’s largest buyers and suppliers have found a simple solution though and it’s one that is available to everyone else too. Come here about how the issue of data quality concerns haven’t gone away, and what others are doing to make sure they and their insights are protected.

Published in: Business, Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • The load test was conducted without stopping external traffic. Therefore, for each load levels above there was an additional background load generated by the current production traffic.
  • The load test was conducted without stopping external traffic. Therefore, for each load levels above there was an additional background load generated by the current production traffic.
  • The load test was conducted without stopping external traffic. Therefore, for each load levels above there was an additional background load generated by the current production traffic.
  • The load test was conducted without stopping external traffic. Therefore, for each load levels above there was an additional background load generated by the current production traffic.
  • Support for more …Question Types:For example: Currently, no “open end” question types are analyzed by TrueSample. However, there is reason to believe that the prevalence of vulgarities in “open ends” strongly correlates with lack of engagement.Sample Types: For example: Currently, TrueSample’s Global product can help improve non-consumer research. RealCheck data validation partners are focused only on consumers. However, databases for other types of sample (B2B, Doctors, etc.) may be able to expand RealCheck capabilities to adding market segments.Countries: For example: Currently, TrueSample’s Global product can help improve research in all countries. RealCheck data validation partners are focused only on twelve of them. However, databases for other countries (China, Japan, South Korea, etc.) can expand the list of supported geographies.Platforms: TrueSample can support more survey, recruitment, router/sample aggregator, and mobile platforms.RealCheck Types: For example: Currently, RealCheck supports Postal, Social and Local, but Phone Numbers and other identifiers may be possible to include for additional flexibility of the product.Decision Making Criteria:Quality ImprovementYieldPriceTechnology/AutomationEngagement: Clustering vs. Parametric
  • Transcript

    • 1. Data Quality Doesn‘t Just HappenAnd Here‘s What Some of the Industry‘s Most Influential Players are Doing About itJune 2013
    • 2. 1. Why is data quality an issue?2. What are industry players doing about it?3. Why was TrueSample created?AGENDA
    • 3. Back to Basics: Why does data qualitymatter?
    • 4. ―Online panels havestormed the marketresearch industry,offering access toinexpensivesamples quickly —but at the sametime, firms reportanxiety about thequality of thesample…‖Brad BortnerForrester―Industryassociations launchmajor initiatives toinvestigate andrestore onlineresearch quality.‖Industry AssociationsCASRO, AMA,ESOMAR, ARF―P&G speaks outabout online dataquality issues atthe Client Summitsparking industry-wide discourse‖Kim DedekerP&G & KantarThe Market Research Industry Has Been Strugglingto Address Online Data Quality for Years
    • 5. Independent Research from The ARF Identifies41% Email Address Overlap Across Panels
    • 6. Panelist Duplication/Multi-Panel Membership# of PanelsTotal #Panelists % of Total Panelist ValidatedTotal Responses taken by Panelists in thissection % of Responses1 15,747,937 78.23% 4,580,489 11.21%2 2,668,338 13.25% 7,313,709 17.90%3 897,742 4.46% 4,962,607 12.15%4 384,027 1.91% 3,842,079 9.40%5 186,938 0.93% 3,088,652 7.56%6 98,951 0.49% 2,658,842 6.51%7 55,989 0.28% 2,314,391 5.66%8 32,760 0.16% 2,014,186 4.93%9 20,324 0.10% 1,764,155 4.32%10 13,369 0.07% 1,564,083 3.83%11 9,278 0.05% 1,415,787 3.47%12 6,231 0.03% 1,174,433 2.87%13 4,162 0.02% 2,278,468 5.58%14 2,474 0.01% 663,378 1.62%15 1,475 0.01% 692,292 1.69%16 763 0.00% 253,483 0.62%17 366 0.00% 139,181 0.34%18 159 0.00% 73,002 0.18%19 69 0.00% 30,923 0.08%20 37 0.00% 19,850 0.05%21 24 0.00% 9,373 0.02%22 9 0.00% 4,373 0.01%24 1 0.00% 0 0.00%25 1 0.00% 0 0.00%26 2 0.00% 0 0.00%29 1 0.00% 0 0.00%34 1 0.00% 0 0.00%36 1 0.00% 0 0.00%TOTAL 20,131,429 100.00% 40,857,736 100.00%• 78% of submittedand validatedpanelists onlybelong to a singlepanel• HOWEVER…..• 50% of surveyresponses comefrom paneliststhat are amember of 5+panels!• Less than 1% oftotal panelistsaccounts for morethan 15% ofsurvey responsesand they are amember of anaverage of 13panels
    • 7. First-Hand EvidenceProject: Technology A&U studyGoal: Compare clean/unclean sampleResults of unclean sample:• Unrealistic segmentation solutions• Higher mean scores and SD’s• Degradation of sensitivity ofsignificance tests-From Steve Schwartz‘ presentation at ‗09 IIR Market Research EventTakeaway: Data from unclean sample would have led to different business decisionsClients Are Able to Identify Analytical Issues withData Quality in Online Research ProjectsTop TechFirm
    • 8. First-Hand Evidence Project: Product launch in-home usage study (IHUT)Goal: Test product against 3 discrete sample populationsand ready for commercial product launchResults:• Lack of quality controls resulted in 50% ofrespondents receiving more than one product duringthe usage period• Research Impact: All three studies had to bereviewed – key measures were undeterminable• Business Impact: Estimated loss in revenue of $15million due to delays not to mention tarnishedreputation with retailersTakeaway: Lack of quality controls/measures can causesignificant rework and expenseClients Experience Operational Issues with DataQuality in Online Research ProjectsTop CPGFirm
    • 9. Quantifying the Risk of Bad Respondents• Risk Ratio is defined as the ratio of the probability of getting a wrong answer to the baselineprobability of 5%, based on sampling theory.• Clients on average see 20% or more of respondents in their survey failing at least one qualitycheck of TrueSample meaning that their risk of not applying TrueSample is doubled!
    • 10. What Industry Players are DoingAbout it?
    • 11. Data from Confirmit 2012 Annual MR Software Survey• Penny for your thoughts – most online surveys today are incentivised– Nearly six out of 10 (57%) of research companies are using incentivised panels forbetween two-thirds and 100% of their samples. Only a few (7%) are not using rewardsat all.• Independent panel verification is the exception not the norm– Around three-quarters (76%) of panel operators do not subscribe to independent panelverification services. Even among large companies 58% do not do this.• Most MR companies run simple fraud prevention checks on onlineresponses– Most companies are checking for speeding by respondents (73%) and nearly two- thirds(63%) look for ‘straightlining’: two quality control methods that many data collectiontools make easy to apply.• More thorough respondent fraud checks are largely shunned– Just over a half surveyed (52%) use challenge questions, and fewer still some of themore high tech methods.
    • 12. Clients Implement Standardized Quality Requirements– Suppliers Seek an Automated, Systematic Solutionto Comply
    • 13. As More Clients Apply Standard Online ResearchQuality Requirements TrueSample Will Help ClientsMeet Them
    • 14. FoQ2 is Counting on the TrueSample Quality CouncilFrom the FOQ 2 analyses and insights:• The ARF and FoQ2 participants will produce important findings and delivernew guidelines with strong recommendations over the next few months.• The ARF is counting on TrueSample and the TrueSample Quality Council tohelp translate FoQ2 learning into advanced online research practiceapplications and Research-on-Research.
    • 15. Companies Coming Together to Create anIndustry StandardClients:Clients:Sample Suppliers:Sample Suppliers:Research Companies:Research Companies:Survey Platforms:Technology Platforms:Federated
    • 16. Why was TrueSample created?
    • 17. Survey Design& CreationPanel Management& SelectionDataCollectionAnalyze &ImproveSAMPLETrueSample: Provide a consistent and scalable dataquality platform for online research
    • 18. TrueSample: Help people seeking insights makebetter decisions Through applying the bestavailable, independent, andcomprehensive data qualitysolution in every country where theyconduct online quantitative research. Through reducing the risk ofmaking poor decisions as a resultof applying TrueSample technologyand algorithms to respondents andsurvey instruments to systematicallyand comprehensively eliminate "bad"data wherever possible.5
    • 19. Research-on-Research (RoR) has Been theFoundation of the TrueSample Quality CouncilPast RoRExamplesImpact of IdentifyVerification onHard-to-ReachGroupsImpact ofChronicallyUnengagedRespondents onData QualityImpact of SurveyDesign on DataQualityImpact of ‗Bad‘Respondents onBusinessDecisionsRealCheckPostalSurveyScoreEngagementCheckRealCheckSocial―The goal of the TrueSample Research-on-Research Sub-Committee is to drive a researchagenda that identifies and provides empirical evidence related to techniques that can beincorporated into the TrueSample product to maintain or enhance research data quality, animportant component in minimizing the risk of incorrect business decisions.‖
    • 20. Social Media& River SampleMobile DeviceData CollectionMore RobustAnalytics &Question TypesTrueSample Quality CouncilRoR Sub-CommitteePrioritization ProcessDESIGN FIELD ANALYZEResults will inform TrueSample product roadmapFrom RoR to Product RoadmapAlternativeIdentity ValidationVariables• Phase 1 = ROR efforts to ascertainWHAT challenges need to be solvedfor• Phase 2 = ROR efforts to to ascertainHOW to solve for the challenges
    • 21. › Panelists are who and where they say they are› Identity validation with reputable, third-party databasesReal› Panelists unique within and across all Certified Panels› Machine fingerprinting ensures no duplicate survey takersUnique› No straight-lining respondents› No speeding respondentsEngaged› Respondents meet exclusion criteria for survey› Respondents‘ survey-taking behavior tracked over timeQualified› Predictive models improve survey design before launch› Actual survey engagement scored and benchmarkedSurveyScoreTrueSample is a Technology that ProvidesConsistent, Objective, and Automated Quality
    • 22. Thank you!
    • 23. TrueSample 2013 Initiatives• Consistency scoring at the panel level to help aid in the selection of the mostappropriate panel for a particular study as well as proactively identify anysignificant changes to a panel over time that may effect results in a study• Consistency scoring at the individual panelist level to aid in the removal ofpanelists that habitually provide responses that are inconsistentCONSISTENT• Brings the full benefits of Real, Unique, Qualified, Engaged, Consistent, andSurveyScore to mobile devices• Specifically designed for app based research being conducted on smartphone and or tablet based devices (iOS, Android, etc.)MOBILE• Extends Panelist Validation functionality around Real and Unique to samplesources utilizing a river sampling methodology• Optimizes record submission process and functionality for suppliers utilizinga river sampling methodologyRIVER SAMPLE
    • 24. PROJECT PROJECT DESCRIPTION EXPECTED COMPLETIONPROJECTLEADERTrueSampleConsistent- Identify consistency of an individual panelist- Identify consistency of a panelPhase 1-June 2013Phase 2-October2013TrueSampleEngagementAlgorithm Redesign- Replace the current parametric approach with non-parametric clustering-based algorithmSeptember 2013 TrueSampleMobile Surveys- Compare and contrast user’s browsing patterns formobile vs. desktop based surveys.- Understand the impact of a shorter survey, grideffect, straight lining, speeding, etc.October2013TrueSampleDynamic/River Sample- Optimize real-time panelist validation process- Understand impact of including this sample type insurveysPhase 1-April 2013Phase 2-August2013TrueSampleGlobal Validation- identifying different cultures – are data issues thesame? Should validations/algorithms be differentbased on culture? Risk analysis in non-US country. Isengaged check different?Fall TSQCMktg Inc/ResearchNowOperationalizeSurveyScoreWhat does a change in SurveyScore really mean? Fall TSQC Kantar2013 RoR Roadmap
    • 25. Survey Validation Evaluates Respondents in Real-Time as They Complete SurveysName/Address Form*Respondentrecognized asReal?Page 0 Page 1YesNoPage 3+ Last PageRespondentReal?Unique?Engaged?Qualified?* Form can be enabled on aper-survey basis.Collect Page & Question DataStore validationstatus forreportingEnd Page Store validationstatus andSurveyScore forreportinghttp://SURVEYURL?source-id=22345&respondent-id=772822RespondentReal?Unique?Engaged?Qualified?CreateDigitalFingerprint