Data Quality Doesn’t Just Happen: And Here’s What Some of the Industry’s Most Influential Players are Doing About It by Mark Menig of TrueSample - Presented at Insight Innovation eXchange North America 2013
Data quality isn’t always the sexiest topic, but it’s critical and one that buyers and suppliers often neglect to have. The ramifications of ignoring it can cost millions of dollars. Some of the industry’s largest buyers and suppliers have found a simple solution though and it’s one that is available to everyone else too. Come here about how the issue of data quality concerns haven’t gone away, and what others are doing to make sure they and their insights are protected.
Similar to Data Quality Doesn’t Just Happen: And Here’s What Some of the Industry’s Most Influential Players are Doing About It by Mark Menig of TrueSample - Presented at Insight Innovation eXchange North America 2013
Similar to Data Quality Doesn’t Just Happen: And Here’s What Some of the Industry’s Most Influential Players are Doing About It by Mark Menig of TrueSample - Presented at Insight Innovation eXchange North America 2013 (20)
Data Quality Doesn’t Just Happen: And Here’s What Some of the Industry’s Most Influential Players are Doing About It by Mark Menig of TrueSample - Presented at Insight Innovation eXchange North America 2013
1. Data Quality Doesn‘t Just Happen
And Here‘s What Some of the Industry‘s Most Influential Players are Doing About it
June 2013
2. 1. Why is data quality an issue?
2. What are industry players doing about it?
3. Why was TrueSample created?
AGENDA
4. ―Online panels have
stormed the market
research industry,
offering access to
inexpensive
samples quickly —
but at the same
time, firms report
anxiety about the
quality of the
sample…‖
Brad Bortner
Forrester
―Industry
associations launch
major initiatives to
investigate and
restore online
research quality.‖
Industry Associations
CASRO, AMA,
ESOMAR, ARF
―P&G speaks out
about online data
quality issues at
the Client Summit
sparking industry-
wide discourse‖
Kim Dedeker
P&G & Kantar
The Market Research Industry Has Been Struggling
to Address Online Data Quality for Years
6. Panelist Duplication/Multi-Panel Membership
# of Panels
Total #
Panelists % of Total Panelist Validated
Total Responses taken by Panelists in this
section % of Responses
1 15,747,937 78.23% 4,580,489 11.21%
2 2,668,338 13.25% 7,313,709 17.90%
3 897,742 4.46% 4,962,607 12.15%
4 384,027 1.91% 3,842,079 9.40%
5 186,938 0.93% 3,088,652 7.56%
6 98,951 0.49% 2,658,842 6.51%
7 55,989 0.28% 2,314,391 5.66%
8 32,760 0.16% 2,014,186 4.93%
9 20,324 0.10% 1,764,155 4.32%
10 13,369 0.07% 1,564,083 3.83%
11 9,278 0.05% 1,415,787 3.47%
12 6,231 0.03% 1,174,433 2.87%
13 4,162 0.02% 2,278,468 5.58%
14 2,474 0.01% 663,378 1.62%
15 1,475 0.01% 692,292 1.69%
16 763 0.00% 253,483 0.62%
17 366 0.00% 139,181 0.34%
18 159 0.00% 73,002 0.18%
19 69 0.00% 30,923 0.08%
20 37 0.00% 19,850 0.05%
21 24 0.00% 9,373 0.02%
22 9 0.00% 4,373 0.01%
24 1 0.00% 0 0.00%
25 1 0.00% 0 0.00%
26 2 0.00% 0 0.00%
29 1 0.00% 0 0.00%
34 1 0.00% 0 0.00%
36 1 0.00% 0 0.00%
TOTAL 20,131,429 100.00% 40,857,736 100.00%
• 78% of submitted
and validated
panelists only
belong to a single
panel
• HOWEVER…..
• 50% of survey
responses come
from panelists
that are a
member of 5+
panels!
• Less than 1% of
total panelists
accounts for more
than 15% of
survey responses
and they are a
member of an
average of 13
panels
7. First-Hand Evidence
Project: Technology A&U study
Goal: Compare clean/unclean sample
Results of unclean sample:
• Unrealistic segmentation solutions
• Higher mean scores and SD’s
• Degradation of sensitivity of
significance tests
-From Steve Schwartz‘ presentation at ‗09 IIR Market Research Event
Takeaway: Data from unclean sample would have led to different business decisions
Clients Are Able to Identify Analytical Issues with
Data Quality in Online Research Projects
Top Tech
Firm
8. First-Hand Evidence Project: Product launch in-home usage study (IHUT)
Goal: Test product against 3 discrete sample populations
and ready for commercial product launch
Results:
• Lack of quality controls resulted in 50% of
respondents receiving more than one product during
the usage period
• Research Impact: All three studies had to be
reviewed – key measures were undeterminable
• Business Impact: Estimated loss in revenue of $15
million due to delays not to mention tarnished
reputation with retailers
Takeaway: Lack of quality controls/measures can cause
significant rework and expense
Clients Experience Operational Issues with Data
Quality in Online Research Projects
Top CPG
Firm
9. Quantifying the Risk of Bad Respondents
• Risk Ratio is defined as the ratio of the probability of getting a wrong answer to the baseline
probability of 5%, based on sampling theory.
• Clients on average see 20% or more of respondents in their survey failing at least one quality
check of TrueSample meaning that their risk of not applying TrueSample is doubled!
11. Data from Confirmit 2012 Annual MR Software Survey
• Penny for your thoughts – most online surveys today are incentivised
– Nearly six out of 10 (57%) of research companies are using incentivised panels for
between two-thirds and 100% of their samples. Only a few (7%) are not using rewards
at all.
• Independent panel verification is the exception not the norm
– Around three-quarters (76%) of panel operators do not subscribe to independent panel
verification services. Even among large companies 58% do not do this.
• Most MR companies run simple fraud prevention checks on online
responses
– Most companies are checking for speeding by respondents (73%) and nearly two- thirds
(63%) look for ‘straightlining’: two quality control methods that many data collection
tools make easy to apply.
• More thorough respondent fraud checks are largely shunned
– Just over a half surveyed (52%) use challenge questions, and fewer still some of the
more high tech methods.
13. As More Clients Apply Standard Online Research
Quality Requirements TrueSample Will Help Clients
Meet Them
14. FoQ2 is Counting on the TrueSample Quality Council
From the FOQ 2 analyses and insights:
• The ARF and FoQ2 participants will produce important findings and deliver
new guidelines with strong recommendations over the next few months.
• The ARF is counting on TrueSample and the TrueSample Quality Council to
help translate FoQ2 learning into advanced online research practice
applications and Research-on-Research.
15. Companies Coming Together to Create an
Industry Standard
Clients:Clients:
Sample Suppliers:Sample Suppliers:
Research Companies:Research Companies:
Survey Platforms:Technology Platforms:
Federated
17. Survey Design
& Creation
Panel Management
& Selection
Data
Collection
Analyze &
Improve
SAMPLE
TrueSample: Provide a consistent and scalable data
quality platform for online research
18. TrueSample: Help people seeking insights make
better decisions
Through applying the best
available, independent, and
comprehensive data quality
solution in every country where they
conduct online quantitative research.
Through reducing the risk of
making poor decisions as a result
of applying TrueSample technology
and algorithms to respondents and
survey instruments to systematically
and comprehensively eliminate "bad"
data wherever possible.
5
19. Research-on-Research (RoR) has Been the
Foundation of the TrueSample Quality Council
Past RoR
Examples
Impact of Identify
Verification on
Hard-to-Reach
Groups
Impact of
Chronically
Unengaged
Respondents on
Data Quality
Impact of Survey
Design on Data
Quality
Impact of ‗Bad‘
Respondents on
Business
Decisions
Real
Check
Postal
SurveyScore
Engagement
Check
Real
Check
Social
―The goal of the TrueSample Research-on-Research Sub-Committee is to drive a research
agenda that identifies and provides empirical evidence related to techniques that can be
incorporated into the TrueSample product to maintain or enhance research data quality, an
important component in minimizing the risk of incorrect business decisions.‖
20. Social Media
& River Sample
Mobile Device
Data Collection
More Robust
Analytics &
Question Types
TrueSample Quality Council
RoR Sub-Committee
Prioritization Process
DESIGN FIELD ANALYZE
Results will inform TrueSample product roadmap
From RoR to Product Roadmap
Alternative
Identity Validation
Variables
• Phase 1 = ROR efforts to ascertain
WHAT challenges need to be solved
for
• Phase 2 = ROR efforts to to ascertain
HOW to solve for the challenges
21. › Panelists are who and where they say they are
› Identity validation with reputable, third-party databases
Real
› Panelists unique within and across all Certified Panels
› Machine fingerprinting ensures no duplicate survey takers
Unique
› No straight-lining respondents
› No speeding respondents
Engaged
› Respondents meet exclusion criteria for survey
› Respondents‘ survey-taking behavior tracked over time
Qualified
› Predictive models improve survey design before launch
› Actual survey engagement scored and benchmarked
SurveyScore
TrueSample is a Technology that Provides
Consistent, Objective, and Automated Quality
23. TrueSample 2013 Initiatives
• Consistency scoring at the panel level to help aid in the selection of the most
appropriate panel for a particular study as well as proactively identify any
significant changes to a panel over time that may effect results in a study
• Consistency scoring at the individual panelist level to aid in the removal of
panelists that habitually provide responses that are inconsistentCONSISTENT
• Brings the full benefits of Real, Unique, Qualified, Engaged, Consistent, and
SurveyScore to mobile devices
• Specifically designed for app based research being conducted on smart
phone and or tablet based devices (iOS, Android, etc.)
MOBILE
• Extends Panelist Validation functionality around Real and Unique to sample
sources utilizing a river sampling methodology
• Optimizes record submission process and functionality for suppliers utilizing
a river sampling methodology
RIVER SAMPLE
24. PROJECT PROJECT DESCRIPTION EXPECTED COMPLETION
PROJECT
LEADER
TrueSample
Consistent
- Identify consistency of an individual panelist
- Identify consistency of a panel
Phase 1-June 2013
Phase 2-October
2013
TrueSample
Engagement
Algorithm Redesign
- Replace the current parametric approach with non-
parametric clustering-based algorithm
September 2013 TrueSample
Mobile Surveys
- Compare and contrast user’s browsing patterns for
mobile vs. desktop based surveys.
- Understand the impact of a shorter survey, grid
effect, straight lining, speeding, etc.
October
2013
TrueSample
Dynamic/River Sample
- Optimize real-time panelist validation process
- Understand impact of including this sample type in
surveys
Phase 1-April 2013
Phase 2-August
2013
TrueSample
Global Validation
- identifying different cultures – are data issues the
same? Should validations/algorithms be different
based on culture? Risk analysis in non-US country. Is
engaged check different?
Fall TSQC
Mktg Inc/
Research
Now
Operationalize
SurveyScore
What does a change in SurveyScore really mean? Fall TSQC Kantar
2013 RoR Roadmap
25. Survey Validation Evaluates Respondents in Real-
Time as They Complete Surveys
Name/Address Form*
Respondent
recognized as
Real?
Page 0 Page 1
Yes
No
Page 3+ Last Page
Respondent
Real?
Unique?
Engaged?
Qualified?
* Form can be enabled on a
per-survey basis.
Collect Page & Question Data
Store validation
status for
reporting
End Page Store validation
status and
SurveyScore for
reporting
http://SURVEYURL?source-id=22345&respondent-id=772822
Respondent
Real?
Unique?
Engaged?
Qualified?
Create
Digital
Fingerprint
Editor's Notes
The load test was conducted without stopping external traffic. Therefore, for each load levels above there was an additional background load generated by the current production traffic.
The load test was conducted without stopping external traffic. Therefore, for each load levels above there was an additional background load generated by the current production traffic.
The load test was conducted without stopping external traffic. Therefore, for each load levels above there was an additional background load generated by the current production traffic.
The load test was conducted without stopping external traffic. Therefore, for each load levels above there was an additional background load generated by the current production traffic.
Support for more …Question Types:For example: Currently, no “open end” question types are analyzed by TrueSample. However, there is reason to believe that the prevalence of vulgarities in “open ends” strongly correlates with lack of engagement.Sample Types: For example: Currently, TrueSample’s Global product can help improve non-consumer research. RealCheck data validation partners are focused only on consumers. However, databases for other types of sample (B2B, Doctors, etc.) may be able to expand RealCheck capabilities to adding market segments.Countries: For example: Currently, TrueSample’s Global product can help improve research in all countries. RealCheck data validation partners are focused only on twelve of them. However, databases for other countries (China, Japan, South Korea, etc.) can expand the list of supported geographies.Platforms: TrueSample can support more survey, recruitment, router/sample aggregator, and mobile platforms.RealCheck Types: For example: Currently, RealCheck supports Postal, Social and Local, but Phone Numbers and other identifiers may be possible to include for additional flexibility of the product.Decision Making Criteria:Quality ImprovementYieldPriceTechnology/AutomationEngagement: Clustering vs. Parametric