Online Research Quality the Next Frontier


Published on

A white paper sharing the TrueSample team's perspective on the next wave of issues impacting research quality and how the online market research industry must evolve to address them.

Published in: Business, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Online Research Quality the Next Frontier

  1. 1. Online Research Quality: The Next Frontier A TrueSample Perspective Yes, technology has already improved online research quality dramatically. But data quality issues persist—and new challenges are emerging. The advent of real-time sampling techniques, the proliferation of mobile devices, and the soaring popularity of social media have created new, more nuanced questions about data quality—and increased demand for new approaches and solutions. How will your company—and our industry—respond? Here are key questions every buyer of online sample should be asking today, along with the TrueSample approach to delivering next-generation data quality solutions.A TrueSample White Paper 1
  2. 2. Online Research Quality: The Next FrontierIs Online Data Quality Still a Problem?Since the beginning of the new millennium there have been nagging suspicions that the datagenerated from online panels can’t be completely trusted. The risk of fake or duplicate respondents,habitually unengaged panelists, professional survey takers, “gamers,” “straightliners,” and“satisficers,” those who provide unusually positive responses, have all raised concerns. While someof these problems have been largely tamed by technology or controlled by panel management bestpractices, some issues persist and others grow more imminent. For example:  Under- or over-representation of certain groups in online panels introduces bias that can Clearly, the market impact data quality. research industry has  Declining use of email creates new challenges for reaching viable survey respondents and raised the bar on what is delivering engaging survey design. considered acceptable  Biases due to panel tenure or membership in multiple panels cast doubt on the ability to quality for online research sample. achieve consistent survey results over time. Although the largestThe key question today is whether—and to what extent—these issues impact research results and issues have been tamed,business decisions. This paper addresses the question in two ways. First, to provide context, it more nuanced questionsbriefly summarizes how the industry has responded to data quality issues over the past few years. about data qualityThen it examines emerging issues and describes the approach the TrueSample team is taking to continue to emerge—andmaximize data quality in the years ahead. must be addressed.How Has the Industry Addressed Data Quality Concerns?Since the advent of the online era, market research firms and sample providers have responded—though not always in a coordinated way—to stamp out the threat of unreliable data. The first stepwas to quantify how much bad data and “bad actors” were influencing research results.In a detailed analysis of a typical online panel, the Advertising Research Foundation (ARF) foundthey could identify 20% corrupt sample, meaning respondents who exhibit “bad behaviors”.1 TheTrueSample team’s own research showed that relying on data from “bad” respondents who arefound to be fake, duplicate or unengaged increased the risk of making the wrong decision by asmuch as 50%. These and other findings prompted action by the market research industry, and anumber of efforts emerged:  Research vendors developed their own manual data cleaning and data weighting techniques.  Machine fingerprinting and identity validation technologies used within other industries were applied to online panels.  TrueSample was introduced to eliminate fake, duplicate and unengaged survey respondents (see Figure 1).  The Advertising Research Foundation developed the Quality Enhancement Process to help clients and research vendors engage in structured conversations about online data quality.  The TrueSample Quality Council issued Online Research Quality Guidelines for all research buyers to follow when choosing vendors (for details see the TrueSample Quality Council’s Online Consumer Research Quality Guidelines).21 Source: “The Online Panel Quality Wars,” by Brad Bortner, Forrester Research, November 20, 2009, footnote 7.2 URL: TrueSample White Paper
  3. 3. Online Research Quality: The Next FrontierClearly, the market research industry has raised the bar on what is considered acceptable quality foronline research sample. In the process it has made sample buyers more confident in the businessdecisions that derive from market research.Unfortunately, neither time nor technology stands still. While the largest issues have been tamed,more nuanced questions about data quality continue to emerge—and must be addressed. How Does TrueSample Solve or Control Data Quality Issues? Introduced in 2008, TrueSample is now used by more than 100 research groups and panel companies to ensure data quality across multiple sample sources and survey platforms. It uses a combination of real-time technologies to provide: • Elimination of fakes. TrueSample uses third-party databases to validate all prospective panelists and survey respondents to guarantee that they are who they say they are. • Prevention of duplicates. Sophisticated digital fingerprinting eliminates duplicate respondents from panels and surveys, ensuring that no individual can take the same survey twice. • Assurance of true engagement. Survey engagement technology eliminates speeders and straightliners in real time, and SurveyScore quantifies the panelist experience by providing benchmarks of perception and engagement behavior (for details see . Not real 24.3% Not unique 2.8% Not engaged TrueSample 1.65% 71.25% Figure 1: An average of 28.8% of panelists are rejected by TrueSample.A TrueSample White Paper
  4. 4. Online Research Quality: The Next FrontierWhere Should the Research Industry Focus Now?To continue to improve the quality of online research—and to fully exploit the new opportunitiesthe online era holds for market research—the industry should turn its attention to three specificareas:1. Real-time and social media sampling methods In response to declining online panel membership and email usage, and as a means to solicit feedback from hard-to-reach groups such as 18-24 year olds, many researchers are turning to websites and social networks as an active recruitment source for surveys. Soliciting potential survey takers while they are visiting websites (real-time sampling, also known as river sampling) and sampling from web-based social networking sites may provide fast access to very specific Soliciting potential groups, users, and demographics, but it also introduces new questions about how to ensure data survey takers while they quality. For example: are visiting websites  Do real-time survey respondents answer surveys differently than respondents (real-time sampling, also known as river sampling) sourced from online panels? and sampling from web-  Will survey takers sourced from real-time sample or social media sample provide based social networking their names and addresses for address verification, or are alternative means of sites may provide fast identity verification required? access to very specific groups, users, and  Will real-time respondents take the 20- to 30-minute surveys market researchers demographics, but it also typically design, or should surveys be redesigned in shorter formats for real-time introduces new participants? questions about how to ensure data quality. In answer to the first question, preliminary research by the TrueSample team shows that real-time survey takers exhibit the same satisficing behavior as newer panelists, meaning that they provide unusually positive responses, even if the real-time survey takers are also on many other panels and have long panel tenures. This satisficing behavior can introduce bias into research results unless the correct data quality measures and tenure balances are put in place. Equally important, researchers suspect that survey takers on the web or social media networks may be less tolerant of long, complicated surveys that may interrupt their online experience. Therefore respondent engagement for this segment needs to be measured and benchmarked to determine if survey design must be recalibrated. There are also concerns that real-time survey takers may not be willing to provide name and address information before taking surveys because it feels like a privacy violation—thereby eliminating the ability to validate the respondents’ identities. So what does all of this mean in terms of data quality solution requirements? The TrueSample team anticipates that sample buyers will demand solutions that deliver the following:  Consistent quality assurance when blending sampling methods. To address some of the tenure- and panel-membership-related biases present in real-time samples, researchers will likely need to blend online or offline panelists with real- time respondents, to reach a more balanced and representative sample. This sample will need to be “cleansed” using a data quality solution that can be consistently applied across all sampling methods and ensures that all respondents are real, unique and engaged.A TrueSample White Paper
  5. 5. Online Research Quality: The Next Frontier  Mechanisms for measuring and improving respondent engagement. Surveys will need to be optimized to effectively engage specific types of survey takers. Research has already led to the development of TrueSample SurveyScore® and SurveyScore® Predictor, which help to optimize online survey design to achieve the highest engagement levels among respondents, but these tools must be applied to real-time and social media sampling techniques for measurement and benchmarking that is specific to the respondent audiences of these sampling methods.  Creative use of profile data for identity validation. A great deal of identity verification data already exists online (see Figure 2). The industry will need to get creative about using “social sign-on” and other existing profile information to validate respondents’ identities, rather than ask for name and address information during a survey. Profile Data Available on Social Networks, May 2010 Facebook Twitter Yahoo! Google MySpace Linkedin Aol Name Email Nickname Photo Profile URL Birthday Gender Location Social graph Additional profile information Source: Gigya, Multiple Identities, July 7, 2010. Figure 2: A variety of identity verification data exists in online profiles.A TrueSample White Paper
  6. 6. Online Research Quality: The Next Frontier2. Mobile survey modalities The gizmos people use to access the Internet and communicate with each other—and with market researchers—are evolving at a jaw-dropping rate. iPads, Android phones, Nook readers, Kindles, Netbooks, and whatever’s next on the horizon all point to the development of a new set of survey modalities that will impact the quality of market research data. Early adopters of these devices tend to be younger and more af pre-teens (the emerging generation of survey-takers), online chats and text messaging have supplanted email as the preferred communication vehicle. Additionally, adoption of mobile communications is accelerated in hard-to-reach European and Asian markets. For these reasons, market If respondents using researchers are starting to pay attention to mobile devices as a mechanism for collecting quantitative survey feedback. mobile devices and tablets differ from those The key questions that need to be addressed: respondents using computers, we will need  How do we get people to take surveys on mobile phones and other instant access to account for those platforms when it’s inevitable that their attention will be fragmented by the other demographic differences activities they pursue on these devices? to prevent biased results.  How is representivity affected when they do respond to our surveys?  How do we optimize survey design to maximize engagement on these devices? What capabilities should next-generation data quality solutions provide to achieve reliable quality in mobile surveys?  Mode-based sample balancing. If respondents using mobile devices and tablets differ from those respondents using computers, we will need to account for those demographic differences to prevent biased results. Next-generation data quality solutions will need to help researchers blend and balance sample using different modalities to achieve representativeness.  Mechanisms for measuring and improving respondent engagement. Surveys will need to be optimized for the engagement of specific types of survey modalities. SurveyScore and SurveyScore Predictor, two features of TrueSample, help to optimize online survey design to maximize engagement levels among online respondents, but new norms and predictive models must be built using mobile survey data to bring these same measurement and benchmarking capabilities to mobile survey-taking.A TrueSample White Paper
  7. 7. Online Research Quality: The Next Frontier3. Ongoing concerns over representivity According to Forrester Research, online panel-based research is now the dominant mode for quantitative research. But questions linger about how representative online panelists really are and whether or not we’ve exacerbated the problem with data quality solutions that verify identities using consumer databases. For example:  Is there something inherently different about the types of people who join certain online panels?  Do these differences impact their survey responses?  Do data quality solutions increase these biases by rejecting particular types of respondents in greater numbers? The TrueSample team has identified three key issues related to the representivity of online panelists that can be addressed by a next-generation data quality solution. First, the length of time panelists have belonged to an online panel, or their “panel tenure” may impact those panelists’ survey responses. Specifically, the newer panelists are to a panel, the more likely they are to “satisfice” or provide unusually positive responses. Second, the number of online panels that panelists belong to or their “panel membership” can impact their responses and can increase their likelihood for survey-taking hyperactivity. TrueSample research has shown that multi-panel members show a higher score bias, meaning that they provide more positive responses than single-panel members and may thereby impact the reliability of research results. Third, there is clear evidence of underrepresentation of certain demographic groups within online panels. For example, 18-24-year-olds and Hispanics are historically hard to find in online panels. This underrepresentation is aggravated by traditional identity validation techniques because these are “high-velocity” segments; in other words both groups tend to move and change their address more frequently than other segments. So using name and mailing address to validate identity may not be a good test for panel inclusion in these groups, because it causes them to fail the “real” test in disproportionate numbers. 120.0% 120.0% Real Not Real Duplicate Overall Real % Real Not Real Duplicate Overall Real % 100.0% 100.0% 80.0% 80.0% 60.0% 60.0% 40.0% 40.0% 20.0% 20.0% 0.0% 0.0% 18-24 25-34 35-44 45-54 55-64 65+ 0 1 2 3 4 6 5 (White) (Black) (Native (Asian (Other) (Decline (Hispanic) American) & Pacific to answer) Islander) Figure 3: TrueSample pass-through rates by age (left) and race (right) indicate that 18-24- year-olds and Hispanics fail identity validation more frequently than other segments.A TrueSample White Paper
  8. 8. Online Research Quality: The Next FrontierThe interrelationships between these three issues are complex, and quantification of theimpact on market research quality—individually or collectively—remains incomplete.However, it is clear that next-generation data solutions will need to evolve to addressinconsistencies with sample representivity. Specifically, next-generation data qualitysolutions will require the following attributes:  New data sources for identify validation. To reduce the likelihood of falsely rejecting survey respondents who may be “real” but can’t be validated due to frequently changing addresses or a lack of offline identify information, data quality solutions must employ additional data sources for identify validation using attributes such as email addresses and social networking profile data. TrueSample has begun to use additional data sources to validate offline and online identities and to reduce over-rejection particularly in high-velocity demographics such as 18-24 year olds and Hispanics.  Balancing on panelist tenure and behavior. Data quality solutions such as TrueSample already allow users to evaluate the blend of panelists by their tenure and panel membership, and can allow users to filter by sample source to break out validation results for respondents from each individual sample source; however, additional advancements are needed. The next step toward mitigating the potential impact of “high-velocity” segments will be to provide sophisticated panelist behavior modeling so that sample can be proactively balanced on tenure, memberships, and survey taking frequency for consistent research results.The Questions Will Keep Coming. So Will the Answers.The latest wave of technological innovation presents exciting new opportunities for quantitative marketresearch, but the industry needs better quality control mechanisms to fully exploit those opportunities.Questions and concerns about data quality will continue to evolve. No one can claim to have all the answers,but our goal is to ask the right questions and explore the right avenues as we continue to guide the industry inassuring the highest possible data quality.A TrueSample White Paper