Your SlideShare is downloading. ×
SurveyMonkey Audience - Data Quality Whitepaper
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

SurveyMonkey Audience - Data Quality Whitepaper


Published on

SurveyMonkey Audience: an on-demand group of millions of survey respondents to take your surveys. Audience allows customers to target respondents on a wide variety of demographic and behavioral traits …

SurveyMonkey Audience: an on-demand group of millions of survey respondents to take your surveys. Audience allows customers to target respondents on a wide variety of demographic and behavioral traits to get fast, high quality, cost effective data.

Published in: Technology

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. !!Data QualityMeasuring the Quality of Online Data Sources!!A SurveyMonkey Audience White PaperOctober 2012!!!!!!!!!!!!!Data Quality: Measuring the Quality of Online Data Sources 1
  • 2. Overview!In today’s technology driven society and marketplace, research has become accessible to a wide varietyof participants; from research experts and specialists, to marketing, product and business teams, all ofwhom need to gather feedback and expect quality results. Online surveys are a very popular method ofcollecting data for both research experts and research novices alike due to their speed, efficiency andreach. As a replacement to other data collections methods (e.g., phone, in-person interview, focusgroup), online research is highly advantageous in terms of ease, speed, efficiency and cost. But toensure that quality is preserved using the online medium, understanding the key topics that impactquality is critical for researchers using online data sources.Among the many ways to evaluate the quality of a data source, the following key factors have beenshown to have a major impact on the quality and reliability of data produced for online samples. • Recruitment Channels • Attribute Accuracy • Scale & Diversity • IncentivesIn this paper, we will analyze how these factors can impact data quality and how SurveyMonkeyAudience and its respondent groups, SurveyMonkey Contribute and SurveyMonkey ZoomPanel, handlethese topics. We will also answer two key questions on the SurveyMonkey Audience respondent groups: • Are SurveyMonkey Contribute and SurveyMonkey ZoomPanel members representative of the United States population? • How closely do the responses from the Contribute and ZoomPanel members match reliable benchmarks?Data Quality: Measuring the Quality of Online Data Sources 2
  • 3. Recruitment ChannelsKey Quality Questions • How are online respondents recruited? • Which sources were used to find the people taking your survey?Overview & AlternativesWe utilize a wide range of methods in order to recruit respondents. The Internet has the capability toreach a vast portion of the population, thus making online sources an easily accessible channel forrecruitment. Focusing on a single recruitment channel (e.g., people recruited from a niche website)may yield a large number of respondents, but creates the risk of acquiring individuals with an inherentbias, thus not thoroughly representing the general (or target) population. In order to prevent bias, mostdata sources use multiple channels infused with a variety of techniques to recruit new survey takers,or use large sources that target a more diverse set of the overall population. By incorporating multiplechannels in order to reach a larger array of respondents, data sources are able to more effectivelyeradicate biases that may arise from using a single channel.Some of the many channels for recruitment include: • Online advertisements (banner or search ads) • Co-registration (offers to sign up for additional services or offers when registering for other services or offers) • In-game advertising (recruitment ads displayed during a game playing experience) • Website recruitment (offers to sign up to take surveys displayed on a website) • TV advertisements • Billboards and offline advertising to join an online serviceSurveyMonkey Audience ApproachSurveyMonkey Audience, one of the world’s largest providers of access to online survey respondents,utilizes a variety of methods to recruit new survey takers to maintain a high quality flow of newrespondents to its member registration sites. The key methods used by SurveyMonkey Audience torecruit survey respondents include:Website recruitment • SurveyMonkey is the world’s leading provider of survey solutions. A byproduct of this scale is the significant volume of traffic to its web properties and customer surveys. Over 30 million unique people visit SurveyMonkey websites, or take SurveyMonkey customer surveys each month. Not all of these visitors are survey creators or SurveyMonkey subscribers, and many are completing surveys sent to them by colleagues, friends, or service providers. SurveyMonkey Audience recruits from this large group of people, providing them with the opportunity to regularly complete surveys.Data Quality: Measuring the Quality of Online Data Sources 3
  • 4. Of the millions of people who take surveys every day on the SurveyMonkey platform, thousands of people register to take surveys. • Given the scale and reach of SurveyMonkey and its audience (over 30 million unique people per month), website recruitment is the primary method of acquiring new respondents for the SurveyMonkey Contribute business.Online advertisements & co-registration partners • SurveyMonkey Audience uses online advertising and co-registration partnerships to recruit members in the US, Canada, UK, France and Australia for its ZoomPanel member recruitment sites. Instead of using a single channel or partner, SurveyMonkey Audience works with over one dozen partners to attract a diverse set of new survey respondents from various sites that appeal to different demographics to recruit new members to ZoomPanel.Attribute AccuracyKey Quality Questions • How was the information about each respondent collected? • How is attribute information validated?Overview & AlternativesIn order to effectively navigate today’s data rich Internet environment, data sources must utilize a varietyof methods to learn about potential survey respondents. Since many research projects have specifictargeting criteria (seeking respondents with particular demographic, behavioral or attitudinalcharacteristics), the abundance of available data must be filtered efficiently in order to determine thefeasibility of projects as well as confirming the level of insightfulness the gleaned data will yield.Data providers use a vast assortment of techniques in their efforts to target respondents and allowsurvey creators to cross analyze results across different demographic, behavioral, and attitudinal traitsof respondents. Some of the methods used by providers to understand these attributes include: Profiling Asking respondents questions about themselves that can be used for targeting or in survey data analysis Screening Using screener questions at various points in a survey to filter in or out people that fit a certain criteria Data appending Leveraging outside data validation source or social networking data based on certain respondent identifiers (e.g., email address, physical address, and/or name)Data Quality: Measuring the Quality of Online Data Sources 4
  • 5. Inference Inferring information about a respondent based on cookie information from websites, or making assumptions on an attribute based on other attributes or responses to questionsAttribute information on respondents is only useful if it is accurate. Therefore, accuracy is a keyopportunity and challenge for data providers to distinguish themselves. Since services will often provideincentives and rewards (as discussed later in this paper) for participation in surveys, many pitfalls in dataaccuracy may arise if attribute information is not validated. Some of the key pitfalls include: Inaccurate attribute information Respondents providing inaccurate attribute or response information in order to qualify for more surveys and earn more rewards Duplicate respondents Respondents registering for multiple accounts to receive more survey opportunities and earn more rewards (often using different, invalid attributes in each profile) Invalid survey response data Respondents speeding through longer surveys, without considering each response option, so they can earn rewards by simply finishing a surveySo how do data providers protect against this? Different providers use a wide variety of mechanisms—some manual, some using technology—to extract outliers or validate individual members and theirattributes.SurveyMonkey Audience ApproachSurveyMonkey Audience utilizes both profiling and screening as its key methods to learn as much aspossible about respondents. When new respondents sign up to join member sites, they are asked toprovide information about themselves across a wide variety of attribute categories. This allows us totarget members and provide surveys that will be relevant to respondents, allowing customers to narrowdown their target audience to only those members who are relevant for their studies. SinceSurveyMonkey Audience maintains groups of respondents that have registered to take surveys,additional profile information can be collected and updated frequently.For certain attributes that may not have been previously profiled, customers can also useSurveyMonkey’s survey tools to add questions with skip logic to screen respondents in or out of surveys.Data Quality: Measuring the Quality of Online Data Sources 5
  • 6. SurveyMonkey Audience has taken a very proactive stance on validation and quality to ensure itscustomers get highly reliable data. In order to combat inaccurate profile information and to ensure thatrespondents are unique, real people, we use TrueSample Validation for every new respondent whoregisters for our ZoomPanel member site. We require that new respondents provide their physicaladdress and email address, which is validated using TrueSample. Only respondents with validaddresses are permitted to register, and any duplicates are removed from our member group.For our SurveyMonkey Contribute respondents, we have attempted to eliminate the incentive forrespondents to provide false information by removing the reward at the completion of a survey project.When SurveyMonkey Contribute members complete a survey, they are given the opportunity to make asmall donation (made by SurveyMonkey on their behalf) to a charity of their choice and play an instantwin sweepstakes game.Scale & DiversityKey Questions • How big is the group you are sampling from? • How diverse and representative are your survey respondents?Overview & AlternativesThe size of a potential data or sample source is a very important aspect in its ability to addresscustomers’ needs and ensure that a diverse group of people can be reached. With niche data sources,a smaller group can satisfy specific customer demands. However, in instances where a source targetsa larger consumer group, a significant subset of the overall population is required in order to provideenough reach to perform effective sampling. Recruiting from a large traffic source or leveragingrecruitment sources that reach massive audiences (like social networks or highly trafficked websites) isvery helpful in creating a large, diverse group of potential respondents.SurveyMonkey Audience ApproachSurveyMonkey Audience, per the recruitment sources described above, has access to a very largeportion of overall Internet traffic through its website recruitment and partner channels. By leveragingthese two sources, an online website recruitment channel that reaches 30M+ unique people eachmonth, and multiple partner co-registration channels which span diverse interest groups and offertypes, our member sites, SurveyMonkey Contribute and SurveyMonkey ZoomPanel, have very broadreach and allow us to recruit a large (3M+ potential respondents) and diverse group of people.IncentivesKey Questions • How are survey respondents rewarded for responding to your surveys?Data Quality: Measuring the Quality of Online Data Sources 6
  • 7. Overview & AlternativesMost people enjoy providing their opinions and being heard, particularly when they can help shape thefuture development of products or services that are relevant or interesting to them. Surveys havebecome a very popular instrument for collecting this type of feedback and organizing it in a way thatallows for quantitative analysis. However, time is valuable, especially in a world where people areconstantly on the go. Therefore, survey respondents often expect some sort of reward or benefit inexchange for their time and opinions. Using rewards and incentives for taking the time to provide honestfeedback has become a useful tool to increase the likelihood people will respond to and finish surveys.Rewards are also a way for data providers to form a relationship with their respondents and encouragethem to continue to take surveys and build scalable solutions to address customer needs.While rewards and incentives are a helpful mechanism for encouraging survey participation, they alsowield the potential to introduce quality concerns when not monitored and administered carefully. Indeed,literature in psychology suggests that motivation to complete tasks carefully diminishes when people arepaid. Data providers have come up with a wide variety of reward types and programs to help increasesurvey participation and maintain active member groups. Some of these types and programs include: Cash rewards Point or credit programs Programs that enable members to earn points or credits to redeem in the future for rewards that “cost” a certain amount that can be accrued over time with participation Currency programs Similar to point or credit programs, but leveraging a currency provided in a partner ecosystem, like rewarding Facebook credits by participating in in-game surveys that can be used to purchase gaming or other credits on a partner platform Charitable donations Donating to charity on behalf of survey participants in exchange for survey participation Information or content access Allowing participants to view premium content in exchange for survey participation Sweepstakes entries Entering survey participants into a sweepstakes to win a larger prize in exchange for survey participation Gift cards Often part of point or credit programs, but using a non-cash gift card as a reward for survey participationData Quality: Measuring the Quality of Online Data Sources 7
  • 8. Frequent flyer miles or rewards program points Similar to currency programs, awarding miles or points to other popular reward programs in exchange for survey participationRewards programs are most often crafted to appeal to the specific type of respondents taking surveys.These programs strive to increase the participation rates of people invited or exposed to a survey byproviding an appropriate reward. Certain types of people respond differently to different types of rewardsand reward sizes, which can influence the reliability of the accuracy and data provided by respondents.A negative outcome of an incentives program is that not all rewards will be equally appealing to themany types of people that may respond to a survey. Thus, rewards programs aspire to provide rewardsthat encourage participation while simultaneously eliminating any data quality or reliability issues.Therefore, data providers must monitor the impact of their rewards programs to ensure that data qualitypitfalls are avoided.Some of the common quality issues that may arise due to reward and incentive programs include: Speeding Respondents rushing through a survey to finish a survey and qualify to receive their reward Satisficing Tied to speeding, but when respondents do not thoroughly consider all answer options and pick whatever option they see that will help them quickly move through a survey Straight lining An example is “picking the first option for all answer choices”, a way to quickly move through a survey without providing honest opinions Respondent bias When reward options only appeal to a certain demographic or type of person, this may lead to a biased data set that only includes people interested in the incentive provided Response manipulation When variable reward amounts are used (typically when screener questions screen out people and provide lesser reward amounts for respondents who are screened out vs. higher amounts for respondents screened in), respondents may intentionally provide invalid response data to attempt to gain a larger reward amount by fitting into the response category they feel will allow them to increase their potential rewardWhile many of the pitfalls listed above are difficult to avoid, data providers have created tools andsystems to remove incentives for behavior that produces invalid response data. Data providers havealso begun to increasingly use tools to remove outliers from data sets, as well as removing respondentsfrom their member groups that provide inaccurate data.Data Quality: Measuring the Quality of Online Data Sources 8
  • 9. SurveyMonkey Audience ApproachSurveyMonkey Audience created its reward programs to encourage honest participation and to avoidthe common pitfalls associated with many reward types.SurveyMonkey Contribute respondents are rewarded with charitable donations, made on their behalf bySurveyMonkey. Respondents are also allowed to play a flash game that enters them into an instant winsweepstakes upon completing surveys. The combination of charitable donations and sweepstakesentries means that there is no direct material incentive waiting for respondents at the end of any surveyopportunity, therefore eliminating the instant gratification that respondents may expect upon finishingsurveys.SurveyMonkey ZoomPanel respondents are rewarded with redeemable points for merchandise, giftcards and sweepstakes entries. While respondents receive points upon the completion of any survey,the amount is nominal.Measuring Representativeness of Data SourcesKey Quality Questions • How closely do SurveyMonkey Audience respondents match the US population? • How closely do SurveyMonkey Audience responses match reliable benchmarks?OverviewData sources all take their own unique methods of proving quality and representativeness. Since not alldata sources are the same and are often used for different purposes, this varied approach of provingand measuring quality is appropriate. In any quality discussion on data, the topics of consistency andaccuracy will often surface—namely, how consistent is the data produced by a given data source andhow accurate is the data provided by a given data source. While consistency is important, accuracy isthe foundation of data quality. Being consistently wrong is typically not a desirable attribute of any datasource.SurveyMonkey Audience ApproachThe goal of our methods in recruitment, policies, processes and incentive structures is to create a groupof respondents who are able to produce data that is accurate. SurveyMonkey Audience was designedas a tool that helps people find the right answer, not just any answer. So, while we constantly measureour respondent groups to see how closely they resemble the US population and the populations ofrespondents our customers are seeking, we also test the end results. We want to see the data thoserespondents produce, and see how closely it benchmarks against known, reliable data that may costsignificantly more to purchase or field. In the following sections, we detail how SurveyMonkey Audiencemeasures the representativeness of its member groups, and the data they produce.Data Quality: Measuring the Quality of Online Data Sources 9
  • 10. How closely do the Contribute and ZoomPanel member groupsmatch the US population?If we expect our respondents to provide reliable data that will help customers make better decisions, wewant to start with a group of people that is representative of the United States population. This does notguarantee that the responses from these members will mirror what the United States population thinksand does, but it does help ensure that our respondents are diverse across various dimensions.While we benchmark various demographic attributes with the United States population, one of the mostimportant factors, which can be visually represented using census data, is the location of our members.When members sign up to take surveys on SurveyMonkey Contribute or SurveyMonkey ZoomPanel, wecapture each members zip code in the registration process. Below are six charts that show how theSurveyMonkey Contribute and SurveyMonkey ZoomPanel member groups compare to the USpopulation. The differences between the 3 maps showing the US Census information, SurveyMonkeyContribute members and SurveyMonkey ZoomPanel members for both the county level and metro arealevel density are difficult to see, which is a good thing, and shows a representative location makeup forboth member groups when compared to the US Census. !!!!!!!!Data Quality: Measuring the Quality of Online Data Sources 10
  • 11. Population Density MapsPopulation Density by County – 2010 United States Census!!Population Density by County –SurveyMonkey Contribute and ZoomPanel Members!!!!!!!!!!!!! SurveyMonkey Contribute ZoomPanel!!!Data Quality: Measuring the Quality of Online Data Sources 11
  • 12. !Population Density by Metro Area – 2010 United States Census!!!!! ! ! ! ! ! ! ! ! !!!!!!Population Density by Major Metro Area –SurveyMonkey Contribute and Zoom Panel Members!!!!!!!!!!!! SurveyMonkey Contribute ZoomPanel!Data Quality: Measuring the Quality of Online Data Sources 12
  • 13. How closely do the responses from the Contribute andZoomPanel members match reliable benchmarks?In addition to making sure that our respondents are representative of the United States population, wewant to make sure that the responses are also indicative of the general population. This helps us makesure that we arent making decisions based on survey data fielded from a group that just looks like theUnited States population but may not answer questions similarly to the United States population.SurveyMonkey Audience Quality BenchmarkingSurveyMonkey Audience has many systemic controls in place to mitigate many quality issues that mayarise, but we also look at the data provided by our respondents to ensure that it is accurate andcomparable to other reputable sources. A key benchmarking exercise we perform periodically comparesthe responses from our member groups to those of other data sources that have established themselvesas proven, reliable sources of data with solid methodological underpinnings. One such source is Gallup,a leading full service research business.The testing concept is simple; Gallup conducts phone interviews with more than 1,500 people every dayon a variety of topics. They publish the results of their surveys daily for certain measures and weekly forothers. We take one topic they ask thousands of Americans and ask SurveyMonkey Audiencerespondents the exact same question, using the exact same language and answer options.Data Quality: Measuring the Quality of Online Data Sources 13
  • 14. Compared alongside Gallup’s metrics, we found that the data and responses from our member groupsare equivalent to the data from Gallups phone survey. The charts below explain how SurveyMonkeyAudience respondents (from the SurveyMonkey Contribute member group) compare to Gallups phonesurvey respondents. Conducted over a 7-day period, our study’s results were consistently within a 5%margin of error with Gallup. !Methodology & Results ExplanationMethodologyOver a seven day period from 7/19/12 to 7/25/12, we surveyed SurveyMonkey Audience respondentsand asked them one question, “wed like you to think about your spending yesterday, not counting thepurchase of a home, motor vehicle, or your normal household bills. How much money did you spend orcharge yesterday on all other types of purchases you may have made, such as at a store, restaurant,gas station, online, or elsewhere?”Respondents were asked to input the dollar amount (a whole number) that they spent the prior day. Thewording was identical to the question asked by Gallup in its daily questionnaire to over 1,500 phonebased respondents. We used our SurveyMonkey Contribute member group for this analysis. Surveyswere launched every morning at 9am PT, and were left open for a period of 3 days.We analyzed our data to see how it stood up to Gallup, which publishes its data in two data points everyday; a 14-day trailing average and a 3-day average, whereby they average the corresponding (3 or 14day) trailing averages to produce their data point. These trailing averages are used to smooth thespending trends since data can be more volatile on a daily basis due to the day or week or macro eventsor other factors.We included 2 data points for SurveyMonkey Audience responses to compare against Gallup’s 14-dayand 3-day averages. First, we included our own raw average of the trailing 3-day average responses.Second, we performed a common manipulation called an exponential transformation to account for askew in the average household income of SurveyMonkey Audience respondents, which is higher, onaverage, than the US population per the US Census. The data points that use the manipulation arelabeled “3-day Adj. Avg.”Results ExplanationThe Audience 3-day Adj. Avg, for 5 consecutive days is within a 5% error margin of Gallup’s 14-daytrailing average. What does this mean? When applying a manipulation to correct for a higher averageincome of Audience respondents in comparison to the US census data, Audience is able to produceeffectively the same results as Gallup, with only 3-days of data instead of 14 for Gallup.Data Quality: Measuring the Quality of Online Data Sources 14
  • 15. The 3-day trends for Audience data also follow the same trend as the Gallup data using both theAudience 3-day Raw Avg and the Audience 3-day Adj. Avg. Gallup’s data, when comparing the 14-dayAvg and 3-day Avg shows more volatility in the 3-day Avg versus. the 14-day Avg. which lead us tobelieve the that Audience sample can be a healthy predictor of trends even with a smaller data set interms of the days needed to produce a stable benchmark.Since the Audience 3-day Adj. Avg is within a 5% error margin of the Gallup 14-day Avg, for the 5consecutive where the test was run (and 3-day Avg. metrics could be gathered) we believe this indicatesthat SurveyMonkey Audience is able to produce results comparable to those of Gallup. The Audience 3-day Raw Avg is consistently higher than the Gallup 14-day Avg, which our manipulation shows may behighly correlated to the higher average income of members of the Audience respondent group.ConclusionWhile data providers and methods of gathering data have grown rapidly over the past decade, it iscritical for experienced researchers and those new to research to understand and evaluate the qualityof the data produced by the providers they work with and purchase data from.While there are a variety of key factors that can impact the quality of any given data source, webelieve that most of these factors can be understood and evaluated by both experts and more noviceresearchers.SurveyMonkey Audience uses various methods to recruit, maintain and incentivize its variousmember groups of respondents. We believe that our practices encourage responsible data collectionthat will benefit our customers. We encourage any customers who purchase data in the form of surveyrespondents to hold their suppliers to a high standard and to make sure they are aware of the variousquestions and topics addressed in this paper.!!!Data Quality: Measuring the Quality of Online Data Sources 15