Successfully reported this slideshow.

Web Survey Methodology: Interface Design, Sampling and Statistical Inference

3,926 views

Published on

In the last two decades, Web or Internet surveys have had a profound impact on the survey
world. The change has been felt mostly strongly in the market research sector, with many
companies switching from telephone surveys or other modes of data collection to online surveys.
The academic and public policy/social attitude sectors were a little slower to adopt, being more
careful about evaluating the effect of the change on key surveys and trends, and conducting
research on how best to design and implement Web surveys. The public sector (i.e., government
statistical offices) has been the slowest to embrace Web surveys, in part because the stakes are
much higher, both in terms of the precision requirements of the estimates and in terms of the
public scrutiny of such data. However, National Statistical Offices (NSOs) are heavily engaged
in research and development with regard to Web surveys, mostly notably as part of a mixedmode
data collection strategy, or in the establishment survey world, where repeated measurement
and quick turnaround are the norm. Along with the uneven progress in the adoption of Web
surveys has come a number of concerns about the method, particularly with regard to the
representational or inferential aspects of Web surveys. At the same time, a great deal of research
has been conducted on the measurement side of Web surveys, developing ways to improve the
quality of data collected using this medium.
This seminar focuses on these two key elements of Web surveys — inferential issues and
measurement issues. Each of these broad areas will be covered in turn in the following sections.
The inferential section is largely concerned with methods of sampling for Web surveys, and the
associated coverage and nonresponse issues. Different ways in which samples are drawn, using
both non-probability and probability-based approaches, are discussed. The assumptions behind
the different approaches to inference in Web surveys, the benefits and risks inherent in the
different approaches, and the appropriate use of particular approaches to sample selection in Web
surveys, are reviewed. The following section then addresses a variety of issues related to the
design of Web survey instruments, with a review of the empirical literature and practical
recommendations for design to minimize measurement error.

  • Be the first to comment

  • Be the first to like this

Web Survey Methodology: Interface Design, Sampling and Statistical Inference

  1. 1. XXIV NAZIOARTEKO ESTATISTIKA MINTEGIA SEMINARIO INTERNACIONAL DE ESTADÍSTICA 2011Web Survey Methodology:Interface Design, Sampling and Statistical Inference Mick Couper EUSKAL ESTATISTIKA ERAKUNDEA INSTITUTO VASCO DE ESTADÍSTICA 53
  2. 2. Web Inkesten Metodologia:Interfazeen diseinua, laginketa eta inferentzia estatistikoa Web Survey Methodology: Interface Design, Sampling and Statistical Inference Metodología de Encuestas Web: Diseño de interfaces, muestreo e inferencia estadística Mick Couper University of Michigan Institute for Social Research e-mail: MCouper@umich.edu 11/11/2011
  3. 3. Lanketa / Elaboración:Euskal Estatistika ErakundeaInstituto Vasco de Estadística (EUSTAT)Argitalpena / Edición:Euskal Estatistika ErakundeaInstituto Vasco de EstadísticaDonostia – San Sebastián, 1 – 01010 Vitoria – GasteizEuskal AEko AdministrazioaAdministración de la C.A. de EuskadiAle-kopurua / Tirada:100 ale / ejemplaresXI-2011Inprimaketa eta Koadernaketa:Impresión y Encuadernacion:Composiciones RALI, S.A.Costa, 10-12 - 7ª - 48010 BilbaoI.S.B.N.: 978-84-7749-468-3Lege-gordailua / Depósito Legal: BI 2993-2011
  4. 4. AURKEZPENANazioarteko Estatistika Mintegia antolatzean, hainbat helburu bete nahi ditu EUSTAT- Euskal Esta-tistika Erakundeak:– Unibertsitatearekiko eta, batez ere, Estatistika-Sailekiko lankidetza bultzatzea.– Funtzionarioen, irakasleen, ikasleen eta estatistikaren alorrean interesatuta egon daitezkeen guztien lanbide-hobekuntza erraztea.– Estatistika alorrean mundu mailan abangoardian dauden irakasle eta ikertzaile ospetsuak Euskadira ekartzea, horrek eragin ona izango baitu, zuzeneko harremanei eta esperientziak ezagutzeari dago- kienez.Jarduera osagarri gisa, eta interesatuta egon litezkeen ahalik eta pertsona eta erakunde gehienetarairistearren, ikastaro horietako txostenak argitaratzea erabaki dugu, beti ere txostengilearen jatorrizkohizkuntza errespetatuz; horrela, gai horri buruzko ezagutza gure herrian zabaltzen laguntzeko. Vitoria-Gasteiz, 2011ko Azaroa JAVIER FORCADA SAINZ EUSTATeko Zuzendari Nagusia PRESENTATIONIn promoting the International Statistical Seminars, EUSTAT-The Basque Statistics Institute wishesto achieve several aims:– Encourage the collaboration with the universities, especially with their statistical departments.– Facilitate the professional recycling of civil servants, university teachers, students and whoever else may be interested in the statistical field.– Bring to the Basque Country illustrious professors and investigators in the vanguard of statistical subjects, on a worldwide level, with the subsequent positive effect of encouraging direct relation- ships and sharing knowledge of experiences.As a complementary activity and in order to reach as many interested people and institutions as possi-ble, it has been decided to publish the papers of these courses, always respecting the original languageof the author, to contribute in this way towards the growth of knowledge concerning this subject inour country. Vitoria-Gasteiz, November 2011 JAVIER FORCADA SAINZ General Director of EUSTAT III
  5. 5. PRESENTACIÓNAl promover los Seminarios Internacionales de Estadística, el EUSTAT-Instituto Vasco de Estadísticapretende cubrir varios objetivos:– Fomentar la colaboración con la Universidad y en especial con los Departamentos de Estadística.– Facilitar el reciclaje profesional de funcionarios, profesores, alumnos y cuantos puedan estar intere- sados en el campo estadístico.– Traer a Euskadi a ilustres profesores e investigadores de vanguardia en materia estadística, a nivel mundial, con el consiguiente efecto positivo en cuanto a la relación directa y conocimiento de expe- riencias.Como actuación complementaria y para llegar al mayor número posible de personas e Institucionesinteresadas, se ha decidido publicar las ponencias de estos cursos, respetando en todo caso la lenguaoriginal del ponente, para contribuir así a acrecentar el conocimiento sobre esta materia en nuestro País. Vitoria-Gasteiz, noviembre 2011 JAVIER FORCADA SAINZ Director General de EUSTAT IV
  6. 6. BIOGRAFI OHARRAKMick P. Couper irakasle ikertzailea da Frankfurteko Gizarte Ikerketarako Institutuko Inkesten Iker-ketako Zentroan eta Marylandeko Unibertsitateko Inkesten Metodologiari buruzko Baterako Pro-graman. Soziologiako doktorea da Rhodesko Unibertsitatean, gizarte ikerketa aplikatuko lizentziatuaMichigango Unibertsitatean, eta gainera, Gizarte Zientzietako lizentziatua Mendebaldeko LurmuturHiriko Unibertsitatean. Nonresponse in Household Intervieuw Surveys [Erantzun eza etxebizitzapartikularretan egindako inkestetan] lanaren egilekidea da, Computer Assisted Survey InformationCollection [Inkesten informazioaren bilketa informatizatua] bildumaren editore burua, Survey Me-thodology [Inkesten Metodologia] lanaren egilekidea (horiek guztiak Wileyk argitaratuak) eta De-signing Effective Web Surveys (Cambridge) [Web inkesten diseinu eraginkorra] lanaren egilea. Gauregun darabilen ikerketa-ildoaren ardatza, inkestatzaileek zein inkestatuek inkestak egiteko teknologiaerabiltzea da. Azken 10 urteotan inkesten web diseinuari eta horren aplikazioari buruzko ikerketazabala egin du. BIOGRAPHICAL SKETCHMick P. Couper is a Research Professor in the Survey Research Center at the Institute for SocialResearch and in the Joint Program in Survey Methodology at the University of Maryland. He has aPh.D. in sociology from Rhodes University, an M.A. in applied social research from the Universityof Michigan and an M.Soc.Sc. from the University of Cape Town. He is co-author of Nonresponsein Household Interview Surveys, chief editor of Computer Assisted Survey Information Collection,co-author of Survey Methodology (all published by Wiley), and author of Designing Effective WebSurveys (Cambridge). His current research interests focus on aspects of technology use in surveys,whether by interviewers or respondents. He has conducted extensive research on Web survey designand implementation over the past 10 years. NOTAS BIOGRÁFICASMick P. Couper es profesor investigador en el Centro de Investigación de Encuestas del Instituto deInvestigación Social así como en el Programa Conjunto sobre Metodología de Encuestas de la Univer-sidad de Maryland. Posee un doctorado en Sociología por la Universidad de Rhodes, una licenciaturaen investigación social aplicada de la Universidad de Michigan, además de una licenciatura en Cienciassociales en la Universidad del Cabo Occidental. Es coautor de Nonresponse in Household IntervieuwSurveys [No respuesta en encuestas realizadas en domicilios particulares], editor jefe de ComputerAssisted Survey Information Collection [Recopilación informatizada de Información de Encuestas],coautor de Survey Methodology [Metodología de Encuestas] (todos publicados por Wiley) y autor deDesigning Effective Web Surveys (Cambridge) [Diseño efectivo de encuestas web]. Su línea de inves-tigación actual se centra en la utilización de la tecnología para la realización de encuestas, tanto por losencuestadores como por los encuestados. Ha llevado a cabo una amplia investigación sobre el diseñoweb de encuestas y su aplicación durante los últimos 10 años. V
  7. 7. IndexIntroduction ............................................................................................................................. 3Part 1. Inference in Web Surveys .................................................................................... 5 1.1 Sampling ............................................................................................................................. 5 1.2 Coverage ............................................................................................................................. 9 1.3 Nonresponse........................................................................................................................ 11 1.4 Correcting for Selection Biases........................................................................................... 15 1.5 Online Access Panels .......................................................................................................... 19 1.6 Web Surveys as Part of Mixed-Mode Data Collection....................................................... 24 1.7 Summary on Inferential Issues............................................................................................ 26Part 2. Interface Design ....................................................................................................... 27 2.1 Measurement Error ............................................................................................................. 27 2.2 Measurement Features of Web Surveys.............................................................................. 28 2.3 Paging versus Scrolling Design .......................................................................................... 30 2.4 Choice of Response or Input Formats................................................................................. 32 2.5 The Design of Input Fields.................................................................................................. 33 2.6 The Use of and Design of Grid or Matrix Questions.......................................................... 36 2.7 Images in Web Surveys....................................................................................................... 39 2.8 Running Tallies ................................................................................................................... 42 2.9 Progress Indicators.............................................................................................................. 44 2.10 Summary on Design Issues ............................................................................................... 47Tables & Figures Table 1. Types of Web Survey Samples ................................................................................... 6 Figure 1. World Internet Use over Time................................................................................... 10 Figure 2. Participation Rates for Comparable Samples from Same Vendor ............................ 19 1
  8. 8. Figure 3. Scrolling Survey Example ......................................................................................... 31 Figure 4. Paging Survey Example............................................................................................. 31 Figure 5. Example of Question with Template ......................................................................... 35 Figure 6. Alternative Versions of Date of Birth Question ........................................................ 36 Figure 7. Extract of Grid from Couper et al. (2011) ................................................................. 39 Figure 8. Low and High Frequency Examples of Eating Out Behavior ................................... 40 Figure 9. Images from Self-Reported Health Experiments....................................................... 41 Figure 10. Example of a Running Tally.................................................................................... 43 Figure 11. Example of Complex Running Tally....................................................................... 44 Figure 12. Examples of Progress indicators.............................................................................. 45References................................................................................................................................. 49 2
  9. 9. IntroductionIn the last two decades, Web or Internet surveys have had a profound impact on the surveyworld. The change has been felt mostly strongly in the market research sector, with manycompanies switching from telephone surveys or other modes of data collection to online surveys.The academic and public policy/social attitude sectors were a little slower to adopt, being morecareful about evaluating the effect of the change on key surveys and trends, and conductingresearch on how best to design and implement Web surveys. The public sector (i.e., governmentstatistical offices) has been the slowest to embrace Web surveys, in part because the stakes aremuch higher, both in terms of the precision requirements of the estimates and in terms of thepublic scrutiny of such data. However, National Statistical Offices (NSOs) are heavily engagedin research and development with regard to Web surveys, mostly notably as part of a mixed-mode data collection strategy, or in the establishment survey world, where repeated measurementand quick turnaround are the norm. Along with the uneven progress in the adoption of Websurveys has come a number of concerns about the method, particularly with regard to therepresentational or inferential aspects of Web surveys. At the same time, a great deal of researchhas been conducted on the measurement side of Web surveys, developing ways to improve thequality of data collected using this medium.This seminar focuses on these two key elements of Web surveys — inferential issues andmeasurement issues. Each of these broad areas will be covered in turn in the following sections.The inferential section is largely concerned with methods of sampling for Web surveys, and theassociated coverage and nonresponse issues. Different ways in which samples are drawn, usingboth non-probability and probability-based approaches, are discussed. The assumptions behindthe different approaches to inference in Web surveys, the benefits and risks inherent in thedifferent approaches, and the appropriate use of particular approaches to sample selection in Websurveys, are reviewed. The following section then addresses a variety of issues related to thedesign of Web survey instruments, with a review of the empirical literature and practicalrecommendations for design to minimize measurement error. 3
  10. 10. A total survey error framework (see Deming, 1944; Kish, 1965; Groves, 1989) is useful forevaluating the quality or value of a method of data collection such as Web or Internet surveys. Inthis framework, there are several different sources of error in surveys, and these can be dividedinto two main groups: errors of non-observation and errors of observation. Errors of non-observation refer to failures to observe or measure eligible members of the population of interest,and can include coverage errors, sampling errors, and nonresponse errors. Errors of non-observation are primarily concerned about issues of selection bias. Errors of observation are alsocalled measurement errors (see Biemer et al., 1991; Lessler and Kalsbeeck, 1992). Sources ofmeasurement error include the respondent, the instrument, the mode of data collection and (ininterviewer-administered surveys) the interviewer. In addition, processing errors can affect alltypes of surveys. Errors can also be classified according to whether they affect the variance orbias of survey estimates, both contributing to overall mean square error (MSE) of a surveystatistic. A total survey error perspective aims to minimize mean square error for a set of surveystatistics, given a set of resources. Thus, cost and time are also important elements in evaluatingthe quality of a survey. While Web surveys generally are significantly less expensive than othermodes of data collection, and are quicker to conduct, there are serious concerns raised abouterrors of non-observation or selection bias. On the other hand, there is growing evidence thatusing Web surveys can improve the quality of the data collected (i.e., reduce measurementerrors) relative to other modes, depending on how the instruments are designed.Given this framework, we first discuss errors of non-observation or selection bias that may raiseconcerns about the inferential value of Web surveys, particularly those targeted at the generalpopulation. Then in the second part we discuss ways that the design of the Web surveyinstrument can affect measurement errors. 4
  11. 11. Part 1: Inference in Web SurveysInference in Web surveys involves three key aspects: sampling, coverage, and nonresponse.Sampling methods are not unique to the Web, although identifying a suitable sampling framepresents challenges for Web surveys. I’ll address each of these sources of survey error in turn.1.1 SamplingThe key challenge for sampling for Web surveys is that the mode does not have an associatedsampling method. For example, telephone surveys are often based on random-digit dial (RDD)sampling, which generates a sample of telephone numbers without the necessity of a completeframe. But similar strategies are not possible with the Web. While e-mail addresses are relativelyfixed (like telephone numbers or street addresses), Internet use is a behavior (rather than status)that does not require an e-mail address. Thus, the population of “Internet users” is dynamic anddifficult to define. Furthermore, the goal is often to make inference to the full population, not justInternet users.Given this, there are many different ways in which samples can be drawn for Web surveys.These vary in the quality of the inferential claims that they can support. Dismissing all Websurveys as bad, or praising all types of Web surveys as equally good, is too simple acharacterization. Web surveys should be evaluated in terms of their “fitness for the [intended]use” of the data they produce (Juran and Gryna, 1980; see also O’Muircheartaigh, 1997). Thecomparison to other methods should also be explicit. For example, compared to mall interceptsurveys, Web surveys may have broader reach and be cheaper and faster. Compared tolaboratory experiments among college students, opt-in panels offer larger samples with morediversity. However, compared to face-to-face surveys of the general population, Web surveysmay have serious coverage and nonresponse concerns. Further, accuracy or reliability needs tobe traded off against cost, speed of implementation, practical feasibility, and so on.Understanding the inferential limits of different approaches to Web survey sample selection canhelp guide producers on when a Web survey may be appropriate and when not, and guide usersin the extent to which they give credibility to the results of such surveys. 5
  12. 12. In an early paper (Couper, 2000), I identified a number of different ways to recruit respondentsfor Internet surveys. I broadly classified these into probability-based and non-probabilityapproaches. This dichotomy may be too strong, and one could better think about the methodsarrayed along a continuum, with one end represented by surveys based on volunteers with noattempt to correct for any biases associated with self selection. At the other end of the continuumare surveys based on probability samples of the general population, where those without Internetaccess (the non-covered population) are provided with access, and high response rates areachieved (reducing the risk of nonresponse bias). In practice, most Web surveys lie somewherebetween these two end-points.Table 1. Types of Web Survey SamplesType of Survey DefinitionNon-Probability Samples0) Polls for entertainment Polls that make no claims regarding representativeness; respondents are typically volunteers to the Web site hosting the survey 1) Unrestricted self-selected Respondents are recruited via open invitations on portals orsurveys frequently visited Web sites; these are similar to the entertainment polls, but often make claims of representativeness 2) Volunteer opt-in or access Respondents take part in many surveys as members of a Webpanels panel; panel members are usually recruited via invitations on popular Web sitesProbability Samples3) Intercept surveys Sample members are randomly or systematically selected visitors to a specific Web site, often recruited via pop-up invitations or other means 4) List-based samples Sample members are selected from a list of some well-defined population (e.g., students or staff at a university), with recruitment via e-mail 5) Web option in mixed-mode A Web option is offered to the members of a sample selectedsurveys through traditional methods; initial contact often through some other medium (e.g., mail) 6) Pre-recruited panels of A probability sample, selected and screened to identify InternetInternet users users, is recruited to participate in an online panel 7) Pre-recruited panels of the A probability sample is recruited to take part in a Web panel;full population those without Internet access are provided with access 6
  13. 13. Without going into the details of each type of Web survey, I’ll offer a few observations aboutselected types, and discuss two more recent approaches to selecting or recruiting respondents forWeb surveys. First, entertainment polls are not really surveys at all, but are just ways to engagean (online) audience and get feedback. However, they often look like surveys, and have beenused by policy-makers as if the data are real. So, they can be viewed by lay persons (as opposedto survey professionals) as real surveys with inferential value. Second, although data to supportthis contention are scarce, the vast majority of surveys that people are invited to or participate inonline are non-probability surveys. This is important because the target population might not beable to distinguish the difference between different types of surveys, and may treat all suchsurveys as of equal quality and importance.A third observation about this typology is that intercept surveys (Type 3 in Table 1) have becomeincreasingly popular in recent years. Almost every online transaction these days (from a purchaseto a hotel stay or flight) are followed up by a satisfaction questionnaire asking about theexperience. While technically a probability-based approach, the low response rates that are likelyin this type of survey (few organizations report such response rates) raises questions about theirinferential value. Another increasingly popular version of intercept surveys is so-called “riversampling”. This approach has been offered as an alternative to opt-in or access panels, which aresuffering from high nonresponse and attrition rates (see Section 1.3). The idea of river samplingis to “intercept” visitors to selected Web sites, ask a few screener questions, and then direct themto appropriate Web surveys, without having them join a panel. In practice, river samples sufferfrom the same low recruitment rates and self-selection biases as do opt-in panels (see Baker-Prewitt, 2010). In other words, while the approach is technically a probability sample of visitorsto a web site, the nonresponse problem may lead to inferential error.Another approach that is gaining attention recently is respondent-driven sampling (RDS). WhileRDS was developed as a method to recruit members of rare populations (e.g., drug users, sexworkers, the homeless), efforts are being made to apply these methods to Web surveys, given therecruitment difficulties of the medium. If the assumptions of RDS are met (see Heckathorn 1997,2002; Wejnert and Heckathorn, 2008), it could be viewed as a probability sample. In practice,however, the assumptions are rarely met, and the recruitment process can produce serious biases(see Lee, 2009; Mavletova, 2011; Schonlau and Kapteyn, 2011; Toepoel, 2011). The methodrelies on initial recruits (“seeds”) identifying and recruiting other members of the population ofinterest to participate in the survey. If the chains are of sufficient length (i.e., each recruitidentifies and recruits the same number of additional recruits, and this continues untilequilibrium is reached), the method could yield a representative sample of that population. Inpractice, recruitment is rarely that successful, and the depth and breadth of social ties not as largeas expected, raising questions about the method.More recently, a lot of attention — especially in market research — has turned to social media asa way to recruit participants for Web surveys (see Poynter, 2010). The argument is made that theuse of these media is so widespread that they make ideal recruiting platforms. For example, as ofSeptember 2011, Facebook reported having over 750 million subscribers worldwide — it ispopular to point out that if it was a country, it would be the third largest country in the world. Inpractice, however, researchers cannot get access to the frame of registered Facebook users fromwhich to draw a sample. They are therefore forced to use snowball or respondent-drivensampling methods or to post advertisements on Facebook to recruit subjects. Thus far, these 7
  14. 14. efforts have not proved very successful (see e.g., Toepoel, 2011; Bhutta, 2010). Further, althougha very large group, the set of registered Facebook users represent only that population — whileof interest in their own right, the set of registered Facebook users do not represent any otherknown population of interest.What difference does it make if a sample consists of self-selected volunteers rather than aprobability sample from the target population? The key statistical consequence is bias —unadjusted means or proportions from non-probability samples are likely to be biased estimatesof the corresponding population means or proportions. The size and direction of the bias dependon two factors — one reflecting the proportion of the population with no chance of inclusion inthe sample (for example, people without Web access or people who would never join a Webpanel) and one reflecting differences in the inclusion probabilities among the different membersof the sample who could in principle complete the survey: Bias = E (y − Y ) Cov(p, y) = P0 (Y1 − Y0 ) + (1) pWhere y represents a sample statistic (e.g., mean or proportion) based on those who completethe web survey; Y represents the corresponding population statistic; P0 , the proportion of thepopulation of interest with no chance at all of participating in the survey (e.g., those without Webaccess); Y1 , the mean among those with a non-zero chance of taking part; Y0 , the mean amongthose with zero probability of taking part; Cov(p,Y ) , the covariance between the probabilities ofinclusion (p) and the survey variable of interest (y) among those with some chance of taking part;and p , the mean probability of inclusion among those with a non-zero probability of taking part.According to the equation, the bias due to the use of samples of volunteers rather thanprobability samples has two components. The first term in the second line of Equation 1 reflectsthe impact of the complete omission of some portion of the population of interest; it is theproduct of the proportion of the target population that is excluded from the sample entirely andthe difference between the mean for this group and the mean for the remainder of the population.The second term in the second line of the equation reflects the impact of differences in theinclusion probabilities (among those with non-zero probabilities); to the extent that theseprobabilities covary with the survey variable of interest (y), then the second bias component willbe nonzero. Although Equation 1 applies to the unweighted sample mean, y , it provides someuseful distinctions for understanding how more complex estimators affect the bias. In non-probability samples, p and p are generally unknown or cannot be estimated. Furthermore, in bothprobability and non-probability samples, Y is not known — if it was, there would be little or noneed to do the survey. Thus, selection bias cannot be estimated in practice for most surveyvariables of interest. 8
  15. 15. 1.2 CoverageIf one has access to a sampling frame, the sampling process is itself quite straightforward, usingstandard sampling methods (e.g., systematic sampling with or without stratification). The bigissue is with regard to the frame, particularly the exclusion of certain groups. This is a problemof coverage. There are two factors contributing to coverage bias: the proportion without Internetaccess and the difference between those with and without access on the variable or statistic ofinterest. The proportion without Internet access corresponds to P0 in Equation 1 above; thedifferences between those with Internet access and those without it correspond to the(Y1 − Y0 ) term in that equation.We should first clarify what we mean by access. Unlike early telephone surveys, where alandline telephone was a fixed attribute of a household, Internet access or use can be thought ofin many different ways. Some surveys ask if the household has Internet access. Others capturewhether the person has access to the Internet, whether at work, home or somewhere else. Stillothers define Internet access in terms of frequency of use. There are parallels to defining themobile phone user. Having a mobile phone does not mean one can always be reached on it.Similarly, not all users equipped with smart phones (Web-enabled devices) use that capability.So, Internet access and use are dynamic terms, with implications not only for estimatingcoverage error, but also for sampling and nonresponse.Regardless of how it is defined, Internet access appears to be increasing in most countries,although it appears that the rate of increase might be slowing (consistent with the standard Scurve of adoption). The key question is whether the level of Internet penetration will reach 100%and if not, at what level will it stop. While Internet penetration is higher in some countries thanothers, it is still not universal. Further, the nature of Internet access and use is rapidly changing,with many new users skipping the standard browser-based approach and instead using Internet-enabled smart phones or other mobile devices (like tablet computers) to communicate. Indeed,the use of e-mail as a communication method (and potentially as a sampling and recruitmentmethod) is rapidly being overtaken by text-messaging (whether SMS, Twitter, or other means),and social media such as Facebook are dominating the use of the Internet. So, the very nature ofthe Internet is changing as survey researchers try to figure out how best to use the medium forconducting surveys. In similar fashion to cell-phone only users versus traditional landlinetelephone users, we cannot assume that the method we have developed for standard browser-based Web surveys will apply in the same way to identifying and recruiting the new types ofInternet users to our surveys.Rates of Internet use (defined as accessing the Internet from any place at least once a week in thepast 3 months) across the 27 countries of the European Union have increased from an average of36% in 2004 to an average of 65% in 2010 (Eurostat, 2011). There is considerable variationwithin the European Union, with reported usage rates of 90% or over for Iceland and Norway,followed by several countries with rates over 85% (e.g., Denmark, Luxembourg, Netherlands,and Sweden) in 2010. At the other end of the distribution, several countries (e.g., Bulgaria,Greece, Italy, Portugal, and Romania) had Internet usage rates below 50% in 2010. With 58% ofadults reporting regular Internet use in 2010, Spain is slightly below the European average. 9
  16. 16. Figure 1. World Internet Use over TimeOf somewhat more importance than the rate of penetration or use is whether and how those withInternet access differ from those without. This is referred to as the “digital divide” and initiallyreferred to the clear demographic differences between users and non-users that were found in theearly days of the Internet (e.g., NTIA, 1998, 1999). While some demographic differences (e.g.,gender and race) appear to be disappearing, at least in the US, other differences (especially withregard to age and education) appear to be persisting. For example, data from a March 2011survey by the Pew Internet and American Life Project (see www.pewinternet.org) shows thatwhile 95% of those 18-29 use the Internet, only 42% of those 65 and older do so; similarly, 94%of those with at least a college degree use the Internet, compared to 42% of those who have notcompleted high school; 96% of those making $75,000 or more a year are online, while only 63%of those making less than $30,000 a year are1.Furthermore, it is not just the demographic differences that are important — it is the differenceson all of the key variables of interest in our surveys, controlling for these demographicdifferences. While the demographic differences can potentially be adjusted for (given unbiasedpopulation estimates on these characteristics), it is proving to be much harder to reduce biases onkey attitudinal, behavioral, and lifestyle variables.1 Note that these estimates may themselves be subject to error, as they come from a telephone survey which isitself subject to coverage, nonresponse, and measurement errors. 10
  17. 17. The research evidence suggests that the digital divide is not restricted to demographiccharacteristics, but extends to a wide range of health variables and attitude measures, forexample (see Couper et al., 2007; Dever, Rafferty, and Valliant, 2008; Lee, 2006; Schonlau etal., 2009, for further evidence on this point). Those with Internet access seem to differ on avariety of characteristics from those who have not yet gotten online. Adjusting for demographicdifferences between those online and those not online does not make these other differencesdisappear. Coverage thus remains a serious concern for inference to the general population.Without alternative modes of data collection or other ways to include the non-Internetpopulation, serious biases are likely.1.3 NonresponseAnother source of potential inferential error in Web surveys relates to nonresponse. Even if wecould create a sampling frame of the population of interest and invite a sample of the frame toparticipate in our survey, not everyone will be reached (contacted) and agree to participate.Again, as with coverage, nonresponse bias is a function of the rate of nonresponse and thedifferences between the respondents and the nonrespondents on the variables of interest.Nonresponse has different implications for probability and non-probability Web surveys.Nonresponse error can be expressed as follows: ⎛m⎞ σy r = y n + ⎜ ⎟ ( y r − y m ) or y r = y m + yp (2) ⎝n⎠ pWhere yr is the respondent mean for the statistic of interest, ym is the nonrespondents mean, yn isthe mean for the full sample, and m/n is the proportion of the population that is nonrespondent.Nonresponse error ( yr − yn ) increases as a function of the nonresponse rate (m/n) and thedifference between respondents and nonrespondents ( yr − ym ). The second expression inEquation 2 is equivalent to the first, where σ yp is the covariance between y (the variable ofinterest) and p (the propensity to respond), and p is the average response propensity in thesample, equivalent to the response rate. This expression focuses attention on the associationbetween the propensity to respond and the variable of interest, rather than on the nonresponserate (see Groves, 2006). In order to estimate nonresponse bias, one needs the value of the surveystatistic for both respondents ( yr ) and nonrespondents ( ym ), or the covariance between thevariable of interest and the response propensity ( σ yp ).There is relatively little research on nonresponse bias in Web surveys, in part because thepopulation parameters for the variables of interest are rarely known. What little there is hasfocused primarily of demographic variables or examined relatively homogenous populations(e.g., college students). Instead, most of the research has focused on response rates in Websurveys. Further, in non-probability surveys, nonresponse error reflects the differences betweenthe survey respondents and the pool of volunteers from which the respondents came (e.g., 11
  18. 18. members of an access panel), but the inference of interest is not to the access panel but to thepopulation at large. In that sense, calculating response rates as indicators of nonresponse errormakes little sense, and the term is misleading. Callegaro and DiSogra (2008) suggest using“completion rate” for the response to a specific survey sent to members of an opt-in or accesspanel, while the AAPOR Task Force (2010) recommend using the term “participation rate”. Ishall use the latter term here.Two recent meta-analyses have examined response rates to Web surveys relative to comparablemodes of data collection. Lozar Manfreda and colleagues (2008) conducted a meta-analysis of 45experimental mode comparisons between Web and other survey modes (mostly mail), withrandom assignment to mode. They found that, on average, response rates to the Web surveyswere 11 percentage points lower than those in the alternative mode. When the analysis wasrestricted to the 27 studies where the other mode was mail, the average difference in responserates was 12 percentage points in favor of mail.Shih and Fan (2008) restricted their meta-analysis to 39 studies directly comparing Web to mail.They found an average unweighted response rate of 34% for Web surveys and 45% for mailsurveys, which yielded a weighted difference of 11 percentage points, very close to that obtainedby Lozar Manfreda and colleagues. Shih and Fan further examined five different study featuresin an attempt to account for these differences. The type of population surveyed has a significanteffect, accounting for about a quarter of the effect size. The smallest difference between Weband mail response rates (about 5 percentage points) was for college populations, while the largest(about 23 percentage points) was for surveys of professionals.Both studies found considerable variation in the response rate differences, with response rates forsome Web surveys exceeding those of the other mode. But the number of studies is notsufficiently large to tease out the source of these differences, or identify under whatcircumstances Web surveys may yield higher response rates than other modes.Turning to probability-based panels, three examples can be provided. The FFRISP (or “Face-to-Face Recruited Internet Survey Platform”; see Krosnick et al., 2009) panel used an areaprobability sample and face-to-face recruitment, obtaining a response rate of 51% for thehousehold screener (among eligible households), 90% for the recruitment interview (amongscreened households), and 40% for enrollment in the panel (among those who completed therecruitment interview), yielding a cumulative recruitment rate of 18% (Sakshaug et al., 2009).Participation rates to the individual surveys sent to panel members will further lower responserates (see below).The Dutch LISS (Longitudinal Internet Studies for the Social Sciences) panel used an addressframe and telephone and face-to-face recruitment. Scherpenzeel and Das (2011) report that in75% of eligible households a contact person completed the short recruitment interview oranswered a subset of central questions. Among these, 84% expressed willingness to participate inthe panel and 76% of those registered for panel membership, yielding a cumulative recruitmentrate of 48%. 12
  19. 19. The Knowledge Networks (KN) Panel used RDD telephone methods for recruitment until 2009when it switched to a mix of RDD and address-based sampling (ABS). Using a specific examplefrom 2006, Callegaro and DiSogra (2008) report a mean household recruitment rate of 33%, anda household profile rate (panelists who completed the set of profile questionnaires after joiningthe panel) of 57%, yielding a cumulative recruitment rate of 18.5%.In all these cases, the panels also suffer from attrition over the life of the panel, along withnonresponse to specific surveys sent to panelists. For example, Callegaro and DiSogra (2008)report an 85% response rate to one survey in the KN panel. Scherpenzeel and Das (2011) reportresponse rates in the 60-70% range for individual surveys sent to LISS panel members. Theseexamples show the challenges of recruiting and retaining panel members using probability-basedmethods. The response rates and nonresponse bias at the recruitment stage may be similar to thatfor other modes of data collection. But, given that this is followed by additional sample lossfollowing recruitment, the nonresponse problem is compounded. However, once panelists havecompleted the screening interview or profile survey additional information is available to assess(and potentially reduce) nonresponse bias at the individual survey level and attrition across thelife of the panel.In summary, response rates across various all type of Web surveys appear to be lower than forother modes and — as is true of all modes of data collection — appear to be declining. Onehypothesis for the lower response rates to online surveys relative to other modes of datacollection may be that Web surveys are still relatively new, and methods for optimizing responserates are still under development. I turn next to a discussion of strategies to increase response andparticipation rates in Web surveys. There is a growing body of research on ways to increaseresponse rates in Web surveys. Again, I offer a brief review of some key findings here. For moreresearch on nonresponse in Web survey, the interested reader is directed to www.websm.org,which has an extensive bibliography on Web survey methods.One factor affecting response rates is the number and type of contact attempts. Both the LozarManfreda et al. (2008) and Shih and Fan (2008) meta-analyses find significant effects of thenumber of contacts on the differences in response rates between Web surveys and other modes.When the number of contacts was small in each mode, the response rate differences were closerthan when a large number of contacts were used, suggesting that additional contact attempts maybe of greater benefit in other modes than they are in Web surveys. There is evidence suggestingthat while e-mail reminders are virtually costless and that additional e-mail reminders continue tobring in more respondents (see, e.g., Muñoz-Leiva et al., 2010), there is a sense of diminishingreturns, with each additional contact yielding fewer additional respondents. It also suggests thatthe value of an e-mail contact to a respondent may not be as great as, say, a mail contact.A related factor that has received research attention is that of prenotification. A prenotice is acontact prior to the actual survey invitation, informing sample members of the upcoming request.Prenotification may be thought of another contact, much like reminders. The research evidencesuggests that the mode of prenotification may be more important than the additional contact itrepresents. Several types of prenotification have been studied in addition to e-mail, includingletters (Crawford et al, 2004; Harmon, Westin, and Levin, 2005), postcards (Kaplowitz, Hadlock,and Levine, 2004; Kaplowitz et al., in press), and SMS (Bosnjak et al., 2008). The findings 13
  20. 20. suggest that an e-mail prenotice may not offer many advantages over no prenotice, but aprenotice in another mode (letter, postcard, or SMS) may be effective in increasing Web surveyresponse rates.Another well-studied topic in Web surveys relates to incentives. Much of this work issummarized in a meta-analysis by Göritz (2006a; see also Göritz, 2010). Across 32 experimentalstudies, she found that incentives significantly increased the proportion of invitees starting thesurvey (odds ratio = 1.19; 95% confidence interval: 1.13-1.25). The general finding in the surveyliterature is that prepaid incentives are more effective than promised or conditional ones, andcash incentives are more effective than alternatives such as in-kind incentives, prize draws orlotteries, loyalty points, and the like.Despite this research evidence, lotteries or loyalty-point incentives, conditional on completion,are popular in Web surveys, especially among opt-in or access panels. A key reason for this isthat is not possible to deliver prepaid cash incentives electronically. To do so by mail isexpensive, and requires a mailing address. If the response rate is likely to be very low (as wehave seen above), the increase in response may not justify the investment for a prepaid incentive(but see Alexander et al., 2008). Further, the cost of lotteries is usually capped, i.e., a fixeamount of money is allocated for the prizes regardless of the number of participants. This makesit easier for panel vendors to manage costs.Given the popularity of lotteries or loyalty points among vendors, are they effective inencouraging response from sample persons? Göritz (2006a) found that lottery incentives producehigher response rates than no incentives in her meta-analysis of 27 experimental studiesinvolving lotteries, most based on commercial panels. In her meta-analysis of 6 incentiveexperiments in a non-profit (academic) panel, she found no significant benefit of a cash lottery(OR = 1.03) over offering no incentive (Göritz, 2006b). Thus, lotteries may be better than noincentive for some types of samples, but it is not clear whether they are more effective thanalternative incentive strategies.Bosnjak and Tuten (2003) tested four incentive types in a survey among 1332 real estate agentsand brokers. A $2 prepaid incentive via PayPal achieved a 14.3% response rate, while a $2promised incentive via PayPal obtained 15.9%, a prize draw after completion obtained 23.4%,and a control group with no incentive obtained 12.9%. One explanation for the relative successof the prize draw is the cash was not used for the prepaid or promised incentives — for thePayPal incentive to be of value, one must have a PayPal account and have an expectation ofadditional money added to that account.Birnholtz and colleagues (2004) conducted an experiment among earthquake engineering facultyand students. A mailed invitation with a $5 prepaid cash incentive obtained a response rate of56.9%, followed by a mailed invitation with a $5 Amazon.com gift certificate (40.05 responserate) and a n e-mailed invitation with a $5 Amazon.com e-certificate (32.4% response rate). Thisstudy suggests that cash outperforms a gift certificate (consistent with the general incentivesliterature), and also points to the potential advantage of mail over e-mail invitations.Alexander and colleagues (2008) conducted an incentive experiment for recruitment to an onlinehealth intervention. They tested a variety of different incentives in mailed invitations to potential 14
  21. 21. participants. Further, they found that a small prepaid incentive ($2) was cost-effective relative tolarger promised incentives, even with enrollment rates in the single digits.This brief review suggests that incentives seem to work for Web surveys in similar fashion toother modes of data collection, and for the same reasons. While it is impractical for access panelsto send mail invitations with prepaid incentives when they are sending tens of thousands ofinvitations a day, the combination of an advance letter containing a small prepaid cash incentive,along with an e-mail invitation, may be most effective for list-based samples.Again, there isn’t (as yet) as much research on nonresponse in Web surveys as there has been inother modes of data collection. It may be that, because non-probability surveys dominate theWeb survey world, nonresponse is of less concern. The market research world is focused onrespondent engagement, which is more concerned with keeping respondents engaged in thesurvey once started (i.e., preventing breakoffs) than with getting them to start in the first place.1.4 Correcting for Selection BiasesThere are a number of different ways researchers attempt to correct for selection biases, both forprobability-based and non-probability online surveys. In probability-based surveys, separatecorrections can sometimes be made for coverage and nonresponse error, using different auxiliaryvariables. In non-probability surveys, this is often done in a single step, attempting to correct alsofor selection error, i.e., differences between the survey population (Internet users) and those whoare recruited into the panel and selected to take part in the specific survey.There are four key approaches to correcting for selection biases (see Kalton and Flores-Cervantes, 2003). These include: 1) Poststratification or weighting class adjustments 2) Raking or rim weighting 3) Generalized regression (GREG) modeling 4) Propensity score adjustment (PSA)Several of these methods are closely related to each other. Both GREG and raking are specialcases of calibration weighting. Post-stratification, in turn, is a special case of GREG weighting.All of the methods involve adjusting the weights assigned for the survey participants to make thesample line up more closely with population figures. I will not review these methods in detail,but rather provide brief commentary of the underlying assumptions and the challenges faced bynon-probability surveys.The first method that has been used to adjust for the sampling and coverage problems in Websurveys is known variously as ratio adjustment, post-stratification, or cell weighting. Theprocedure is quite simple — the weight for each respondent (typically, the inverse of the case’sselection probability) in a weighting cell (or post-stratum) is multiplied by an adjustment factor: 15
  22. 22. Niw2ij = ni w1ij , (3) ∑w 1 ijin which w2ij is the adjusted or post-stratified weight, w1ij is the unadjusted weight, and theadjustment factor is the ratio between the population total for cell j ( Ni ) and the sum of theunadjusted weights for the respondents in that cell. For many Web surveys, the initial weightsare all one, reflecting equal probabilities of selection. After adjustment, the weighted sampletotals for each cell exactly match the population totals.Post-stratification will eliminate the bias due to selection or coverage problems, provided that,within each adjustment cell, the probability that each case completes the survey is unrelated tothat case’s value on the survey variable of interest. This condition is sometimes referred to as themissing at random (MAR) assumption (Little and Rubin, 2002). In terms of Equation 1, a post-stratification adjustment will eliminate the bias if the within-cell covariance between theparticipation probabilities (p) and the survey variables (y) goes to zero:Cov(p, y | X ) = 0where X is the vector of categorical variables that are cross-classified to form the adjustmentcells. This condition of zero covariance can be met is several ways: The participationprobabilities can be identical within each cell; the values of the survey variable can be identicalwithin each cell; or values for the two can vary independently within the cells. As a practicalmatter, post-stratification will reduce the magnitude of the bias whenever the absolute value ofthe within-cell covariance term is less than overall covariance term: Cov(p, y | X ) < Cov(p, y) (4)Most survey statisticians use post-stratification in the belief that the inequality in Equation 4holds, not that the bias disappears entirely.Raking (or rim weighting) also adjusts the sample weights so that sample totals line up withexternal population figures, but the adjustment aligns the sample to the marginal totals for theauxiliary variables, not to the cell totals. Raking is preferred when population figures may not beavailable for every adjustment cell formed by crossing the auxiliary variables; or, there may bevery few participants in a given cell so that the adjustment factors become extreme and highlyvariable across cells; or, the researchers may want to incorporate a large number of variables inthe weighting scheme, too many for a cell-by-cell adjustment to be practical. Raking is carriedout using iterative proportional fitting. Raking reduces or eliminates bias under the sameconditions as post-stratification — that is, when the covariance between the probability ofparticipation and the survey variable is reduced after the auxiliary variables are taken intoaccount — but assumes a more stringent model, in which the interactions between the auxiliaryvariables can be ignored or bring only small additional reductions in bias.Generalized regression (GREG) weighting is an alternative method of benchmarking sampleestimates to the corresponding population figures. This approach assumes a “linear relationshipbetween an analysis variable y and a set of covariates” (Dever, Rafferty, and Valliant, 2008). Aswith post-stratification and raking, GREG weighting eliminates the bias when the covariatesremove any relationship between the likelihood of a respondent completing the survey and thesurvey variables of interest. 16
  23. 23. Another popular adjustment method — especially in non-probability settings — is propensityscore adjustment (PSA) or propensity weighting. A number of papers have examined the use ofpropensity score adjustment to improve web survey estimates by reducing biases due to non-coverage or selection or both (Berrens et al., 2003; Dever, Rafferty, and Valliant, 2008; Lee,2006; Lee and Valliant, 2009; Schonlau, van Soest, and Kapteyn, 2007; Schonlau et al., 2004;and Schonlau et al., 2009). A propensity score is the predicted probability that a case will end upin one group rather than another — for example, the probability that someone will be amongthose that have Internet access (versus not having access). The technique was originallyintroduced as a way of coping with confounds in observational studies between cases who got agiven treatment and similar cases who did not (Rosenbaum and Rubin, 1984). Such confoundsare likely to arise whenever there is non-random assignment of cases to groups as in non-experimental studies. Propensity score adjustment simultaneously corrects for the effects ofmultiple confounding variables on which the members of the two groups differ.With Web surveys, the two groups are typically defined as the respondents to a Web survey (forexample, the Web panel members who completed a specific Web questionnaire) and therespondents to a reference survey (for example, the respondents to an RDD survey conducted inparallel with the Web survey). The reference survey is assumed to have little or no coverage orselection bias so that it provides a useful benchmark to which the Web survey results can beadjusted (see Lee and Valliant, 2008, for a useful discussion of propensity weighting).The first step in propensity weighting is to fit a model predicting the probability of membershipin one of the groups. The usual procedure is to fit a logistic regression model: plog(p( x ) /(1 − p( x )) = α + ∑ β j x j , (5) jin which p( x ) is the probability that the case will be in the group of interest (e.g., will completethe web survey), the x’s are the covariates, α is an intercept term, and the β’s are logisticregression coefficients. Next, cases are grouped (typically into quintiles) based on their predictedpropensities, that is, their value for p( x ). Finally, the existing weight (if any) for the case is ˆadjusted by dividing by the predicted propensity of the case: w1iw2 i = (6) ˆi ( x ) pIf the cases have been grouped in propensity strata, then the mean (or harmonic mean) of thepropensities in the stratum would be used in place of pi ( x ) in the denominator of Equation 6. As ˆLee and Valliant (2008) point out, propensity adjustments work best when the logistic regressionmodel includes predictors that are related to both the propensities and to the substantive variables(Little and Vartivarian, 2004, make the same point about post-stratification adjustments).Simulations by Lee and Valliant (2009) show that even when the reference sample completelycovers the target population, propensity adjustments alone do not completely remove thecoverage bias (see also Bethlehem, 2010).While there are some variations in the variables used and how the models fit, all adjustmentmethods rely on some key assumptions. Key among these is the MAR assumption, that is, withinthe cells formed by cross-classifying the covariates (in the case of poststratification) or 17
  24. 24. conditional on the auxiliary variables included in the model (in the case of GREG and PSA),there is no relationship between the probability a given case will be in the respondent pool (i.e.,is covered, selected, and respondents) and that case’s value on the survey variable y. Clearly, thesame adjustment may eliminate this bias for estimates based on some survey variables but notthose based on others.Propensity scoring goes further in that it assumes that all the information in the covariates iscaptured by the propensity score. This condition is often referred to as strong ignorability. Forthe bias to be eliminated by a propensity weighting model, then, conditional on the fittedpropensities, a) the distribution of values of the survey variable must unrelated to what group thecase came from (for example, the pool of web respondents versus the pool of respondents to thecalibration survey) and b) the survey outcomes must be unrelated to the covariates. Theseconditions imply thatCov(p, y | p( x )) = 0 . ˆA further drawback of PSA is that the variables used in the model must be measured in both theweb survey sample and the calibration sample. I return to this issue below in a discussion of opt-in panels.How effective are the adjustments at removing the bias? Regardless of which method ofadjustment is used, the following general conclusions can be reached: 1) The adjustments remove only part of the bias. 2) The adjustments sometimes increase the biases relative to unadjusted estimates. 3) The relative biases that are left after adjustment are often substantial. 4) There are large differences across variables, with the adjustments sometimes removing the biases and other times making them much worse.Overall, then, the adjustments seem to be useful but fallible corrections for the coverage andselection biases inherent in web samples, offering only a partial remedy for these problems.Most of the focus on adjustment methods had been on the reduction of bias. When a relativelysmall reference survey (for example, a parallel RDD survey) is used to adjust the estimates froma large Web survey, the variance of the estimates is likely to be sharply increased (Bethlehem,2010; Lee, 2006). This variance inflation is not just the byproduct of the increased variability ofthe weights, but reflects the inherent instability of the estimates from the reference survey.The general conclusion is that when the Web survey is based on a probability sample,nonresponse bias and, to a lesser extent, coverage bias, can be reduced through judicious use ofpostsurvey adjustment using appropriate auxiliary variables. However, when the estimates arebased on a set of self-selected respondents, where the selection mechanism is unknown, andunlikely to be captured by a set of key demographic variables, the adjustments are likely to bemore risky. 18
  25. 25. 1.5 Online Access PanelsWith the above discussion on issues of representation in mind, let’s focus a little more attentionon opt-in or volunteer access panels. These have been enormously popular in North America andEurope over the last decade, with scores of different panels competing for market share and forpanelists in each country. The promise that these panels offer is a ready pool of potentialrespondents, many of whom have been pre-screened on key characteristics. For researchers whoneed a large number of respondents quickly and cheaply, but are less concerned about inference,these panels have provided a very valuable service. However, in recent years there have beenincreasing concerns about the quality of these panels. These concerns have been manifested inseveral different ways.First, there is growing evidence of over-saturation of these panels, with the demand (both thenumber of surveys and the number of respondents per survey) outstripping supply (the number ofpanelists who complete surveys). This can be seen in the declining participation rates ofpanelists, and in the increasing number of invitations panelists receive. Data on this issue is notmade available by the panel vendors, so this is hard to assess. But, we have been using the samevendor for our experiments on Web survey design (see Part 2) for several years. Theparticipation rates (the number of registered panelists who complete a particular survey) havebeen steadily declining, as seen in Figure 2.Figure 2. Participation Rates for Comparable Samples from Same VendorTo provide one concrete example, for our survey experiment conducted in July 2010, over138,000 panelists were invited to obtain a set of 1,200 respondents. This represents a significantfraction of the panel. These participation rates vary widely across different vendors, in part 19
  26. 26. because of different practices in maintaining the panel, especially with regard to inactivemembers. In a 2005 report, ComScore Networks claimed that 30% of all online surveys werecompleted by less than 0.25% of the population, and these panelists completed an average of 80surveys in 90 days. This estimate is likely to be high as the source of the data was itself avolunteer panel in which members had agreed to have their Internet activities monitored.However, a 2006 study among 19 online panels in the Netherlands (Vonk, Willems, and vanOssenbruggen, 2006), found that 62% of respondents reported belonging to more than one panel,with the average being 2.73 panels per respondent. A small group (3% of respondents) reportedbelonging to 10 or more panels.Another piece of evidence related to over-saturation comes from a US panel I’ve been a memberof for several years. In response to ESOMAR’s (2008) 26 questions, the vendor claimed anaverage response (participation) rate of 35%-40% (contrast this with the rates in Figure 2). Thevendor also stated that panelists are contacted 3-4 times a month to participate in surveys.However over the past 5 years, I have received an average of 43.3 unique invitations (excludingreminders) per month, ranging from an average of 30 per month in 2007 to an average of 63 permonth in 2010.Related to the issue of over-saturation is the rising concern among panel vendors about“professional respondents” — those who do so many surveys that they may not be payingattention to the questions, instead speeding through the survey to get the rewards (incentives orpoints) for participation. One estimate is that about 7% of respondents are “deliberately doing apoor job” (Giacobbe and Pettit, 2010). This is manifested in problems such as over-qualifying(e.g., saying “yes” to all screener questions to qualify for a survey), duplication or “hyperactives”(e.g., belonging to the same panel under different guises, or belonging to multiple panels),speeding (answering too fast to have read the question), or inattention (e.g., straightlining ingrids, inconsistency across repeated questions, failing a specific instruction to select a response).In one study reported by Downes-Le Guin (2005), 13% of respondents claimed to own a Segwayand 34% failed a request to check a specific response (second from the left) in a grid. Given this,panel vendors are developing increasingly sophisticated methods of identifying and dealing withsuch professional respondents (e.g., Cardador, 2010). However, recent research done by theindustry itself (see the ARF’s Foundations of Quality Study; http://www.thearf.org/assets/orqc-initiative) suggests that the problem might not be as bad as claimed. Nonetheless, the issuecontinues to raise concern for users of online panels as well as for vendors.In the last few years, several large buyers of online market research have raised questions aboutthe replicability or reliability of the results from opt-in panels. For example, in 2005, Jeff Hunter,Consumer Insights Director at General Mills, delivered a keynote address at the ESOMARWorldwide Panel Research Conference in which he described a concept test where the samesurvey was administered to different samples from the same panel, and produced substantiallydifferent results on whether to launch the product. Similarly, Kim Dedeker, VP of GlobalConsumer and Market Knowledge at Procter & Gamble (one of the largest buyers of marketresearch in the world), gave a talk in which she described situations where online concept testsidentified a strong concept. A large investment was then made to launch the products, but laterconcept tests got disappointing results. She noted that “Online research … is the primary driver 20
  27. 27. behind the lack of representation in online testing. Two of the biggest issues are the samples donot accurately represent the market, and professional respondents.”In response to these rising concerns, the Advertising Research Foundation (ARF) launched itsown study in cooperation with panel vendors. The Foundations of Quality (FOQ) project wasdesigned to address the following key questions: 1) Why do online results vary? 2) How muchpanel overlap is there, and how much does this affect results? 3) What are the effects of “bad”survey-taking behavior? Part of the design involved a 15-minute online survey administered tomembers of 17 different US panels. The full report is available at a high price, but ARF pressreleases claim that the problems of panel overlap are not as bad as others have argued.The American Association for Public Opinion Research (AAPOR) created a task force to reviewonline panels (see AAPOR, 2010). The task force made several recommendations regarding suchpanels, some of which are as follows: 1) Researchers should avoid nonprobability online panels when a key research objective is to accurately estimate population values … claims of “representativeness” should be avoided when using these sample sources 2) There are times when a nonprobability online panel is an appropriate choice 3) There are significant differences in the composition and practices of individual panels that can affect survey results 4) Panel vendors must disclose their methodsDespite their inferential limitations, opt-in panels have a number of uses. They can provide dataquickly and cheaply. They are useful for identifying and surveying a set of subjects with knowncharacteristics or behaviors, based on extensive screening data that are often available. Someother examples of the uses of such panels include: 1) pretesting of survey instruments, 2) testingnew concepts, theories, or measurement, 3) methodological or substantive experiments (wherevolunteer bias is not a concern), 4) trend analysis (assuming a stable panel population), andpossibly 5) correlational analysis (although selection bias is still a concern). It is recommendedthat opt-in or access panels should not be used as the sole source of data, but that they should beused in combination with some other methods.While many online panel vendors make claims of comparability to national estimates, there areonly a few independent studies examining this issue. For example, Yeager et al. (2009)compared an RDD telephone survey with a probability-based Internet survey and 7 non-probability Internet surveys (6 access panels and 1 river sample) in 2004, with an average samplesize of around 1,200 respondents. They compared survey estimates to known benchmarks fromlarge federal surveys in the US. They found that the probability sample surveys done bytelephone or Web were consistently highly accurate across a set of demographic and non-demographic variables, especially after post-stratification with primary demographics. Further,non-probability sample surveys done via the Internet were always less accurate, on average, thanthe probability sample surveys, and were less consistent in their level of accuracy. Finally, post-stratification using demographic variables sometimes improved the accuracy of non-probabilitysample surveys and sometimes reduced their accuracy. 21
  28. 28. In a later analysis using other items from the 7 non-probability panels in the same study, Yeageret al. (2010) reported large differences in levels of reported consumption of a variety ofconsumer products (e.g., 43% to 74% reporting consumption of Coke) across the non-probabilitypanels, but the rank orders of reported consumption were relatively stable. Generally, theassociations between variables were consistent across panels, suggesting less volatility thanothers have suggested. However, some key conclusions did not replicate across panels, andsometimes the differences were very large (e.g., up to 45 percentage point differences betweenpairs of surveys on selected consumer categories. Furthermore, it is difficult to predict whenthese large differences will occur — that is, it is not always the same pair of panels that producedthe same magnitude or direction of differences.In their analysis of the aggregated results from 19 different online panels in the Netherlands,Vonk, Willems, and van Ossenbruggen (2006) found several large differences compared toofficial data from Statistics Netherlands (CBS). For example, 77% of panelists reported notbelonging to a church (compared to 64% from CBS), 29% reported supporting the CDA party(compared to 16% from CBS), and 2% of panelists were identified as foreigners living in largecities (compared to the official estimate of 30%).Self-selection may not only affect point estimates, but also correlations (and, by extension,coefficients from regression and other models). For example, Faas and Schoen (2006) comparedthree different surveys in Germany prior to the 2002 Federal election: a face-to-face survey, anonline access panel and open-access Web survey. They concluded that “…open online surveysdo not yield results representative for online users (either in terms of marginal distributions or interms of associations)” (Faas and Schoen, 2006, p. 187). They further noted that weightingadjustments did not help to reduce the bias in the online polls.Loosveldt and Sonck (2008) compared data from a Belgian access panel to data from theEuropean Social Survey (ESS). They compared unweighted, demographically weighted, andpropensity-weighted estimates from the panel. They found significant differences in responses onseveral different themes, including political attitudes, work satisfaction, and attitudes towardsimmigrants. They also found that post-stratification adjustment based on demographics had nosubstantial impact on the bias in the estimates. Further, propensity score adjustment had only aminimal effect, with some differences becoming larger rather than smaller.How do opt-in or access panels deal with the inferential issues? Many panel vendors provide thedata “as is,” without any attempt at adjustment, leaving the user to draw inferences about therepresentativeness of the data. A few use some form of poststratification or raking adjustment tomatch the set of panel respondents to the broader population on key demographic variables. Theuse of propensity score adjustment or similar strategies (e.g., matching) is rare. One panelprovider in the US, Harris Interactive, has promoted the use of PSA for general populationinference.The Harris Interactive approach to PSA is as follows (see Terhanian et al., 2000, 2001): • Ask a small set of “Webographic” questions in all online panel surveys. • Ask the same set of questions in occasional RDD telephone surveys. 22
  29. 29. • Use these common variables to predict the likelihood of being in the Web or RDD group, using a logistic regression model. • Use the predicted values (propensities) to adjust the Web responses either directly (using propensity scores) or indirectly (by creating weighting classes based on the scores). Typically, respondents to both surveys are sorted into 5 bins (quintiles) based on propensity scores. • Assign weights such that the Web survey’s (weighted) proportion of respondents in each bin matches the reference (telephone) survey’s proportion.Several key assumptions need to be met for this approach to be successful at eliminatingselection bias. The first is that the questions asked in both surveys capture the full range ofdifferences in selection into the two samples (i.e., the selection mechanism is MAR or ignorableconditional on these variables). While Harris Interactive does not disclose the items used,examples of “Webographic” questions have included the frequency of watching the news on TV,frequency of vigorous physical activity, ownership of a non-retirement investment account, andwhether a variety of items are considered invasions of privacy. A second assumption is that thereis no measurement error, i.e., that the same answers would be obtained to these questionsregardless of the mode (telephone or Web) in which they are asked. Third, in using the telephonesurvey as a population benchmark, the PSA ignores selection bias in the telephone survey. Withresponse rates to RDD surveys as low as 20%, and concerns about coverage of cell phone onlyhouseholds, this is a serious concern.In addition, as Bethlehem (2010) has noted, the variance of the resultant estimator should takeinto account the fact that the RDD benchmark survey is itself subject to a high level ofvariation, depending on sample size. Some users of PSA treat these as population values (interms of both bias and variance), ignoring this uncertainty. According to Bethlehem (2010), thevariance of the post-stratification estimator for an Internet (I) sample weighted using a referencesample (RS) is: 1 L 1 L L V (yI ) = ∑Wh (YI (h) − Y )2 + ∑Wh (1 − Wh )V (yI(h) ) + ∑Wh2V (yI(h) ) (7) m h=1 m h=1 h =1Where yI(h) is the Web survey estimate for the mean of stratum h, and mh / m is the relativesample size in stratum h for the reference sample. Thus, the first term in Equation 7 will be of theorder 1/m, the second term of order 1/mn, and the third of order 1/n, where n is the Web samplesize and m the reference sample size. As Bethlehem (2010) notes, n will generally be muchlarger than m in most situations, so the first term in the variance will dominate; that is, the smallsize of the reference survey will have a big influence on the reliability of the estimates. WhileDuffy et al. (2005) and Börsch-Supan et al. (2004) both acknowledge that the design effects ofpropensity score weighting will significantly reduce the effective sample size, this issue appearsto have been largely ignored by those using PSA in practice.Another approach to the inferential challenges of volunteer online panels has been to usesample matching techniques, the approach advocated by YouGov Polimetrix (see Rivers, 2006,2007; Rivers and Bailey, 2009). Here, a target sample is selected from the sampling framerepresenting the population to which one wants to make inference. However, instead ofattempting to interview that sample, a matched sample is selected from a pool of availablerespondents (e.g., from an online panel) and those are interviewed. As with post-survey 23
  30. 30. adjustment approaches, the success of this method in eliminating or reducing bias relies on thematching variables used (i.e., it uses an MAR assumption, conditional on the matchingvariables). Given that the data available for the target population often only consists ofdemographic variables, the model assumes that controlling for such demographic differenceswill eliminate bias on other variables measured in the survey. As shown earlier, suchassumptions do not eliminate bias in all circumstances. While model-assisted sampling (seeSärndal, Swensson, and Wretman, 1992) is gaining in popularity, and all adjustmentprocedures rely on model assumptions (see Section 1.4), fully model-based approaches requirea greater reliance on the model to be effective in reducing bias. To the extent that the modeldoes not accurately reflect the effect of the selection process on the variables or statistics ofinterest, procedures like sample matching and propensity score adjustment are likely to havevarying success in minimizing or eliminating bias. Design-based approaches (like those basedon traditional sampling theory) can protect against failures in the model assumptions.In general, most opt-in or access panels appear to be focusing more on the measurementconcerns than on the inferential concerns. Attention is focused on reducing duplicate offraudulent respondents, identifying and removing professional or inattentive respondents, anddesigning surveys to increase respondent engagement. It seems clear that the inferential issuesare being largely ignored. This may be a result of the opt-in panels turning away from earlyattempts to make population projections, such as with regard to pre-election polls (e.g., Taylor etal., 2001), and focusing more on market research applications, where the pressure to produceaccurate estimates may be less strong, and the prediction failures receive less media attention.The general recommendation is that when using opt-in or access panels, one should avoidmaking inferential claims beyond what can be supported by the data. While there may be somesituations where the estimates from such panels appear to reliable (e.g., in pre-election polls),this cannot be generalized to all situations. In other words, while these panels have a widevariety of uses, broad population representation on a wide range of topics is likely not one ofthem.1.6 Web Surveys as Part of Mixed-Mode Data CollectionGiven the inferential challenges facing Web surveys discussed above, National Statistical Offices(NSOs) and researchers concerned with broad population representation are increasingly turningto mixed-mode surveys involving Web data collection in combination with other modes. Thehope is that by combining modes, the weakness of one mode (e.g., the coverage concerns andlack of a sampling frame for Web surveys) can be compensated for by using other modes.The combination of Web surveys with mail surveys has received the most attention in recentyears. These two modes share similar measurement error properties, and mail is a logical methodfor inviting people to Web surveys. There are two main approaches to this mode combination,concurrent mixed-mode designs and sequential mixed-mode designs. Concurrent designs send apaper questionnaire to sample persons or households, but provide them with the opportunity tocomplete the survey online. However, several early studies have found that providing 24
  31. 31. respondents such a choice does not increase response rates, and may in fact result in lowerresponse rates than the mail-only approach. For example, Griffin, Fischer, and Morgan (2001)reported a 37.8% response rate for the American Community Survey (ACS) with a Web option,compared to 43.6% for mail only. This early result led the US Census Bureau to be cautiousabout offering an online option for the 2000 decennial census, and no such option was availablein the 2010 census. In studies of students in Sweden, Werner (2005) reported lower responserates (62%-64%) for the mail+Web versions than for the mail-only control version (66%).Brennan (2005) used a sample from the New Zealand electoral register, and obtained lowerresponse rates for the mail+Web option (25.4%) than for the mail-only design (40.0%), andBrøgger et al. (2007) obtained similar results (44.8% for mail+Web versus 46.7% for mail-only)in a survey among adults age 20-40 in Norway. Gentry and Good (2008) reported response ratesof 56.4% for those offered an eDiary option for radio listening, compared to 60.6% for thoseoffered only the paper diary. Other studies (e.g., Tourkin et al., 2005; Hoffer et al., 2006; Israel,2009; Cantor et al., 2010; Smyth et al., 2010; Millar and Dillman, 2011) have also founddisappointing results for the concurrent mixed-mode approach. A number of hypotheses arebeing advanced for these results, and research is continuing on ways to optimize designs toencourage Web response while not negatively affecting overall response rates.More recent studies have focused on sequential mixed-mode designs, where samples membersare directed to one mode initially, rather than being given a choice, and nonrespondents arefollowed up in another mode. One example is the study of adults in Stockholm by Holmberg,Lorenc, and Werner (2010). They compared several different sequential strategies involving mailand Web. While overall response rates did not differ significantly across the five experimentalconditions, they found that the proportion of respondents completing the survey online increasedas that option was pushed more heavily in a sequential design. For example, when the first twomail contacts (following the prenotification or advance letter) mentioned only the Web option,and the mail questionnaire was provided only at the third contact, the overall response rate was73.3%, with 47.4% of the sample using the Web. In contrast, in the condition where the mailquestionnaire was provided in the first contact, the possibility of a Web option was mentioned inthe second (reminder) contact, and the login for the Web survey was not provided until the thirdcontact (along with a replacement questionnaire), the overall response rate was 74.8%, but only1.9% of the sample completed the Web version. Millar and Dillman (2011) report similarfindings for a mail “push” versus Web “push” approach. While none of the sequential mixed-mode designs show substantial increases in overall response rates, the increased proportion ofresponses obtained via the Web represents a potential cost saving that could be directed atadditional follow-up in other modes.Despite these somewhat disappointing results, a growing number of NSOs are providing anInternet option for census returns with apparent success. For example, Singapore reported thatabout 15% of census forms were completed online in the 2000 population census. In theNorwegian census of 2001, about 9.9% of responses were reportedly obtained via the Web.Statistics Canada reported that 18.3% of Canadian households completed their census formonline in 2006, and this increased to 54.4% in the recently-completed 2011 census. Preliminaryestimates from the Australian census in August 2011 suggest a 27% uptake of the Internetoption, while South Korea anticipates that 30% of forms will be completed online for their 25
  32. 32. census in November 2011. The United Kingdom also recently completed its census (in March2011) and heavily promoted the online option, but the success of this effort is not yet known.The success of these census efforts may suggest that the length of the form or questionnaire maybe a factor in whether it is completed online or not. In addition, censuses tend to be heavily-promoted public events and this may play a role in the successful outcome. Much more researchis needed into the conditions under which mixed-mode designs involving mail and Web willyield improvements in response rate — and reductions in nonresponse bias.In addition, a key assumption underlying this strategy is that the measurement error differencesbetween the modes are not large — or at least not large enough to negate the benefits of mixingmodes. The primary focus thus far has been on response rates, with much less attention paid tomeasurement differences between the modes. But this suggests that the mail-with-Web-optionstrategy may be most effective when the survey is very short and measures demographicvariables that are les likely to be affected by mode.1.7 Summary on Inferential IssuesAs has been seen in this section, inferential issues remain a challenge for Web surveys aimed atbroad population representation. Sampling frames of e-mail addresses or lists of Internet users inthe general population do not exist. While the proportion of the population without Internetaccess has been declining, there remain substantial differences between those with access andthose without on a variety of topics. Nonresponse also remains a challenge for Web surveysrelative to other (more expensive) modes of data collection. Statistical adjustments may reducethe bias of self-selection in some cases, but substantial biases may remain.Nonetheless, there remain a number of areas where Web surveys are appropriate. For example,surveys of college students and members of professional associations are ideally suited to Webdata collection. Establishment or business surveys may also benefit from online data collection,especially as part of a mixed-mode strategy. There are a number of creative ways to addressthese challenges (such as the development of probability-based access panels), but for now atleast, Web surveys are likely to supplement rather than replace other modes of data collection forlarge-scale surveys of the general public where high levels of accuracy and reliability arerequired. 26
  33. 33. Part 2: Interface DesignThis second part of the seminar focuses on the design of Web survey instruments and datacollection procedures, with a view to minimizing measurement error or maximizing data quality.The particular focus is on those aspects unique to Web surveys. For example, question wordingis an issue relevant for all modes of data collection, and is not a focus of this seminar. Further, Iwill not address technical issues of Web survey implementation, such as hardware, software orprogramming. Given the unique features of Web surveys, there are many challenges andopportunities for survey designers. The seminar is not intended to be an exhaustive review of thetopic, but rather to provide empirical evidence on illustrative examples to emphasize theimportance of careful design in developing and implementing Web surveys. Couper (2008a)goes into these ― and other ― deign issues in more depth.2.1 Measurement ErrorMeasurement error involves a different type of inference to that discussed in Part 1 above, that isfrom a particular observation or measurement from the i-th respondent (yi) to the “true value” forthat measure for that respondent (μi), sometimes measured across several trials (t). The simplestexpression of measurement error is as follows:yit = μi − ε it (8)where εit is the error term for respondent i and trial t. In order to estimate measurement errorusing this expression, we need to know the true value. In practice, the true value is rarely known.Researchers tend to rely on alternative approaches to examine measurement error properties of amode or a design. One common approach is to examine differences in responses to alternativepresentations of the same questions. The measurement error model applicable to this approach isas follows:yij = μi + Mij + ε ij (9) 27
  34. 34. where yij is the response for the i-th person using the j-th form of the question or instrument, andMij is the effect on the response of the i-th person using the j-th method. The classic split-ballotexperiments to examine question wording effects (e.g., Schuman and Presser, 1981) areexamples of this approach.One of the advantages of Web surveys lies in the ease with which randomization can beimplemented, giving researchers a powerful tool to explore measurement effects. This has led toa large number of experimental comparisons of different design options. In such Web designstudies, indirect measures of data quality or measurement error are often used, involving not onlyan examination of response distributions across versions, but also other indicators such asmissing data rates, breakoff rates (potentially leading to increased nonresponse error), speed ofcompletion, and subjective reactions by respondents. Together these all point to potentialcomparative advantages of one particular design approach relative to another, without directlyassessing the measurement error. So, in this part, the focus is more on the measurement processthan on measurement error.2.2 Measurement Features of Web SurveysWeb surveys have several features or characteristics that have implications for the design ofsurvey instruments, and hence for measurement error. By themselves, each of thesecharacteristics is not unique to Web surveys, but in combination they present both opportunitiesand challenges for the survey designer.First, Web surveys are self-administered. In this they share attributes of paper self-administeredquestionnaires (e.g., mail surveys) and computerized self-administered questionnaires (e.g.,computer-assisted self-interviewing [CASI] or interactive voice response [IVR]). While thisattribute also has implications for sampling, coverage, and nonresponse error, our focus here ison measurement error. Self-administration has long been shown to be advantageous in terms ofreducing effects related to the presence of the interviewer, such as social desirability biases. Atthe same time, the benefits of interviewer presence — such as in motivating respondents,probing, or clarifying — are also absent. From an instrument design perspective, this means thatthe instrument itself must serve these functions. It must also be easy enough for untrained orinexperienced survey-takers to complete.Second, Web surveys are computerized. Like computer-assisted personal interviewing (CAPI)and computer assisted telephone interviewing (CATI), but unlike paper surveys, computerizationbrings a full range of advanced features to bear on the design of the instrument. Randomization(of question order, response order, question wording or format, etc.) is relatively easy toimplement in Web surveys. Other aspects of computer-assisted interviewing (CAI) that are easyto include in Web surveys but relatively hard in paper surveys include automated routing(conditional questions), edit checks, fills (inserting information from prior answers in the currentquestion), and so on. Web surveys can be highly customized to each individual respondent,based on information available on the sampling frame, information collected in a prior wave 28
  35. 35. (known as dependent interviewing), or information collected in the course of the survey. Thispermits the use of complex instruments and approaches such as computerized adaptive testingand conjoint methods, among others. However, adding such complexity increases the need fortesting, increases the chances of errors, and makes careful specification and testing of theinstrument all the more important.A third feature of Web surveys, and related to their computerized nature, is that they can bedesigned with varying degrees of interactivity. Conditional routing is one form of interactivity.But in this respect, Web surveys can be designed to behave more like interviewer-administeredsurveys, for example, prompting for missing data, seeking clarification of unclear responses,providing feedback, and the like.A fourth characteristic of Web surveys is that they are distributed. In CAI surveys, thetechnology is in the hands of the interviewer, using hardware and software controlled by thesurvey organization. This means that the designer has control over the look and feel of the surveyinstruments. In contrast, the Web survey designer has little control over the browser used by therespondent to access and complete the survey, or the hardware used. Increasingly, people areaccessing the Web using a variety of mobile devices (such as smart phones or tablets), and thesepresent new challenges for the designer. But Web surveys can vary in many other ways too —from the user’s control over the size of the browser, to the security settings that may affectwhether and how JavaScript, Flash, or other enhancements work, to the connection type thatdetermines the speed with which respondents can download or upload information online.Hypertext markup language (HTML) is standard across browsers and platforms, but JavaScript(for example) does not always behave in an identical manner across different operating systems.While these variations give the respondent great flexibility in terms of how, when, and wherethey access the survey instrument, it presents design challenges in term of ensuring a consistentlook and feel for all respondents in the survey.Finally, a feature of Web surveys that has already been widely exploited by Web surveydesigners is that it is a visually rich medium. It is true that other modes are visual too — forexample, pictures or images have been used in paper surveys. But it is the ease with which visualelements can be introduced in Web surveys that makes them distinctive as a data collectionmode. The visual nature of web surveys means much more than just pictures. Visual elementsinclude colors, shapes, symbols, drawings, images, photographs, and videos. The cost of addinga full-color image to a Web survey is trivial. Visual in this sense extends beyond the wordsappearing on the Web page, and can extend to full multimedia presentation, using both soundand video. The visual richness of the medium brings many opportunities to enhance and extendsurvey measurement, and is one of the most exciting features of Web surveys. On the other hand,the broad array of visual enhancements also brings the risk of affecting measurement in ways notyet fully understood.Together these characteristics make the design of Web surveys more important than in manyother modes of data collection. As already noted, a great deal of research has already beenconducted on alternative designs for Web surveys, and such research is continuing. It is notpossible to summarize this vast literature here. Rather, I will present a few selected examples ofkey design issues to illustrate the importance of design for optimizing data quality and 29

×