Jelke Bethlehem
Web panels for Official
Statistics
Official Statistics
 The mission of national statistical institutes
‐ Publishing reliable and accurate statistical
information that meets the needs of society.
‐ Commitment to quality: the quality of the statistical
information must be guaranteed.
 Challenges
‐ ICT developments.
‐ Decreasing response
rates.
‐ Decreasing budgets.
2
Data collection for population surveys
 Traditional data collection
‐ Face-to-face and telephone, paper.
‐ Interviewer-assisted.
‐ Good quality, slow, expensive.
 Computer-assisted interviewing
‐ CAPI, CATI.
‐ Interviewer-assisted.
‐ Better quality, fast, easier, expensive.
 Web surveys
‐ CAWI, cheaper, self-administered, quality issues.
3
Online data collection
 Single mode web surveys
‐ Must be based on probability sampling.
‐ Self-administered: quality issues.
‐ Low response rates (30%).
 Mixed-mode web surveys
‐ Sequential mixed-mode, start with web.
‐ Less expensive than CAPI or CATI.
‐ Normal response rates.
‐ Mode effects.
4
Web panels
 Why a web panel?
‐ Instrument for longitudinal research.
‐ Sampling frame for cross-sectional research.
‐ Quick surveys.
 Challenges
‐ Under-coverage (lack of internet-access).
‐ How to recruit a representative web panel?
‐ Nonresponse (in recruitment and surveys).
‐ Measurement errors (self-administered).
‐ Maintenance (attrition, panel conditioning).
5
Under-coverage
 The under-coverage problem
‐ People without internet cannot be a panel member.
‐ Those with internet differ from those without it.
‐ Therefore, estimates may be biased.
 The bias:
‐ The bias depends on internet coverage.
‐ The bias depends on the difference between those
with and without internet.
6
)YY(
N
N
YYY)y(E)y(B NII
NI
III
Under-coverage
 Internet coverage in Europe (2011)
7
Internet coverage
varies between
45% (Bulgaria)
and 94% (The
Netherlands)
Source: Eurostat
Under-coverage
 Under-represented groups
‐ Low-educated, ethnic minorities, elderly.
‐ Only 34% of people of age 75+ use internet (NL).
 Reducing under-coverage
‐ Provide free internet access to those without it.
‐ Make a mixed-mode panel with CAPI, CATI or mail
for those without internet.
‐ Maybe the problem will solve itself in time.
8
Recruitment
 Recruitment by means of self-selection (opt-in)
‐ People decide themselves whether or not to become
a member of the panel. No sample selection.
‐ Participation probabilities πk are unknown.
‐ Bias:
‐ Bias depends on average participation probability.
‐ Bias depends on variation of the probabilities.
‐ Bias depends on relationship between target
variable and participation behaviour.
9
Y,Y
SSSS
SSR
Y)y(E)y(B
Recruitment
 Other self-selection problems
‐ Also people from outside the target population can
become a member of the panel.
‐ Sometimes multiple membership is possible.
‐ Groups of people may attempt to manipulate the
outcomes of the polls.
 Conclusion
‐ A self-selection panel is out of the
question for general population
surveys.
10
Recruitment
 Recruitment by means of probability sampling
‐ Allows for unbiased estimation.
‐ Allows for computation of margins of error.
‐ Required: a sampling frame with email addresses.
‐ Such a sampling frame is not available.
‐ Solution: Different mode(s) for recruitment:
mail, CATI or CAPI (or a combination).
‐ Traditional sampling frames can be used.
‐ Disadvantage: makes a web panel
expensive.
11
Recruitment
 Recruitment from other surveys
‐ Build panel from respondents of previous CAPI or
CATI surveys.
‐ Respondents may have agreed to participate in
future surveys.
‐ Recruitment may be less expensive.
‐ But these respondents may be a selective group,
and therefore the resulting panel may lack
representativity.
12
Nonresponse
 The nonresponse problem
‐ Nonresponse leads to biased estimates.
‐ Bias:
‐ Bias depends on response rate.
‐ Bias depends on variation of response probabilities.
 Indicators
‐ Response rate
‐ Representativity indicator: R = 1 – 2 Sρ
13
Y,Y
RR
SSR
Y)y(E)y(B
Nonresponse
 Recruitment nonresponse
‐ High, as participation requires substantial
commitment.
‐ Bias reduction (adjustment weighting) difficult due
to lack of relevant auxiliary variables.
 Survey/wave nonresponse (attrition)
‐ May be low, as people agreed to participate.
‐ Plenty of auxiliary variables for bias reduction, e.g.
from profile survey.
14
Nonresponse
 Treatment of nonresponse
‐ Different treatment of recruitment and survey
nonresponse, as they are different phenomena.
‐ Treatment is only effective if response behaviour
can be explained by auxiliary variables.
‐ Treatment is only effective if target variable can be
explained by auxiliary variables.
‐ Consider reference survey for obtaining more
auxiliary variables.
15
Measurement errors
 What about the quality of the answers?
‐ CAPI and CATI are interviewer-assisted surveys, but
web surveys are self-administered.
‐ How strong are the effects of satisficing (not the best
answer, but a reasonable answer)?
‐ How to handle “don’t know”?
‐ Are there device-effects (desktop, laptop, tablet,
smartphone, etc)?
‐ Include consistency checks?
‐ Do results in the literature apply to official statistics?
16
Maintenance
 Panel must be kept stable over time
‐ Detected changes must be caused by real changes.
‐ How to handle attrition?
‐ How to handle panel conditioning?
 Refreshment
‐ Refreshment is costly.
‐ Add a random sample from the population, or focus
on under-represented groups?
‐ Estimation more complex due to varying selection
probabilities.
17
A web panel pilot
 Objectives
‐ Getting experience with setting up a web panel.
‐ Getting more information about the costs.
‐ Using a simple tool (NetQ), not yet Blaise.
 Recruitment
‐ Invite respondents from Mobility Survey (OViN).
‐ This was a mixed-mode survey (web-CATI-CAPI).
‐ Recruitment by mail.
‐ Inference with respect to OViN respondents.
18
A web panel pilot
 Recruitment process
‐ Response rates:
‐ Ultimate response rate is very low.
19
Step n % of sample % of previous
Sample 12046
Response OViN 6928 57.5 57.5
Willingness 4251 35.3 61.4
Selected 4227 35.1 99.4
Registered 1231 10.2 29.1
Participates 1134 9.4 92.1
A web panel pilot
 Recruitment process
‐ Representativity:
‐ Representativity improves.
‐ There is a risk of a large bias.
20
Step n R-indicator
Sample 12046
Response OViN 6928 0.784
Willingness 4251 0.843
Selected 4227 0.842
Registered 1231 0.883
A web panel pilot
 Recruitment process
‐ Relation with OViN recruitment
‐ Higher participation rates for web respondents.
‐ Socially desirable answers in recruitment?
21
Mode Response
OViN
Willing
(% of response)
In panel
(% of willing)
Web 2370 55.4 55.4
CATI 2946 59.9 16.9
CAPI 1612 72.8 17.5
A web panel pilot
 Estimation
‐ Two target variables of which the values are known
for all OViN respondents: Level of education and
employment status.
‐ They are related to many other target variables.
 Questions
‐ How close are panel estimates to OViN response?
‐ Does weighting adjustment help?
‐ Weight model: age × income × soc-eco-class.
22
A web panel pilot
 Estimation for level of education
‐ The bias is somewhat smaller, but remains
substantial.
23
Level of education Panel Weighted OViN
Primary 2.6 4.3 5.5
Lower secondary 15.2 16.5 21.0
Higher secondary 34.4 35.8 37.6
Bachelor/master 45.5 40.6 33.6
A web panel pilot
 Estimation for employment status
‐ Correction too strong, too weak, or in wrong
direction.
24
Employment Panel Weighted OViN
Housewife/man 11.9 12.2 12.5
Pension 16.8 17.8 14.7
School/student 6.1 10.6 9.8
Disabled 2.4 2.8 2.8
Unemployed 1.9 2.1 2.3
Employed 59.2 52.4 56.1
Web panels for Official Statistics
 Conclusions
‐ Under-coverage is a problem that can be solved.
‐ Recruitment by means of probability sampling.
‐ Recruitment of a representative panel is expensive.
‐ Recruitment nonresponse is high.
‐ Relevant auxiliary variables are required to reduce
nonresponse bias.
‐ More research is required with respect to
measurement errors.
‐ A panel maintenance strategy must be implemented.
25

Web Panels in Official Statistics

  • 1.
    Jelke Bethlehem Web panelsfor Official Statistics
  • 2.
    Official Statistics  Themission of national statistical institutes ‐ Publishing reliable and accurate statistical information that meets the needs of society. ‐ Commitment to quality: the quality of the statistical information must be guaranteed.  Challenges ‐ ICT developments. ‐ Decreasing response rates. ‐ Decreasing budgets. 2
  • 3.
    Data collection forpopulation surveys  Traditional data collection ‐ Face-to-face and telephone, paper. ‐ Interviewer-assisted. ‐ Good quality, slow, expensive.  Computer-assisted interviewing ‐ CAPI, CATI. ‐ Interviewer-assisted. ‐ Better quality, fast, easier, expensive.  Web surveys ‐ CAWI, cheaper, self-administered, quality issues. 3
  • 4.
    Online data collection Single mode web surveys ‐ Must be based on probability sampling. ‐ Self-administered: quality issues. ‐ Low response rates (30%).  Mixed-mode web surveys ‐ Sequential mixed-mode, start with web. ‐ Less expensive than CAPI or CATI. ‐ Normal response rates. ‐ Mode effects. 4
  • 5.
    Web panels  Whya web panel? ‐ Instrument for longitudinal research. ‐ Sampling frame for cross-sectional research. ‐ Quick surveys.  Challenges ‐ Under-coverage (lack of internet-access). ‐ How to recruit a representative web panel? ‐ Nonresponse (in recruitment and surveys). ‐ Measurement errors (self-administered). ‐ Maintenance (attrition, panel conditioning). 5
  • 6.
    Under-coverage  The under-coverageproblem ‐ People without internet cannot be a panel member. ‐ Those with internet differ from those without it. ‐ Therefore, estimates may be biased.  The bias: ‐ The bias depends on internet coverage. ‐ The bias depends on the difference between those with and without internet. 6 )YY( N N YYY)y(E)y(B NII NI III
  • 7.
    Under-coverage  Internet coveragein Europe (2011) 7 Internet coverage varies between 45% (Bulgaria) and 94% (The Netherlands) Source: Eurostat
  • 8.
    Under-coverage  Under-represented groups ‐Low-educated, ethnic minorities, elderly. ‐ Only 34% of people of age 75+ use internet (NL).  Reducing under-coverage ‐ Provide free internet access to those without it. ‐ Make a mixed-mode panel with CAPI, CATI or mail for those without internet. ‐ Maybe the problem will solve itself in time. 8
  • 9.
    Recruitment  Recruitment bymeans of self-selection (opt-in) ‐ People decide themselves whether or not to become a member of the panel. No sample selection. ‐ Participation probabilities πk are unknown. ‐ Bias: ‐ Bias depends on average participation probability. ‐ Bias depends on variation of the probabilities. ‐ Bias depends on relationship between target variable and participation behaviour. 9 Y,Y SSSS SSR Y)y(E)y(B
  • 10.
    Recruitment  Other self-selectionproblems ‐ Also people from outside the target population can become a member of the panel. ‐ Sometimes multiple membership is possible. ‐ Groups of people may attempt to manipulate the outcomes of the polls.  Conclusion ‐ A self-selection panel is out of the question for general population surveys. 10
  • 11.
    Recruitment  Recruitment bymeans of probability sampling ‐ Allows for unbiased estimation. ‐ Allows for computation of margins of error. ‐ Required: a sampling frame with email addresses. ‐ Such a sampling frame is not available. ‐ Solution: Different mode(s) for recruitment: mail, CATI or CAPI (or a combination). ‐ Traditional sampling frames can be used. ‐ Disadvantage: makes a web panel expensive. 11
  • 12.
    Recruitment  Recruitment fromother surveys ‐ Build panel from respondents of previous CAPI or CATI surveys. ‐ Respondents may have agreed to participate in future surveys. ‐ Recruitment may be less expensive. ‐ But these respondents may be a selective group, and therefore the resulting panel may lack representativity. 12
  • 13.
    Nonresponse  The nonresponseproblem ‐ Nonresponse leads to biased estimates. ‐ Bias: ‐ Bias depends on response rate. ‐ Bias depends on variation of response probabilities.  Indicators ‐ Response rate ‐ Representativity indicator: R = 1 – 2 Sρ 13 Y,Y RR SSR Y)y(E)y(B
  • 14.
    Nonresponse  Recruitment nonresponse ‐High, as participation requires substantial commitment. ‐ Bias reduction (adjustment weighting) difficult due to lack of relevant auxiliary variables.  Survey/wave nonresponse (attrition) ‐ May be low, as people agreed to participate. ‐ Plenty of auxiliary variables for bias reduction, e.g. from profile survey. 14
  • 15.
    Nonresponse  Treatment ofnonresponse ‐ Different treatment of recruitment and survey nonresponse, as they are different phenomena. ‐ Treatment is only effective if response behaviour can be explained by auxiliary variables. ‐ Treatment is only effective if target variable can be explained by auxiliary variables. ‐ Consider reference survey for obtaining more auxiliary variables. 15
  • 16.
    Measurement errors  Whatabout the quality of the answers? ‐ CAPI and CATI are interviewer-assisted surveys, but web surveys are self-administered. ‐ How strong are the effects of satisficing (not the best answer, but a reasonable answer)? ‐ How to handle “don’t know”? ‐ Are there device-effects (desktop, laptop, tablet, smartphone, etc)? ‐ Include consistency checks? ‐ Do results in the literature apply to official statistics? 16
  • 17.
    Maintenance  Panel mustbe kept stable over time ‐ Detected changes must be caused by real changes. ‐ How to handle attrition? ‐ How to handle panel conditioning?  Refreshment ‐ Refreshment is costly. ‐ Add a random sample from the population, or focus on under-represented groups? ‐ Estimation more complex due to varying selection probabilities. 17
  • 18.
    A web panelpilot  Objectives ‐ Getting experience with setting up a web panel. ‐ Getting more information about the costs. ‐ Using a simple tool (NetQ), not yet Blaise.  Recruitment ‐ Invite respondents from Mobility Survey (OViN). ‐ This was a mixed-mode survey (web-CATI-CAPI). ‐ Recruitment by mail. ‐ Inference with respect to OViN respondents. 18
  • 19.
    A web panelpilot  Recruitment process ‐ Response rates: ‐ Ultimate response rate is very low. 19 Step n % of sample % of previous Sample 12046 Response OViN 6928 57.5 57.5 Willingness 4251 35.3 61.4 Selected 4227 35.1 99.4 Registered 1231 10.2 29.1 Participates 1134 9.4 92.1
  • 20.
    A web panelpilot  Recruitment process ‐ Representativity: ‐ Representativity improves. ‐ There is a risk of a large bias. 20 Step n R-indicator Sample 12046 Response OViN 6928 0.784 Willingness 4251 0.843 Selected 4227 0.842 Registered 1231 0.883
  • 21.
    A web panelpilot  Recruitment process ‐ Relation with OViN recruitment ‐ Higher participation rates for web respondents. ‐ Socially desirable answers in recruitment? 21 Mode Response OViN Willing (% of response) In panel (% of willing) Web 2370 55.4 55.4 CATI 2946 59.9 16.9 CAPI 1612 72.8 17.5
  • 22.
    A web panelpilot  Estimation ‐ Two target variables of which the values are known for all OViN respondents: Level of education and employment status. ‐ They are related to many other target variables.  Questions ‐ How close are panel estimates to OViN response? ‐ Does weighting adjustment help? ‐ Weight model: age × income × soc-eco-class. 22
  • 23.
    A web panelpilot  Estimation for level of education ‐ The bias is somewhat smaller, but remains substantial. 23 Level of education Panel Weighted OViN Primary 2.6 4.3 5.5 Lower secondary 15.2 16.5 21.0 Higher secondary 34.4 35.8 37.6 Bachelor/master 45.5 40.6 33.6
  • 24.
    A web panelpilot  Estimation for employment status ‐ Correction too strong, too weak, or in wrong direction. 24 Employment Panel Weighted OViN Housewife/man 11.9 12.2 12.5 Pension 16.8 17.8 14.7 School/student 6.1 10.6 9.8 Disabled 2.4 2.8 2.8 Unemployed 1.9 2.1 2.3 Employed 59.2 52.4 56.1
  • 25.
    Web panels forOfficial Statistics  Conclusions ‐ Under-coverage is a problem that can be solved. ‐ Recruitment by means of probability sampling. ‐ Recruitment of a representative panel is expensive. ‐ Recruitment nonresponse is high. ‐ Relevant auxiliary variables are required to reduce nonresponse bias. ‐ More research is required with respect to measurement errors. ‐ A panel maintenance strategy must be implemented. 25