Vast amounts of survey data are collected for many purposes, including governmental information, public opinion and election surveys, advertising and market research as well as scientific research
Survey data underlie many public policy and business decisions
Good quality data reduces the risk of poor policies and decisions and is of crucial importance
2. Survey Data
• Vast amounts of survey data are collected for many
purposes, including governmental information, public
opinion and election surveys, advertising and market
research as well as scientific research
• Survey data underlie many public policy and
business decisions
• Good quality data reduces the risk of poor policies
and decisions and is of crucial importance
3. Survey challenges
• Budgets are severely constrained (survey costs)
• Pressures on providing timely data are greater in the
digital age
• Public interest in participating in surveys is declining
and now at all time low (response rates)
• When cooperation obtained from reluctant
respondents, responses may be less accurate
• New modes of data collection introduce new
concerns for data quality (errors)
4. Definition
• Quality can be defined simply as “fitness for
use”
• Quality is a requirement for survey data to be as
accurate as necessary to achieve their intended
purposes, be available at the time it is needed
(timely), and be accessible to those for whom the
survey was conducted.
Biemer and Lyberg (2003)
5. Total Survey Quality (TSQ)
Total Survey Quality (TSQ)
TSQ – survey quality is more than its accuracy or statistical
dimension. It also includes among other factors producing
results that fit the needs of the survey users and providing
results that users will have confidence in. Usability of results
is of crucial importance.
Statistical
Dimension
Non-statistical
Dimension
6. TSQ: Quality Dimensions –Statistical
• Accuracy of estimates is the difference between
the estimate and the true parameter value
• Accuracy is the larger concept of TSQ
X = T + e
Observed
item
True value Error
Variance
(random error)
Bias (systematic
error)
7. TSQ: Quality Dimensions – Non-statistical
• Relevance - product is relevant and meets user needs
• Timeliness and punctuality – in disseminating results
(most important user needs)
• Accessibility and clarity – of the information
• Comparability - reliable comparisons across space and
time are often crucial; cross-national comparisons
• Coherence - single source – elementary concepts can
be combined in more complex ways; different sources –
based on common definitions, classifications and
methodological standards
• Completeness - data are rich enough to satisfy the
analysis objectives
8. TSQ: Quality Dimensions – Non-statistical
• Credibility – credible methodology
• Interpretability – documentation is clear
• Richness of detail - data are rich enough to
satisfy the analysis objectives
• Level of confidentiality protection
• Cost – data give good value for money
9. Total Survey Error (TSE)
• TSE concept was developed by Robert Groves
(1989) in book on Survey Errors and Survey Costs
• Survey estimates are derived from complex survey
data, published estimates may differ from their true
parameter values due to survey errors
• Total Survey Error is the difference between a
population mean, total, or other population
parameter and the estimate of the parameter based
on the sample survey (or census) (Biemer and
Lyberg, 2003)
11. Sources of Sampling Error
Sampling errors – can be computed for probability
samples and are due to selecting a sample instead of the
entire population
Sources:
• Sampling scheme
• Sample size
• Estimator choice
12. Components of Non-sampling Error
Non-sampling errors (including measurement error – cannot be
formally estimated but can be improved by interviewing
procedures and question wordings etc.) - are errors due to
mistakes or system deficiencies, also from incomplete responses
to the survey or its questions, etc.
1. Specification error
2. Frame error
3. Nonresponse error
4. Measurement error
5. Data processing error
6. Modelling/Estimation error
13. Other Important Factors
A number of additional factors can impact survey
data quality. Four of the more important:
• the length of time the survey was fielded,
• the use of incentives,
• the reputation of the organisation conducting the
survey
• mode of data collection
14. Actors affecting data quality
• Respondents (respondents’ effects on data quality):
e.g., response styles, satisficing (less efforts to
provide optimal responses)
• Interviewers (interviewers’ effects on data quality):
e.g., fabrication, ability to elicit interest and
commitment to survey in respondents, duration of
interview, duplication of responses apart from say
demographic
• Supervisors and survey research organisations
(supervisors’ effects on data quality), e.g. sampling
design, training of field workers
15. Data quality monitoring strategies
• Continuous quality improvement (CQI)
– Special cause variation – errors made by individual coders
– Common cause variation – errors due to the process itself
• Responsive design (Groves and Heeringa, 2006)
• Adaptive design – real time control of costs and
errors – (Schouten et al. 2013)
• Adaptive Total Design (Biemer, 2010) – adaptive
design strategy that combines ides of CQI and the
TSE paradigm to reduce costs and error across
multiple survey processes
• Six Sigma – set of principles and strategies for
improving any process
16. Data Quality in Practice
• No instance where a total survey quality (TSQ)
measure has ever been calculated or combined
single measure of quality taking all dimensions into
account
• Cost-benefit trade-offs to minimise different errors
depending on survey aims
• Quality reports or quality declarations have been
used where information on each dimension is
provided
• Data quality guides are meant to alert the data user
to potential sources of bias that might be present
17. Conclusions (1)
• Data quality is a multi-dimensional concept
• Single score or measure of data quality is not available
• Cost-benefit trade-offs to minimize different errors depending
on survey aims
• Quality frameworks are developed and adopted
• Broad range of relevant data quality indicators and information
are available with data
• The chances of users misusing the data or misinterpreting
published statistics is reduced if they understand better the
strengths and limitations of the data.
• New technologies require fresh considerations of data quality
issues in new types of surveys
18. Conclusions (2)
High quality of the survey data brings
– improvement in the quality of surveys themselves
– improvement of the quality of research and of policy
and financial decisions that are based on the survey
data
19. References
• Biemer (2010) Total survey error: Design, implementation, and evaluation. Public
Opinion Quarterly, 74(5): 817-848.
• Biemer (2016) Total Survey Error Paradigm: Theory and Practice. In The Sage
handbook of survey methodology by Wolf, Joye, Smith and Fu. London: SAGE
publications.
• Biemer and Lyberg (2003) Introduction to survey quality. New York: John Wiley & Sons.
• Groves and Heeringa (2006) Responsive design for household surveys: Tools for
actively controlling survey errors and costs. Journal of the Royal Statistical Society
Series A, 169 (3): 439-457.
• Lynn (2004) Editorial: Measuring and communicating survey quality. Journal of the Royal
Statistical Society Series A, 167 (4): 575-578.
• Lyberg and Weisberg (2016) The SAGE handbook of survey methodology. London:
SAGE publications.
• Schouten et al. (2013) Optimizing quality of response through adaptive survey designs.
Survey Methodology, 39 (1): 29-39.
• Weisberg (2005) The total survey error approach. Chicago: University of Chicago Press.
Editor's Notes
So data quality is crucial
All these require reconsideration of existing data quality frameworks. New aspects should be taken into account
Opt-in panels
Big Data
Difficult to convince clients that traditional probability surveys are the best and worth extra costs
Bias – mean of errors is not equal to 0, does not cancel out; variance – mean of error is equal to 0, does cancel out
Accuracy is The larger concept of Total Survey Quality (TSQ)
Broader that accuracy definition is needed as users are not just interested in the accuracy of the estimates provided.
Accuracy is the cornestone of quality, since without it, sruvey data are of little use. If the data are erroneous, it does not help much if relevance, timeliness, accessibility, comparability, coherence and completeness are sufficient.
Relevance of statistical concept (product is relevant and meets user needs)
Timeliness and punctuality in disseminating results (one of the most important user needs)
Accessibility and clarity of the information
Comparability (reliable comparisons across space and time are often crucial; cross-national comparisons)
Coherence (single source – elementary concepts can be combined in more complex ways; different sources – based on common definitions, classifications and methodological standards)
Completeness (data are rich enough to satisfy the analysis objectives)
Credibility (credible methodology)
Interpretability – documentation is clear
Richness of detail
Level of confidentiality protection
Cost – data give good value for money
Credibility (credible methodology)
Interpretability – documentation is clear
Richness of detail
Level of confidentiality protection
Cost – data give good value for money
Simple random sampling is often neither possible nor cost-effective. Stratifying the sample can reduce the sampling error, clustering the sample can reduce costs but would increase the sampling error.
In many cases non-sampling error can be much more damaging than sampling error to estimates from surveys
Errors can be systematic or random and correlated or uncorrelated.
Uncorrelated (e.g., interviewer mistakenly records a “yes” answer as a “no”
Correlated (when interviewers take multiple interviewers and when cluster sampling is used – correlated errors increase the variance of estimates due to an effective sample size that is smaller than the intended one and thereby make it more difficult to achieve statistically significant results)
Measurement errors pose a serious limitation to the validity and usefulness of the information collected via survey. Having excellent samples representative of the target population, having high response rates, having complete data, etc. does us little good if our measurement instruments evoke responses that are fraught with error.
Measurement error is distinct from other survey errors and it is error that occurs when the recorded or observed value is different from the true value of the variable.
Reliability and validity are important in measurement error. Reliability is “agreement between two efforts to measure the same thing, using maximally similar methods”
How was the survey administered (e.g. in person, by telephone, online, multiple modes, etc.)? (sensitive questions)
Were the questions well constructed, clear, and not leading or otherwise biasing? (satisficing)
What steps, if any, were taken to ensure that respondents were providing truthful  answers to the questions, and were any respondents removed from the final dataset (e.g., identifying speeders, satisficers, multiple completions)? (in-survey behaviour)
How long was the survey in the field and how much effort was put to ensuring a good response?
What incentives, if any, were respondents offered to encourage participation? (monetary incentives could bias the survey responses towards lower income groups; some respondetns could rush through)
What is the record of accomplishment of the organization that conducted the survey? (organisation with long and successful records can inspire confidence)
For face to face and telephone interviews
potentially modify the survey design based on the analysis of paradata collected from relevant processes
CQI is methods emphasize imprivement in the underlying process rather than screening the product. Initital quality improvements efforts should focus on eliminating the special cause erros since thay re usually responsible for most of the errors in a process. Reducing the common cause errors will require changing the process in such a way that the error rate can be lowered. Approach have been adopted to control costs and errors in surveys. Uses number of quality management tools.
Continuous quality improvement (CQI) uses number of standard quality management tools: worksflow diagram, cause and effect (or fishbone) diagram, Pareto histograms, statistical process control methods and various production efficiency metrics
Responsive design (Groves and Heeringa, 2006) – strategy that includes some of the ideas and concepts and approaches of CQI while providing several innovative strategies that use paradata as well as survey data to monitor nonresponse bias and follow up efficiency and effectiveness
Adaptive design – real time control of costs and errors – Schouten et al. 2013 – strategy for tailoring key features of the survey design for different types of sample members maximizing response rates and reduce nonresponse selectivity – appropriate when substantial prior information about sample units is available
Adaptive Total Design (Biemer, 2010) – adaptive design strategy that combines ides of CQI and the TSE paradigm to reduce costs and error across multiple survey processes including frame consturction, sampling, data collection and data processing
Six Sigma – set of principles and strategies for improving any process
Report that provides comprehensive picture of the quality of a survey, addressing each potential source of error
Supplemental to regular survey documentation and should be based on information that is available in many different forms such as survey methodology reports, user manuals on how to use microdata files, and technical reports providing details about specifics.
Cost-benefit trade-offs are needed to decide which errors to minimize
Quality frameworks were developed and adopted and provided statistics producers with clear description of how certain dimensions of quality can be measured and why it might be important to do so.
The survey community needs to find ways of ensuring that as broad a range as possible of relevant indicators and information is made available routinely (Lynn 2004)
The chances of users misusing the data or misinterpreting published statistics will be reduced if they understand better the strengths and limitations of the data.
The publication of data quality measures itself represent an improvement in the quality of a survey