3. Overview of the Survey Process
Data Quality and Total Survey Error
Decomposing Nonsampling Error into Its
Component Parts
Gauging the Magnitude of Total Survey Error
Mean Squared Error
Illustration of the Concepts
Topics to be Discussed
4. There is an insatiable need today for timely and
accurate information in government, business,
education, science, and in our personal lives.
The word survey is often used to describe a method of
gathering information from a sample of units, a
fraction of the persons, households, agencies, and so
on, in the population that is to be studied.
5. Figure 1. Survey process.
The planning stages of the
survey process are largely
iterative. At each stage of
planning, new information may
be revealed regarding the
feasibility of the design. The
research objectives,
questionnaire, target population,
sampling design, and
implementation strategy may be
revised several times prior to
implementing the survey.
6. Determining the Research Objectives.
Defining the Target Population.
Determining the Mode of Administration.
Developing the Questionnaire.
8. Designing the Sampling Approach.
Developing Data Collection and Data Processing
Plans.
Collecting and Processing the Data.
Estimation and Data Analysis.
9. To many users of survey data, data quality is purely a
function of the amount of error in the data. If the data are
perfectly accurate, the data are of high quality. If the data
contain a large amount of error, the data are poor quality.
For estimates of population parameters (such as means,
totals, proportions, correlations coefficients, etc.),
essentially the same criteria for data quality can be applied.
10. Assuming that a proper estimator of the population
parameter is used, an estimate of a population
parameter is of high quality if the data on which the
estimate is based are high quality. Conversely, if the
data themselves are of poor quality, the estimates will
also be poor quality.
11. However, in the case of estimates, the sample size on
which the estimates are based is also important
determinant of quality. Even if the data are high quality,
an estimate based on too few observations will be
unreliable and potentially unusable.
12. Thus, the quality of an estimator of a population
parameter is a function of the total survey error, which
includes components of error that arise solely as result
of drawing a sample rather than conducting a complete
census called sampling error components, as well as
other components that are related to the data collection
and processing procedures, called nonsampling error
components.
13. Figure 2. Total survey error. Total survey error can be partitioned
into two types of components: sampling error and nonsampling error.
14. Consider, a very simple survey aimed at estimating the
average annual income of all workers in a small
community of 5,000 workers.
Thus, the population parameter in this case is the
average income over all 5,000 workers.
Suppose the survey designer determines that a sample of
400 employees drawn at random from the community
population should be sufficient to provide an adequate
estimate of the population average income.
15. Suppose that average annual income for the persons in
the sample is $32,981.
Finally, suppose that the actual population average
annual income for this community is $35,181.
In this case, the total survey error in the estimate is:
$32,981 - $35,181 = - $2,200.
16. One major reason that a survey estimate will not be
identical to the population parameter is sampling error,
the difference between the estimate and the parameter
as a result only taking a sample of 400 workers in the
community instead of the entire population of 5,000
workers (i.e. a complete census).
17. The respondent.
The interviewer.
Refusals to participate.
Data entry.
The cumulative effect of these errors constitutes the
nonsampling error component of the total survey error.
In our example, nonsampling errors could arise from the
following sources:
18. The objective of any survey design is to minimize the
total survey error in the estimates subject to the
constraints imposed by the budget and other resources
available for the survey.
19. Table 2. Five Major Sources of Nonsampling Error and Their Potential Causes
20. Figure 3. Range of estimates produced by a sample survey subject to sampling
error, variable error, and systematic error. Shown is the range of possible
estimates of average income for a sample of size 400. The range is much
smaller with sampling error only, and when systematic nonsampling error is
introduced, the range of possible estimates may not even cover the true value.
21. The development of a survey design involves many
decisions that can affect the total survey estimate.
Making the correct design decision requires the
simultaneous consideration of many quality, cost, and
timeliness factors and choosing the combination of
design elements that minimizes the total survey error
while meeting the budget and schedule constraints.
22. An important aid in the design process is a means of
quantifying the total error in a survey process.
In this way, alternative survey designs can be compared
not only on the basis of cost and timeliness, but also in
terms of their total survey error.
23. There are many ways of quantifying the total survey
error for a survey estimate.
However, one measure that is used most often in the
survey literature is the total mean squared error (MSE).
Each estimate that will be computed from the survey
data has a corresponding MSE which reflects the effects
on the estimate of all sources of error.
24. The MSE gauges the magnitude of the total survey error,
or more precisely, the magnitude of the effect of total
survey error on the particular estimate of interest. A
small MSE indicates that total survey error is also small
and under control. A large MSE indicates that one or
more sources of error are adversely affecting the
accuracy of the estimate.
25. The primary objective of survey design is to
minimize the Mean Squared Errors (MSEs) of the
key survey estimates while staying within budget
and on schedule.
This information is important since it can influence the way the data
are used as well as the way the data are collected in the future should
the survey ever be repeated.
26. Variable Errors
Systematic Errors
Effects of Systematic and Variable Errors on
Estimates
Error Sources Can Produce Both Systematic and
Variable Errors
27. Figure 4. Two types of nonsampling error. All nonsampling error
sources produce variable error, systematic error, or both. Systematic error leads
to biased estimates; variable error affects the variance of an estimator.
28. Figure 5. Systematic and variable error expressed as targets. If these targets
represented the error in two survey designs, which survey design would you choose?
The survey design in part (a) produces estimates having a large variance & a small bias,
while the one in part (b) produces estimates having a small variance & a large bias.
29. The mean squared error (MSE) of the marksman’s aim is
the average squared distance between the hits on the
target and the bull’s-eye. The MSE of a survey estimate
is the average squared difference between the
estimates produced by many hypothetical repetitions of
the survey process (corresponding to the hits on the
target) and the population parameter value
(corresponding to the bull’s-eye).
30.
31.
32. Figure 6. Truth is unknown. Determining the better survey process when the
true value of the population parameter is unknown is like trying to judge the accuracy
of two marksmen without having a bull’s-eye. All that is known about the sets of survey
estimates is that the survey design in part (a) produces a large variance and the survey
design in part (b) produces a small variance.
36. Figure 7. Comparison of the total mean squared error for three survey
designs. Design A is preferred since it has the smallest total error, even though
sampling error is largest for this design.
37. The mean squared error is the sum of the total bias squared plus the
variance components for all the various sources of error in the
survey design.
Costs, timeliness, and other quality dimensions being equal for
competing designs, the optimal design is the one achieving the
smallest mean squared error.
The contributions of nonsampling error components to total survey
error can be many times larger than the sampling error contribution.
Choosing a survey design on the basis of sampling error or variance
alone may lead to a suboptimal design with respect to total data
quality.
Editor's Notes
As an example, consider two survey designs, design A and design B, and suppose that both designs meet the budget and schedule constraints for the survey. However, for the key characteristic to be measured in the study (e.g., the income question), the total error in the estimate for design A is + or - $3780, while the total error in estimate for design B is only + or - $1200. Since design B has a much smaller error, this is the design of choice, all other things being equal. In this way, having a way of summarizing and quantifying the total error in a survey process can provide a method for choosing between competing designs.