SlideShare a Scribd company logo
1 of 38
Introduction to
Survey Quality
The Survey Process and
Data Quality
 Overview of the Survey Process
 Data Quality and Total Survey Error
 Decomposing Nonsampling Error into Its
Component Parts
 Gauging the Magnitude of Total Survey Error
 Mean Squared Error
 Illustration of the Concepts
Topics to be Discussed
There is an insatiable need today for timely and
accurate information in government, business,
education, science, and in our personal lives.
The word survey is often used to describe a method of
gathering information from a sample of units, a
fraction of the persons, households, agencies, and so
on, in the population that is to be studied.
Figure 1. Survey process.
The planning stages of the
survey process are largely
iterative. At each stage of
planning, new information may
be revealed regarding the
feasibility of the design. The
research objectives,
questionnaire, target population,
sampling design, and
implementation strategy may be
revised several times prior to
implementing the survey.
 Determining the Research Objectives.
 Defining the Target Population.
 Determining the Mode of Administration.
 Developing the Questionnaire.
Table 1. Correspondence Between Research Questions and Survey Questionsa
 Designing the Sampling Approach.
 Developing Data Collection and Data Processing
Plans.
 Collecting and Processing the Data.
 Estimation and Data Analysis.
To many users of survey data, data quality is purely a
function of the amount of error in the data. If the data are
perfectly accurate, the data are of high quality. If the data
contain a large amount of error, the data are poor quality.
For estimates of population parameters (such as means,
totals, proportions, correlations coefficients, etc.),
essentially the same criteria for data quality can be applied.
Assuming that a proper estimator of the population
parameter is used, an estimate of a population
parameter is of high quality if the data on which the
estimate is based are high quality. Conversely, if the
data themselves are of poor quality, the estimates will
also be poor quality.
However, in the case of estimates, the sample size on
which the estimates are based is also important
determinant of quality. Even if the data are high quality,
an estimate based on too few observations will be
unreliable and potentially unusable.
Thus, the quality of an estimator of a population
parameter is a function of the total survey error, which
includes components of error that arise solely as result
of drawing a sample rather than conducting a complete
census called sampling error components, as well as
other components that are related to the data collection
and processing procedures, called nonsampling error
components.
Figure 2. Total survey error. Total survey error can be partitioned
into two types of components: sampling error and nonsampling error.
Consider, a very simple survey aimed at estimating the
average annual income of all workers in a small
community of 5,000 workers.
Thus, the population parameter in this case is the
average income over all 5,000 workers.
Suppose the survey designer determines that a sample of
400 employees drawn at random from the community
population should be sufficient to provide an adequate
estimate of the population average income.
Suppose that average annual income for the persons in
the sample is $32,981.
Finally, suppose that the actual population average
annual income for this community is $35,181.
In this case, the total survey error in the estimate is:
$32,981 - $35,181 = - $2,200.
One major reason that a survey estimate will not be
identical to the population parameter is sampling error,
the difference between the estimate and the parameter
as a result only taking a sample of 400 workers in the
community instead of the entire population of 5,000
workers (i.e. a complete census).
 The respondent.
 The interviewer.
 Refusals to participate.
 Data entry.
The cumulative effect of these errors constitutes the
nonsampling error component of the total survey error.
In our example, nonsampling errors could arise from the
following sources:
The objective of any survey design is to minimize the
total survey error in the estimates subject to the
constraints imposed by the budget and other resources
available for the survey.
Table 2. Five Major Sources of Nonsampling Error and Their Potential Causes
Figure 3. Range of estimates produced by a sample survey subject to sampling
error, variable error, and systematic error. Shown is the range of possible
estimates of average income for a sample of size 400. The range is much
smaller with sampling error only, and when systematic nonsampling error is
introduced, the range of possible estimates may not even cover the true value.
The development of a survey design involves many
decisions that can affect the total survey estimate.
Making the correct design decision requires the
simultaneous consideration of many quality, cost, and
timeliness factors and choosing the combination of
design elements that minimizes the total survey error
while meeting the budget and schedule constraints.
An important aid in the design process is a means of
quantifying the total error in a survey process.
In this way, alternative survey designs can be compared
not only on the basis of cost and timeliness, but also in
terms of their total survey error.
There are many ways of quantifying the total survey
error for a survey estimate.
However, one measure that is used most often in the
survey literature is the total mean squared error (MSE).
Each estimate that will be computed from the survey
data has a corresponding MSE which reflects the effects
on the estimate of all sources of error.
The MSE gauges the magnitude of the total survey error,
or more precisely, the magnitude of the effect of total
survey error on the particular estimate of interest. A
small MSE indicates that total survey error is also small
and under control. A large MSE indicates that one or
more sources of error are adversely affecting the
accuracy of the estimate.
The primary objective of survey design is to
minimize the Mean Squared Errors (MSEs) of the
key survey estimates while staying within budget
and on schedule.
This information is important since it can influence the way the data
are used as well as the way the data are collected in the future should
the survey ever be repeated.
 Variable Errors
 Systematic Errors
 Effects of Systematic and Variable Errors on
Estimates
 Error Sources Can Produce Both Systematic and
Variable Errors
Figure 4. Two types of nonsampling error. All nonsampling error
sources produce variable error, systematic error, or both. Systematic error leads
to biased estimates; variable error affects the variance of an estimator.
Figure 5. Systematic and variable error expressed as targets. If these targets
represented the error in two survey designs, which survey design would you choose?
The survey design in part (a) produces estimates having a large variance & a small bias,
while the one in part (b) produces estimates having a small variance & a large bias.
The mean squared error (MSE) of the marksman’s aim is
the average squared distance between the hits on the
target and the bull’s-eye. The MSE of a survey estimate
is the average squared difference between the
estimates produced by many hypothetical repetitions of
the survey process (corresponding to the hits on the
target) and the population parameter value
(corresponding to the bull’s-eye).
Figure 6. Truth is unknown. Determining the better survey process when the
true value of the population parameter is unknown is like trying to judge the accuracy
of two marksmen without having a bull’s-eye. All that is known about the sets of survey
estimates is that the survey design in part (a) produces a large variance and the survey
design in part (b) produces a small variance.
Table 3. Computation of the Mean Squared Error Using Method 1a
Table 4. Computation of the Mean Squared Error Using Method 2a
Figure 7. Comparison of the total mean squared error for three survey
designs. Design A is preferred since it has the smallest total error, even though
sampling error is largest for this design.
 The mean squared error is the sum of the total bias squared plus the
variance components for all the various sources of error in the
survey design.
 Costs, timeliness, and other quality dimensions being equal for
competing designs, the optimal design is the one achieving the
smallest mean squared error.
 The contributions of nonsampling error components to total survey
error can be many times larger than the sampling error contribution.
 Choosing a survey design on the basis of sampling error or variance
alone may lead to a suboptimal design with respect to total data
quality.
Pertemuan 3 & 4 - Pengendalian Mutu Statistik.pptx

More Related Content

Similar to Pertemuan 3 & 4 - Pengendalian Mutu Statistik.pptx

Chaplowe - M&E Planning 2008 - shortcuts
Chaplowe - M&E Planning 2008 - shortcutsChaplowe - M&E Planning 2008 - shortcuts
Chaplowe - M&E Planning 2008 - shortcutssgchaplowe
 
A Novel Performance Measure for Machine Learning Classification
A Novel Performance Measure for Machine Learning ClassificationA Novel Performance Measure for Machine Learning Classification
A Novel Performance Measure for Machine Learning ClassificationIJMIT JOURNAL
 
A NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATION
A NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATIONA NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATION
A NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATIONIJMIT JOURNAL
 
A Novel Performance Measure For Machine Learning Classification
A Novel Performance Measure For Machine Learning ClassificationA Novel Performance Measure For Machine Learning Classification
A Novel Performance Measure For Machine Learning ClassificationKarin Faust
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptxVishalLabde
 
Executive Program Practical Connection Assignment - 100 poin
Executive Program Practical Connection Assignment - 100 poinExecutive Program Practical Connection Assignment - 100 poin
Executive Program Practical Connection Assignment - 100 poinBetseyCalderon89
 
Using Investigative Analytics to Speed New Drugs to Market
Using Investigative Analytics to Speed New Drugs to MarketUsing Investigative Analytics to Speed New Drugs to Market
Using Investigative Analytics to Speed New Drugs to MarketCognizant
 
Software Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking SchemeSoftware Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking SchemeEditor IJMTER
 
Sample size determination.pptx
Sample size determination.pptxSample size determination.pptx
Sample size determination.pptxMohammedAbdela7
 
Introduction to Epi Info.pdf
Introduction to Epi Info.pdfIntroduction to Epi Info.pdf
Introduction to Epi Info.pdfYomif3
 
Certified Specialist Business Intelligence (.docx
Certified     Specialist     Business  Intelligence     (.docxCertified     Specialist     Business  Intelligence     (.docx
Certified Specialist Business Intelligence (.docxdurantheseldine
 
Sample size
Sample sizeSample size
Sample sizezubis
 
Cenduit_Whitepaper_Forecasting_Present_14June2016
Cenduit_Whitepaper_Forecasting_Present_14June2016Cenduit_Whitepaper_Forecasting_Present_14June2016
Cenduit_Whitepaper_Forecasting_Present_14June2016Praveen Chand
 
Research Method for Business chapter 11-12-14
Research Method for Business chapter 11-12-14Research Method for Business chapter 11-12-14
Research Method for Business chapter 11-12-14Mazhar Poohlah
 

Similar to Pertemuan 3 & 4 - Pengendalian Mutu Statistik.pptx (20)

Chaplowe - M&E Planning 2008 - shortcuts
Chaplowe - M&E Planning 2008 - shortcutsChaplowe - M&E Planning 2008 - shortcuts
Chaplowe - M&E Planning 2008 - shortcuts
 
A Novel Performance Measure for Machine Learning Classification
A Novel Performance Measure for Machine Learning ClassificationA Novel Performance Measure for Machine Learning Classification
A Novel Performance Measure for Machine Learning Classification
 
A NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATION
A NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATIONA NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATION
A NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATION
 
A Novel Performance Measure For Machine Learning Classification
A Novel Performance Measure For Machine Learning ClassificationA Novel Performance Measure For Machine Learning Classification
A Novel Performance Measure For Machine Learning Classification
 
1.pdf
1.pdf1.pdf
1.pdf
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
Executive Program Practical Connection Assignment - 100 poin
Executive Program Practical Connection Assignment - 100 poinExecutive Program Practical Connection Assignment - 100 poin
Executive Program Practical Connection Assignment - 100 poin
 
Data quality: total survey error
Data quality: total survey errorData quality: total survey error
Data quality: total survey error
 
Sampling method
Sampling methodSampling method
Sampling method
 
Using Investigative Analytics to Speed New Drugs to Market
Using Investigative Analytics to Speed New Drugs to MarketUsing Investigative Analytics to Speed New Drugs to Market
Using Investigative Analytics to Speed New Drugs to Market
 
Question 1
Question 1Question 1
Question 1
 
Sampling methods in medical research
Sampling methods in medical researchSampling methods in medical research
Sampling methods in medical research
 
Software Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking SchemeSoftware Cost Estimation Using Clustering and Ranking Scheme
Software Cost Estimation Using Clustering and Ranking Scheme
 
Sample size determination.pptx
Sample size determination.pptxSample size determination.pptx
Sample size determination.pptx
 
Introduction to Epi Info.pdf
Introduction to Epi Info.pdfIntroduction to Epi Info.pdf
Introduction to Epi Info.pdf
 
Certified Specialist Business Intelligence (.docx
Certified     Specialist     Business  Intelligence     (.docxCertified     Specialist     Business  Intelligence     (.docx
Certified Specialist Business Intelligence (.docx
 
Chapter 18
Chapter 18Chapter 18
Chapter 18
 
Sample size
Sample sizeSample size
Sample size
 
Cenduit_Whitepaper_Forecasting_Present_14June2016
Cenduit_Whitepaper_Forecasting_Present_14June2016Cenduit_Whitepaper_Forecasting_Present_14June2016
Cenduit_Whitepaper_Forecasting_Present_14June2016
 
Research Method for Business chapter 11-12-14
Research Method for Business chapter 11-12-14Research Method for Business chapter 11-12-14
Research Method for Business chapter 11-12-14
 

Recently uploaded

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 

Recently uploaded (20)

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 

Pertemuan 3 & 4 - Pengendalian Mutu Statistik.pptx

  • 2. The Survey Process and Data Quality
  • 3.  Overview of the Survey Process  Data Quality and Total Survey Error  Decomposing Nonsampling Error into Its Component Parts  Gauging the Magnitude of Total Survey Error  Mean Squared Error  Illustration of the Concepts Topics to be Discussed
  • 4. There is an insatiable need today for timely and accurate information in government, business, education, science, and in our personal lives. The word survey is often used to describe a method of gathering information from a sample of units, a fraction of the persons, households, agencies, and so on, in the population that is to be studied.
  • 5. Figure 1. Survey process. The planning stages of the survey process are largely iterative. At each stage of planning, new information may be revealed regarding the feasibility of the design. The research objectives, questionnaire, target population, sampling design, and implementation strategy may be revised several times prior to implementing the survey.
  • 6.  Determining the Research Objectives.  Defining the Target Population.  Determining the Mode of Administration.  Developing the Questionnaire.
  • 7. Table 1. Correspondence Between Research Questions and Survey Questionsa
  • 8.  Designing the Sampling Approach.  Developing Data Collection and Data Processing Plans.  Collecting and Processing the Data.  Estimation and Data Analysis.
  • 9. To many users of survey data, data quality is purely a function of the amount of error in the data. If the data are perfectly accurate, the data are of high quality. If the data contain a large amount of error, the data are poor quality. For estimates of population parameters (such as means, totals, proportions, correlations coefficients, etc.), essentially the same criteria for data quality can be applied.
  • 10. Assuming that a proper estimator of the population parameter is used, an estimate of a population parameter is of high quality if the data on which the estimate is based are high quality. Conversely, if the data themselves are of poor quality, the estimates will also be poor quality.
  • 11. However, in the case of estimates, the sample size on which the estimates are based is also important determinant of quality. Even if the data are high quality, an estimate based on too few observations will be unreliable and potentially unusable.
  • 12. Thus, the quality of an estimator of a population parameter is a function of the total survey error, which includes components of error that arise solely as result of drawing a sample rather than conducting a complete census called sampling error components, as well as other components that are related to the data collection and processing procedures, called nonsampling error components.
  • 13. Figure 2. Total survey error. Total survey error can be partitioned into two types of components: sampling error and nonsampling error.
  • 14. Consider, a very simple survey aimed at estimating the average annual income of all workers in a small community of 5,000 workers. Thus, the population parameter in this case is the average income over all 5,000 workers. Suppose the survey designer determines that a sample of 400 employees drawn at random from the community population should be sufficient to provide an adequate estimate of the population average income.
  • 15. Suppose that average annual income for the persons in the sample is $32,981. Finally, suppose that the actual population average annual income for this community is $35,181. In this case, the total survey error in the estimate is: $32,981 - $35,181 = - $2,200.
  • 16. One major reason that a survey estimate will not be identical to the population parameter is sampling error, the difference between the estimate and the parameter as a result only taking a sample of 400 workers in the community instead of the entire population of 5,000 workers (i.e. a complete census).
  • 17.  The respondent.  The interviewer.  Refusals to participate.  Data entry. The cumulative effect of these errors constitutes the nonsampling error component of the total survey error. In our example, nonsampling errors could arise from the following sources:
  • 18. The objective of any survey design is to minimize the total survey error in the estimates subject to the constraints imposed by the budget and other resources available for the survey.
  • 19. Table 2. Five Major Sources of Nonsampling Error and Their Potential Causes
  • 20. Figure 3. Range of estimates produced by a sample survey subject to sampling error, variable error, and systematic error. Shown is the range of possible estimates of average income for a sample of size 400. The range is much smaller with sampling error only, and when systematic nonsampling error is introduced, the range of possible estimates may not even cover the true value.
  • 21. The development of a survey design involves many decisions that can affect the total survey estimate. Making the correct design decision requires the simultaneous consideration of many quality, cost, and timeliness factors and choosing the combination of design elements that minimizes the total survey error while meeting the budget and schedule constraints.
  • 22. An important aid in the design process is a means of quantifying the total error in a survey process. In this way, alternative survey designs can be compared not only on the basis of cost and timeliness, but also in terms of their total survey error.
  • 23. There are many ways of quantifying the total survey error for a survey estimate. However, one measure that is used most often in the survey literature is the total mean squared error (MSE). Each estimate that will be computed from the survey data has a corresponding MSE which reflects the effects on the estimate of all sources of error.
  • 24. The MSE gauges the magnitude of the total survey error, or more precisely, the magnitude of the effect of total survey error on the particular estimate of interest. A small MSE indicates that total survey error is also small and under control. A large MSE indicates that one or more sources of error are adversely affecting the accuracy of the estimate.
  • 25. The primary objective of survey design is to minimize the Mean Squared Errors (MSEs) of the key survey estimates while staying within budget and on schedule. This information is important since it can influence the way the data are used as well as the way the data are collected in the future should the survey ever be repeated.
  • 26.  Variable Errors  Systematic Errors  Effects of Systematic and Variable Errors on Estimates  Error Sources Can Produce Both Systematic and Variable Errors
  • 27. Figure 4. Two types of nonsampling error. All nonsampling error sources produce variable error, systematic error, or both. Systematic error leads to biased estimates; variable error affects the variance of an estimator.
  • 28. Figure 5. Systematic and variable error expressed as targets. If these targets represented the error in two survey designs, which survey design would you choose? The survey design in part (a) produces estimates having a large variance & a small bias, while the one in part (b) produces estimates having a small variance & a large bias.
  • 29. The mean squared error (MSE) of the marksman’s aim is the average squared distance between the hits on the target and the bull’s-eye. The MSE of a survey estimate is the average squared difference between the estimates produced by many hypothetical repetitions of the survey process (corresponding to the hits on the target) and the population parameter value (corresponding to the bull’s-eye).
  • 30.
  • 31.
  • 32. Figure 6. Truth is unknown. Determining the better survey process when the true value of the population parameter is unknown is like trying to judge the accuracy of two marksmen without having a bull’s-eye. All that is known about the sets of survey estimates is that the survey design in part (a) produces a large variance and the survey design in part (b) produces a small variance.
  • 33. Table 3. Computation of the Mean Squared Error Using Method 1a
  • 34. Table 4. Computation of the Mean Squared Error Using Method 2a
  • 35.
  • 36. Figure 7. Comparison of the total mean squared error for three survey designs. Design A is preferred since it has the smallest total error, even though sampling error is largest for this design.
  • 37.  The mean squared error is the sum of the total bias squared plus the variance components for all the various sources of error in the survey design.  Costs, timeliness, and other quality dimensions being equal for competing designs, the optimal design is the one achieving the smallest mean squared error.  The contributions of nonsampling error components to total survey error can be many times larger than the sampling error contribution.  Choosing a survey design on the basis of sampling error or variance alone may lead to a suboptimal design with respect to total data quality.

Editor's Notes

  1. As an example, consider two survey designs, design A and design B, and suppose that both designs meet the budget and schedule constraints for the survey. However, for the key characteristic to be measured in the study (e.g., the income question), the total error in the estimate for design A is + or - $3780, while the total error in estimate for design B is only + or - $1200. Since design B has a much smaller error, this is the design of choice, all other things being equal. In this way, having a way of summarizing and quantifying the total error in a survey process can provide a method for choosing between competing designs.