1. COMPLEX SAMPLING
Siti Haslinda Mohd Din
Statistician
Institute for Public Health
2. JUST A MINUTE
One day some papers catch fire in a wastebasket
in the Dean’s office. Luckily, a physicist, a chemist
and a statistician happen to be nearby.
Naturally, they are you to help.
“What rush in doing????”
the Dean demand
The physicist whips out a notebook and starts
to work on how much energy would have to be
removed from the fire in order to stop the
combustion.
Then a chemist works statistician replies,
To which the on determining
which solve a problem of this magnitude, you need a
“To reagent would have to be added to
LARGE SAMPLE SIZE.”
the fire to prevent oxidation.
While they doing this, the statistician is
setting fires to all the other wastebaskets
in the adjacent offices.
http://www.amstat.org/publications/ise/v10n3/friedman.html
3. Survey Sampling
• The subject of survey sampling is
concern with the process of
selecting members of the
population to be included in the
survey and the estimation.
• A sample design needs to be
developed to meet the survey
objectives.
4. Properties of complex sampling
A given complex sample can have some or all of the following features:
STRATIFICATION
+
CLUSTER
+
MULTISTAGE
5. Properties of complex sampling
Stratification
- Selecting samples independently within non-
overlapping subgroups of the population, or
strata.
For example,
strata may be socioeconomic groups, job
categories, age groups, or ethnic groups.
- With stratification, you can ensure
• adequate sample sizes for subgroups of
interest,
• improve the precision of overall
estimates, and
• use different sampling methods from
stratum to stratum.
6. Properties of complex sampling
Clustering.
• Involves the selection of groups of
sampling units, or clusters.
For example, clusters may be schools,
hospitals, or geographical areas, and
sampling units may be students, patients,
or citizens.
• Clustering is common in multistage
designs and area (geographic) samples.
7. Properties of complex sampling
Multiple stages.
•In multistage sampling,
– a first-stage sample based on clusters.
– a second-stage sample by drawing subsamples from
the selected clusters.
– If the second-stage sample is based on
subclusters, then add a third stage to the sample.
For example:
• first stage of a survey, a sample of cities
• from the selected cities, households could
be sampled.
• Finally, from the selected households,
individuals could be polled.
8. Example : South Zone
Johor STRATIFIED
Negeri Sembilan Melaka
STRATIFIED
Urban Rural Urban Rural
STRATIFIED Urban Rural
STRATIFIED
eb eb
EB EB eb eb eb eb eb eb eb
CLUSTER CLUSTER CLUSTER CLUSTER CLUSTER CLUSTER
EB eb eb eb eb eb eb
Not selected Selected enumeration
enumeration block block
9. Sampling Weight
• Uniform in SRS but varies in unequal
probabilities sampling
• Sampling weights are automatically
computed while drawing a complex sample
and ideally correspond to the “frequency”
that each sampling unit represents in the
target population. Therefore, the sum of
the weights over the sample should
estimate the population size.
10. Sampling Weight
• Used to compensate for
– Unequal probabilities of selection
– Nonresponse adjustment (a unit that fails to
respond)
– In post stratification to adjust weighted
sample distribution for certain variables (eg
age and sex) to make them conform to the
known population distribution.
To improved the precision of sample
estimates and to compensate for
noncoverage and nonresponse
11. Basic weighting approach
• Suppose sample element i was
selected with probability ∏i.
• Then the sample element i
represents 1/∏i elements in the
population.
W = 1/∏i
• Example : a sample element selected
with probability 1/10 represents 10
elements in the population
12. Weighting for Unequal
Probabilities of Selection
• Consider an EPSEM (Equal Probability of
Selection Method) sample of 6 household
selected from 240 household. One adult is
selected at random in each selected household.
• The probability of selection of the βth adult is
– P(αβ) = P(α).P(β|α)=f.1/Bα=1/wα
– Which Bα = number of adults in household α
if f=6/240 = 1/40 and Bα=3
then P(αβ) = (1/40)X(1/3)=1/120
Therefore each adults represents 120 adults
from population; W=120
13. Non response
Sources of failure to obtain observations
(responses, measurements) on some elements
selected and designated for the sample;
•Not at homes
•Refusals
•Incapacitated or inability
•Not found
•Lost schedules
NR refer to eligible respondents and should exclude the
ineligibles but include vacant dwellings, household without
the specified kinds of population elements.
NR rate computed for responses and
nonresponses among the eligible only.
14. Disposition of the sample
with components of Total Units
response and nonresponse (Initial Sample)
(1)
Resolved Unresolved
(2) (3)
Estimated Units Estimated Units
Units in Scope Units Out of Scope
in Scope Out of Scope
(4) (5)
(3A) (3B)
Respondents Nonrespondents Nonexistent Units Response rate
(6) (7) (8) = [6]/([4]+[3A])
Units Temporarily
Refusal Conversions Refusals
(11) (13)
Out of Scope Non response rate
(9) = ([7]+[3A])/([4]+[3A])
Units Permanently
Other Respondents Noncontacts
Out of Scope
(12) (14)
(10) Estimated Units in scope
[3A] = [4]/[2]X[3]
Other
Nonrespondents
(15)
Adapted from Hidiroglou et al (1993)
15. Weighting for Non response
• Compute weighted response rates in
subgroups of the sample.
• Use the inverse of the subgroup response
rates for non-response adjustment
• The weighted response rate=
Weighted # completed interviews with eligible elements
Weighted # eligible elements in sample
• Exclude empty dwellings, destroyed dwellings, addresses
that are not dwellings and ineligible elements
W2 = 1/ response rate
= nh / nh’ nh = # of sample response
nh’ = # of actual response
16. Total weight
• W = W1 X W2
W1 = weight for unequal selection probabilities
W2 = weight for non-response
17. Weighting for Post Stratification
• The weighted sample distribution conform
to a known population distribution.
• If known population of female of age 25-
64 and stay in North area are 12,800,100
where as total weighted sampled are
11,325,553.
• Therefore, the post stratification weight:
W = 12,800,100/11,325,553
= 1.13
W3 = # of population of specific category
# of weighted sampled of the specific category
18. Total weight
• W = W1 X W2 X W3
W1 = weight for unequal selection probabilities
W2 = weight for non-response
W3 = weight for post stratification
19. Variance Estimation
• Linearization
– Taylor Series approximation
(Wolter 1985)
• Best for simple statistics eg weighted mean
(Frankel,1971)
• Replication (Resampling method)
– Balanced Repeated Replication (BRR)
– Jackknife estimation
(Kish & Frankel 1974; Krewski and Rao 1981; Kovar, Rao
and Wu 1988; Rao, Wu, and Yue 1992; Shao 1996)
• Maximum-likelihood estimates (Brillinger, 1964)
• Best for complex statistics like regression
coefficients (Frankel,1971)
20. Available Software
• Stata
• SAS
• SUDAAN (Research Triangle Institute)
• WesVar
• SPSS
• NASSTIM&NASSVAR
• etc
21. Comparison proportion of smoking
pregnant mother with years of schooling
Years of schooling Weighted proportion Unweighted proportion
< 12 years 0.315 ± 0.010 0.328 ± 0.007
12 years 0.373 ± 0.012 0.332 ± 0.008
> 12 years 0.202 ± 0.011 0.217± 0.008
Data source : National Maternal and Child Survey 1988,US
22. Comparison of the highest prevalence
by states and gender
Prevalens (%)
States SPSS STATA
Male Female Male Female
Johor 26.97 29.62 25.39 28.75
Kedah 20.36 28.39 19.63 27.12
Kelantan 19.00 27.09 16.08 24.39
Melaka 29.06 34.84 29.67 33.99
N. Sembilan 30.99 34.56 28.40 34.18
Pahang 26.27 37.48 24.06 39.02
P. Pinang 24.81 28.80 24.40 27.09
Source : National Perak 27.21 31.96 26.58 31.02
Health Morbidity
Survey 1996 Perlis 24.91 35.98 22.49 35.29
Selangor 25.65 28.66 25.26 26.73
Sarawak 21.46 26.73 17.41 28.18
Sabah 22.84 26.51 18.28 25.80
Terengganu 26.58 35.17 33.75 32.17
WPKL 30.80 29.94 30.29 29.39
23. The difference based on the highest
prevalence of obesity among adults in Kedah
by gender and ethnicity
Prevalens
Gender Ethnic S.E (95% CI)
Without 32.35
Female India 5.68 (21.22,43.48)
weight
29.87
With weights Female Cina 4.54 (20.98,38.76)
Source : National Health Morbidity Survey 1996
24. Things to be considered if a design-
based inference approach is chosen
• What is the nature of the sample design? Was is a
stratified multistage sample design used? Was is a
cluster sample design used? Were unequal prob. of
selection applied?
• Were there adjustments for nonresponse or coverage
errors? Is there a weight or several weights that must
be applied when different parts of the sample are
analyzed?
• Are there important measurement issues that
could affect survey analyses? Is item
nonresponse an important problem for some
variables?
• How can the results be interpreted, and
what kind of inference are appropriate in
view of the complex survey design?
25. Steps required for performing
a design-based analysis
Paul S. Levy and Stanley Lemeshow (1999)
• Identify the following elements of the sample
design:
• Stratification
• Clustering
• Population sizes required for determination of finite
population correction
• Determine the sampling weight
• Determine a final sampling weight; nonresponse, post
stratification
• Ensure data required for an appropriate design-
based analysis
• Determine the procedure and the set of commands
for performing the required analysis
• Run the analysis and carefully interpret findings
26. Further reading
• C.J. Skinner, D.Holt, T.M.F.Smith, 1989, Analysis of Complex
Surveys, New York: John Wiley and Sons.
• P.S. Levy, S.Lemeshow. 1999, Sampling of Populations; Methods
and Applications,, 3rd Ed.,John Wiley & Sons.
• Cochran, W. G. 1977. Sampling Techniques. New York: John Wiley
and Sons.
• Kish, L. 1965. Survey Sampling. New York: John Wiley and Sons.
• Kish, L. 1987. Statistical Design for Research. New York: John
Wiley and Sons.
• Murthy, M. N. 1967. Sampling Theory and Methods. Calcutta,
India: Statistical Publishing Society.
• E.L.Korn, B.I.Graubard. Examples of Differing Weighted and
Unweighted Estimates From a Sample Survey, The American
Statistician, Aug 1995, 49, No.3, 291-295.
• E.S.Lee, R. N. Fourthofer, R.J. Lorimor. Analysis of
Complex Sample Survey Data, Problem and Startegies,
Sociological Methods & Research , Aug-Nov. 1986,15,69-
100.