5. Cluster
• A cluster is an aggregate or group, consisting
of several (nonhomogeneuos) population
elements
6. Intuition…
• Study variable: Income/ Awarness/ health status
etc
• Ghatbhogh, Rupsa,
Naihati
• PSU: Primary sampling
Unit
• Single stage sampling
Sample
Collect Information
from all individual
14. Definition…
• Cluster sampling is a method of sampling,
which consists of first selecting, at random
groups, called clusters of elements from the
population, and then choosing all of the
elements within each cluster to make up the
sample. (M. Nurul Islam)
21. Single-stage cluster sampling (equal)
• Theorem 8.1: defined mean is unbiased and
estimate the variance of mean.
(Need intra-cluster correlation discussed next
slide)
• 𝑉 𝑦𝑛 =
(1−𝑓)(𝑁𝑀−1)
n𝑀2(𝑁−1)
𝑆2
[1 + (𝑀 − 1)𝜌]
Or 𝑉 𝑦𝑛 ≈
1−𝑓
nM
𝑆2
[1 + (𝑀 − 1)𝜌]
22. Intra-cluster correlation
• The similarity of observations within a cluster
can be quantified by means of the Intracluster
Correlation Coefficient (ICC), sometimes also
referred to as intraclass correlation coefficient.
• This is very similar to the well known
Pearson’s correlation coefficient; only that we
do not simultaneously look at observations of
two variables on the same object but we look
simultaneously on two values of the same
variable, but taken at two different objects.
• Calculation like Auto-correlation (discussed)
24. Variance in terms of 𝜌
• 𝑉 𝑦𝑛 =
𝑁−𝑛
𝑁
1
n
𝑖=1
𝑁
𝑦 𝑖− 𝑌 2
𝑁−1
• Expand the squared term and relate with 𝜌
• 𝑉 𝑦𝑛 =
(1−𝑓)(𝑁𝑀−1)
n𝑀2(𝑁−1)
𝑆2
[1 + (𝑀 − 1)𝜌]
• If N large 𝑁𝑀 − 1 ≈ 𝑁𝑀 and 𝑁 − 1 ≈ 𝑁
• 𝑉 𝑦𝑛 ≈
1−𝑓
nM
𝑆2
[1 + (𝑀 − 1)𝜌]
• 𝑉 𝑦𝑛 =
1−𝑓
nM
𝑆2
[1 + (𝑀 − 1)𝜌] [simplicity ]
25. Design effect
• Variance of 𝐶𝑙𝑢𝑠𝑡𝑒𝑟 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔
• 𝑉 𝑦𝑛 =
1−𝑓
nM
𝑆2
[1 + (𝑀 − 1)𝜌]
• Variance of 𝑆𝑖𝑚𝑝𝑙𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔
• 𝑉 𝑦 𝑛𝑀 =
𝑁𝑀−𝑛𝑀
𝑁𝑀
𝑆2
𝑛𝑀
=
1−𝑓
nM
𝑆2
• Dividing
𝑉 𝑦 𝑛
𝑉 𝑦 𝑛𝑀
= 1 + 𝑀 − 1 𝜌 = Deff
• What is the inter pretation of Design effect?
– It’s simple, can you find it. Try your best.
26. Relationship between 𝜌, Deff and M
• 𝐷𝑒𝑓𝑓 = 1 + 𝑀 − 1 𝜌
– See its property when
– 𝜌 = 1 [Deff=M all the M values in a cluster are
equal]
– 𝑀 = 1 [SRS= cluster sampling]
– 𝜌 = 0 [cluster void
– 𝐷𝑒𝑓𝑓 = 0 or +1 find range of intra-cluster
correlation
27. Efficiency of cluster sampling
•
𝑉 𝑦 𝑛𝑀
𝑉 𝑦 𝑛
=
1
1+ 𝑀−1 𝜌
=
1
𝐷𝑒𝑓𝑓
• Observe its characteristics when
– 𝜌 > 0 Cluster sampling less efficient compared to SRS
– 𝜌 < 0 Cluster sampling more efficient compared to SRS
32. • Total number of elements
𝑀0 = 𝑖=1
𝑁
𝑀𝑖
• Total number of elements in each cluster
𝑦𝑖 = 𝑗=1
𝑀 𝑖
𝑦𝑖𝑗
• Average number of elements per cluster
𝑀 =
𝑖=1
𝑁
𝑀𝑖
N
=
𝑀0
𝑁
Single-stage cluster sampling (Unequal)
33. Single-stage cluster sampling (Unequal)
• Population mean (1)
𝑌 =
𝑖=1
𝑁
𝑗=1
𝑀 𝑖
𝑦𝑖𝑗
𝑖=1
𝑁
𝑀𝑖
=
𝑖=1
𝑁
𝑀𝑖 𝑦𝑖
𝑖=1
𝑁
𝑀𝑖
=
𝑖=1
𝑁
𝑀𝑖 𝑦𝑖
𝑀0
• Population mean (2)
𝑌𝑁 =
𝑖=1
𝑁
𝑦𝑖
𝑁
• Are they same?
34. Single-stage cluster sampling (Unequal)
• Sample mean (1)
𝑦𝑛 = 𝑖=1
𝑛
𝑦 𝑖
𝑛
Biased for 𝑌 but unbiased for 𝑌𝑁
• Sample mean (2)
• 𝑦𝑛 =
𝑁
𝑛𝑀0
𝑖=1
𝑛
𝑀𝑖 𝑦𝑖 =
1
𝑛 𝑖=1
𝑛
(
𝑀 𝑖 𝑦 𝑖
𝑀
)
This is unbiased for 𝑌
36. • Do an example
Single-stage cluster sampling (Unequal)
37. • Further study
– Cluster sampling with PPS sampling (No need right
now )
Single-stage cluster sampling (Unequal)
38. Background...
• A unit may contain too many elements to
obtain a measurement on each
• A unit may contain elements that are nearly
alike.
Multi-stage cluster sampling (Two-stage)
39. Background...
•
𝑉 𝑦 𝑛𝑀
𝑉 𝑦 𝑛
=
1
1+ 𝑀−1 𝜌
or
𝑉 𝐶𝑙𝑢𝑠𝑡𝑒𝑟
𝑉 𝑆𝑅𝑆
=
1
1+ 𝑀−1 𝜌
– What will be happen when M increase??????
• Less efficient cluster sampling
• Large cluster draw small sample
Multi-stage cluster sampling (Two-stage)
40. • Sub-sampling (two stage sampling)
• A two stage cluster is one, which is obtained
by first selecting a sample of cluster and then
selecting again a sample of elements from
each sampled cluster.
• Village → Household (subsample)
Multi-stage cluster sampling (Two-stage)
44. • 𝑦 = 𝑖=1
𝑛
𝑦𝑖 = 𝑖=1
𝑛
𝑗=𝑗
𝑚 𝑖
𝑦𝑖𝑗
• 𝑚0 = 𝑖=1
𝑛
𝑚𝑖 , 𝑚 =
𝑚0
𝑛
• Average value per second stage unit
• 𝑦𝑖 =
𝑗=𝑗
𝑚 𝑖 𝑦 𝑖𝑗
𝑚 𝑖
=
𝑦 𝑖
𝑚 𝑖
, 𝑦 =
y
𝑚0
• Average value per first-stage unit
𝑦𝑛 =
𝑦
𝑛
=
𝑖=1
𝑛
𝑗=𝑗
𝑚 𝑖 𝑦 𝑖𝑗
𝑛
Multi-stage cluster sampling (Two-stage)
45. • Number of estimator is defined (You can define
more with good properties as a researcher )
• 𝑦𝑡𝑠(1) = 𝑖=1
𝑛
𝑦 𝑖
𝑛
ordinary mean based on first
stage unit mean.
• 𝑦𝑡𝑠(2) = 𝑖=1
𝑛
𝑀 𝑖 𝑦 𝑖
𝑛 𝑀
=
𝑁
𝑀0
𝑖=1
𝑛
𝑀 𝑖 𝑦 𝑖
𝑛
based on 𝑀0
= 𝑖=1
𝑛
𝑀 𝑖 𝑦 𝑖
𝑖=1
𝑛
𝑀 𝑖
= 𝑦𝑡𝑠 Known as ratio estimator
• 𝑌𝑅 = 𝑀0
𝑖=1
𝑛
𝑀 𝑖 𝑦 𝑖
𝑖=1
𝑛
𝑀 𝑖
estimator of total
Multi-stage cluster sampling (Two-stage)
replace 𝑀0by 𝑀0 = 𝑁 𝑖=1
𝑛
𝑀 𝑖
𝑛
46. Why such Scribble functions?
• 𝑖 th cluster total= 𝑀𝑖 𝑦𝑖
• Estimator of total Y over selected n clusters
𝑖=1
𝑛
𝑀𝑖 𝑦𝑖
• Average value of Y per cluster is 𝑖=1
𝑛
𝑀 𝑖 𝑦 𝑖
𝑛
• Estimator of total Y over N clusters
N
n 𝑖=1
𝑛
𝑀𝑖 𝑦𝑖
• Total= Total frequency × mean