Micro-Scholarship, What it is, How can it help me.pdf
seminar1!$%^*((^$#@$^@%@%@%$%@%@$@%$%$%@%@
1. Surya Prakash Tripathi
M. Sc. (Agricultural Statistics)
Roll No.- 21601
ICAR - Indian Agricultural Statistics Research Institute
Library Avenue, Pusa, New Delhi – 110012
COURSE SEMINAR (STAT 591)
OPTIMAL STRATIFICATION USING
WEIBULLDISTRIBUTED AUXILIARY
INFORMATION
1
4. Introduction
Sampling is the process in which a fraction of the total population which is a true
representation of the population called a sample is taken into consideration and is
used further for drawing inferences about the characteristics of the population.
Sampling theory has developed into a widely used method for understanding and
analyzing large blocks of information to know the meaningful pattern and trends.
An optimum sample size can sufficiently determine the true feature of a much
larger population.
When the population is homogeneous with respect to the characteristics under
study then simple random sampling is used but when the population is
heterogeneous then stratified sampling is used.
4
5. Introduction Contd…
In stratified sampling the heterogeneous population is divided into smaller groups
called strata which are homogeneous with respect to characteristics under study
and also strata are formed in such a way that heterogeneity is minimum within a
stratum and maximum between the stratum.
Stratified sampling has several advantages such as high precision of the
estimates, administrative convenience, obtaining a full cross-section of the
population, substantial gain in efficiency moreover, it provides the estimates of
not only the population but also of subpopulation (stratum).
Stratified sampling plays an important role in health surveys for estimating the
prevalence of diseases, in the discipline of business and sciences and in many
other parameter estimations. 5
7. Objective
In most of the cases surveyors stratify the population according to their
convenience such as geographical or administrative reasons, provinces, districts
or natural criteria such as gender and age.
However, this method of stratification is not always a reasonable criterion
because the stratum so formed may not be internally homogenous with respect to
the variable of interest.
Hence there is need to find the optimal stratum boundary that can maximize the
precision of the estimate. There are various methods available for obtaining OSB
when the frequency distribution of the variable is known but there is need to
minimize the stratum variance 𝜎ℎ
2
as far as possible.
7
9. Overview of solution approach
Here, an efficient method of constructing an optimum stratum boundary (OSB)
for determining optimal stratum width and optimal sample size is discussed.
In practical cases, it is found difficult to obtain information about the variable of
interest before the conduct of the survey and hence an auxiliary variable which is
used for that purpose which is accessible from past surveys.
Auxiliary variable in this approach followed the Weibull distribution and is
linearly related to the main variable.
9
10. Overview of solution approach contd…
The problem of stratification is framed as a mathematical programming problem
(MPP) and emphasizes on minimization of variance of estimated population
parameters under Neyman allocation.
The dynamic programming procedure used in this problem resulted in remarkable
gains in the precision of the estimates of the population characteristics.
The dynamic programming technique used the Bellman principle of optimality to
solve the formulated MPP, which is a multistage decision problem.
10
12. Methodology
Let the population be stratified into L strata based on an auxiliary variable x,
when the estimation of the mean of a study variable y is of interest. If a simple
random sample of size nh is to be drawn from hth stratum with sample mean 𝑦ℎ;(h
= 1, 2, ..., L), then the stratified sample mean, 𝑦𝑠𝑡 , is given by
𝑦𝑠𝑡 =
ℎ=1
𝐿
𝑊ℎ𝑦ℎ
Under the Neyman allocation the formula for variance of 𝑦𝑠𝑡is given by
𝑣 𝑦𝑠𝑡 𝑁 =
ℎ=1
𝐿
𝑊ℎ𝜎ℎ𝑦
2
𝑛
−
1
𝑁
ℎ=1
𝐿
𝑊ℎ𝜎ℎ𝑦
2
12
13. Methodology contd…
But when the finite population correction factor is ignored then the variance
becomes.
𝑉 𝑦𝑠𝑡 =
ℎ=1
𝐿
𝑊ℎ𝜎ℎ𝑦
2
𝑛
Where 𝑤ℎ and 𝜎ℎ𝑦
2
are stratum weight and stratum variance in the hth stratum; h =
1, 2, ..., L respectively and n is total sample size which was already determined.
14. Since it is already considered that the study variable has the regression model of
the form
𝑦 = 𝜆 𝑥 + 𝜖
Where 𝜆 𝑥 is a linear or non-linear function of x and 𝜀 is an error term such that
𝐸 𝜖 ∣ 𝑥 = 0 and 𝑣 𝜖 ∣ 𝑥 = 𝜙 𝑥 > 0 for all x.
Methodology contd…
14
15. Now under model equation, the stratum means 𝜇ℎ𝑦 and stratum variance 𝜎ℎ𝑦
2
of
y can be expressed as
𝜇ℎ𝑦= 𝜇ℎ𝜆
If 𝜆 and 𝜖 are uncorrelated then the stratum variance can be expressed as
𝜎ℎ𝑦
2
= 𝜎ℎ𝜆
2
+ 𝜎ℎ𝜖
2
Where 𝜎ℎ𝜖
2
is the variance of the 𝜀 in the hth stratum and 𝜎ℎ𝜆
2
denotes the variance
of 𝜆 𝑥 in the hth stratum.
Where 𝜇ℎ𝜆 and 𝜇ℎ𝜙 are the expected values of function 𝜆 𝑥 and 𝜙 𝑥 .
Methodology contd…
15
𝜎ℎ𝑦
2
= 𝜎ℎ𝜆
2
+ 𝜇ℎ𝜙
16. Now the frequency density function of the auxiliary variable x is defined which
is used for the stratification as 𝑓 𝑥 ; 𝑎 ≤ 𝑥 ≤ 𝑏 and for determining the strata
boundaries, the range 𝑑 = 𝑏 − 𝑎 is divided into ( 𝐿 − 1) intermediate points
𝑎 = 𝑥0 ≤ 𝑥1 ≤ 𝑥2 ≤, ⋯ , ≤ 𝑥𝐿−1 ≤ 𝑥𝐿 = 𝑏
Since it is already known that to minimize the stratum variance under the
Neyman allocation there is need to minimize the numerator i.e.
ℎ=1
𝐿
𝑊ℎ𝜎ℎ𝑦
Which is same as minimizing
ℎ=1
𝐿
𝑊ℎ 𝜎ℎ𝜆
2
+ 𝜇ℎ𝜙
Methodology contd…
16
17. Now if 𝑓 𝑥 , 𝜆 𝑥 𝑎𝑛𝑑 𝜙 𝑥 are known and integrable functions, then the
quantities 𝑊ℎ, 𝜎ℎ𝑦
2
𝑎𝑛𝑑 𝜇ℎ𝜙 can be expressed as the function of the boundary
points 𝑥ℎ 𝑎𝑛𝑑 𝑥ℎ−1 using the given expression.
𝑊ℎ =
𝑥ℎ−1
𝑥ℎ
𝑓 𝑥 ⅆ𝑥
𝜎ℎ𝜆
2
=
1
𝑊ℎ
𝑥ℎ−1
𝑥ℎ
𝜆2 𝑥 𝑓 𝑥 ⅆ𝑥 − 𝜇ℎ𝜆
2
Methodology contd…
17
18. 𝑢ℎ𝜙 =
1
𝑊ℎ
𝑥ℎ−1
𝑥ℎ
𝜙 𝑥 𝑓 𝑥 ⅆ𝑥
𝑢ℎ𝜆 =
1
𝑊ℎ
𝑥ℎ−1
𝑥ℎ
𝜆 𝑥 𝑓 𝑥 ⅆ𝑥
Here 𝑥ℎ 𝑎𝑛𝑑 𝑥ℎ−1 are the boundary points of the given stratum.
Hence, the objective function as the function of the boundary points 𝑥ℎ, 𝑥ℎ−1
only is obtained.
Therefore, to minimize the variance under the function
𝜙ℎ 𝑥ℎ, 𝑥ℎ−1 = 𝑊ℎ𝜎ℎ𝑦 = 𝑊ℎ 𝜎ℎ𝜆
2
+ 𝜇ℎ𝜙
Methodology contd…
18
19. The solution of the optimization problem is to be determined under which for
obtaining the stratum boundaries there is a need to find 𝑥1, 𝑥2,⋅ ⋯ ⋯ , 𝑥𝐿 such
that,
ℎ=1
𝐿
𝜙ℎ 𝑥ℎ, 𝑥ℎ−1
is minimum with subject to 𝑎 = 𝑥0 ≤ 𝑥1 ≤ 𝑥2 ≤, ⋯ , ≤ 𝑥𝐿−1 ≤ 𝑥𝐿 = 𝑏
Further the length of each stratum is defined as
𝑙ℎ = 𝑥ℎ − 𝑥ℎ−1; ℎ = 1,2, . . . . . , 𝐿
where 𝑙ℎ ≥ 0 denotes the range or the width of the hth stratum.
Methodology contd…
19
20. Obviously, with this definition of 𝑙ℎ, the range of the distribution, d = b - a, is
expressed as a function of stratum width as
ℎ=1
𝐿
𝑙ℎ =
ℎ=1
𝐿
𝑥ℎ − 𝑥ℎ−1 = 𝑏 − 𝑎 = 𝑥𝐿 − 𝑥0 = 𝑑
The hth stratification point xh; h = 1, 2, . . ., L is then expressed as
𝑥ℎ = 𝑥0 +
𝑖=1
ℎ
𝑙𝑖
𝑜𝑟 𝑥ℎ = 𝑥ℎ−1 + 𝑙ℎ
Methodology contd…
20
21. Considering range of distribution as a function of stratum width as constraint the
problem of optimization can be treated as any equivalent problem of
determining optimum strata width (OSW), 𝑙1, 𝑙2 , . . . . . , 𝑙𝐿and is expressed as the
following Mathematical Programming Problem;
Minimize ℎ=1
𝐿
𝜙ℎ 𝑙ℎ, 𝑥ℎ−1
Subject to ℎ=1
𝐿
𝑙ℎ = 𝑑
And 𝑙ℎ ≥ 0; ℎ = 1,2, . . . . . , 𝐿
Methodology contd…
21
22. Initially, x0 is known. Therefore, the first term, that is, 𝜙1 𝑙1, 𝑥0 in the objective
function of the MPP Equation is a function of l1 alone. Once l1 is known, the
second term 𝜙2 𝑙2, 𝑥1 will become a function of l2 alone and so on. Due to the
special nature of functions, the MPP Equation may be treated as a function of lh
alone and can be expressed as
Minimize ℎ=1
𝐿
𝜙ℎ 𝑙ℎ
Subject to ℎ=1
𝐿
𝑙ℎ = 𝑑
And 𝑙ℎ ≥ 0; ℎ = 1,2, . . . . . , 𝐿
Methodology contd…
22
24. Dynamic programming approach
Dynamic programming determines the optimum solution of a multi-variable
problem by decomposing it into stages, each stage compromising a single
variable sub-problem.
A dynamic programming model is basically a recursive equation based on
Bellman’s principle of optimality.
This recursive equation links the different stages of the problem in a manner
which guarantees that each stage’s optimal feasible solution is also optimal and
feasible for the entire problem.
24
25. This is a multistage problem for determining Optimal Stratum Boundary for
auxiliary variable following Weibull distribution.
The problem is formulated as MPP’s and solved using dynamic programming
approach.
The formulated MPP minimize the variance of estimated population parameter
under different allocation subjected to the restriction that the sum of the widths
of all the strata is equal to the total range of distribution of the variable.
Dynamic programming approach contd…
25
26. Since Neyman allocation is considered hence the subproblem of the optimization
for first k< L strata becomes
Minimize ℎ=1
𝑘
𝜙ℎ 𝑙ℎ
Subject to ℎ=1
𝑘
𝑙ℎ = 𝑑𝑘
And 𝑙ℎ ≥ 0; ℎ = 1,2, . . . . . , 𝐿𝑘
where dk < d is the total width available for division into k strata or the state
value at stage k.
Note that dk = d for k = L
Dynamic programming approach contd…
26
27. 𝑑1 = 𝑙1 = 𝑑2 − 𝑙2
𝑑𝑘−1 = 𝑙1 + 𝑙2 + ⋯ + 𝑙𝑘−1 = 𝑑𝑘 − 𝑙𝑘
The transformation functions are given by
Let ф𝑘 𝑑𝑘 denote the minimum value of the objective function of Equation,
that is
ф𝑘 𝑑𝑘 = 𝑚𝑖𝑛
ℎ=1
𝑘
𝜙ℎ𝑙ℎ |
ℎ=1
𝑘
𝑙ℎ = 𝑑𝑘, 𝑙ℎ ≥ 0; ℎ = 1,2, . . . . , 𝑘 𝑎𝑛𝑑 1 ≤ 𝑘 ≤ 𝐿
Dynamic programming approach contd…
27
𝑑𝑘 = 𝑙1 + 𝑙2 + ⋯ + 𝑙𝑘
28. With the above definition of ф𝑘 𝑑𝑘 , the MPP Equation (18) is equivalent to
finding recursively by finding ф𝑘 𝑑𝑘 for k = 1, 2, . . ., L and 0 ≤ dk ≤ d. It can be
written as
ф𝑘 𝑑𝑘 = 𝑚𝑖𝑛 𝜙𝑘 𝑙𝑘 +
ℎ=1
𝑘−1
𝜙ℎ 𝑙ℎ ∕
ℎ=1
𝑘−1
𝑙ℎ = 𝑑𝑘 − 𝑙𝑘; 𝑙ℎ ≥ 0; ℎ = 1,2, . . . . . , 𝑘
For a fixed value of lk; 0 ≤ lk ≤ dk,
ф𝑘 𝑑𝑘 = 𝜙𝑘𝑙𝑘 + 𝑚𝑖𝑛
ℎ=1
𝑘−1
𝜙ℎ(𝑙ℎ) ∕
ℎ=1
𝑘−1
𝑙ℎ = 𝑑𝑘 − 𝑙𝑘; 𝑙ℎ ≥ 0; ℎ = 1,2, . . . . . , 𝑘
Dynamic programming approach contd…
28
29. Using Bellman’s principle of optimality, A forward recursive equation of the
dynamic programming technique is written as
ф𝑘 𝑑𝑘 = 𝑚𝑖𝑛
0≤𝑙𝑘≤𝑑𝑘
𝜙𝑘𝑙𝑘 + ф𝑘−1 𝑑𝑘 − 𝑙𝑘
For the first stage, that is, for k = 1;
ф1 𝑑1 = 𝜙1 𝑑1 = 𝑙1
∗
= 𝑑1
Dynamic programming approach contd…
29
30. where l1
* = d1 is the optimum width of the first stratum. The relations Equations
are solved recursively for each k = 1, 2, . . ., L and 0 ≤ dk ≤ d, and ф𝐿 𝑑 is
obtained.
From ф𝐿 𝑑 the optimum width of Lth stratum, lL
*, is obtained.
From ф𝐿 (d - lL
*) the optimum width of (L-1)th stratum, lL-1
*, is obtained and so
on until l1
*, optimum width of 1st stratum, is obtained.
Dynamic programming approach contd…
30
32. Weibull distribution
The Weibull distribution is a two-parameter family of continuous probability
distributions. Because of its versatility in fitting of a variety of distributions, it is
one of the most widely used distributions in applied statistics.
If an auxiliary variable x follows the Weibull distribution on the interval [x0, xL],
its two-parameter probability density function with a state space x ≥ 0 is given by
𝑓 𝑥; 𝜃, 𝑟 =
𝑟
𝜃
𝑥
𝜃
𝑟−1
ⅇ− 𝑥∕𝜃 𝑟
, 𝑥 ≥ 0
0, 𝑥 < 0
32
33. where r > 0 is the shape parameter and θ > 0 is the scale parameter of the
distribution.
The Weibull distribution is related to a number of other probability distributions;
In particular, Weibull distribution is reduced into Exponential distribution with
parameter
1
𝜃
,when r = 1
Weibull distribution contd…
33
𝑓 𝑥;
1
𝜃
=
1
𝜃
ⅇ− 𝑥 𝜃 1
, 𝑥 ≥ 0
0, 𝑥 < 0
34. If the quantity X is a "time-to-failure", the Weibull distribution gives a
distribution for which the failure rate is proportional to a power of time.
A value of (r<1) indicates that the failure rate decreases over time.
A value of (r=1) indicates that the failure rate is constant over time.
A value of (r>1) indicates that the failure rate increases with time.
Weibull distribution contd…
34
35. Derivation of weight
Given,
𝑊ℎ =
𝑥ℎ−1
𝑥ℎ
𝑓 𝑥 𝑑𝑥
Substituting the value of probability density function of the Weibull distribution
and integrating it over the given interval, the weight of the stratum was obtained as
Now, Generating the expression of 𝑦𝑠𝑡 ;
𝑦𝑠𝑡 =
ℎ=1
𝐿
ⅇ
−
𝑥ℎ−1
𝜃
𝑟
− ⅇ
−
𝑥ℎ
𝜃
𝑟
𝑦ℎ
𝑊ℎ = ⅇ
−
𝑥ℎ−1
𝜃
𝑟
− ⅇ
−
𝑥ℎ
𝜃
𝑟
35
37. Estimating the linear regression model
The health data of size N =724 obtained from the 2004 Fiji National Nutrition
Survey on “Micronutrient Status of Women in Fiji” is taken’
The data in this problem had two characteristics the level of iron and the level of
haemoglobin for each woman.
Survey is based to focus on iron deficiency anaemia to be conducted in the
country.
Thus stratified random sampling is used for collecting the sample and taking
haemoglobin (y) as a variable of interest and at the same time taking the level of
iron(x) collected in some previous study as a choice for an auxiliary variable.
37
38. For this purpose, the linear regression model is fitted and following things are
observed
Source Sum of
Squares
Degree of
Freedom
Mean Sum
of Squares
f P value
Regression 461.92 1 461.92 299.95 0.000
Residual 1050.61 682 1.54
Lack of fit 236.40 204 1.16 0.68 0.890
Pure error 814.21 478 1.70
Total 1515.54 683
Estimating the linear regression model contd…
38
39. It is observed that data significantly fitted the linear regression model with iron
level (x) .
The coefficient of determination or correlation coefficient, R2 =
461.92
1512.54
=
0.3054 indicates a moderate strength of the linear relationship between the two
variables.
The table also reveals that there is no significant lack of fit in the linear
regression with p-value = 0.890. Thus, the model fits the data well and gives no
reason to consider an alternative model.
Estimating the linear regression model contd…
39
40. Predictor Coefficients SE Coeff t p -Value
α 10.9449 0.1245 87.89 0.000
β 0.114115 0.009548 11.95 0.000
Also, there the p-value for the parameters α and β shows that the parameters in
the model are highly significant
Estimating the linear regression model contd…
40
41. Iron
Haemoglobin
Also, the scatter plot for the iron versus
Hemoglobin clearly depicts the moderate positive
linear association between the two variables.
Therefore, the hemoglobin content (y) and the
iron level (x) are fairly assumed to follow a linear
regression model with the following equation
𝜆 𝑥 = 𝛼 + 𝛽𝑥
And the least-squares estimates of the parameters
are given by
𝛼 =10.9449 and 𝛽 = 0.1141
Estimating the linear regression model contd…
41
42. Estimating the distribution
Iron
To determine the distribution of our auxiliary
variable, The relative frequency histogram of
iron level (x) is constructed
It shows that the distribution of x is right-
skewed distribution that matches the Weibull
distribution.
42
Density
43. Observed Values
Expected
Weibull
Values
Weibull Q-Q Plot Of Iron , X
The probability plot (Q-Q) of x was obtained
which showed that the points clustered around
the straight line, thus the auxiliary variable is
assumed to follow the Weibull distribution.
Also, the maximum likelihood estimate (MLE)
of the parameters for Weibull distribution is
found to be
Shape, r = 2.342 and Scale,𝜃 = 13.40
Estimating the distribution contd…
43
44. Estimating the variance of the error term
It is assumed that the variance of the error term is 𝑣 𝜖 ∣ 𝑥 = 𝜙 𝑥 > 0 for all x
in the range (a, b) and the expected value of the function 𝜙 𝑥 given by 𝑢ℎ𝜙is
obtained as
𝑢ℎ𝜙 =
𝑆𝑆𝑅𝑒𝑠
𝑁 − 𝑝
= 𝑀𝑆𝑅𝑒𝑠
Where 𝑆𝑆𝑅𝑒𝑠 and 𝑀𝑆𝑅𝑒𝑠 are the sum of squares of residuals and mean square
of residuals respectively, and p is the number of parameters in the regression
model.
In the given regression model
𝜆 𝑥 = 𝛼 + 𝛽𝑥
44
46. Results
Considering the level of Haemoglobin(y) as the main variable of interest, the
minimum and the maximum values of x (iron) are 1.5 and 25.1, which shows
that the range of distribution of iron level is 23.6.
The problem is solved by dividing it into two stages (for k =1 and k≥2) using
the recurrence equations to obtain the Optimum strata widths by implementing
the dynamic programming solution procedure.
46
47. To compare the effectiveness of Dynamic programming procedure it is
compared with some of the methods available in the literature.
1. Cum 𝑓 method of Dalenius and Hodges (1959).
2. Geometric method of Gunning and Horgan (2004).
3. Lavallée-Hidiroglou method Lavallee and Hidiroglou (1988) with Kozak’s
algorithm Kozak (2004).
Results contd…
47
48. Strata OSW OSB OFV
(L)
ℎ=1
𝐿
𝑊ℎ𝜎ℎ
2 𝑙1
∗
=10.72
𝑙2
∗
= 12.88
𝑥1
∗
=12.22 1.3658
3 𝑙1
∗
= 7.79
𝑙2
∗
= 6.15
𝑙3
∗
= 9.66
𝑥1
∗
=9.29
𝑥2
∗
=15.44
1.3462
4 𝑙1
∗
= 6.22
𝑙2
∗
= 4.60
𝑙3
∗
= 4.98
𝑙4
∗
= 7.81
𝑥1
∗
=7.72
𝑥2
∗
=12.31
𝑥3
∗
=17.29
1.3384
5 𝑙1
∗
= 5.20
𝑙2
∗
= 3.78
𝑙3
∗
= 3.75
𝑙4
∗
= 4.30
𝑙5
∗
= 6.57
𝑥1
∗
=6.70
𝑥2
∗
=10.48
𝑥3
∗
=14.23
𝑥4
∗
=18.53
1.3346
The table shows the value of the
optimum stratum widths which are
obtained using the Dynamic
optimization procedure and the
corresponding value of stratum
boundaries are calculated using the
formula for the
given number of strata to be formed.
Also, the stratum variances are
calculated for the desired no. of strata
and the table depicts that variances of
the strata decrease as we increase the
desired no. of strata.
𝑥ℎ
∗
= 𝑥ℎ−1
∗
+ 𝑙ℎ
∗
𝑥ℎ
∗
= 𝑥ℎ−1
∗
+ 𝑙ℎ
∗
𝑙ℎ
∗
Results contd…
48
49. CSRF GEO L-H Kozak DP
L OSB OFV OSB OFV OSB OFV OSB OFV
2 12.12 1.366 06.14 1.404 08.1 1.384 12.22 1.366
3 9.76
15.66
1.346 03.84
09.81
1.369 05.55
09.15
1.372 09.29
15.44
1.346
4 07.40
12.12
16.84
1.339 03.03
06.14
12.41
1.353 05.55
09.15
15.55
1.342 07.71
12.31
17.29
1.338
5 6.22
9.76
13.30
18.02
1.335 02.64
04.63
08.13
14.21
1.345 05.55
09.15
12.65
17.00
1.335 6.70
10.48
14.23
18.53
1.335
The table shows the comparative study of
the already known method for calculating
the optimum stratum boundary with the
Dynamic solution procedure and reveals
that the variance of the Dynamic solution
procedure is minimum among all these
however, cum 𝑓 method gives the results
closer to the DP method
Results contd…
49
50. CSRF GEO L-H Kozak DP
L h 𝑛ℎ OFV 𝑛ℎ OFV 𝑛ℎ OFV 𝑛ℎ OFV
2 1
2
274
226
1.366 69
431
1.403 128
372
1.384 278
222
1.366
3 1
2
3
190
195
115
1.346 23
165
312
1.369 56
107
337
1.372 173
206
121
1.346
4 1
2
3
4
109
166
139
86
1.339 12
59
211
218
1.353 57
110
215
118
1.342 119
163
141
77
1.338
5 1
2
3
4
5
75
115
125
122
63
1.335 8
29
95
211
157
1.345 58
110
125
124
83
1.335 88
128
129
101
54
1.335
Results contd…
50
The table also shows the comparative
study of sample sizes which are
obtained by different methods and
again it is found that the DP method
has the minimum variance.
51. No. of Strata OSB for x OSB for y OFV of y
L (𝑥ℎ)
𝑦ℎ = 𝛼 + 𝛽 x
2 𝑥1
∗
=12.22 𝑦1
∗
= 12.34 1.366
3 𝑥1
∗
=9.29
𝑥2
∗
=15.44
𝑦1
∗
= 12.01
𝑦2
∗
= 12.71
1.346
4 𝑥1
∗
=7.72
𝑥2
∗
=12.31
𝑥3
∗
=17.29
𝑦1
∗
= 11.82
𝑦2
∗
= 12.35
𝑦3
∗
= 12.92
1.338
5 𝑥1
∗
=6.70
𝑥2
∗
=10.48
𝑥3
∗
=14.23
𝑥4
∗
=18.53
𝑦1
∗
= 11.71
𝑦2
∗
= 12.14
𝑦3
∗
= 12.57
𝑦4
∗
= 13.06
1.335
ℎ=1
𝐿
𝑊ℎ𝜎ℎ
Results contd…
51
Table shows the results obtained for optimal
stratum boundary of the main variable of
interest that is haemoglobin content in
woman (y) with the help of level of iron (x)
which serves as the auxiliary variable as the
two variables are linearly related to each
other.
The formula used for obtaining OSB for
variable y is given as 𝑦ℎ = 𝛼 + 𝛽
52. The results are obtained from all these different procedures which aimed at
minimizing the objective function values ℎ=1
𝐿
𝜙ℎ 𝑙ℎ = ℎ=1
𝐿
𝑊ℎ 𝜎ℎ𝜆
2
+ 𝜇ℎ𝜙
for L= 2,3,4,5.
The results of the sample size produced from the cum 𝑓 method are closest to
that produced from the Dynamic programming procedures whereas the other
two methods vary far from the proposed method.
Results contd…
52
53. It can be clearly observed from the table that the geometric method produces the
larger sample sizes towards the tailer stratum, thus there is significant difference
between the sample size obtained from other methods on comparison with the
proposed method.
Further, by looking at the variances for all L = 2,3,4,5 it can be seen that
Dynamic programming method produces the variance which is minimum of all
these methods and also the value of the objective function for the DP method are
very close to cum 𝑓 method.
Results contd…
53
55. Conclusion
The results showcase that the construction of strata using an auxiliary variable
that follows Weibull distribution leads to remarkable gain in precision of
estimates of the main study variable and also constructed the stratum boundaries
in such a way that variance is minimized.
Dynamic programming technique does not require any initial approximate
solution and uses an auxiliary variable and parametric assumptions in order to
understand the characteristics of the main variable.
55
56. Thus, it can be concluded that this technique performs much more efficiently in
determining the optimal sample size and optimal stratum boundary.
Further, this solution procedure is not restricted to the case where the auxiliary
variable was Weibull distributed but can be utilized for other statistical
distributions
Conclusion contd…
56
58. References
Bellman, R. E. (1957). Dynamic programming. Princeton, NJ: Princeton University
Press.
Bühler, W., and T. Deutler. (1975). Optimal stratification and grouping by dynamic
programming. Metrika, 22 (1),161–75.
De Gruijter, J. J., B. Minasny, and A. B. Mcbratney. (2015). Optimizing stratification and
allocation for design-based estimation of spatial means using predictions with error.
Journal of Survey Statistics and Methodology, 3(1),19–42.
Dalenius, T., and J. L. Hodges. (1959). Minimum variance stratification. Journal of the
American Statistical Association, 54 (285),88–101.
58
59. References
Khan, M. G., N. Sehar, and M. J. Ahsan. (2005). Optimum stratification for exponential
study variable under Neyman allocation. Journal of the Indian Society of Agricultural
Statistics, 59 (2), 146–50.
Khan, M. G. M., N. Nand, and N. Ahmad. (2008). Determining the optimum strata
boundary points using dynamic Programming. Survey Methodology,34 (2),205–14.
Reddy, K. G., & Khan, M. G. (2019). Optimal stratification in stratified designs using
Weibull-distributed auxiliary information. Communications in Statistics-Theory and
Methods, 48(12), 3136-3152.
59