SlideShare a Scribd company logo
1 of 62
Surya Prakash Tripathi
M. Sc. (Agricultural Statistics)
Roll No.- 21601
ICAR - Indian Agricultural Statistics Research Institute
Library Avenue, Pusa, New Delhi – 110012
COURSE SEMINAR (STAT 591)
OPTIMAL STRATIFICATION USING
WEIBULLDISTRIBUTED AUXILIARY
INFORMATION
1
 Introduction
 Objective
 Overview of solution approach
 Methodology
 Dynamic programming approach
 Weibull distribution
2
 Estimation
 Results
 Conclusion
 References
Outline of Seminar
10-02-2023 Course Seminar
INTRODUCTION
3
Introduction
 Sampling is the process in which a fraction of the total population which is a true
representation of the population called a sample is taken into consideration and is
used further for drawing inferences about the characteristics of the population.
 Sampling theory has developed into a widely used method for understanding and
analyzing large blocks of information to know the meaningful pattern and trends.
 An optimum sample size can sufficiently determine the true feature of a much
larger population.
 When the population is homogeneous with respect to the characteristics under
study then simple random sampling is used but when the population is
heterogeneous then stratified sampling is used.
4
Introduction Contd…
 In stratified sampling the heterogeneous population is divided into smaller groups
called strata which are homogeneous with respect to characteristics under study
and also strata are formed in such a way that heterogeneity is minimum within a
stratum and maximum between the stratum.
 Stratified sampling has several advantages such as high precision of the
estimates, administrative convenience, obtaining a full cross-section of the
population, substantial gain in efficiency moreover, it provides the estimates of
not only the population but also of subpopulation (stratum).
 Stratified sampling plays an important role in health surveys for estimating the
prevalence of diseases, in the discipline of business and sciences and in many
other parameter estimations. 5
OBJECTIVE
6
Objective
 In most of the cases surveyors stratify the population according to their
convenience such as geographical or administrative reasons, provinces, districts
or natural criteria such as gender and age.
 However, this method of stratification is not always a reasonable criterion
because the stratum so formed may not be internally homogenous with respect to
the variable of interest.
 Hence there is need to find the optimal stratum boundary that can maximize the
precision of the estimate. There are various methods available for obtaining OSB
when the frequency distribution of the variable is known but there is need to
minimize the stratum variance 𝜎ℎ
2
as far as possible.
7
OVERVIEW
OF
SOLUTION APPROACH
8
Overview of solution approach
 Here, an efficient method of constructing an optimum stratum boundary (OSB)
for determining optimal stratum width and optimal sample size is discussed.
 In practical cases, it is found difficult to obtain information about the variable of
interest before the conduct of the survey and hence an auxiliary variable which is
used for that purpose which is accessible from past surveys.
 Auxiliary variable in this approach followed the Weibull distribution and is
linearly related to the main variable.
9
Overview of solution approach contd…
 The problem of stratification is framed as a mathematical programming problem
(MPP) and emphasizes on minimization of variance of estimated population
parameters under Neyman allocation.
 The dynamic programming procedure used in this problem resulted in remarkable
gains in the precision of the estimates of the population characteristics.
 The dynamic programming technique used the Bellman principle of optimality to
solve the formulated MPP, which is a multistage decision problem.
10
METHODOLOGY
11
Methodology
 Let the population be stratified into L strata based on an auxiliary variable x,
when the estimation of the mean of a study variable y is of interest. If a simple
random sample of size nh is to be drawn from hth stratum with sample mean 𝑦ℎ;(h
= 1, 2, ..., L), then the stratified sample mean, 𝑦𝑠𝑡 , is given by
𝑦𝑠𝑡 =
ℎ=1
𝐿
𝑊ℎ𝑦ℎ
 Under the Neyman allocation the formula for variance of 𝑦𝑠𝑡is given by
𝑣 𝑦𝑠𝑡 𝑁 =
ℎ=1
𝐿
𝑊ℎ𝜎ℎ𝑦
2
𝑛
−
1
𝑁
ℎ=1
𝐿
𝑊ℎ𝜎ℎ𝑦
2
12
Methodology contd…
 But when the finite population correction factor is ignored then the variance
becomes.
𝑉 𝑦𝑠𝑡 =
ℎ=1
𝐿
𝑊ℎ𝜎ℎ𝑦
2
𝑛
 Where 𝑤ℎ and 𝜎ℎ𝑦
2
are stratum weight and stratum variance in the hth stratum; h =
1, 2, ..., L respectively and n is total sample size which was already determined.
 Since it is already considered that the study variable has the regression model of
the form
𝑦 = 𝜆 𝑥 + 𝜖
 Where 𝜆 𝑥 is a linear or non-linear function of x and 𝜀 is an error term such that
𝐸 𝜖 ∣ 𝑥 = 0 and 𝑣 𝜖 ∣ 𝑥 = 𝜙 𝑥 > 0 for all x.
Methodology contd…
14
 Now under model equation, the stratum means 𝜇ℎ𝑦 and stratum variance 𝜎ℎ𝑦
2
of
y can be expressed as
𝜇ℎ𝑦= 𝜇ℎ𝜆
 If 𝜆 and 𝜖 are uncorrelated then the stratum variance can be expressed as
𝜎ℎ𝑦
2
= 𝜎ℎ𝜆
2
+ 𝜎ℎ𝜖
2
 Where 𝜎ℎ𝜖
2
is the variance of the 𝜀 in the hth stratum and 𝜎ℎ𝜆
2
denotes the variance
of 𝜆 𝑥 in the hth stratum.
 Where 𝜇ℎ𝜆 and 𝜇ℎ𝜙 are the expected values of function 𝜆 𝑥 and 𝜙 𝑥 .
Methodology contd…
15
𝜎ℎ𝑦
2
= 𝜎ℎ𝜆
2
+ 𝜇ℎ𝜙
 Now the frequency density function of the auxiliary variable x is defined which
is used for the stratification as 𝑓 𝑥 ; 𝑎 ≤ 𝑥 ≤ 𝑏 and for determining the strata
boundaries, the range 𝑑 = 𝑏 − 𝑎 is divided into ( 𝐿 − 1) intermediate points
𝑎 = 𝑥0 ≤ 𝑥1 ≤ 𝑥2 ≤, ⋯ , ≤ 𝑥𝐿−1 ≤ 𝑥𝐿 = 𝑏
 Since it is already known that to minimize the stratum variance under the
Neyman allocation there is need to minimize the numerator i.e.
ℎ=1
𝐿
𝑊ℎ𝜎ℎ𝑦
 Which is same as minimizing
ℎ=1
𝐿
𝑊ℎ 𝜎ℎ𝜆
2
+ 𝜇ℎ𝜙
Methodology contd…
16
 Now if 𝑓 𝑥 , 𝜆 𝑥 𝑎𝑛𝑑 𝜙 𝑥 are known and integrable functions, then the
quantities 𝑊ℎ, 𝜎ℎ𝑦
2
𝑎𝑛𝑑 𝜇ℎ𝜙 can be expressed as the function of the boundary
points 𝑥ℎ 𝑎𝑛𝑑 𝑥ℎ−1 using the given expression.
𝑊ℎ =
𝑥ℎ−1
𝑥ℎ
𝑓 𝑥 ⅆ𝑥
𝜎ℎ𝜆
2
=
1
𝑊ℎ
𝑥ℎ−1
𝑥ℎ
𝜆2 𝑥 𝑓 𝑥 ⅆ𝑥 − 𝜇ℎ𝜆
2
Methodology contd…
17
𝑢ℎ𝜙 =
1
𝑊ℎ
𝑥ℎ−1
𝑥ℎ
𝜙 𝑥 𝑓 𝑥 ⅆ𝑥
𝑢ℎ𝜆 =
1
𝑊ℎ
𝑥ℎ−1
𝑥ℎ
𝜆 𝑥 𝑓 𝑥 ⅆ𝑥
 Here 𝑥ℎ 𝑎𝑛𝑑 𝑥ℎ−1 are the boundary points of the given stratum.
 Hence, the objective function as the function of the boundary points 𝑥ℎ, 𝑥ℎ−1
only is obtained.
 Therefore, to minimize the variance under the function
𝜙ℎ 𝑥ℎ, 𝑥ℎ−1 = 𝑊ℎ𝜎ℎ𝑦 = 𝑊ℎ 𝜎ℎ𝜆
2
+ 𝜇ℎ𝜙
Methodology contd…
18
 The solution of the optimization problem is to be determined under which for
obtaining the stratum boundaries there is a need to find 𝑥1, 𝑥2,⋅ ⋯ ⋯ , 𝑥𝐿 such
that,
ℎ=1
𝐿
𝜙ℎ 𝑥ℎ, 𝑥ℎ−1
is minimum with subject to 𝑎 = 𝑥0 ≤ 𝑥1 ≤ 𝑥2 ≤, ⋯ , ≤ 𝑥𝐿−1 ≤ 𝑥𝐿 = 𝑏
 Further the length of each stratum is defined as
𝑙ℎ = 𝑥ℎ − 𝑥ℎ−1; ℎ = 1,2, . . . . . , 𝐿
 where 𝑙ℎ ≥ 0 denotes the range or the width of the hth stratum.
Methodology contd…
19
 Obviously, with this definition of 𝑙ℎ, the range of the distribution, d = b - a, is
expressed as a function of stratum width as
ℎ=1
𝐿
𝑙ℎ =
ℎ=1
𝐿
𝑥ℎ − 𝑥ℎ−1 = 𝑏 − 𝑎 = 𝑥𝐿 − 𝑥0 = 𝑑
 The hth stratification point xh; h = 1, 2, . . ., L is then expressed as
𝑥ℎ = 𝑥0 +
𝑖=1
ℎ
𝑙𝑖
𝑜𝑟 𝑥ℎ = 𝑥ℎ−1 + 𝑙ℎ
Methodology contd…
20
 Considering range of distribution as a function of stratum width as constraint the
problem of optimization can be treated as any equivalent problem of
determining optimum strata width (OSW), 𝑙1, 𝑙2 , . . . . . , 𝑙𝐿and is expressed as the
following Mathematical Programming Problem;
Minimize ℎ=1
𝐿
𝜙ℎ 𝑙ℎ, 𝑥ℎ−1
Subject to ℎ=1
𝐿
𝑙ℎ = 𝑑
And 𝑙ℎ ≥ 0; ℎ = 1,2, . . . . . , 𝐿
Methodology contd…
21
 Initially, x0 is known. Therefore, the first term, that is, 𝜙1 𝑙1, 𝑥0 in the objective
function of the MPP Equation is a function of l1 alone. Once l1 is known, the
second term 𝜙2 𝑙2, 𝑥1 will become a function of l2 alone and so on. Due to the
special nature of functions, the MPP Equation may be treated as a function of lh
alone and can be expressed as
Minimize ℎ=1
𝐿
𝜙ℎ 𝑙ℎ
Subject to ℎ=1
𝐿
𝑙ℎ = 𝑑
And 𝑙ℎ ≥ 0; ℎ = 1,2, . . . . . , 𝐿
Methodology contd…
22
DYNAMIC
PROGRAMMING
APPROACH
23
Dynamic programming approach
 Dynamic programming determines the optimum solution of a multi-variable
problem by decomposing it into stages, each stage compromising a single
variable sub-problem.
 A dynamic programming model is basically a recursive equation based on
Bellman’s principle of optimality.
 This recursive equation links the different stages of the problem in a manner
which guarantees that each stage’s optimal feasible solution is also optimal and
feasible for the entire problem.
24
 This is a multistage problem for determining Optimal Stratum Boundary for
auxiliary variable following Weibull distribution.
 The problem is formulated as MPP’s and solved using dynamic programming
approach.
 The formulated MPP minimize the variance of estimated population parameter
under different allocation subjected to the restriction that the sum of the widths
of all the strata is equal to the total range of distribution of the variable.
Dynamic programming approach contd…
25
 Since Neyman allocation is considered hence the subproblem of the optimization
for first k< L strata becomes
Minimize ℎ=1
𝑘
𝜙ℎ 𝑙ℎ
Subject to ℎ=1
𝑘
𝑙ℎ = 𝑑𝑘
And 𝑙ℎ ≥ 0; ℎ = 1,2, . . . . . , 𝐿𝑘
 where dk < d is the total width available for division into k strata or the state
value at stage k.
 Note that dk = d for k = L
Dynamic programming approach contd…
26
𝑑1 = 𝑙1 = 𝑑2 − 𝑙2
𝑑𝑘−1 = 𝑙1 + 𝑙2 + ⋯ + 𝑙𝑘−1 = 𝑑𝑘 − 𝑙𝑘
 The transformation functions are given by
 Let ф𝑘 𝑑𝑘 denote the minimum value of the objective function of Equation,
that is
ф𝑘 𝑑𝑘 = 𝑚𝑖𝑛
ℎ=1
𝑘
𝜙ℎ𝑙ℎ |
ℎ=1
𝑘
𝑙ℎ = 𝑑𝑘, 𝑙ℎ ≥ 0; ℎ = 1,2, . . . . , 𝑘 𝑎𝑛𝑑 1 ≤ 𝑘 ≤ 𝐿
Dynamic programming approach contd…
27
𝑑𝑘 = 𝑙1 + 𝑙2 + ⋯ + 𝑙𝑘
 With the above definition of ф𝑘 𝑑𝑘 , the MPP Equation (18) is equivalent to
finding recursively by finding ф𝑘 𝑑𝑘 for k = 1, 2, . . ., L and 0 ≤ dk ≤ d. It can be
written as
ф𝑘 𝑑𝑘 = 𝑚𝑖𝑛 𝜙𝑘 𝑙𝑘 +
ℎ=1
𝑘−1
𝜙ℎ 𝑙ℎ ∕
ℎ=1
𝑘−1
𝑙ℎ = 𝑑𝑘 − 𝑙𝑘; 𝑙ℎ ≥ 0; ℎ = 1,2, . . . . . , 𝑘
 For a fixed value of lk; 0 ≤ lk ≤ dk,
ф𝑘 𝑑𝑘 = 𝜙𝑘𝑙𝑘 + 𝑚𝑖𝑛
ℎ=1
𝑘−1
𝜙ℎ(𝑙ℎ) ∕
ℎ=1
𝑘−1
𝑙ℎ = 𝑑𝑘 − 𝑙𝑘; 𝑙ℎ ≥ 0; ℎ = 1,2, . . . . . , 𝑘
Dynamic programming approach contd…
28
 Using Bellman’s principle of optimality, A forward recursive equation of the
dynamic programming technique is written as
ф𝑘 𝑑𝑘 = 𝑚𝑖𝑛
0≤𝑙𝑘≤𝑑𝑘
𝜙𝑘𝑙𝑘 + ф𝑘−1 𝑑𝑘 − 𝑙𝑘
 For the first stage, that is, for k = 1;
ф1 𝑑1 = 𝜙1 𝑑1 = 𝑙1
∗
= 𝑑1
Dynamic programming approach contd…
29
 where l1
* = d1 is the optimum width of the first stratum. The relations Equations
are solved recursively for each k = 1, 2, . . ., L and 0 ≤ dk ≤ d, and ф𝐿 𝑑 is
obtained.
 From ф𝐿 𝑑 the optimum width of Lth stratum, lL
*, is obtained.
 From ф𝐿 (d - lL
*) the optimum width of (L-1)th stratum, lL-1
*, is obtained and so
on until l1
*, optimum width of 1st stratum, is obtained.
Dynamic programming approach contd…
30
WEIBULL
DISTRIBUTION
31
Weibull distribution
 The Weibull distribution is a two-parameter family of continuous probability
distributions. Because of its versatility in fitting of a variety of distributions, it is
one of the most widely used distributions in applied statistics.
 If an auxiliary variable x follows the Weibull distribution on the interval [x0, xL],
its two-parameter probability density function with a state space x ≥ 0 is given by
𝑓 𝑥; 𝜃, 𝑟 =
𝑟
𝜃
𝑥
𝜃
𝑟−1
ⅇ− 𝑥∕𝜃 𝑟
, 𝑥 ≥ 0
0, 𝑥 < 0
32
 where r > 0 is the shape parameter and θ > 0 is the scale parameter of the
distribution.
 The Weibull distribution is related to a number of other probability distributions;
 In particular, Weibull distribution is reduced into Exponential distribution with
parameter
1
𝜃
,when r = 1
Weibull distribution contd…
33
𝑓 𝑥;
1
𝜃
=
1
𝜃
ⅇ− 𝑥 𝜃 1
, 𝑥 ≥ 0
0, 𝑥 < 0
 If the quantity X is a "time-to-failure", the Weibull distribution gives a
distribution for which the failure rate is proportional to a power of time.
 A value of (r<1) indicates that the failure rate decreases over time.
 A value of (r=1) indicates that the failure rate is constant over time.
 A value of (r>1) indicates that the failure rate increases with time.
Weibull distribution contd…
34
Derivation of weight
 Given,
𝑊ℎ =
𝑥ℎ−1
𝑥ℎ
𝑓 𝑥 𝑑𝑥
 Substituting the value of probability density function of the Weibull distribution
and integrating it over the given interval, the weight of the stratum was obtained as
 Now, Generating the expression of 𝑦𝑠𝑡 ;
𝑦𝑠𝑡 =
ℎ=1
𝐿
ⅇ
−
𝑥ℎ−1
𝜃
𝑟
− ⅇ
−
𝑥ℎ
𝜃
𝑟
𝑦ℎ
𝑊ℎ = ⅇ
−
𝑥ℎ−1
𝜃
𝑟
− ⅇ
−
𝑥ℎ
𝜃
𝑟
35
ESTIMATION
36
Estimating the linear regression model
 The health data of size N =724 obtained from the 2004 Fiji National Nutrition
Survey on “Micronutrient Status of Women in Fiji” is taken’
 The data in this problem had two characteristics the level of iron and the level of
haemoglobin for each woman.
 Survey is based to focus on iron deficiency anaemia to be conducted in the
country.
 Thus stratified random sampling is used for collecting the sample and taking
haemoglobin (y) as a variable of interest and at the same time taking the level of
iron(x) collected in some previous study as a choice for an auxiliary variable.
37
 For this purpose, the linear regression model is fitted and following things are
observed
Source Sum of
Squares
Degree of
Freedom
Mean Sum
of Squares
f P value
Regression 461.92 1 461.92 299.95 0.000
Residual 1050.61 682 1.54
Lack of fit 236.40 204 1.16 0.68 0.890
Pure error 814.21 478 1.70
Total 1515.54 683
Estimating the linear regression model contd…
38
 It is observed that data significantly fitted the linear regression model with iron
level (x) .
 The coefficient of determination or correlation coefficient, R2 =
461.92
1512.54
=
0.3054 indicates a moderate strength of the linear relationship between the two
variables.
 The table also reveals that there is no significant lack of fit in the linear
regression with p-value = 0.890. Thus, the model fits the data well and gives no
reason to consider an alternative model.
Estimating the linear regression model contd…
39
Predictor Coefficients SE Coeff t p -Value
α 10.9449 0.1245 87.89 0.000
β 0.114115 0.009548 11.95 0.000
 Also, there the p-value for the parameters α and β shows that the parameters in
the model are highly significant
Estimating the linear regression model contd…
40
Iron
Haemoglobin
 Also, the scatter plot for the iron versus
Hemoglobin clearly depicts the moderate positive
linear association between the two variables.
 Therefore, the hemoglobin content (y) and the
iron level (x) are fairly assumed to follow a linear
regression model with the following equation
𝜆 𝑥 = 𝛼 + 𝛽𝑥
 And the least-squares estimates of the parameters
are given by
𝛼 =10.9449 and 𝛽 = 0.1141
Estimating the linear regression model contd…
41
Estimating the distribution
Iron
 To determine the distribution of our auxiliary
variable, The relative frequency histogram of
iron level (x) is constructed
 It shows that the distribution of x is right-
skewed distribution that matches the Weibull
distribution.
42
Density
Observed Values
Expected
Weibull
Values
Weibull Q-Q Plot Of Iron , X
 The probability plot (Q-Q) of x was obtained
which showed that the points clustered around
the straight line, thus the auxiliary variable is
assumed to follow the Weibull distribution.
 Also, the maximum likelihood estimate (MLE)
of the parameters for Weibull distribution is
found to be
Shape, r = 2.342 and Scale,𝜃 = 13.40
Estimating the distribution contd…
43
Estimating the variance of the error term
 It is assumed that the variance of the error term is 𝑣 𝜖 ∣ 𝑥 = 𝜙 𝑥 > 0 for all x
in the range (a, b) and the expected value of the function 𝜙 𝑥 given by 𝑢ℎ𝜙is
obtained as
𝑢ℎ𝜙 =
𝑆𝑆𝑅𝑒𝑠
𝑁 − 𝑝
= 𝑀𝑆𝑅𝑒𝑠
 Where 𝑆𝑆𝑅𝑒𝑠 and 𝑀𝑆𝑅𝑒𝑠 are the sum of squares of residuals and mean square
of residuals respectively, and p is the number of parameters in the regression
model.
 In the given regression model
𝜆 𝑥 = 𝛼 + 𝛽𝑥
44
RESULTS
45
Results
 Considering the level of Haemoglobin(y) as the main variable of interest, the
minimum and the maximum values of x (iron) are 1.5 and 25.1, which shows
that the range of distribution of iron level is 23.6.
 The problem is solved by dividing it into two stages (for k =1 and k≥2) using
the recurrence equations to obtain the Optimum strata widths by implementing
the dynamic programming solution procedure.
46
 To compare the effectiveness of Dynamic programming procedure it is
compared with some of the methods available in the literature.
1. Cum 𝑓 method of Dalenius and Hodges (1959).
2. Geometric method of Gunning and Horgan (2004).
3. Lavallée-Hidiroglou method Lavallee and Hidiroglou (1988) with Kozak’s
algorithm Kozak (2004).
Results contd…
47
Strata OSW OSB OFV
(L)
ℎ=1
𝐿
𝑊ℎ𝜎ℎ
2 𝑙1
∗
=10.72
𝑙2
∗
= 12.88
𝑥1
∗
=12.22 1.3658
3 𝑙1
∗
= 7.79
𝑙2
∗
= 6.15
𝑙3
∗
= 9.66
𝑥1
∗
=9.29
𝑥2
∗
=15.44
1.3462
4 𝑙1
∗
= 6.22
𝑙2
∗
= 4.60
𝑙3
∗
= 4.98
𝑙4
∗
= 7.81
𝑥1
∗
=7.72
𝑥2
∗
=12.31
𝑥3
∗
=17.29
1.3384
5 𝑙1
∗
= 5.20
𝑙2
∗
= 3.78
𝑙3
∗
= 3.75
𝑙4
∗
= 4.30
𝑙5
∗
= 6.57
𝑥1
∗
=6.70
𝑥2
∗
=10.48
𝑥3
∗
=14.23
𝑥4
∗
=18.53
1.3346
 The table shows the value of the
optimum stratum widths which are
obtained using the Dynamic
optimization procedure and the
corresponding value of stratum
boundaries are calculated using the
formula for the
given number of strata to be formed.
 Also, the stratum variances are
calculated for the desired no. of strata
and the table depicts that variances of
the strata decrease as we increase the
desired no. of strata.
𝑥ℎ
∗
= 𝑥ℎ−1
∗
+ 𝑙ℎ
∗
𝑥ℎ
∗
= 𝑥ℎ−1
∗
+ 𝑙ℎ
∗
𝑙ℎ
∗
Results contd…
48
CSRF GEO L-H Kozak DP
L OSB OFV OSB OFV OSB OFV OSB OFV
2 12.12 1.366 06.14 1.404 08.1 1.384 12.22 1.366
3 9.76
15.66
1.346 03.84
09.81
1.369 05.55
09.15
1.372 09.29
15.44
1.346
4 07.40
12.12
16.84
1.339 03.03
06.14
12.41
1.353 05.55
09.15
15.55
1.342 07.71
12.31
17.29
1.338
5 6.22
9.76
13.30
18.02
1.335 02.64
04.63
08.13
14.21
1.345 05.55
09.15
12.65
17.00
1.335 6.70
10.48
14.23
18.53
1.335
 The table shows the comparative study of
the already known method for calculating
the optimum stratum boundary with the
Dynamic solution procedure and reveals
that the variance of the Dynamic solution
procedure is minimum among all these
 however, cum 𝑓 method gives the results
closer to the DP method
Results contd…
49
CSRF GEO L-H Kozak DP
L h 𝑛ℎ OFV 𝑛ℎ OFV 𝑛ℎ OFV 𝑛ℎ OFV
2 1
2
274
226
1.366 69
431
1.403 128
372
1.384 278
222
1.366
3 1
2
3
190
195
115
1.346 23
165
312
1.369 56
107
337
1.372 173
206
121
1.346
4 1
2
3
4
109
166
139
86
1.339 12
59
211
218
1.353 57
110
215
118
1.342 119
163
141
77
1.338
5 1
2
3
4
5
75
115
125
122
63
1.335 8
29
95
211
157
1.345 58
110
125
124
83
1.335 88
128
129
101
54
1.335
Results contd…
50
 The table also shows the comparative
study of sample sizes which are
obtained by different methods and
again it is found that the DP method
has the minimum variance.
No. of Strata OSB for x OSB for y OFV of y
L (𝑥ℎ)
𝑦ℎ = 𝛼 + 𝛽 x
2 𝑥1
∗
=12.22 𝑦1
∗
= 12.34 1.366
3 𝑥1
∗
=9.29
𝑥2
∗
=15.44
𝑦1
∗
= 12.01
𝑦2
∗
= 12.71
1.346
4 𝑥1
∗
=7.72
𝑥2
∗
=12.31
𝑥3
∗
=17.29
𝑦1
∗
= 11.82
𝑦2
∗
= 12.35
𝑦3
∗
= 12.92
1.338
5 𝑥1
∗
=6.70
𝑥2
∗
=10.48
𝑥3
∗
=14.23
𝑥4
∗
=18.53
𝑦1
∗
= 11.71
𝑦2
∗
= 12.14
𝑦3
∗
= 12.57
𝑦4
∗
= 13.06
1.335
ℎ=1
𝐿
𝑊ℎ𝜎ℎ
Results contd…
51
 Table shows the results obtained for optimal
stratum boundary of the main variable of
interest that is haemoglobin content in
woman (y) with the help of level of iron (x)
which serves as the auxiliary variable as the
two variables are linearly related to each
other.
 The formula used for obtaining OSB for
variable y is given as 𝑦ℎ = 𝛼 + 𝛽
 The results are obtained from all these different procedures which aimed at
minimizing the objective function values ℎ=1
𝐿
𝜙ℎ 𝑙ℎ = ℎ=1
𝐿
𝑊ℎ 𝜎ℎ𝜆
2
+ 𝜇ℎ𝜙
for L= 2,3,4,5.
 The results of the sample size produced from the cum 𝑓 method are closest to
that produced from the Dynamic programming procedures whereas the other
two methods vary far from the proposed method.
Results contd…
52
 It can be clearly observed from the table that the geometric method produces the
larger sample sizes towards the tailer stratum, thus there is significant difference
between the sample size obtained from other methods on comparison with the
proposed method.
 Further, by looking at the variances for all L = 2,3,4,5 it can be seen that
Dynamic programming method produces the variance which is minimum of all
these methods and also the value of the objective function for the DP method are
very close to cum 𝑓 method.
Results contd…
53
CONCLUSION
54
Conclusion
 The results showcase that the construction of strata using an auxiliary variable
that follows Weibull distribution leads to remarkable gain in precision of
estimates of the main study variable and also constructed the stratum boundaries
in such a way that variance is minimized.
 Dynamic programming technique does not require any initial approximate
solution and uses an auxiliary variable and parametric assumptions in order to
understand the characteristics of the main variable.
55
 Thus, it can be concluded that this technique performs much more efficiently in
determining the optimal sample size and optimal stratum boundary.
 Further, this solution procedure is not restricted to the case where the auxiliary
variable was Weibull distributed but can be utilized for other statistical
distributions
Conclusion contd…
56
References
57
References
Bellman, R. E. (1957). Dynamic programming. Princeton, NJ: Princeton University
Press.
Bühler, W., and T. Deutler. (1975). Optimal stratification and grouping by dynamic
programming. Metrika, 22 (1),161–75.
De Gruijter, J. J., B. Minasny, and A. B. Mcbratney. (2015). Optimizing stratification and
allocation for design-based estimation of spatial means using predictions with error.
Journal of Survey Statistics and Methodology, 3(1),19–42.
Dalenius, T., and J. L. Hodges. (1959). Minimum variance stratification. Journal of the
American Statistical Association, 54 (285),88–101.
58
References
Khan, M. G., N. Sehar, and M. J. Ahsan. (2005). Optimum stratification for exponential
study variable under Neyman allocation. Journal of the Indian Society of Agricultural
Statistics, 59 (2), 146–50.
Khan, M. G. M., N. Nand, and N. Ahmad. (2008). Determining the optimum strata
boundary points using dynamic Programming. Survey Methodology,34 (2),205–14.
Reddy, K. G., & Khan, M. G. (2019). Optimal stratification in stratified designs using
Weibull-distributed auxiliary information. Communications in Statistics-Theory and
Methods, 48(12), 3136-3152.
59
60
61
62

More Related Content

Similar to seminar1!$%^*((^$#@$^@%@%@%$%@%@$@%$%$%@%@

Modeling the dynamics of molecular concentration during the diffusion procedure
Modeling the dynamics of molecular concentration during the  diffusion procedureModeling the dynamics of molecular concentration during the  diffusion procedure
Modeling the dynamics of molecular concentration during the diffusion procedure
International Journal of Engineering Inventions www.ijeijournal.com
 
Numerical analysis dual, primal, revised simplex
Numerical analysis  dual, primal, revised simplexNumerical analysis  dual, primal, revised simplex
Numerical analysis dual, primal, revised simplex
SHAMJITH KM
 
Blind Source Separation using Dictionary Learning
Blind Source Separation using Dictionary LearningBlind Source Separation using Dictionary Learning
Blind Source Separation using Dictionary Learning
Davide Nardone
 
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Mengxi Jiang
 

Similar to seminar1!$%^*((^$#@$^@%@%@%$%@%@$@%$%$%@%@ (20)

Support Vector Machine.pptx
Support Vector Machine.pptxSupport Vector Machine.pptx
Support Vector Machine.pptx
 
Modeling the dynamics of molecular concentration during the diffusion procedure
Modeling the dynamics of molecular concentration during the  diffusion procedureModeling the dynamics of molecular concentration during the  diffusion procedure
Modeling the dynamics of molecular concentration during the diffusion procedure
 
#3Measures of central tendency
#3Measures of central tendency#3Measures of central tendency
#3Measures of central tendency
 
B02402012022
B02402012022B02402012022
B02402012022
 
Data Science Cheatsheet.pdf
Data Science Cheatsheet.pdfData Science Cheatsheet.pdf
Data Science Cheatsheet.pdf
 
Numerical analysis dual, primal, revised simplex
Numerical analysis  dual, primal, revised simplexNumerical analysis  dual, primal, revised simplex
Numerical analysis dual, primal, revised simplex
 
Blind Source Separation using Dictionary Learning
Blind Source Separation using Dictionary LearningBlind Source Separation using Dictionary Learning
Blind Source Separation using Dictionary Learning
 
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity MeasureRobust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
 
Unsteady MHD Flow Past A Semi-Infinite Vertical Plate With Heat Source/ Sink:...
Unsteady MHD Flow Past A Semi-Infinite Vertical Plate With Heat Source/ Sink:...Unsteady MHD Flow Past A Semi-Infinite Vertical Plate With Heat Source/ Sink:...
Unsteady MHD Flow Past A Semi-Infinite Vertical Plate With Heat Source/ Sink:...
 
Traveling Salesman Problem
Traveling Salesman Problem Traveling Salesman Problem
Traveling Salesman Problem
 
Estimation Theory Class (Summary and Revision)
Estimation Theory Class (Summary and Revision)Estimation Theory Class (Summary and Revision)
Estimation Theory Class (Summary and Revision)
 
An efficient approach to wavelet image Denoising
An efficient approach to wavelet image DenoisingAn efficient approach to wavelet image Denoising
An efficient approach to wavelet image Denoising
 
v39i11.pdf
v39i11.pdfv39i11.pdf
v39i11.pdf
 
Numerical Solution of Diffusion Equation by Finite Difference Method
Numerical Solution of Diffusion Equation by Finite Difference MethodNumerical Solution of Diffusion Equation by Finite Difference Method
Numerical Solution of Diffusion Equation by Finite Difference Method
 
Monte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptxMonte Carlo Berkeley.pptx
Monte Carlo Berkeley.pptx
 
Chapter14
Chapter14Chapter14
Chapter14
 
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
 
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
 
201977 1-1-4-pb
201977 1-1-4-pb201977 1-1-4-pb
201977 1-1-4-pb
 
App8
App8App8
App8
 

Recently uploaded

Recently uploaded (20)

FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 

seminar1!$%^*((^$#@$^@%@%@%$%@%@$@%$%$%@%@

  • 1. Surya Prakash Tripathi M. Sc. (Agricultural Statistics) Roll No.- 21601 ICAR - Indian Agricultural Statistics Research Institute Library Avenue, Pusa, New Delhi – 110012 COURSE SEMINAR (STAT 591) OPTIMAL STRATIFICATION USING WEIBULLDISTRIBUTED AUXILIARY INFORMATION 1
  • 2.  Introduction  Objective  Overview of solution approach  Methodology  Dynamic programming approach  Weibull distribution 2  Estimation  Results  Conclusion  References Outline of Seminar 10-02-2023 Course Seminar
  • 4. Introduction  Sampling is the process in which a fraction of the total population which is a true representation of the population called a sample is taken into consideration and is used further for drawing inferences about the characteristics of the population.  Sampling theory has developed into a widely used method for understanding and analyzing large blocks of information to know the meaningful pattern and trends.  An optimum sample size can sufficiently determine the true feature of a much larger population.  When the population is homogeneous with respect to the characteristics under study then simple random sampling is used but when the population is heterogeneous then stratified sampling is used. 4
  • 5. Introduction Contd…  In stratified sampling the heterogeneous population is divided into smaller groups called strata which are homogeneous with respect to characteristics under study and also strata are formed in such a way that heterogeneity is minimum within a stratum and maximum between the stratum.  Stratified sampling has several advantages such as high precision of the estimates, administrative convenience, obtaining a full cross-section of the population, substantial gain in efficiency moreover, it provides the estimates of not only the population but also of subpopulation (stratum).  Stratified sampling plays an important role in health surveys for estimating the prevalence of diseases, in the discipline of business and sciences and in many other parameter estimations. 5
  • 7. Objective  In most of the cases surveyors stratify the population according to their convenience such as geographical or administrative reasons, provinces, districts or natural criteria such as gender and age.  However, this method of stratification is not always a reasonable criterion because the stratum so formed may not be internally homogenous with respect to the variable of interest.  Hence there is need to find the optimal stratum boundary that can maximize the precision of the estimate. There are various methods available for obtaining OSB when the frequency distribution of the variable is known but there is need to minimize the stratum variance 𝜎ℎ 2 as far as possible. 7
  • 9. Overview of solution approach  Here, an efficient method of constructing an optimum stratum boundary (OSB) for determining optimal stratum width and optimal sample size is discussed.  In practical cases, it is found difficult to obtain information about the variable of interest before the conduct of the survey and hence an auxiliary variable which is used for that purpose which is accessible from past surveys.  Auxiliary variable in this approach followed the Weibull distribution and is linearly related to the main variable. 9
  • 10. Overview of solution approach contd…  The problem of stratification is framed as a mathematical programming problem (MPP) and emphasizes on minimization of variance of estimated population parameters under Neyman allocation.  The dynamic programming procedure used in this problem resulted in remarkable gains in the precision of the estimates of the population characteristics.  The dynamic programming technique used the Bellman principle of optimality to solve the formulated MPP, which is a multistage decision problem. 10
  • 12. Methodology  Let the population be stratified into L strata based on an auxiliary variable x, when the estimation of the mean of a study variable y is of interest. If a simple random sample of size nh is to be drawn from hth stratum with sample mean 𝑦ℎ;(h = 1, 2, ..., L), then the stratified sample mean, 𝑦𝑠𝑡 , is given by 𝑦𝑠𝑡 = ℎ=1 𝐿 𝑊ℎ𝑦ℎ  Under the Neyman allocation the formula for variance of 𝑦𝑠𝑡is given by 𝑣 𝑦𝑠𝑡 𝑁 = ℎ=1 𝐿 𝑊ℎ𝜎ℎ𝑦 2 𝑛 − 1 𝑁 ℎ=1 𝐿 𝑊ℎ𝜎ℎ𝑦 2 12
  • 13. Methodology contd…  But when the finite population correction factor is ignored then the variance becomes. 𝑉 𝑦𝑠𝑡 = ℎ=1 𝐿 𝑊ℎ𝜎ℎ𝑦 2 𝑛  Where 𝑤ℎ and 𝜎ℎ𝑦 2 are stratum weight and stratum variance in the hth stratum; h = 1, 2, ..., L respectively and n is total sample size which was already determined.
  • 14.  Since it is already considered that the study variable has the regression model of the form 𝑦 = 𝜆 𝑥 + 𝜖  Where 𝜆 𝑥 is a linear or non-linear function of x and 𝜀 is an error term such that 𝐸 𝜖 ∣ 𝑥 = 0 and 𝑣 𝜖 ∣ 𝑥 = 𝜙 𝑥 > 0 for all x. Methodology contd… 14
  • 15.  Now under model equation, the stratum means 𝜇ℎ𝑦 and stratum variance 𝜎ℎ𝑦 2 of y can be expressed as 𝜇ℎ𝑦= 𝜇ℎ𝜆  If 𝜆 and 𝜖 are uncorrelated then the stratum variance can be expressed as 𝜎ℎ𝑦 2 = 𝜎ℎ𝜆 2 + 𝜎ℎ𝜖 2  Where 𝜎ℎ𝜖 2 is the variance of the 𝜀 in the hth stratum and 𝜎ℎ𝜆 2 denotes the variance of 𝜆 𝑥 in the hth stratum.  Where 𝜇ℎ𝜆 and 𝜇ℎ𝜙 are the expected values of function 𝜆 𝑥 and 𝜙 𝑥 . Methodology contd… 15 𝜎ℎ𝑦 2 = 𝜎ℎ𝜆 2 + 𝜇ℎ𝜙
  • 16.  Now the frequency density function of the auxiliary variable x is defined which is used for the stratification as 𝑓 𝑥 ; 𝑎 ≤ 𝑥 ≤ 𝑏 and for determining the strata boundaries, the range 𝑑 = 𝑏 − 𝑎 is divided into ( 𝐿 − 1) intermediate points 𝑎 = 𝑥0 ≤ 𝑥1 ≤ 𝑥2 ≤, ⋯ , ≤ 𝑥𝐿−1 ≤ 𝑥𝐿 = 𝑏  Since it is already known that to minimize the stratum variance under the Neyman allocation there is need to minimize the numerator i.e. ℎ=1 𝐿 𝑊ℎ𝜎ℎ𝑦  Which is same as minimizing ℎ=1 𝐿 𝑊ℎ 𝜎ℎ𝜆 2 + 𝜇ℎ𝜙 Methodology contd… 16
  • 17.  Now if 𝑓 𝑥 , 𝜆 𝑥 𝑎𝑛𝑑 𝜙 𝑥 are known and integrable functions, then the quantities 𝑊ℎ, 𝜎ℎ𝑦 2 𝑎𝑛𝑑 𝜇ℎ𝜙 can be expressed as the function of the boundary points 𝑥ℎ 𝑎𝑛𝑑 𝑥ℎ−1 using the given expression. 𝑊ℎ = 𝑥ℎ−1 𝑥ℎ 𝑓 𝑥 ⅆ𝑥 𝜎ℎ𝜆 2 = 1 𝑊ℎ 𝑥ℎ−1 𝑥ℎ 𝜆2 𝑥 𝑓 𝑥 ⅆ𝑥 − 𝜇ℎ𝜆 2 Methodology contd… 17
  • 18. 𝑢ℎ𝜙 = 1 𝑊ℎ 𝑥ℎ−1 𝑥ℎ 𝜙 𝑥 𝑓 𝑥 ⅆ𝑥 𝑢ℎ𝜆 = 1 𝑊ℎ 𝑥ℎ−1 𝑥ℎ 𝜆 𝑥 𝑓 𝑥 ⅆ𝑥  Here 𝑥ℎ 𝑎𝑛𝑑 𝑥ℎ−1 are the boundary points of the given stratum.  Hence, the objective function as the function of the boundary points 𝑥ℎ, 𝑥ℎ−1 only is obtained.  Therefore, to minimize the variance under the function 𝜙ℎ 𝑥ℎ, 𝑥ℎ−1 = 𝑊ℎ𝜎ℎ𝑦 = 𝑊ℎ 𝜎ℎ𝜆 2 + 𝜇ℎ𝜙 Methodology contd… 18
  • 19.  The solution of the optimization problem is to be determined under which for obtaining the stratum boundaries there is a need to find 𝑥1, 𝑥2,⋅ ⋯ ⋯ , 𝑥𝐿 such that, ℎ=1 𝐿 𝜙ℎ 𝑥ℎ, 𝑥ℎ−1 is minimum with subject to 𝑎 = 𝑥0 ≤ 𝑥1 ≤ 𝑥2 ≤, ⋯ , ≤ 𝑥𝐿−1 ≤ 𝑥𝐿 = 𝑏  Further the length of each stratum is defined as 𝑙ℎ = 𝑥ℎ − 𝑥ℎ−1; ℎ = 1,2, . . . . . , 𝐿  where 𝑙ℎ ≥ 0 denotes the range or the width of the hth stratum. Methodology contd… 19
  • 20.  Obviously, with this definition of 𝑙ℎ, the range of the distribution, d = b - a, is expressed as a function of stratum width as ℎ=1 𝐿 𝑙ℎ = ℎ=1 𝐿 𝑥ℎ − 𝑥ℎ−1 = 𝑏 − 𝑎 = 𝑥𝐿 − 𝑥0 = 𝑑  The hth stratification point xh; h = 1, 2, . . ., L is then expressed as 𝑥ℎ = 𝑥0 + 𝑖=1 ℎ 𝑙𝑖 𝑜𝑟 𝑥ℎ = 𝑥ℎ−1 + 𝑙ℎ Methodology contd… 20
  • 21.  Considering range of distribution as a function of stratum width as constraint the problem of optimization can be treated as any equivalent problem of determining optimum strata width (OSW), 𝑙1, 𝑙2 , . . . . . , 𝑙𝐿and is expressed as the following Mathematical Programming Problem; Minimize ℎ=1 𝐿 𝜙ℎ 𝑙ℎ, 𝑥ℎ−1 Subject to ℎ=1 𝐿 𝑙ℎ = 𝑑 And 𝑙ℎ ≥ 0; ℎ = 1,2, . . . . . , 𝐿 Methodology contd… 21
  • 22.  Initially, x0 is known. Therefore, the first term, that is, 𝜙1 𝑙1, 𝑥0 in the objective function of the MPP Equation is a function of l1 alone. Once l1 is known, the second term 𝜙2 𝑙2, 𝑥1 will become a function of l2 alone and so on. Due to the special nature of functions, the MPP Equation may be treated as a function of lh alone and can be expressed as Minimize ℎ=1 𝐿 𝜙ℎ 𝑙ℎ Subject to ℎ=1 𝐿 𝑙ℎ = 𝑑 And 𝑙ℎ ≥ 0; ℎ = 1,2, . . . . . , 𝐿 Methodology contd… 22
  • 24. Dynamic programming approach  Dynamic programming determines the optimum solution of a multi-variable problem by decomposing it into stages, each stage compromising a single variable sub-problem.  A dynamic programming model is basically a recursive equation based on Bellman’s principle of optimality.  This recursive equation links the different stages of the problem in a manner which guarantees that each stage’s optimal feasible solution is also optimal and feasible for the entire problem. 24
  • 25.  This is a multistage problem for determining Optimal Stratum Boundary for auxiliary variable following Weibull distribution.  The problem is formulated as MPP’s and solved using dynamic programming approach.  The formulated MPP minimize the variance of estimated population parameter under different allocation subjected to the restriction that the sum of the widths of all the strata is equal to the total range of distribution of the variable. Dynamic programming approach contd… 25
  • 26.  Since Neyman allocation is considered hence the subproblem of the optimization for first k< L strata becomes Minimize ℎ=1 𝑘 𝜙ℎ 𝑙ℎ Subject to ℎ=1 𝑘 𝑙ℎ = 𝑑𝑘 And 𝑙ℎ ≥ 0; ℎ = 1,2, . . . . . , 𝐿𝑘  where dk < d is the total width available for division into k strata or the state value at stage k.  Note that dk = d for k = L Dynamic programming approach contd… 26
  • 27. 𝑑1 = 𝑙1 = 𝑑2 − 𝑙2 𝑑𝑘−1 = 𝑙1 + 𝑙2 + ⋯ + 𝑙𝑘−1 = 𝑑𝑘 − 𝑙𝑘  The transformation functions are given by  Let ф𝑘 𝑑𝑘 denote the minimum value of the objective function of Equation, that is ф𝑘 𝑑𝑘 = 𝑚𝑖𝑛 ℎ=1 𝑘 𝜙ℎ𝑙ℎ | ℎ=1 𝑘 𝑙ℎ = 𝑑𝑘, 𝑙ℎ ≥ 0; ℎ = 1,2, . . . . , 𝑘 𝑎𝑛𝑑 1 ≤ 𝑘 ≤ 𝐿 Dynamic programming approach contd… 27 𝑑𝑘 = 𝑙1 + 𝑙2 + ⋯ + 𝑙𝑘
  • 28.  With the above definition of ф𝑘 𝑑𝑘 , the MPP Equation (18) is equivalent to finding recursively by finding ф𝑘 𝑑𝑘 for k = 1, 2, . . ., L and 0 ≤ dk ≤ d. It can be written as ф𝑘 𝑑𝑘 = 𝑚𝑖𝑛 𝜙𝑘 𝑙𝑘 + ℎ=1 𝑘−1 𝜙ℎ 𝑙ℎ ∕ ℎ=1 𝑘−1 𝑙ℎ = 𝑑𝑘 − 𝑙𝑘; 𝑙ℎ ≥ 0; ℎ = 1,2, . . . . . , 𝑘  For a fixed value of lk; 0 ≤ lk ≤ dk, ф𝑘 𝑑𝑘 = 𝜙𝑘𝑙𝑘 + 𝑚𝑖𝑛 ℎ=1 𝑘−1 𝜙ℎ(𝑙ℎ) ∕ ℎ=1 𝑘−1 𝑙ℎ = 𝑑𝑘 − 𝑙𝑘; 𝑙ℎ ≥ 0; ℎ = 1,2, . . . . . , 𝑘 Dynamic programming approach contd… 28
  • 29.  Using Bellman’s principle of optimality, A forward recursive equation of the dynamic programming technique is written as ф𝑘 𝑑𝑘 = 𝑚𝑖𝑛 0≤𝑙𝑘≤𝑑𝑘 𝜙𝑘𝑙𝑘 + ф𝑘−1 𝑑𝑘 − 𝑙𝑘  For the first stage, that is, for k = 1; ф1 𝑑1 = 𝜙1 𝑑1 = 𝑙1 ∗ = 𝑑1 Dynamic programming approach contd… 29
  • 30.  where l1 * = d1 is the optimum width of the first stratum. The relations Equations are solved recursively for each k = 1, 2, . . ., L and 0 ≤ dk ≤ d, and ф𝐿 𝑑 is obtained.  From ф𝐿 𝑑 the optimum width of Lth stratum, lL *, is obtained.  From ф𝐿 (d - lL *) the optimum width of (L-1)th stratum, lL-1 *, is obtained and so on until l1 *, optimum width of 1st stratum, is obtained. Dynamic programming approach contd… 30
  • 32. Weibull distribution  The Weibull distribution is a two-parameter family of continuous probability distributions. Because of its versatility in fitting of a variety of distributions, it is one of the most widely used distributions in applied statistics.  If an auxiliary variable x follows the Weibull distribution on the interval [x0, xL], its two-parameter probability density function with a state space x ≥ 0 is given by 𝑓 𝑥; 𝜃, 𝑟 = 𝑟 𝜃 𝑥 𝜃 𝑟−1 ⅇ− 𝑥∕𝜃 𝑟 , 𝑥 ≥ 0 0, 𝑥 < 0 32
  • 33.  where r > 0 is the shape parameter and θ > 0 is the scale parameter of the distribution.  The Weibull distribution is related to a number of other probability distributions;  In particular, Weibull distribution is reduced into Exponential distribution with parameter 1 𝜃 ,when r = 1 Weibull distribution contd… 33 𝑓 𝑥; 1 𝜃 = 1 𝜃 ⅇ− 𝑥 𝜃 1 , 𝑥 ≥ 0 0, 𝑥 < 0
  • 34.  If the quantity X is a "time-to-failure", the Weibull distribution gives a distribution for which the failure rate is proportional to a power of time.  A value of (r<1) indicates that the failure rate decreases over time.  A value of (r=1) indicates that the failure rate is constant over time.  A value of (r>1) indicates that the failure rate increases with time. Weibull distribution contd… 34
  • 35. Derivation of weight  Given, 𝑊ℎ = 𝑥ℎ−1 𝑥ℎ 𝑓 𝑥 𝑑𝑥  Substituting the value of probability density function of the Weibull distribution and integrating it over the given interval, the weight of the stratum was obtained as  Now, Generating the expression of 𝑦𝑠𝑡 ; 𝑦𝑠𝑡 = ℎ=1 𝐿 ⅇ − 𝑥ℎ−1 𝜃 𝑟 − ⅇ − 𝑥ℎ 𝜃 𝑟 𝑦ℎ 𝑊ℎ = ⅇ − 𝑥ℎ−1 𝜃 𝑟 − ⅇ − 𝑥ℎ 𝜃 𝑟 35
  • 37. Estimating the linear regression model  The health data of size N =724 obtained from the 2004 Fiji National Nutrition Survey on “Micronutrient Status of Women in Fiji” is taken’  The data in this problem had two characteristics the level of iron and the level of haemoglobin for each woman.  Survey is based to focus on iron deficiency anaemia to be conducted in the country.  Thus stratified random sampling is used for collecting the sample and taking haemoglobin (y) as a variable of interest and at the same time taking the level of iron(x) collected in some previous study as a choice for an auxiliary variable. 37
  • 38.  For this purpose, the linear regression model is fitted and following things are observed Source Sum of Squares Degree of Freedom Mean Sum of Squares f P value Regression 461.92 1 461.92 299.95 0.000 Residual 1050.61 682 1.54 Lack of fit 236.40 204 1.16 0.68 0.890 Pure error 814.21 478 1.70 Total 1515.54 683 Estimating the linear regression model contd… 38
  • 39.  It is observed that data significantly fitted the linear regression model with iron level (x) .  The coefficient of determination or correlation coefficient, R2 = 461.92 1512.54 = 0.3054 indicates a moderate strength of the linear relationship between the two variables.  The table also reveals that there is no significant lack of fit in the linear regression with p-value = 0.890. Thus, the model fits the data well and gives no reason to consider an alternative model. Estimating the linear regression model contd… 39
  • 40. Predictor Coefficients SE Coeff t p -Value α 10.9449 0.1245 87.89 0.000 β 0.114115 0.009548 11.95 0.000  Also, there the p-value for the parameters α and β shows that the parameters in the model are highly significant Estimating the linear regression model contd… 40
  • 41. Iron Haemoglobin  Also, the scatter plot for the iron versus Hemoglobin clearly depicts the moderate positive linear association between the two variables.  Therefore, the hemoglobin content (y) and the iron level (x) are fairly assumed to follow a linear regression model with the following equation 𝜆 𝑥 = 𝛼 + 𝛽𝑥  And the least-squares estimates of the parameters are given by 𝛼 =10.9449 and 𝛽 = 0.1141 Estimating the linear regression model contd… 41
  • 42. Estimating the distribution Iron  To determine the distribution of our auxiliary variable, The relative frequency histogram of iron level (x) is constructed  It shows that the distribution of x is right- skewed distribution that matches the Weibull distribution. 42 Density
  • 43. Observed Values Expected Weibull Values Weibull Q-Q Plot Of Iron , X  The probability plot (Q-Q) of x was obtained which showed that the points clustered around the straight line, thus the auxiliary variable is assumed to follow the Weibull distribution.  Also, the maximum likelihood estimate (MLE) of the parameters for Weibull distribution is found to be Shape, r = 2.342 and Scale,𝜃 = 13.40 Estimating the distribution contd… 43
  • 44. Estimating the variance of the error term  It is assumed that the variance of the error term is 𝑣 𝜖 ∣ 𝑥 = 𝜙 𝑥 > 0 for all x in the range (a, b) and the expected value of the function 𝜙 𝑥 given by 𝑢ℎ𝜙is obtained as 𝑢ℎ𝜙 = 𝑆𝑆𝑅𝑒𝑠 𝑁 − 𝑝 = 𝑀𝑆𝑅𝑒𝑠  Where 𝑆𝑆𝑅𝑒𝑠 and 𝑀𝑆𝑅𝑒𝑠 are the sum of squares of residuals and mean square of residuals respectively, and p is the number of parameters in the regression model.  In the given regression model 𝜆 𝑥 = 𝛼 + 𝛽𝑥 44
  • 46. Results  Considering the level of Haemoglobin(y) as the main variable of interest, the minimum and the maximum values of x (iron) are 1.5 and 25.1, which shows that the range of distribution of iron level is 23.6.  The problem is solved by dividing it into two stages (for k =1 and k≥2) using the recurrence equations to obtain the Optimum strata widths by implementing the dynamic programming solution procedure. 46
  • 47.  To compare the effectiveness of Dynamic programming procedure it is compared with some of the methods available in the literature. 1. Cum 𝑓 method of Dalenius and Hodges (1959). 2. Geometric method of Gunning and Horgan (2004). 3. Lavallée-Hidiroglou method Lavallee and Hidiroglou (1988) with Kozak’s algorithm Kozak (2004). Results contd… 47
  • 48. Strata OSW OSB OFV (L) ℎ=1 𝐿 𝑊ℎ𝜎ℎ 2 𝑙1 ∗ =10.72 𝑙2 ∗ = 12.88 𝑥1 ∗ =12.22 1.3658 3 𝑙1 ∗ = 7.79 𝑙2 ∗ = 6.15 𝑙3 ∗ = 9.66 𝑥1 ∗ =9.29 𝑥2 ∗ =15.44 1.3462 4 𝑙1 ∗ = 6.22 𝑙2 ∗ = 4.60 𝑙3 ∗ = 4.98 𝑙4 ∗ = 7.81 𝑥1 ∗ =7.72 𝑥2 ∗ =12.31 𝑥3 ∗ =17.29 1.3384 5 𝑙1 ∗ = 5.20 𝑙2 ∗ = 3.78 𝑙3 ∗ = 3.75 𝑙4 ∗ = 4.30 𝑙5 ∗ = 6.57 𝑥1 ∗ =6.70 𝑥2 ∗ =10.48 𝑥3 ∗ =14.23 𝑥4 ∗ =18.53 1.3346  The table shows the value of the optimum stratum widths which are obtained using the Dynamic optimization procedure and the corresponding value of stratum boundaries are calculated using the formula for the given number of strata to be formed.  Also, the stratum variances are calculated for the desired no. of strata and the table depicts that variances of the strata decrease as we increase the desired no. of strata. 𝑥ℎ ∗ = 𝑥ℎ−1 ∗ + 𝑙ℎ ∗ 𝑥ℎ ∗ = 𝑥ℎ−1 ∗ + 𝑙ℎ ∗ 𝑙ℎ ∗ Results contd… 48
  • 49. CSRF GEO L-H Kozak DP L OSB OFV OSB OFV OSB OFV OSB OFV 2 12.12 1.366 06.14 1.404 08.1 1.384 12.22 1.366 3 9.76 15.66 1.346 03.84 09.81 1.369 05.55 09.15 1.372 09.29 15.44 1.346 4 07.40 12.12 16.84 1.339 03.03 06.14 12.41 1.353 05.55 09.15 15.55 1.342 07.71 12.31 17.29 1.338 5 6.22 9.76 13.30 18.02 1.335 02.64 04.63 08.13 14.21 1.345 05.55 09.15 12.65 17.00 1.335 6.70 10.48 14.23 18.53 1.335  The table shows the comparative study of the already known method for calculating the optimum stratum boundary with the Dynamic solution procedure and reveals that the variance of the Dynamic solution procedure is minimum among all these  however, cum 𝑓 method gives the results closer to the DP method Results contd… 49
  • 50. CSRF GEO L-H Kozak DP L h 𝑛ℎ OFV 𝑛ℎ OFV 𝑛ℎ OFV 𝑛ℎ OFV 2 1 2 274 226 1.366 69 431 1.403 128 372 1.384 278 222 1.366 3 1 2 3 190 195 115 1.346 23 165 312 1.369 56 107 337 1.372 173 206 121 1.346 4 1 2 3 4 109 166 139 86 1.339 12 59 211 218 1.353 57 110 215 118 1.342 119 163 141 77 1.338 5 1 2 3 4 5 75 115 125 122 63 1.335 8 29 95 211 157 1.345 58 110 125 124 83 1.335 88 128 129 101 54 1.335 Results contd… 50  The table also shows the comparative study of sample sizes which are obtained by different methods and again it is found that the DP method has the minimum variance.
  • 51. No. of Strata OSB for x OSB for y OFV of y L (𝑥ℎ) 𝑦ℎ = 𝛼 + 𝛽 x 2 𝑥1 ∗ =12.22 𝑦1 ∗ = 12.34 1.366 3 𝑥1 ∗ =9.29 𝑥2 ∗ =15.44 𝑦1 ∗ = 12.01 𝑦2 ∗ = 12.71 1.346 4 𝑥1 ∗ =7.72 𝑥2 ∗ =12.31 𝑥3 ∗ =17.29 𝑦1 ∗ = 11.82 𝑦2 ∗ = 12.35 𝑦3 ∗ = 12.92 1.338 5 𝑥1 ∗ =6.70 𝑥2 ∗ =10.48 𝑥3 ∗ =14.23 𝑥4 ∗ =18.53 𝑦1 ∗ = 11.71 𝑦2 ∗ = 12.14 𝑦3 ∗ = 12.57 𝑦4 ∗ = 13.06 1.335 ℎ=1 𝐿 𝑊ℎ𝜎ℎ Results contd… 51  Table shows the results obtained for optimal stratum boundary of the main variable of interest that is haemoglobin content in woman (y) with the help of level of iron (x) which serves as the auxiliary variable as the two variables are linearly related to each other.  The formula used for obtaining OSB for variable y is given as 𝑦ℎ = 𝛼 + 𝛽
  • 52.  The results are obtained from all these different procedures which aimed at minimizing the objective function values ℎ=1 𝐿 𝜙ℎ 𝑙ℎ = ℎ=1 𝐿 𝑊ℎ 𝜎ℎ𝜆 2 + 𝜇ℎ𝜙 for L= 2,3,4,5.  The results of the sample size produced from the cum 𝑓 method are closest to that produced from the Dynamic programming procedures whereas the other two methods vary far from the proposed method. Results contd… 52
  • 53.  It can be clearly observed from the table that the geometric method produces the larger sample sizes towards the tailer stratum, thus there is significant difference between the sample size obtained from other methods on comparison with the proposed method.  Further, by looking at the variances for all L = 2,3,4,5 it can be seen that Dynamic programming method produces the variance which is minimum of all these methods and also the value of the objective function for the DP method are very close to cum 𝑓 method. Results contd… 53
  • 55. Conclusion  The results showcase that the construction of strata using an auxiliary variable that follows Weibull distribution leads to remarkable gain in precision of estimates of the main study variable and also constructed the stratum boundaries in such a way that variance is minimized.  Dynamic programming technique does not require any initial approximate solution and uses an auxiliary variable and parametric assumptions in order to understand the characteristics of the main variable. 55
  • 56.  Thus, it can be concluded that this technique performs much more efficiently in determining the optimal sample size and optimal stratum boundary.  Further, this solution procedure is not restricted to the case where the auxiliary variable was Weibull distributed but can be utilized for other statistical distributions Conclusion contd… 56
  • 58. References Bellman, R. E. (1957). Dynamic programming. Princeton, NJ: Princeton University Press. Bühler, W., and T. Deutler. (1975). Optimal stratification and grouping by dynamic programming. Metrika, 22 (1),161–75. De Gruijter, J. J., B. Minasny, and A. B. Mcbratney. (2015). Optimizing stratification and allocation for design-based estimation of spatial means using predictions with error. Journal of Survey Statistics and Methodology, 3(1),19–42. Dalenius, T., and J. L. Hodges. (1959). Minimum variance stratification. Journal of the American Statistical Association, 54 (285),88–101. 58
  • 59. References Khan, M. G., N. Sehar, and M. J. Ahsan. (2005). Optimum stratification for exponential study variable under Neyman allocation. Journal of the Indian Society of Agricultural Statistics, 59 (2), 146–50. Khan, M. G. M., N. Nand, and N. Ahmad. (2008). Determining the optimum strata boundary points using dynamic Programming. Survey Methodology,34 (2),205–14. Reddy, K. G., & Khan, M. G. (2019). Optimal stratification in stratified designs using Weibull-distributed auxiliary information. Communications in Statistics-Theory and Methods, 48(12), 3136-3152. 59
  • 60. 60
  • 61. 61
  • 62. 62