Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab Analytics

D
BASIC OF STATISTICAL INFERENCE PART-III
www.dexlabanalytics.com 1

D
THEORY OF ESTIMATION
2www.dexlabanalytics.com

3
CONTENT
Copyright © 2020, DexLab Solutions Corporation
TOPIC PAGE NO.
1. Introduction
2. Element of Estimation (Parameter, Statistic, Relation between Statistic
and Parameter
3. What is Estimation (Point estimation, Interval estimation)
4. Characteristics of Estimation (Unbiasedness)
5. Characteristics of Estimation (Sufficiency, Efficiency, Consistency)
6. Method of Estimation (MLE, MLS)
7. Method of Estimation (MMV, MM)
8. Conclusion

4
INTRODUCTION
The estimation is the process of providing numerical values of the unknown parameter to the population. There are mainly two
types of estimation process Point estimation and Interval estimation and confidence interval is the part of the interval estimation.
We will also discuss about the elements of the estimation like parameter, statistic and estimator.
The characteristics of estimators are – (i) Unbiasedness – This is desirable property of a good estimator. (ii) Consistency – An
estimator is said to be consistent if increasing the sample size produces an estimate with smaller standard error.
(iii) Efficiency – An estimator should be an efficient estimator. (iv) Sufficiency – An estimator is said to be sufficient for a parameter,
if it contains all the information in the sample regarding the parameter.
The methods of estimation are– (i) Method of maximum likelihood, (ii) Method of least square, (iii) Method of minimum variance,
(iv) Method of moments.

5
ELEMENT FOR ESTIMATION
PARAMETER STATISTIC
Any statistical measure calculated on
the basis of sample observations is
called Statistic. Like sample mean,
sample standard deviation, etc.
Sample statistic are always known to
us.
Parameter is an unknown numerical factor of the
population. The primary interest of any survey lies in
knowing the values of different measures of the
population distribution of a variable of interest. The
measures of population distribution involves its
mean, standard deviation etc. which is calculated on
the basis of the population values of the variable. In
other words, the parameter is a functional form of
all the population unit.
ESTIMATOR
An estimator is a measure computed on the basis
of sample values. It is a functional from of all
sample observe prorating a representative value
of the collected sample.
RELATION BETWEEN PARAMETER AND STATISTIC
Parameter is a fixed measure describing the whole population (population being a group of people, things, animals, phenomena that share common
characteristics.) A statistic is a characteristic of a sample, a portion of the target population. A parameter is fixed, unknown numerical value, while the
statistic is a known number and a variable which depends on the portion of the population. Sample statistic and population parameters have different
statistical notations: In population parameter, population proportion is represented by P, mean is represented by µ (Greek letter mu), σ2 represents
variance, N represents population size, σ (Greek letter sigma) represents standard deviation, σx̄ represents Standard error of mean, σ/µ represents
Coefficient of variation, (X-µ)/σ represents standardized variate (z), and σp represents standard error of proportion.
In sample statistics, mean is represented by x̄ (x-bar), sample proportion is represented by p̂ (p-hat), s represents standard deviation, s2 represents
variance, sample size is represented by n, sx̄ represents Standard error of mean, sp represents standard error of proportion, s/(x̄) represents Coefficient
of variation, and (x-x̄)/s represents standardized variate (z).

WHAT IS ESTIMATION?
6Copyright © 2020, DexLab Solutions Corporation
Estimation refers to the process by which one makes an idea about a population, based on information obtained from a sample.
Suppose we have a random sample 𝑥1, 𝑥2, … , 𝑥 𝑛 on a variable x, whose distribution in the population involves an unknown parameter 𝜃. It is required to
find an estimate of 𝜃 on the basis of sample values. The estimation is done in two different ways: (i) Point Estimation, and (ii) Interval Estimation.
In point estimation, the estimated value is given by a single quantity, which is a function of sample observations. This function is called the ‘estimator’ of
the parameter, and the value of the estimator in a particular sample is called an ‘estimate’.
Interval estimation, an interval within which the parameter is expected to lie in given by using two quantities based on sample values. This is known as
Confidence interval, and the two quantities which are used to specify the interval, are known as Confidence Limits.
Many functions of sample observations may be
proposed as estimators of the same parameter.
For example, either the mean or median or
mode of the sample values may be used to
estimate the parameter 𝜇 of the normal
distribution with probability density function
1
𝜎 2𝜋
𝑒−(𝑥−𝜇)2 2𝜎2
Which we shall in future refer to as 𝑁(𝜇, 𝜎2
).
In statistical analysis it is not always
possible to find out an exact point estimate
to from an idea about the population
parameters. A approximately true picture
can be formed if the sample estimations
satisfy some important property such as
unbiasedness consistency, sufficiency,
efficiency & so on. So, a more general
concept of estimation would be to find out
an interval based on sample values which is
expect to include the unknown parameter
with a specified probability. This is known
as the theory of interval estimator.
POINT ESTIMATION INTERVAL ESTIMATION
Let 𝑥1, 𝑥2,… , 𝑥 𝑛 be a random sample from a
population involve an unknown parameter 𝜃. Our
job is to find out two functions 𝑡1 & 𝑡2 of the
sample values. Such that the probability of 𝜃 being
included in the random interval 𝑡1, 𝑡2 has a given
value say 1 − 𝛼 . So,
𝑃 𝑡1 ≤ 𝜃 ≤ 𝑡2 = 1 − 𝛼
Here the interval 𝑡1, 𝑡2 is called a 100 ×
1 − 𝛼 % confidence interval for the parameter
𝜃. The quantities 𝑡1 & 𝑡2 which serve as the lower
& upper limits of the interval are known as
confidence limits. 1 − 𝛼 is called the confidence
coefficient. The is a sort of measures of the trust
or confidence that one may place in the interval
for actually including 𝜃.
CONFIDENCE INTERVAL

7
CHARACTERISTICS OF ESTIMATION
A statistic t is said to be an Unbiased Estimator of parameter 𝜃, if the expected value of t is 𝜃. 𝐸 𝑡 = 𝜃
Otherwise, the estimator is said to be ‘biased’. The bias of a statistic in estimating 𝜃 is given as 𝐵𝑖𝑎𝑠 = 𝐸 𝑡 − 𝜃
Let 𝑥1, 𝑥2,… , 𝑥 𝑛 be a random sample drawn from a population with mean 𝜇 and variance 𝜎2
. Then
Sample mean
𝑥 =
𝑥𝑖
𝑛
Sample variance
𝑆2
=
(𝑥𝑖 − 𝑥)2
𝑛
The sample mean 𝑥 is an unbiased estimator of the population mean 𝜇; because 𝐸 𝑥 = 𝜇
The sample variance 𝑆2
is a biased estimator of the population variance 𝜎2
; because
𝐸 𝑆2
=
𝑛 − 1
𝑛
𝜎2
≠ 𝜎2
An unbiased estimator of the population variance 𝜎2 is given by 𝑠2 =
𝑥 𝑖− 𝑥 2
(𝑛−1)
Because
𝐸 𝑠2 = 𝜎2
The distinction between 𝑆2 and 𝑠2 in which only he denominators are different. 𝑆2 is the variance of the sample observations, but 𝑠2 is the ‘unbiased
estimator’ of the variance (𝜎2) in the population.
Example:
Show that
𝒙 𝒊 𝒙 𝒊−𝟏
𝒏(𝒏−𝟏)
is an unbiased estimate of 𝜽 𝟐, for the sample 𝒙 𝟏, 𝒙 𝟐, … , 𝒙 𝒏 drawn on X which takes the values 1 or 0 with respective
probabilities 𝜽 𝒂𝒏𝒅 𝟏 − 𝜽 .
Solution: Since 𝑥1, 𝑥2,… , 𝑥 𝑛 is a random sample from Bernoulli population with parameter 𝜃, T =
𝑖=1
𝑛
𝑥𝑖~𝐵 𝑛, 𝜃 → 𝐸 𝑇 = 𝑛𝜃 𝑎𝑛𝑑 𝑉𝑎𝑟 𝑇 = 𝑛𝜃(1 − 𝜃)
∴ 𝐸
𝑥𝑖 𝑥𝑖 − 1
𝑛(𝑛 − 1)
= 𝐸
𝑇(𝑇 − 1)
𝑛(𝑛 − 1)
=
1
𝑛(𝑛 − 1)
𝐸 𝑇2
− 𝐸(𝑇) =
1
𝑛(𝑛 − 1)
𝑉𝑎𝑟 𝑇 + 𝐸(𝑇) 2
− 𝐸(𝑇)
=
1
𝑛(𝑛 − 1)
𝑛𝜃 1 − 𝜃 + 𝑛2 𝜃2 − 𝑛𝜃 =
𝑛𝜃2(𝑛 − 1)
𝑛(𝑛 − 1)
= 𝜃2
𝑜𝑟, 𝑥𝑖 𝑥𝑖 − 1 / 𝑛(𝑛 − 1) is an unbiased estimator of 𝜃2
.
UNBIASEDNESS

8
CHARACTERISTICS OF ESTIMATION
A statistic is said to be a ‘sufficient estimator’ of a
parameter 𝜃, if it contains all information in the sample
about 𝜃 . If a statistic t exists such that the joint
distribution of the sample is expressible as the product
of two factors, one of which is the sampling distribution
of t and contains 𝜃, but the other factor is independent
of 𝜃, then t will be a sufficient estimator of 𝜃.
Thus if 𝑥1, 𝑥2,… , 𝑥 𝑛 is a random sample from a
population whose probability mass function or
probability density function is 𝑓 𝑥, 𝜃 , and t is a
sufficient estimator of 𝜃 then we can write
𝑓 𝑥1 𝜃 , 𝑓 𝑥2, 𝜃 … . . 𝑓 𝑥 𝑛, 𝜃 = 𝑔 𝑡, 𝜃 .ℎ 𝑥1, 𝑥2, … , 𝑥 𝑛
Where g(t, 𝜃) is the sampling distribution of t and
contains 𝜃, but ℎ 𝑥1, 𝑥2,… , 𝑥 𝑛 is independent of 𝜃.
Since the parameter 𝜃 is occurring in the joint
distribution of all the sample observations can be
contained in the distribution of the statistic t, it is said
that t alone can provide all ‘information’ about 𝜃 and is
therefore “sufficient” for 𝜃.
Sufficient estimators are the most desirable kind of
estimators, but unfortunately they exist in only relatively
few cases. If a sufficient estimator exists, it can be found
by the method of maximum likelihood.
In random sampling from a Normal population 𝑁 𝜇, 𝜎2
,
the sample mean 𝑥 is a sufficient estimator of 𝜇.
SUFFICIENCY
If we confine ourselves to unbiased estimates,
there will, in general, exist more than one
consistent estimator of a parameter. For
example, in sampling from a normal population
𝑁 𝜇, 𝜎2 , when 𝜎2 is known, sample mean 𝑥 is
an unbiased and consistent estimator of 𝜇.
From symmetry it follows immediately the
sample median (Md) is an unbiased estimate of
𝜇. Which is same as the population median. Also
for large n,
𝑉 𝑀𝑑 =
1
4𝑛𝑓1
2
Here, 𝑓1 =
𝑀𝑒𝑑𝑖𝑎𝑛 𝑜𝑟𝑑𝑖𝑛𝑎𝑡𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑎𝑟𝑒𝑛𝑡 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
= 𝑀𝑜𝑑𝑎𝑙 𝑜𝑟𝑑𝑖𝑛𝑎𝑡𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑎𝑟𝑒𝑛𝑡 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
=
1
𝜎 2𝜋
𝑒𝑥𝑝 − 𝑥 − 𝜇 2
/2𝜎2
𝑥=𝜇
=
1
𝜎 2𝜋
∴ 𝑉 𝑀𝑑 =
1
4𝑛
. 2𝜋𝜎2
=
𝜋𝜎2
2𝑛
Since 𝐸 𝑀𝑑 =𝜇
𝑉(𝑀𝑑)→0
, 𝑎𝑠 𝑛 → ∞
Median is also an unbiased and consistent
estimator of 𝜇.
Thus, there is necessity of some further criterion
which will enable us to choose between the
estimators with the common property of
consistency.
Such a criterion which is based on the variance
of the sampling distribution of estimators is
usually known as Efficiency.
EFFICIENCY
A desirable property of good estimator
is that its accuracy should increase
when the sample becomes larger. That
is, the estimator is expected to come
closer to the parameter as the size of
the sample increases.
A statistic 𝑡 𝑛 computed from a sample
of n observations is said to be a
Consistent Estimator of a parameter 𝜃,
if it converges in probability to 𝜃 as n
tends to infinity. This means that the
larger the sample size (n), the less is
the chance that the difference
between 𝑡 𝑛 and 𝜃 will exceed any
fixed value. In symbols, given any
arbitrary small positive quantity 𝜖,
lim
𝑛→∞
= 𝑃 𝑡 𝑛 − 𝜃 > 𝜖 = 0
If 𝐸 𝑡 𝑛 → 𝜃 and 𝑉𝑎𝑟 𝑡 𝑛 → 0 as 𝑛 →
𝜇 , then the statistic 𝑡 𝑛 will be a
‘consistent estimator’ of 𝜃.
Consistency is a limiting property.
Moreover, several consistent
estimators may exist for the same
parameter. For example, in sampling
from a normal population 𝑁 𝜇, 𝜎2
,
both the sample mean and sample
median are consistent estimators of 𝜇.
CONSISTENCY

9
METHODS OF ESTIMATION
This is a convenient method for finding an estimator which satisfies most
of the criteria discussed earlier. Let 𝑥1, 𝑥2,… , 𝑥 𝑛 be a random sample
from a population with p.m.f (for discrete case) or p.d.f. (for continuous
case) 𝑓 𝑥, 𝜃 , where 𝜃 is the parameter. Then the joint distribution of the
sample observations
𝐿 = 𝑓 𝑥1, 𝜃 , 𝑓 𝑥2, 𝜃 , … 𝑓 𝑥 𝑛, 𝜃
Is called the Likelihood Function of the sample. The Method of Maximum
Likelihood consists in choosing as an estimator of 𝜃 that statistic, which
when substituted for 𝜃, maximise the likelihood function L. such a statistic
is called a maximum likelihood estimator (m.l.e.). We shall denote the
m.l.e. of 𝜃 by the symbol 𝜃0. Since log 𝐿is maximum when L is maximum,
in practice the m.l.e. of 𝜃 is obtained by maximising log 𝐿 . This is achieved
by differentiating log 𝐿 partially with respect to 𝜃, and using the two
relations
𝜕
𝜕𝜃
log 𝐿
𝜃=𝜃0
= 0,
𝜕2
𝜕𝜃2
log 𝐿
𝜃=𝜃0
< 0
The m.l.e. is consistent, most efficient, and also sufficient, provided a
sufficient estimator exists. The m.l.e. is not necessarily unbiased. But
when the m.l.e. is biased, by a slight modification, it can be converted into
an unbiased estimator. The m.l.e. tends to be distributed normally for
large samples. The m.l.e. is invariant under functional transformations. If
T is an m.l.e. of 𝜃, and 𝑔(𝜃) is a function of 𝜃, then g(T) is the m.l.e. of
𝑔(𝜃) .
METHOD OF MAXIMUM LIKELIHOOD
Method of Least Squares is a device for finding the equation of a specified
type of curve, which best fits a given set of observations. The principal of
least squares is used to fit a curve of the form:
𝑦𝑖 = 𝑓 𝑥𝑖 be a function where 𝑥𝑖 is an explanatory variable. & 𝑦𝑖 is the
dependent variable. Let us assume a linear relationship between the
dependent & independent variable. So,
𝑌𝑖 = 𝛼 + 𝛽𝑥𝑖 + 𝑒𝑖; 𝛼 = 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 𝑓𝑜𝑟𝑚 𝛽 = 𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑖𝑣𝑒𝑛𝑒𝑠𝑠 𝑜𝑓 𝑌𝑖
𝑒𝑖 = 𝐸𝑟𝑟𝑜𝑟 𝑓𝑜𝑟𝑚
The objective of the researcher is to minimize 𝑙 𝑖
2, 𝑒𝑖 is not minimized
because 𝑒𝑖 = 0 but actual error of individual observation can remain
𝑌𝑖 = 𝛼 + 𝛽𝑋𝑖 + 𝑒𝑖
𝑌𝑖 − 𝛼 − 𝛽𝑋𝑖 = 𝑒𝑖
Squaring both sides we get:- 𝑌𝑖 − 𝛼 − 𝛽𝑋𝑖
2 = 𝑒𝑖
2
𝜕 𝑒𝑖
2
𝜕𝛼
= 0 = 2 𝑌𝑖 − 𝛼 − 𝛽𝑋𝑖 . −1 = 0 𝑜𝑟, −2 𝑌𝑖 − 𝛼 − 𝛽𝑋𝑖 = 0
𝑜𝑟, 𝑌𝑖 − 𝛼 − 𝛽𝑋𝑖 = 0𝑜𝑟, 𝑌𝑖 − 𝛽𝑋𝑖 = 𝑛𝛼 𝑜𝑟, 𝛼 = 𝑌 − 𝛽 𝑋
The method od least squares can be used to fit other types of curves, e.g.
parabola, exponential curve, etc. method of least squares is applied to find
regression lines and also in the determination of trend in time series.
METHOD OF LEAST SQUARE
𝒇 𝟏
𝒇 𝟐
𝒇 𝟑
𝒇 𝟒

10
METHODS OF ESTIMATION
This method was discovered and studied in detail by Karl Pearson.
Let 𝑓 𝑥; 𝜃1, 𝜃2, … , 𝜃 𝑘 be the density function of the parent population
with k parameters 𝜃1, 𝜃2,… , 𝜃 𝑘. If 𝜇 𝑟
′ denoted the rth moment about
origin, then
𝜇 𝑟
′ =
−∞
∞
𝑥 𝑟 𝑓 𝑥; 𝜃1, 𝜃2,… , 𝜃 𝑘 𝑑𝑥, (𝑟 = 1,2, … , 𝑘)
In general 𝜇1
′
, 𝜇2
′
, … , 𝜇 𝑘
′
will function of the parameters 𝜃1, 𝜃2,… , 𝜃 𝑘.
Let 𝑥𝑖, 𝑖 = 1,2, … , 𝑛 be a random sample of size n from the given
population. The method of moments consists in solving the k-equations
for 𝜃1, 𝜃2,… , 𝜃 𝑘 in terms of 𝜇1
′
, 𝜇2
′
,… , 𝜇 𝑘
′
and then replacing these
moments 𝜇 𝑟
′ ; 𝑟 = 1,2,… , 𝑘 by the sample moments,
𝜃𝑖 = 𝜃𝑖 𝜇1
′
, 𝜇2
′
,… . , 𝜇 𝑘
′
= 𝜃𝑖 𝑚1
′
, 𝑚2
′
,… , 𝑚 𝑘
′
; 𝑖 = 1,2, … , 𝑘
Where 𝑚𝑖
′
is the ith moment about origin in the sample.
Then by the method of moments 𝜃1, 𝜃2,… , 𝜃 𝑘 are the required estimator
of 𝜃1, 𝜃2,… . , 𝜃 𝑘 respectively.
METHOD OF MOMENTS
Minimum Variance Unbiased Estimates, the estimates
𝑖 𝑇 𝑖𝑠 𝑢𝑛𝑏𝑖𝑎𝑠𝑒𝑑 𝑓𝑜𝑟 𝛾 𝜃 , 𝑓𝑜𝑟 𝑎𝑙𝑙 𝜃𝜖𝜗 are unbiased and [(ii) It has
smallest variance among the class of all unbiased estimators of 𝛾(𝜃), then
T is called the minimum variance unbiased estimator (MVUE) of 𝛾(𝜃)]
have minimum variance.
If 𝐿 = 𝑖=1
𝑛
𝑓 𝑥𝑖, 𝜃 , is the likelihood function of a random sample of n
observations 𝑥1, 𝑥2, … , 𝑥 𝑛 from a population with probability function
𝑓(𝑥, 𝜃), then the problem is to find a statistics 𝑡 = 𝑡(𝑥1, 𝑥2,… , 𝑥 𝑛), such
that
𝐸 𝑡 =
−∞
∞
𝑡 . 𝐿𝑑𝑥 = 𝛾 𝜃 𝑜𝑟,
−∞
∞
𝑡 − 𝛾(𝜃) 𝐿𝑑𝑥 = 0
And 𝑉 𝑦 = −∞
∞
𝑡 − 𝐸(𝑡) 2
𝐿𝑑𝑥 = −∞
∞
𝑡 − 𝛾(𝜃) 2
𝐿𝑑𝑥
Is minimum where
−∞
∞
𝑑𝑥 𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑠 𝑡ℎ𝑒 𝑛
− 𝑓𝑜𝑙𝑑 𝑖𝑛𝑡𝑒𝑔𝑟𝑎𝑡𝑖𝑜𝑛
−∞
∞
−∞
∞
… .
−∞
∞
𝑑𝑥1 𝑑𝑥2 … 𝑑𝑥 𝑛
METHOD OF MINIMUM VARIANCE

11
CONCLUSION
Estimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical
data that has a random component. The parameters describe an unknown numerical factor of the population. The primary interest of any survey
lies in knowing the values of different measures of the population distribution of a variable of interest. The measures of population distribution
involves its mean, standard deviation etc. which is calculated on the basis of the population values of the variable. The estimation theory has its own
characteristics like the data should be unbiased, a good estimator is that its accuracy should increase when the sample size becomes larger, The
sample mean and sample median should be consistent estimators of parameter mean, The estimator is expected to come closer to the parameter
as the size of the sample increases. Two consistent estimators for the same parameter, the statistic with the smaller sampling variance is said to be
more efficient, so, efficiency is another characteristic of the estimation. The estimator should be sufficient estimator of a parameter theta.
The estimation theory follows some methods by this method and with the characteristics we can properly estimate from the data. The methods we
already discussed that maximum likelihood method, least square method, minimum variance method & method of moments.
In real life, estimation is part of our everyday experience. When you’re shopping in the grocery store and trying to stay within a budget, for
example, you estimate the cost of the items you put in your cart to keep a running total in your head. When you’re purchasing tickets for a group of
people or splitting the cost of dinner between 8 friends, we estimate for ease. In many other field we can use the estimation theory specially for
statistical analysis.

K 3/5, DLF Phase 2, Gurgaon, Haryana – 122 002.
hello@dexlabanalytics.com
+91 124 450 2444; +91 124 488 8144
+91 931 572 5902; +91 8527 872 444
www.dexlabanalytics.com
12
CONTACTUS

Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab Analytics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab Analytics

Similar to Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab Analytics (20)

More from Dexlab Analytics

More from Dexlab Analytics (9)

Recently uploaded

Recently uploaded (20)

Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab Analytics