2. Two main divisions in Statistics:
1. Descriptive Statistics
2. Inferential Statistics
Descriptive Statistics can be defined as those methods involving the collection,
presentation and characterization of a set of data in order to describe the various
features of that set of data properly.
Some of the descriptive measures are:
1. Measures of central tendency: Mean, mode, median etc.
2. Measures of variation: Standard deviation, variance, range etc.
3. Measures of position: Percentiles, deciles, quartiles etc.
4. Measures of association: Karl Pearsonβs correlation coefficient, Spearmanβs
correlation coefficient etc.
3. Inferential statistics can be defined as those methods which aims at
drawing conclusions about population based only on sample results.
Two main objectives of inferential statistics are:
1. Estimation of a characteristic of a population (Parameter Estimation)
2. Making decision concerning a population
( Hypothesis Testing)
Key Concepts: Sampling, Probability Distribution, Sampling Distribution.
4. Statistics/Parameters
Statistic Parameter
Sample mean (π) Population mean (π)
Sample proportion of success (p) Population proportion of success (π)
Sample standard deviation (s) Population standard deviation (π)
Sample correlation coefficient (r) Population correlation coefficient (Ο)
5. ConceptofSamplingDistribution
The concept of Sampling distribution is very important since almost all
inferential statistics are based on sampling distribution.
Sample distribution: A distribution resulting from the collection of
actual data is called sample distribution.
Sampling distribution:
The probability distribution of all possible values that a statistic can
assume (e.g. sample mean, sample proportion etc.), computed from
samples of the same size, drawn randomly and repeatedly from the
same population is called the sampling distribution of that statistic.
6. Samplingdistributionsomemajorstatistics
ο§ Sampling distribution of sample mean
ο§ Sampling distribution of sample proportion of success
ο§ Sampling distribution of sample variance or sample
standard deviation (not covered in the course)
ο§ Sampling distribution of sample correlation
coefficient (not covered in the course)
7. MeasuresdescribingSamplingDistribution
Three measures describing a sampling distribution are:
1. Functional Form/Probability Histogram
If we plot probability histogram by taking observed values of the
statistic along x-axis and relative frequencies along Y-axis, then shape
may take different forms (probability distributions) like Normal
distribution, t distribution, chi-square distribution etc.
2. Expected value: The average value for the statistic
3. Standard error: The standard deviation of the statistic.
8. Samplingdistributionof samplemean(πΏ)
Functional Form/Probability distribution of πΏ
We have three sampling situations:
1. Sampling from normally distributed population
2. Sampling from non-normally distributed population
3. Sampling from a population whose functional form is
unknown
9. Probabilitydistributionof πΏ
Sampling from normally distributed population
The empirical distribution of π will be approximately or near
normal.
Sampling from non-normally distributed population
For the case where sampling is from a non-normally distributed
population, we refer to an important mathematical theorem
known as the Central Limit Theorem.
10. CentralLimitTheorem
The Central Limit Theorem (CLT) is a statistical concept that states that the
sample mean distribution of a random variable will assume a near-normal or
normal distribution if the sample size is large enough. In simple terms, the
theorem states that the sampling distribution of the mean approaches a normal
distribution as the size of the sample increases, regardless of the shape of the
original population distribution.
Key things of CLT:
β’ CLT states that the distribution of sample means approximates a normal
distribution as the sample size gets larger.
β’ Sample sizes equal to or greater than 30 are considered sufficient for the CLT to
hold.
β’ A key aspect of CLT is that the average of the sample means and standard
deviations will equal the population mean and standard deviation.
β’ A sufficiently large sample size can predict the characteristics of a population
accurately.
12. Whatsamplesizeislargeenoughfornormalapproximation ofsampling
distributionofmean?
1. For most population distribution, regardless of shape, the sampling
distribution of the mean is approximately normally distributed if samples of
at least 30 observations are selected.
n β₯ 30 large sample, n < 30 small sample.
2. If the population distribution is fairly symmetrical, the sampling distribution
of the mean is approximately normal if samples of at least 15 observations
are selected.
3. If the population is normally distributed, the sampling distribution of mean is
normally distributed regardless of the sample size.
13. Expectedvalue
For any sample of n observations, the expected value of the sample mean is equal
to the population mean.
οΌ Under the random sampling the sample mean, π is called unbiased estimator of
the population mean Β΅
οΌ will be neither too low nor too high and average of all sample means will be equal
to population mean.
οΌ This is true whether sampling is done with replacement or without replacement
( )
' ( ) ( )
X
E X Β΅
Mean of X s Population mean Β΅
ο
ο½
ο½
14. StandardError
A measure of variability in the statistic from sample to sample is called
standard error.
Standard error of mean is the standard deviation of sample means and it
measures the fluctuation of mean form sample to sample. A distribution
of sample means that is less spread out is a better estimator of the
population mean and has a smaller standard error.
15. Standarderrormean(SRSWR)
When sampling is from infinite population or sampling is done with
replacement or Sampling fraction f =
π
π
is less than 5 %, the standard
error of sample mean is given by,
π. πΈ. (π) = Οπ =
Ο
π
Estimated π. πΈ. (π) = ππ =
π
π
16. Standarderrorofmean(SRSWOR)
When sampling is from finite population or sampling is done without
replacement or Sampling fraction f = n/N is more than 5 %
π. πΈ. π = ππ =
Ο
π
πβπ
πβ1
Estimated π. πΈ. π =
π
π
π βπ
πβ1
where s is the sample estimate of population standard deviation Ο
17. FinitePopulationCorrection
The factor
πβπ
πβ1
is called finite population multiplier or
correction and can be ignored when the sample size is small
in comparison with the population size. In practice, finite
population correction is ignored if sampling fraction f is less
than 5 %
18. Factor affecting the standard error of
sample mean
The factors are:
ο§ Sample size βnβ
The standard error of sample mean decreases with increased sample size. Large the
sample size, lesser is fluctuation between sample means
ο§ Population standard deviation βΟβ
If population data is highly variable, then the standard error of sample mean is large.
So, we can control standard error of mean by taking large sample but we have no direct
control over it because of the natural variability of population observations or data.
19. Example:Theno.ofdaysabsentperyearinthepopulationofsixhealthpost
employeesofacertainvdcare8,3,1,11,4and7
1. Consider all possible samples of size two i.e. n = 2 which can be drawn with simple
random sampling with replacement and without replacement.
2. Draw histogram of distribution of population values and sampling distribution of means.
Comment on the functional form of sampling distribution of means.
3. Find the mean of the population and means of these samples and verify that population
mean is equals to mean of the sample means.
4. Find population standard deviation and verify following results.
(a)
for sampling without replacement f
(b)
for sampling with replacement.
S.E. (π) =
π
π
π β π
π β 1
S.πΈ. (π) =
π
π
27. Toshow
Population Mean
π =
π+π+π+ππ+π+π
π
= 5.6667
Sampling without replacement
πΈ(π) = ππππ ππ π‘βπ sππππ mππππ (X) ππ ΞΌπ =
5.5 + 4.5+. . . +5.5
15
= 5.6667
Sampling with replacement
πΈ(π) = ππππ ππ π‘βπ π πππππ πππππ (X) or ΞΌπ =
8.0 + 5.5+. . . +7
36
= 5.6667
Hence it verified that mean of the sample means is equal to population mean for
both sampling schemes. Hence the expected value of the sample mean is equal to
Population mean
( )
E X ο½ ο
30. To show forSRSWR
Hence it is verified.
2 2 2
. . . . . ( ) tan
(8.0 5.6667) (5.5 5.6667) ... (7.0 5.6667)
36
2.3688
L H S S E X S dard Deviation of X
ο½ ο½
ο ο« ο ο« ο« ο
ο½
ο½
. .
3.35
2.3688
2
R H S
n
ο³
ο½
ο½ ο½
. .( )
S E X
n
ο³
ο½