Sampling distributions

Sampling Distributions
Nguyen Thi Ngoc Mai
May 23, 2017

Statistical inference: Making
guesses about the population from a sample
Truth (not observable)
N
x
N
i
i
2
12
)( 




N
x
N
i

 1

Population parameters
1
)( 2
12




n
Xx
s
n
n
i
i
n
x
X
n
i

 1
Sample statistics
Sample (Observation)
2

Statistics vs. Parameters
› Sample statistic – any summary measure
calculated from data; e.g., could be a mean, a
difference in means or proportions, an odds ratio,
or a correlation coefficient
› Population parameter – the true value/true effect
in the entire population of interest
3

Examples of Sample Statistics:
› Mean
› Rate
› Risk
› Difference in means
› Relative risk (odds ratio/ risk ratio…)
› Correlation coefficient
› Regression coefficient
…
4

› A single number calculated from our sample data
› How can a single number (e.g., a mean ) have a
distribution?
– Answer: It’s a theoretical concept!
Statistics follow distribution!
– Sampling distribution
5

Sampling distribution of the
Mean
6

› The sampling distributions are defined by:
• Shape (e.g., normal distribution, T-distribution)
• Mean
• Standard error
7

The Central Limit Theorem
If all possible random samples, each of size n, are
taken from any population with a mean  and a
standard deviation , the sampling distribution of
the Means will:
1. Have mean:
2. Have standard deviation (standard error):
3. Be approximately normally distributed
regardless of the shape of the parent
population (normality improve with larger n)
 x
n
x

 
x
The mean of the sample meansx
The standard deviation of the sample means. Also called
“the standard error.” - 𝜎 𝑥 𝑆𝐷 𝑥 𝑆𝐸 𝑥 𝑆𝐸𝑀 𝑆𝐸 8

n
x

  n  1  SEM is always smaller than
SD of the population
n increase  variation decreases
Finally, if n is large enough, the sampling
distribution of the mean is approximately
normal!
9

Applications Using the Sampling
distribution of the Mean
! Apply tables of standard
normal distribution
Serum cholesterol levels for all 20 – 74-year-old males
in US have:
 = 211 mg/dL
 = 46 mg/dL
If we select repeated samples of size 25, what
proportion of the samples of size 25 will have mean
value of 230 mg/dL or above?
𝜇 𝑥 = 𝜇 = 211
𝜎 𝑥 =
46
25
= 9.2
𝑧 =
230 − 211
9.2
= 2.07
11
𝑧 =
𝑋 − 𝜇
𝜎
𝒛 =
𝑿 − 𝝁
𝝈/ 𝒏

𝑃 𝑍 < 2.07 = 0.9808
 𝑃 𝑍 ≥ 2.07 = 1 − 0.9808
= 0.192
About 1.9% of sample will
have a mean ≥ 230 mg/dL
12

Upper and lower limits that enclose 95% of the means
of sample size 25 draw from the population?
𝑃 −1.96 ≤ 𝑍 ≤ 1.96 = 0.95
−1.96 ≤ 𝑍 ≤ 1.96
−1.96 ≤
𝑋 − 211
9.2
≤ 1.96
193.0 ≤ 𝑋 ≤ 229
About 95% of the means of
samples size of 25 lie between
193.0 mg/dL and 229 mg/dL
13

How large would the samples need to be for 95% of
their means to lie within  5 mg/dL of the population
mean ?
 = 211 mg/dL
 = 46 mg/dL
𝑃 𝜇 − 5 ≤ 𝑋 ≤ 𝜇 + 5 = 0.95
1.96SE = 5
1.96
𝜎
𝑛
= 5
𝑛 =
1.96 × 46
5
2
= 326
Samples size of 326 would be
required for 95% of the sample
means to lie within 5 mg/dL of
the population mean
14

Practice
Q1. A laboratory value with a mean of 18 g/dL and a standard
deviation of 1.5 implies
a. The true value is between 16.5 and 19.5 g/dL
b. The true value is between 15.0 and 21.0 g/dL
c. The error is too large for the determination to have any value
d. In repeated determination on the same samples, 95% could be
expected to fall between 15.0 and 21.0 g/dL
e. The true value has a 5% chance of being less than 16.5 or more
than 19.5 g/dL
15

Q2. Data for patients at a certain hospital show the mean length of
stay is 10 days and the median is 8 days. The most frequent length of
of stay is 6 days. From these facts, we conclude
a. Approximately 50% of the patients stay less than 6 days
b. The distribution of length of stay follows the normal curve
c. The standard deviation is 2 days
d. The mean length of stay is shifted away from the center of the
distribution by stays of very long duration
e. The mean length of stay is shifted away from the center of the
distribution by stays of very short duration
17

Q3. A random sample of teenage prenatal patients seen at University
Hospital during 1973 had a mean hematocrit of 29 with a standard
error of 1.5. From this information, we may conclude
a. The hematocrit for any teenage prenatal patient in the sample
will not deviate from the mean by any more than 50%
b. The normal range for teenage prenatal patients seen at
University Hospital is 26 to 32
c. The range of 26 to 32 will include the mean of all teenage
prenatal patients seen at University Hospital in 1973 with 95%
probability
d. It is to be expected that 95% of all teenage prenatal patients seen
seen at University Hospital in 1973 will have hematocrits in the
the range of 26 to 32
18

Q4. The IQs of a class of students attending a university are
distributed according to the normal curve, with a mean of 115
115 and a standard deviation of 10. Therefore
a. 50% have IQs between 105 and 115
b. 95% have IQs between 105 than 115
c. 2.5% have IQs above 135
d. 5% have IQs above 135
e. 5% have IQs below 95
19

Q5. The primary use of the standard error of the mean is in
calculating the:
a. Confidence interval
b. Error rate
c. Standard deviation
d. Variance
20

References
1. Dawson, B., & Trapp, G. R. (2004). Basic & Clinical
Biostatistics (4th edition ed.): Lange Medical Books /
McGraw-Hill.
2. Fisher, L. D., & van Belle, G. (1993). Biostatistics: A
Methodology for the Health Sciences (1st edition ed.):
Wiley.
3. Pagano, M., & Gauvreau, K. (2000). Principles of
Biostatistics (2nd edition ed.): Duxbury Press.
4. Sainani, K. (2014). Statistics in Medicine. Retrieved
May, 2017, from
https://lagunita.stanford.edu/courses/Medicine/MedSt
ats/Summer2014/courseware/8016c68f703d4b888e44
4e97481b6830/71fad5f25fc64e6383bb9cc6be846a2b/
21

Sampling distributions

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Sampling distributions

Similar to Sampling distributions (20)

Recently uploaded

Recently uploaded (20)

Sampling distributions

Editor's Notes