Elementary statistical inference1

Elementary Statistical Inference 1
석 진 환
GROOT SEMINAR

Reference
• Introduction to Mathematical Statistics, Hogg, 7e

Which is better: machine learning or statistics?
• Prediction vs. Explanation
• Machine learning models are designed to make the most accurate
predictions possible.
• Statistical models are designed for inference about the relationships
between variables.
• Many statistical models can make predictions, but predictive accuracy is
not their strength. Likewise, machine learning models provide various
degrees of interpretability

목차
1. Estimation
1) Point estimation
2) Interval estimation
3) Order Statistics
2. Hypothesis Testing
3. The Method of Monte Carlo & Bootstrap Procedure

Sampling and Statistics
• We have a random variable 𝑋 of interest, but its pdf 𝑓 𝑥 or pmf 𝑝(𝑥) is
not known
1. 𝑓 𝑥 or 𝑝(𝑥) is completely unknown
2. The form of 𝑓 𝑥 or 𝑝(𝑥) is know, down to a parameter 𝜃
Our information about the unknown distribution or the unknown
parameters of the distribution of 𝑋 comes from a sample on 𝑋.
1. Sampling and Statistics

• We consider the second classification(parametric inference)
• 𝑋 has an exponential distribution, 𝐸𝑥𝑝 𝜃 , 𝜃 is unknown.
• 𝑋 has a binomial distribution, b 𝑛, 𝑝 where 𝑛 is known, but 𝑝 is
unknown.
• 𝑋 has a gamma distribution, Γ 𝛼, 𝛽 , 𝛼 and 𝛽 are unknown.
• 𝑓 𝑥 ; 𝜃 , 𝑝 𝑥 ; 𝜃 𝜃 ∈ Ω

• The sample observations have the same distribution as 𝑋,
and we denote them as the random variables 𝑋1, 𝑋2, … , 𝑋 𝑛,
where 𝑛 denotes the sample size.
• When the sample is actually drawn, we use lower case letters
𝑥1, 𝑥2, … , 𝑥 𝑛 as realizations of the sample.
• Assume that 𝑋1, 𝑋2, … , 𝑋 𝑛 are mutually independent.
• Definition 1
• If 𝑋1, 𝑋2, … , 𝑋 𝑛 iid, then these random variables constitute a random
sample of size n from the common distribution.
• vs Stochastic Process ??

• Definition 2 (Statistic)
• 𝑋1, 𝑋2, … , 𝑋 𝑛 sample on a random variable 𝑋. Let 𝑇 = 𝑇(𝑋1, 𝑋2, … , 𝑋 𝑛) be a
function of the sample. Then 𝑇 is called a statistic(통계량)
• Once the sample is actually drawn, then 𝑡 is called the realization of 𝑇,
where 𝑡 = 𝑇(𝑥1, 𝑥2, … , 𝑥 𝑛)
• Example : 𝑋 =
𝑋1+𝑋2+⋯+𝑋 𝑛
𝑛
, 𝑆2 =
1
𝑛−1
𝑋𝑖 − 𝑋 2 are statistics
• It makes sense to consider a statistic 𝑇, which is an estimator of 𝜃.
• While we call 𝑇 an estimator of 𝜃, we call its realization 𝑡 an estimate of 𝜃.

Estimation
• Point estimation
• 표본평균이 15니까, 모평균도 15일것이다!
• MLE
• Interval estimation
• 표본평균이 15니까, 모평균은 (12.3, 14.7) 구간에 있을 것이다!
• Confidence Interval

Several Properties of point estimators
• Unbiasedness(비편향성), Consistency(일치성), efficiency(효율성)
• Definition 3 (Unbiasedness)
• 𝑋1, 𝑋2, … , 𝑋 𝑛 : sample on a random variable 𝑋 with pdf 𝑓 𝑥 ; 𝜃 , 𝜃 ∈ Ω.
Let 𝑇 = 𝑇(𝑋1, 𝑋2, … , 𝑋 𝑛) be a statistic. We say that 𝑇 is unbiased
estimator of 𝜃 if 𝐸 𝑇 = 𝜃.

Example : Maximum Likelihood Estimator(MLE)
• 𝐿 𝜃 = 𝑖=1
𝑛
𝑓(𝑥𝑖, 𝜃), likelihood function of the random
sample.
• Often-used estimate is that value of 𝜃 which provides a
maximum of 𝐿(𝜃). If it is unique, this is called the maximum
likelihood estimator, and denote it as 𝜃,
i.e., 𝜃 = 𝐴𝑟𝑔𝑚𝑎𝑥 𝜃 𝐿(𝜃).

MLE with EXPONENTIAL DISTRIBUTION
• Random sample 𝑋1, 𝑋2, … , 𝑋 𝑛 is the Γ(1, 𝜃) density. The log of
likelihood function is
𝑙 𝜃 = 𝑙𝑜𝑔
𝑖=1
𝑛
1
𝜃
𝑒
−
𝑥 𝑖
𝜃 = −𝑛log𝜃 − 𝜃−1
𝑖=1
𝑛
𝑥𝑖
•
𝜕𝑙 𝜃
𝜕𝜃
= 0,
𝜕2 𝑙 𝜃
𝜕𝜃2 < 0 yields that 𝜃 = 𝑋 is the mle of 𝜃.
• Because 𝐸 𝑋 = 𝜃, we have that 𝐸 𝑋 = 𝜃 and, hence, 𝜃 is an
unbiased estimator of 𝜃.

MLE with NORMAL DISTRIBUTION
• 𝑋 ~ 𝑁 𝜇, 𝜎2
∗ 𝜽 = 𝜇, 𝜎
• 𝑓 𝑥; 𝜽 =
1
2𝜋𝜎
𝑒
−
𝑥−𝜇 2
2𝜎2
• If 𝑋1, 𝑋2, … , 𝑋 𝑛 is a random variable on 𝑋, then
𝐿 𝜽 =
𝑖=1
𝑛
𝑓 𝑥𝑖; 𝜽
𝑙 𝜽 = log 𝐿 𝜽 =
𝑖=1
𝑛
log(𝑓(𝑥𝑖; 𝜃)) = −
𝑛
2
log 2𝜋 − 𝑛 log 𝜎 −
1
2
𝑖=1
𝑛
𝑥𝑖 − 𝜇
𝜎
2
𝜕𝑙
𝜕𝜇
=
𝜕𝑙
𝜕𝜎
= 0 yields 𝜇 = 𝑋 𝑎𝑛𝑑 𝜎2 = 𝑛−1
𝑖=1
𝑛
𝑋𝑖 − 𝑋 2 .
• 𝜇 is an unbiased estimator of 𝜇
• 𝜎2 is a biased estimator of 𝜎2 , though converges to 𝜎2 as → ∞.

예제1(점추정)
베스킨라빈스 파인트의 무게는 정규분포를 따른다고 가정한다고 알고
있다. 베스킨라빈스에 10번을 가서, 무게를 재봤더니 다음과 같았다.
MLE에 의한 파인트 무게의 모평균의 추정값은??
표본 평균 : 321g -> 모평균의 추정값이 된다!
만약 회사 홈페이지에 310g이라고 써져있다면?
가설검정
1 2 3 4 5 6 7 8 9 10
310 315 330 320 325 325 310 315 320 340
2. Confidence Intervals

Confidence Intervals
• We discussed estimating 𝜃 by a statistic 𝜃 = 𝜃 𝑋1, 𝑋2, … , 𝑋 𝑛 ,
where 𝑋1, 𝑋2, … , 𝑋 𝑛 is a sample from the distribution of 𝑋.
• When the sample is drawn, it is unlikely that the value of 𝜃 is
the true value of the parameter.
• Error must exist.
• Embody estimate of error in terms of a confidence interval

Confidence Intervals
• Definition 4 (Confidence Interval)
• Let 𝑋1, 𝑋2, … , 𝑋 𝑛 be a sample on a random variable 𝑋, where 𝑋 has pdf
𝑓 𝑥: 𝜃 , 𝜃 ∈ Ω. Let 𝛼 be specified. Let 𝐿 = 𝐿 𝑋1, 𝑋2, … , 𝑋 𝑛 and 𝑈 =
𝑈 𝑋1, 𝑋2, … , 𝑋 𝑛 be two statistics.
• We say that the interval (𝐿, 𝑈) is a 1 − 𝛼 100% confidence interval(신뢰
구간) for 𝜃 if
1 − 𝛼 = 𝑃 𝜃[𝜃 ∈ 𝐿, 𝑈 ]
1 − 𝛼 is called the confidence coefficient(신뢰도) of the interval
Once the sample is drawn, the realized value of the confidence interval is
𝑙, 𝑢 = (𝐿 𝑥1, 𝑥2, … , 𝑥 𝑛 , 𝑈(𝑥1, 𝑥2, … , 𝑥 𝑛))

예제2(구간추정)
베스킨라빈스 파인트의 무게는 정규분포를 따른다고 알고있다.
베스킨라빈스에 10번을 가서, 무게를 재봤더니 다음과 같았다.
베스킨라빈스 파인트 무게의 평균에 대한 95% 신뢰구간은?
베스킨라빈스 파인트 무게의 모평균이 얼마인지는 모르겠지만,
표본을 보니, 95% 확률로 00~00에 포함되어 있다.
1 2 3 4 5 6 7 8 9 10
310 315 330 320 325 325 310 315 320 340

Confidence Interval for 𝜇 Under Normality
• Suppose 𝑋1, 𝑋2, … , 𝑋 𝑛 are a random sample from a 𝑁 𝜇, 𝜎2 . Let 𝑋 and 𝑆2 denote the sample
mean and sample variance, respectively.
𝑋 is mle of 𝜇 &
𝑛−1
𝑛
𝑆2
is mle of 𝜎2
𝑇 =
𝑋−𝜇
𝑆
𝑛
has a 𝑡-distribution with 𝑛 − 1 degrees of freedom.
1 − 𝛼 = 𝑃(−𝑡 𝛼
2,𝑛−1 < 𝑇 < 𝑡 𝛼
2,𝑛−1)
= 𝑃(−𝑡 𝛼
2,𝑛−1 <
𝑋−𝜇
𝑆
𝑛
< 𝑡 𝛼
2,𝑛−1)
= 𝑃( 𝑋 − 𝑡 𝛼
2,𝑛−1
𝑆
𝑛
< 𝜇 < 𝑋 + 𝑡 𝛼
2,𝑛−1
𝑆
𝑛
)
∴ 𝐿 𝑋1, 𝑋2 , … , 𝑋 𝑛 = 𝑋 − 𝑡 𝛼
2,𝑛−1
𝑆
𝑛
and U 𝑋1, 𝑋2 , … , 𝑋 𝑛 = 𝑋 + 𝑡 𝛼
2,𝑛−1
𝑆
𝑛

𝑡-interval and standard error
• Once the sample is drawn, let 𝑥 and 𝑠 denote the realized
values of the statistics 𝑋 and 𝑆. Then 1 − 𝛼 100% conficence
interval for 𝜇 is given by
( 𝑥 − 𝑡 𝛼
2,𝑛−1
𝑠
𝑛
, 𝑥 + 𝑡 𝛼
2,𝑛−1
𝑠
𝑛
)
t-interval
Standard error

예제2 풀이
1 2 3 4 5 6 7 8 9 10
310 315 330 320 325 325 310 315 320 340
𝑥 = 321, 𝑠 = 11.79 𝑛 = 10, 𝑡 𝛼
2,𝑛−1 = 2.262
95% 신뢰구간 = (312.5665, 329.4335)

Central Limit Theorem(CLT)
• The last example depends on the normality of the sampled items
• However, this is approximately true even if the sampled items are
not drawn from a normal distribution.
• Theorem 1 (Central Limit Theorem)
• Let 𝑋1, 𝑋2, … , 𝑋 𝑛 is a random sample with mean 𝜇 and finite variance 𝜎2
.
Define a random variable 𝑍 𝑛 =
𝑋−𝜇
𝜎 𝑛
. Then 𝑍 𝑛 →
𝐷
𝑍, where 𝑍~𝑁(0,1).
• 𝑋를 사용하는 경우
• 𝑛 이 충분히 큰 경우, 𝑡-분포를 사용할까? 정규분포를 사용할까?
-> 𝑡-분포 : more conservative

Confidence Intervals for Difference in Means
• We compare the means of 𝑋 and 𝑌, denoted by 𝜇1 and 𝜇2,
respectively. Assume that the variances of 𝑋 and Y are finite
and denote them as 𝜎1
2
and 𝜎2
2
, respectively.
• Goal : Estimate ∆= 𝜇1 − 𝜇2.
• What is confidence intervals of ∆= 𝜇1 − 𝜇2?
• Sampling : 𝑋1, 𝑋2, … , 𝑋 𝑛1
& 𝑌1, 𝑌2, … , 𝑌𝑛2
• Let ∆ = 𝑋 − 𝑌. The statistic ∆ is unbiased estimator of ∆.
• 𝑉𝑎𝑟 ∆ =
𝜎1
2
𝑛1
+
𝜎2
2
𝑛2
. By CLT, 𝑍 =
∆−∆
𝑆1
2
𝑛1
+
𝑆2
2
𝑛2
~𝑁(0,12)
( 𝑥 − 𝑦 − 𝑧 𝛼 2
𝑠1
2
𝑛1
+
𝑠2
2
𝑛2
, 𝑥 − 𝑦 + 𝑧 𝛼 2
𝑠1
2
𝑛1
+
𝑠2
2
𝑛2
)

Confidence Intervals for Difference in Means
• 𝑋 and 𝑌 are normal with the same variance; i.e., 𝜎2
= 𝜎1
2
= 𝜎2
2
.
• Estimator 𝑆 𝑝
2 of 𝜎2 is weighted average of 𝑆1
2
and 𝑆2
2
.
→ pooled estimator
→ 𝑆 𝑝
2 =
𝑛1−1 𝑆1
2+ 𝑛2−1 𝑆2
2
𝑛1+𝑛2−2
• 𝑇 =
𝑋− 𝑌 − 𝜇1−𝜇2 𝜎 𝑛1
−1+𝑛2
−1
𝑛−2 𝑆 𝑝
2 𝑛−2 𝜎2
has a 𝑡-distribution with 𝑛 − 2 degrees of freedom.
• 1 − 𝛼 100% confidence interval for ∆= 𝜇1 − 𝜇2:
• (( 𝑥- 𝑦) − 𝑡 𝛼 2,𝑛−2 𝑠 𝑝
1
𝑛1
+
1
𝑛2
, 𝑥− 𝑦 + 𝑡 𝛼 2,𝑛−2 𝑠 𝑝
1
𝑛1
+
1
𝑛2
))

Order Statistics(순서통계량)
• 전구가 100개 있고, 𝑖 번째 전구의 수명을 확률변수 𝑋𝑖라고 하자.
첫 번째 꺼진 전구의 수명을 𝑌1, 두 번째 꺼진 전구의 수명을 𝑌2,
… , 100번째 꺼진 전구의 수명을 𝑌100 이라고 하자.
• 이렇게 순서대로 얻어진 확률변수를 순서통계량이라고 한다.
• 𝑋1, 𝑋2, … , 𝑋 𝑛이 독립이어도, 𝑌1, 𝑌2, … , 𝑌𝑛은 독립이 아닐 수 있다.
3. Order Statistics

Order Statistics(순서통계량)
• Theorem 2
• 𝑋 : continuous random variable whose pdf 𝑓(𝑥) has support 𝒮 =
𝑎, 𝑏 , where −∞ ≤ 𝑎 < 𝑏 ≤ ∞. Let 𝑌1 < 𝑌2 < ⋯ < 𝑌𝑛 denote the 𝑛
order statistics 𝑋1, 𝑋2, … , 𝑋 𝑛 from 𝑋. Then joint pdf of 𝑌1, 𝑌2, … , 𝑌𝑛 is
given by
𝑔 𝑦1, 𝑦2, … , 𝑦𝑛 =
𝑛! 𝑓 𝑦1 𝑓 𝑦2 ⋯ 𝑓 𝑦𝑛 𝑎 < 𝑦1 < 𝑦2 < ⋯ < 𝑦𝑛 < 𝑏
0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
3. Order Statistics

• Proof
• 𝑔 𝑦1, 𝑦2, … , 𝑦𝑛 = 𝑖=1
𝑛!
𝒥𝑖 𝑓 𝑦1 𝑓 𝑦2 ⋯ 𝑓(𝑦𝑛)
=
𝑛! 𝑓 𝑦1 𝑓 𝑦2 ⋯ 𝑓 𝑦𝑛 𝑎 < 𝑦1 < 𝑦2 < ⋯ < 𝑦𝑛 < 𝑏
0 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
∵ 𝒥𝑖 = 1
• Important statistics using order statistics
• Range = 𝑌𝑛 − 𝑌1
• Midrange =
Y1+𝑌𝑛
2
• Median = 𝑌 𝑛+1
2
, 𝑛 𝑖𝑠 𝑜𝑑𝑑.
3. Order Statistics

Quantiles
• Let 𝑋 be a random variable with a continuous cdf 𝐹(𝑥). For
0 < 𝑝 < 1, define the 𝒑th quantile of 𝑋 to be 𝜉 𝑝 = 𝐹−1
(𝑝).
• e.g) 𝜉0.5 is median of 𝑋.
• What is estimator of 𝜉 𝑝?
• Confidence Interval for 𝜉 𝑝?
3. Order Statistics

• Let 𝑋1, 𝑋2, … , 𝑋 𝑛 be a random sample from the distribution of 𝑋
and let 𝑌1 < 𝑌2 < ⋯ < 𝑌𝑛 be the corresponding order statistics. Let
𝑘 be the greatest integer less than or equal to 𝑝(𝑛 + 1).
• 𝐸 𝐹 𝑌𝑘 = 𝑎
𝑏
𝐹 𝑦 𝑘 𝑔 𝑘 𝑦 𝑘 𝑑𝑦 𝑘
= 0
1 𝑛!
𝑘−1 ! 𝑛−𝑘 !
𝑧 𝑘
1 − 𝑧 𝑛−𝑘
𝑑𝑧
=
𝑘
𝑛+1
≈ 𝑝
• We take 𝑌𝑘 as an estimator of the quantile 𝜉 𝑝.
• 𝑌𝑘 is called the 𝒑th sample quantile(or, 100𝒑th percentile of the sample)
• Sample quantiles are useful descriptive statistics.
3. Order Statistics

Application of Quantile : Five-number summary
• A five-number summary of the data consist of the following
sample quantiles
• The minimum(𝑌1), the first quartile(𝑌.25(𝑛+1) = 𝜉.25), the
median(𝑌.50(𝑛+1) = 𝜉.50), the third quartile(𝑌.75(𝑛+1) = 𝜉.75), and the
maximum(𝑌𝑛)
• Use the notation 𝑄1, 𝑄2 , and 𝑄3 to denote, respectively, the first
quartile, median, and third quartile of the sample.
3. Order Statistics

예제2
• 확률변수 𝑋의 샘플 15개 데이터가 다음과 같다.
• The minimum : 𝑦1 = 56
• 𝑄1 = 𝑦4 = 94
• 𝑄2 = 𝑦8 = 102
• 𝑄3 = 𝑦12 = 108
• The maximum : 𝑦15 = 116
56 70 89 94 96 101 102 102
102 105 106 108 110 113 116
3. Order Statistics

Application of Quantile : Box-Whisker plot
• The five-number summary is the basis for a useful and quick plot of the
data.
• ℎ = 1.5(𝑄3 − 𝑄1)
• 𝐥𝐨𝐰𝐞𝐫 𝐟𝐞𝐧𝐜𝐞 𝐿𝐹 = 𝑄1 − ℎ, 𝐮𝐩𝐩𝐞𝐫 𝐟𝐞𝐧𝐜𝐞(𝑈𝐹) = 𝑄3 + ℎ
• Points that lie outside the fences are called potential outliers and they
are denoted by ‘o’ on the boxplot.
• The whiskers protrude from the sides of the box to what are called
adjacent points, which are the points within the fences but closest to
the fences.
3. Order Statistics

56 70 89 94 96 101 102 102
102 105 106 108 110 113 116
The minimum : 𝑦1 = 56
𝑄1 = 𝑦4 = 94
𝑄2 = 𝑦8 = 102
𝑄3 = 𝑦12 = 108
The maximum : 𝑦15 = 116
3. Order Statistics

Application of Quantile : Diagnostic plot
• In practice, we often assume that the data follow a certain
distribution. Such an assumption needs to be checked and
there are many statistical tests which do so
Goodness of fit(위키피디아)
• We discuss one such diagnostic plot in this regard.
• q-q plot
3. Order Statistics

q-q plot
• Suppose 𝑋 is a random variable with cdf 𝐹 𝑥 − 𝑎 𝑏 , where 𝐹(𝑥)
is known but 𝑎 and 𝑏 > 0 may not be.
• Let 𝑍 = (𝑋 − 𝑎) 𝑏. Then, 𝑍 has cdf 𝐹(𝑧). Let 0 < 𝑝 < 1 and let 𝜉 𝑋,𝑝
be the 𝑝th quantile of 𝑋 and 𝜉 𝑍,𝑝 be the 𝑝th quantile of 𝑍.
• 𝑝 = 𝑃 𝑋 ≤ 𝜉 𝑋,𝑝 = 𝑃[𝑍 ≤
𝜉 𝑋,𝑝−𝑎
𝑏
]
⇒ 𝜉 𝑋,𝑝 = 𝑏𝜉 𝑍,𝑝 + 𝑎
• Thus, the quantiles of 𝑋 are linearly related to the quantiles of 𝑍.
3. Order Statistics

q-q plot
• 𝑋1, 𝑋2, … , 𝑋 𝑛 : Random sample from 𝑋 ⇒ 𝑌1 < ⋯ < 𝑌𝑛 the order statistics.
• For 𝑘 = 1, … , 𝑛, let 𝑝 𝑘 =
𝑘
𝑛+1
. ⇒ 𝑌𝑘 is an estimator of 𝜉 𝑋,𝑝.
• Denote the corresponding quantiles of the cdf 𝐹 𝑧 by 𝜉 𝑍,𝑝 = 𝐹−1(𝑝 𝑘)
• The plot of 𝑌𝑘 versus 𝜉 𝑍,𝑝 is called a q-q plot.
3. Order Statistics

• Figure 4.1 contain q-q plots of three different
Distributions
• Panel C appears to be most linear. We may assume that the data were generated from a
Laplace distribution
3. Order Statistics

Confidence Intervals for Quantiles
• For a sample size 𝑛 on 𝑋, let 𝑌1 < 𝑌2 < ⋯ < 𝑌𝑛 be the order statistics. Let 𝑘 =
[ 𝑛 + 1 𝑝]. Then the 100𝑝th sample percentile 𝑌𝑘 is a point estimator of 𝜉 𝑝.
• We now derive distribution free confidence interval for 𝜉 𝑝.
• Let 𝑖 < 𝑛 + 1 𝑝 < 𝑗, and consider the order statistics 𝑌𝑖 < 𝑌𝑗 and the event
𝑌𝑖 < 𝜉 𝑝 ∩ 𝑌𝑗 > 𝜉 𝑝 = [적어도 i개는 𝜉 𝑝보다 작고 𝑗번째는 𝜉 𝑝보다 큰 경우]
• The probability of success is 𝑃 𝑋 < 𝜉 𝑝 = 𝐹 𝜉 𝑝 = 𝑝.
𝑃 𝑌𝑖 < 𝜉 𝑝 < 𝑌𝑗 =
𝑤=𝑖
𝑗−1
𝑛
𝑤
𝑝 𝑤
1 − 𝑝 𝑛−𝑤
(= 0.95 → 𝑌𝑖, 𝑌𝑗 )
3. Order Statistics

Example:
Confidence Interval for the Median.
• 𝜉.50 ∶ median of 𝐹(𝑥), 𝑄2 ∶ sample median − point estimator 𝑜𝑓𝜉.50
• Select 0 < 𝛼 < 1 and Take 𝑐 𝛼 2 to be the
𝛼
2
𝑡ℎ quantile of a binomial S ~ 𝑏(𝑛,
1
2
).
• 𝑃 𝑆 ≤ 𝑐 𝛼 2 = 𝑃 𝑆 ≥ 𝑛 − 𝑐 𝛼 2 = 𝛼 2.
• Thus it follows that
𝑃 𝑌𝑐 𝛼 2+1 < 𝜉.50 < 𝑌𝑛−𝑐 𝛼 2
= 1 − 𝛼
• Hence, when the sample is drawn,
(𝑦𝑐 𝛼 2+1, 𝑦𝑛−𝑐 𝛼 2
) is a (1 − 𝛼)100% confidence interval for 𝜉.50.
3. Order Statistics

예제3
• 모기 4마리를 채집해서 모기 4마리의 수명을 관찰했다. 모기의
수명이 평균이 1인 지수분포를 따를 때, 제일 오래 산 모기의 수
명이 3 이상일 확률은?
• Let 𝑌1 < 𝑌2 < 𝑌3 < 𝑌4 be the order statistics of a random
sample of size 4 from the distribution having pdf 𝑓 𝑥 = 𝑒−𝑥,
0 < 𝑥 < ∞. Find 𝑃 3 ≤ 𝑌4 .
• 𝑃 𝑌4 ≤ 𝑡 = 1 − 𝑒−𝑡 4
∴ 𝑃 𝑌4 ≥ 3 = 1 − 1 − 𝑒−3 4 = 0.1848
3. Order Statistics

예제4
• 모기 4마리를 채집해서 모기 4마리의 수명을 관찰했다. 모기의
수명이 0에서 1의 값을 가지는 균등분포를 따를 때, 제일 오래
산 모기와 제일 빨리 죽은 모기의 수명의 차이가 0.5 미만일 확
률은?
• Let 𝑌1 < 𝑌2 < 𝑌3 < 𝑌4 be the order statistics of a random
sample of size 4 from the distribution having pdf 𝑓 𝑥 = 1,
0 < 𝑥 < 1. Find the probability that the range of the random
sample is less then
1
2
.
3. Order Statistics

• 𝑠𝑜𝑙) 𝑃 𝑌4 − 𝑌1 >
1
2
= 0
1
2
𝑦1+
1
2
1
𝑔1,4(𝑦1, 𝑦4)𝑑𝑦4 𝑑𝑦1
• 𝑔1,4 𝑦1, 𝑦4 = 𝑦1
𝑦4
𝑦2
𝑦4
𝑔 𝑦1, 𝑦2, 𝑦3, 𝑦4 𝑑𝑦3 𝑑𝑦2 = 12 𝑦4 − 𝑦1
2
• 𝑃 𝑌4 − 𝑌1 ≤
1
2
= 1 − 0
1
2
𝑦1+
1
2
1
12 𝑦4 − 𝑦1
2 𝑑𝑦4 𝑑𝑦1 =
11
16
3. Order Statistics

Introduction to Hypothesis Testing
4 Introduction to Hypothesis Testing
Statistical Inference
Estimation
Hypothesis Testing
Point estimation
Interval estimation

예제5(점추정/가설검정)
베스킨라빈스 파인트의 무게는 정규분포를 따른다고 가정한다. 베스킨라빈스에 10
번을 가서, 무게를 재봤더니 다음과 같았다.
MLE에 의한 파인트 무게의 모평균의 추정값은??
표본 평균 : 321g -> 모평균의 추정값이 된다!
만약 회사 홈페이지에 310g이라고 써져있다면?
가설검정
1 2 3 4 5 6 7 8 9 10
310 315 330 320 325 325 310 315 320 340

• 귀무가설 : 파인트 무게는 평균 310g이다.
• 연구가설 : 파인트 무게는 평균 310g이 아니다.
• 모평균의 추정값 : 321g -> 귀무가설 기각 (X)
• 기각역 설정 -> 추정값이 기각역에 포함??(O)
1 2 3 4 5 6 7 8 9 10 평균
310 315 330 320 325 325 310 315 320 340 321

모분포 : 평균 100, 분산 30
Sampling
표본 평균 : 90
50개
표본 평균 : 80
표본 평균 : 40
표본 평균 : 10
표본 평균 : 1
표본 평균 : 150
귀무가설
귀무가설의 모평균에서 멀어질수록
귀무가설이 틀렸다는 생각이 커진다.
WHY?
50개 뽑았으면 우연이 아니라 필연!

Hypothesis
• 𝐻0 : Null hypothesis(귀무가설)
• Represents no change or no difference from the past
• 𝐻1 : Alternative hypothesis(대립가설/연구가설)
• Represents change or difference
• Research worker’s hypothesis
𝐻0 ∶ 𝜃 ∈ 𝜔0 versus 𝐻1 ∶ 𝜃 ∈ 𝜔1
(where 𝜔0 ∪ 𝜔1 = Ω , and 𝜔0 ∩ 𝜔1 = ∅)
• The decision rule to take 𝐻0 or 𝐻0 is based on a sample 𝑋1, 𝑋2, … , 𝑋 𝑛
from 𝑋
• Hence, the decision could wrong.

Decision Rule
• 𝑋1, … , 𝑋 𝑛 : random sample from the distribution of 𝑋.
• Consider testing the hypotheses :
• 𝐻0 ∶ 𝜃 ∈ 𝜔0 versus 𝐻1 ∶ 𝜃 ∈ 𝜔1
• Denote the space of the sample by 𝒟 = space{(𝑋1, … , 𝑋 𝑛)}.
• Definition 5 (Statistical Test, Critical Region)
• A test of 𝐻0 versus 𝐻1 is based on a subset 𝐶 of 𝒟.
• This set 𝐶 is called the critical region(기각역) and its corresponding
decision rule is:
Reject 𝐻0 if 𝑋1, … , 𝑋 𝑛 ∈ 𝐶
Retain 𝐻0 if 𝑋1, … , 𝑋 𝑛 ∈ 𝐶 𝑐

• the decision could wrong.
• Type 1, Type 2 error
• Goal : Select a critical region from all possible critical regions which minimizes
the probabilities of these two errors.
• in general, not possible (trade-off)
• Procedure
• Select critical regions which bound the probability of Type 1 error
• Among these critical regions we try to select one which minimizes Type 2 error.

• Definition 5 (Critical region of size 𝛼)
• We say a critical region 𝐶 is of size 𝛼 if
𝛼 = max
𝜃∈𝜔0
𝑃 𝜃[ 𝑋1, … 𝑋 𝑛 ∈ 𝐶]
• Over all critical region of size 𝛼, we want to consider critical
regions which have lower probabilities of Type 2 error.
• 𝑃 𝜃 Type 2 Error = 𝑃 𝜃 𝑋1, … , 𝑋 𝑛 ∈ 𝐶 𝑐
, 𝜃 ∈ 𝜔1
• we desire to maximize 𝑃 𝜃[ 𝑋1, … , 𝑋 𝑛 ∈ 𝐶]. (power of the test at 𝜃)
• Definition 6 (Power function)
• Power function of a critical region to be
𝛾 𝐶 𝜃 = 𝑃 𝜃 𝑋1, … , 𝑋 𝑛 ∈ 𝐶 ; 𝜃 ∈ 𝜔1

Better critical region
• Given two critical regions 𝐶1 and 𝐶2 which are both of size 𝛼,
𝐶1 is better than 𝐶2 if 𝛾 𝐶1
𝜃 ≥ 𝛾 𝐶2
(𝜃) for all 𝜃 ∈ 𝜔1.
• Type1 에러의 최대값이 𝛼로 같으면, Type2 에러가 작은 것이 더 좋은
기각역이다.

Test for a Binomial Proportion of Success
• 𝑋~𝐵𝑒𝑟(𝑝), 𝑝0 : probability of dying with some standard treatment
• We want to test at size 𝛼,
𝐻0 ∶ 𝑝 = 𝑝0 versus 𝐻1 ∶ 𝑝 < 𝑝0
• Let 𝑋1, … , 𝑋 𝑛 be a random sample from the distribution of X and
let 𝑆 = 𝑖=1
𝑛
𝑋𝑖 .
• An intuitive decision rule is :
Reject 𝐻0 in favor of 𝐻1 if 𝑆 ≤ 𝑘,
where 𝑘 is such that 𝛼 = 𝑃 𝐻0
[𝑆 ≤ 𝑘].
• For example, suppose 𝑛 = 20, 𝑝0 = 0.7, and 𝛼=0.15
• 𝑃 𝐻0
𝑆 ≤ 11 = 0.1133 and 𝑃 𝐻0
𝑆 ≤ 12 = 0.2277 ⋯ 𝑘 = 11? 12?
• On the conservative side, choose 𝑘 to be 11 and 𝛼 = 0.1133.
• 𝛾 𝑝 = 𝑃𝑝 𝑆 ≤ 𝑘 , 𝑝 < 𝑝0

Nomenclature
• 𝐻0 ∶ 𝑝 = 𝑝0 completely specifies the underlying distribution, it
is called a simple hypothesis. ( 𝜔0 = 1)
• 𝐻1 ∶ 𝑝 < 𝑝0 is composite hypothesis.
• Frequently, 𝛼 is also called the significance level(유의수준) of
the test
• Or, ”maximum of probabilities of committing an error of Type 1 error”
• Or, “maximum of the power of the test when 𝐻0 is true”

𝒕-test : Test for 𝜇 Under Normality
• Let 𝑋 have a 𝑁(𝜇, 𝜎2) distribution. Consider the hypotheses
𝐻0 ∶ 𝜇 = 𝜇0 𝑣𝑒𝑟𝑠𝑢𝑠 𝐻1 ∶ 𝜇 > 𝜇0
• Assume that the desired size of the test is 𝛼, for 0 < 𝛼 < 1.
• Our intuitive rejection rule is to reject 𝐻0 if 𝑋 is much larger than 𝜇0.
• 𝑇 =
𝑋−𝜇0
𝑆 𝑛
, 𝑡-distribution with 𝑛-1 degree of freedom. It follows that this rejection rule
has exact level 𝛼 :
Reject 𝐻0 if 𝑇 =
𝑋 − 𝜇0
𝑆 𝑛
≥ 𝑡 𝛼,𝑛−1, where 𝛼 = 𝑃[𝑇 > 𝑡 𝛼,𝑛−1]

• Large sample rule ??
• In practice, we may not be willing to assume that the
population is normal. Usually 𝑡-critical values are larger than
𝑧-critical values
• Hence, the 𝑡-test is conservative relative to the large sample
test. So, in practice, many statisticians often use the 𝑡-test.

예제6(가설검정 𝑡-test)
베스킨라빈스 파인트의 무게는 모평균이 310g 인 정규분포를 따
른다고 알고있다. 베스킨라빈스에 10번을 가서, 무게를 재봤더니
다음과 같았다.
가설 설정 :
• 귀무가설(𝐻0) : 𝜇 = 310
• 연구가설(𝐻1) : 𝜇 > 310
1 2 3 4 5 6 7 8 9 10 평균
310 315 330 320 325 325 310 315 320 340 321

• 기각역 설정(유의수준(𝛼) = 5%)
• Intuitive rejection rule :
• 𝑋 much larger than 310 or much smaller than 310
• 𝑇 =
𝑋−310
𝑆 10
≥ 𝑡0.05,9
• Critical region of size 0.05
• 𝐶 = { 𝑥 ≥ 312.1170 }
• 𝑥 = 321 -> 귀묵가설 기각
• Conclusion
• 베스킨라빈스 파인트 모평균은 310g이 아니고, 310g 보다 크다.

Randomized tests and 𝑝-value
• Let 𝑋1, … , 𝑋10 be random sample of size 𝑛=10 from a Poisson
distribution with mean 𝜃.
• A critical region for testing 𝐻0 ∶ 𝜃 = 0.1 against 𝐻1 ∶ 𝜃 > 0.1 is
given by 𝑌 = 𝑖=1
10
𝑋𝑖 ≥ 3. The statistic 𝑌 has a Poisson distribution
with mean 10𝜃.
• Significance level of the test is
𝑃 𝑌 ≥ 3 = 1 − 𝑃 𝑌 ≤ 2 = 1 − 0.920 = 0.080
• If the critical region defined by 𝑖=1
10
𝑋𝑖 ≥ 4 is used, the significance level is
𝛼 = 𝑃 𝑌 ≥ 4 = 1 − 𝑃 𝑌 ≤ 3 = 0.019
How we achieve a significance level of 𝛼 = 0.05??

• Let 𝑊 have a Bernoulli distribution with probability of success
equal to
𝑃 𝑊 = 1 =
0.050 − 0.019
0.080 − 0.019
=
31
61
• Assume that 𝑊 is selected independently of the sample. Consider
the rejection rule
Reject 𝐻0 if 1
10
𝑥𝑖 ≥ 4 or if 1
10
𝑥𝑖 = 3 and 𝑊 = 1.
• The significance level of this rule is
𝑃 𝐻0
𝑌 ≥ 4 + 𝑃 𝑌 = 3 ∩ 𝑊 = 1
= 𝑃 𝐻0
𝑌 ≥ 4 + 𝑃 𝐻0
𝑌 = 3 𝑃 𝑊 = 1
= 0.019 + 0.061
31
61
= 0.05
• The process of performing the auxiliary experiment is referred to
as a randomized test.

Observed Significance Level
• Not many statisticians like randomized tests in practice,
because the use of them means that two statisticians could
make the same assumptions, observe the same data, and yet
make different decisions.
• As a matter of fact, many statisticians report what are
commonly called observed significance level or 𝒑-values.
유의수준을 정했는데 그에 따른 기각역을 정할 수 없으면 그냥 p-value를 살피자.

모분포 : 평균 100, 분산 30
Sampling
50개
표본 평균 : 30
표본 평균 : 180
귀무가설
두 표본이 얼마나 귀무가설을
벗어나는지 측정하고 싶다.
표본평균 30이 뽑히는 것만큼
극단적으로 관찰될 확률은?
표본평균 180이 뽑히는 것만큼
극단적으로 관찰될 확률은?
유의수준을 정해준 것은 귀무가설을 얼마나 벗어나면 기각을 시킬지 미리 정해준것!
P-value

• 𝑝-value is the observed “tail” probability of a statistic being at
least as extreme as the particular observed value when 𝐻0 is
true.
• If 𝑌 = 𝑢(𝑋1, 𝑋2, … , 𝑋 𝑛) is the statistic to be used in a test of
𝐻0 and if the critical region is of the form
𝑢 𝑥1, 𝑥2, … , 𝑥 𝑛 ≤ 𝑐,
an observed value 𝑢 𝑥1, 𝑥2, … , 𝑥 𝑛 = 𝑑 would mean that the
𝑝-value = 𝑃(𝑌 ≤ 𝑑; 𝐻0)

예제7
• 𝑋1, 𝑋2, … 𝑋25 a random sample from 𝑁(𝜇, 𝜎2 = 4). 𝑋~𝑁(𝜇, 0.16)
• Test 𝐻0 ∶ 𝜇 = 77 against the one-sided alternative hypothesis
𝐻1 ∶ 𝜇 < 77
• 𝑥 = 76.1
• 𝑧 − 𝑠𝑐𝑜𝑟𝑒 : -2.25
• 𝑝 − 𝑣𝑎𝑙𝑢𝑒 ∶ ∅ −2.25 = 0.012
• If we were using a significance level of 𝛼 = 0.05, we would
reject 𝐻0 and accept 𝐻1 ∶ 𝜇 < 77 because 0.012 < 0.05

Elementary statistical inference1

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Elementary statistical inference1

Similar to Elementary statistical inference1 (20)

More from SEMINARGROOT

More from SEMINARGROOT (20)

Recently uploaded

Recently uploaded (20)

Elementary statistical inference1

Editor's Notes