Session 03 Probability & sampling Distribution NEW.pptx

Probability and Probability Distribution
Chaweewon Boonshuyar
chaweewon.boo@gmail.com

From Course Syllabus
• Probability and Probability distributions
 Basics and Rules
 Binomial distribution
 Normal distribution
 Application in health research (OR, RR)
2
9/2/2022

Probability and Sampling Distributions
Probability Distributions
 Basics and Rules
 Binomial distribution
 Poisson distribution
 Normal distribution
Sampling Distribution
• Sample means
• Sample proportions
Application in Health Research (OR, RR)
3
9/2/2022

Probability
the ratio between number of possible event (a) to the number of all
possible events (A).
P(E)= a/A
In a community, there are 256 population of working age group( 121
male and 135 female). If 1 person is randomly selected,
The chance that a male will be selected = P(Male) = 121/356
The chance that a female will be selected= P(Female)=135/256
Numerator is a part or subset of denominator
Summation of the probability of all possible events= 1
4
9/2/2022

Basic Rules of probability
Probability is the measure of the relative frequency of an occurrence of event.
Let S= set of all possible outcomes=(A1, A2, A3,……..,An}
0≤ 𝑃(𝐴𝑖) ≤ 1
Sum of all 𝑃(𝐴𝑖)=1
𝐴𝑖 & 𝐴𝑗 = any two events , P(𝐴𝑖 or 𝐴𝑗) = P(𝐴𝑖) + P(𝐴𝑗) - P(𝐴𝑖 and 𝐴𝑗)
If 𝐴𝑖 and 𝐴𝑗 = mutually exclusive events
P(𝐴𝑖 and 𝐴𝑗)=0
P(𝐴𝑖 or 𝐴𝑗)= P(𝐴𝑖) + P(𝐴𝑗)
P 𝐴𝑖 𝐴𝑗 =
P(𝐴𝑖 and 𝐴𝑗)
P(𝐴𝑗)
; P(𝐴𝑗) ≠ 0
If 𝐴𝑖 and 𝐴𝑗 = independent events , P(𝐴𝑖 and 𝐴𝑗)=P(𝐴𝑖) * P(𝐴𝑗)
5
9/2/2022

Probability deals with the occurrence of a random event.
The four basic rules of probability are :
1. Addition rule of probability : P(A or B)=P(A)+P(B)-P(A and B)
2. Multiplication rule of probability
: P(A and B)=P(A)×P(B|A) or P(B)×P(A|B)
3. Complement rule of probability : P(not A)=1-P(A)
4. Total probability rule : The sum of the probabilities of all possible
outcomes is 1.
6
9/2/2022

Example 1:
Tossing an unbiased cion once, outcome={H, T}
P(H)=1/2
P(T)=1/2
H and T never occur together or P(H and T)=0
P(H or T) = P(H)+P(T)-P(H and T)
= (1/2)+(1/2)-0= 1
H and T = Mutually exclusive events
= Events that never occur together.
7
9/2/2022

Example 2:
Tossing an unbiased cion twice, outcome={HH,HT, TH, TT}
P(HH=2H=0T)=1/4
P(HT)=1/4=P(TH) >>> P(1T)=P(HT or TH)=P(1H)=1/4+1/4 =1/2
P(TT=2T=0H)=1/4
HH, HT, TH, TT = Mutually exclusive events
= Events that never occur together.
First tossing independent from 2nd tossing
P(HH)=P(H)*P(H), P(HT)=P(H)*P(T), P(TH)=P(T)*P(H), P(TT)=P(T)*P(T)
If A and B are 2 independent events, P(A and B)= P(A)*P(B)
Or P 𝐵 𝐴 = 𝑃(𝐵) or P 𝐴 𝐵 =P(A).
8
9/2/2022

Question 1. Find
P(M)= ……… P(F)=………. P(M and D)=……… P(M and 𝐷)= ……
P (𝐷)=………. P(𝐷 )= ……… P(F and D)=……… P(F and 𝐷)= ……
P(D|M)=…… P(D|F)=……
P(F|D)= ……. P(M|D)=…….
P(𝐷|M)= …… P(𝐷|F)=…….
Disease status by sex
Sex
Disease Total
𝐷 𝐷
Male(M) 40 460 500
Female 260 140 400
Total 300 600 900
9
9/2/2022

Tossing
1 coin
Outcomes
X=# of T P(X)
Once H P(H) 0 P(0T) (1-)
1=((1-)+ )1
T P(T) 1 P(1T) 
Twice HH P(HH) 0 P(0T) (1-)2
1=((1-)+ )2
HT P(HT) 1
2P(1T) 2(1-)
TH P(TH) 1
TT P(TT) 2 P(2T) 2
Thrice HHH P(HHH) 0 P(0T) (1-)3
1=((1-)+ )3
HHT P(HHT) 1
3P(1T) 3(1-)2
HTH P(HTH) 1
THH P(THH) 1
HTT P(HTT) 2
3P(T)
3(1-)2
THT P(THT) 2
TTH P(TTH) 2
TTT P(TTT) 3 P(3T) 3
Outcomes of Tossing coin : Let P(T)=
10
9/2/2022

Binomial Distribution
= The distribution of the number of successes of n
independent and identical Bernoulli trials.
Let x = Number of successes =0,1,2,……..,n
 = Chance of success of each trial
Pr 𝑋 = 𝑥 =
𝑛
𝑥
𝜋𝑥
(1 − 𝜋)𝑛−𝑥
Bernoulli trial or Binomial trial = a random experiment with exactly two
possible outcomes : success or failure in which the probability of success
is the same every time the experiment is conducted. 11
9/2/2022

Let X= # of T P(X) X*P(X) ((X-3)^2)*P(X)
0 (1-)3 0 (0-3)2 *(1-)3
1 3(1-)2 3(1-)2 (1-3)2 * 3(1-)2
2 3(1-)2 6(1-)2 (2-3)2 * 3(1-)2
3 3 33 (3-3)2 *3
Total 1 3 3(1-)
Mean of X Variance of X
12
9/2/2022

Pr 𝑋 = 𝑥 =
𝑛!
𝑥!(𝑛−𝑥)!
𝜋𝑥
(1 − 𝜋)𝑛−𝑥
; X = 0,1,2,3,…,n
Mean of X or 𝜇𝑥 = 𝑛𝜋
Variance of X or 𝜎𝑥
2
= 𝑛𝜋(1 − 𝜋)
𝜋 = parameter in Binomial distribution
13
9/2/2022

How to obtain Binomial probability in EXCEL?
𝑓𝑥 >> Select for a function or select for a category
select for a category …. Statistical
select a function …. BINOM.DIST
Number_s = Number of success (x)
Trials = Number of trials (n)
Probability_s = Probability of success ()
Cummulative box
True = Cumulative distribution function = Prob 𝑋 ≤ 𝑥
False = Probability mass function = Prob 𝑋 = 𝑥
14
9/2/2022

Question 2.
Q1:Suppose it said that 67.8% of BKK people strictly practice on
DMHTT(Distancing, Wearing Mask, Hand washing, Testing (temperature) and
Thai Cha Na)ใ
1. For a family of 10 members, what will be the chance that
a) None of them, b) Half, c) all, d) at least half of them
strictly practice on DMHTT.
2. In a community of size 200 population, what will be the
chance that
a) None of them, b) Half, c) all, d) at least half of them
strictly practice on DMHTT.
For each question, find mean, median, mode and SD of the number of strictly
practice on DMHTT.
15
9/2/2022

 = Mean of outcome per time interval
Let X= Number of outcome in each time interval = 0, 1, 2, 3, …………….
Pr 𝑋 = 𝑥 =
𝑒−𝑥
𝑥!
Mean of X or 𝜇𝑥 =  = Variance of X or 𝜎𝑥
2
= parameter in Poisson distribution
Poisson Distribution
The distribution of the number outcomes in an equal non-overlapping
and independent time interval.
𝑥1 𝑥2 𝑥3 𝑥𝑛
1
0 2 3 n-1 n
16
9/2/2022

𝑓𝑥 >> Select for a function or select for a category
select for a category …. Statistical
select a function …. POISSON.DIST
X = Number of success (x)
Mean = Lamda
Cumulative box
True = Cumulative distribution function = Prob 𝑋 ≤ 𝑥
False = Probability mass function = Prob 𝑋 = 𝑥
How to obtain Binomial probability in EXCEL?
17
9/2/2022

QUESTION 3:
A community hospital report on the average of road traffic injured patients
admitted to the emergency department was 5.1 daily.
What is the probability that in a coming day, there will be none, 3,4 and at
least 6 of the patients admitted to this emergency department?
18
9/2/2022

Normal distribution
Mean=𝝁 , SD=𝝈
Skewness=0, Kurtosis=3
19
9/2/2022

Measure of Shape
Let X1, X2, ..., XN, N measurement data points
SD(X)= 𝑖=1
𝑁 𝑋𝑖−𝑋 2
𝑁
Kurtosis = 𝑖=1
𝑁
𝑋𝑖−𝑋 4/𝑁
(𝑆𝐷(𝑋))4
Skewness = 𝑖=1
𝑁
𝑋𝑖−𝑋 3/𝑁
(𝑆𝐷(𝑋))3
20
9/2/2022

Standard Normal distribution
/2 /2
Z/2 Z(1-/2) Z
Area = 1-
Z
Z  N(0,1)
21
9/2/2022

𝑓(𝑋) =
1
2𝜋𝜎2
𝑒
(𝑋−𝜇)2
2𝜎2
; -∞ < 𝑋 < ∞ , −∞ < 𝜇 < ∞, 𝜎2
≠ 0
X ~N(𝜇 , 𝜎2)
1. Bell shape or symmetry with respect to mean
2. Me(di)an = Mode
3. -∞ < 𝑋 < ∞
4. P(X=x)=0 (in theory but practical ????)
22
9/2/2022

Standard Normal Distribution
= Normal with Mean=0 and Variance=1
Z~ 𝑁 0,1
𝑓(𝑋) =
1
2𝜋
𝑒
1
2
𝑧2
; -∞ < 𝑍 < ∞
Area of Normal or Std Normal …… From EXCEL
Relationship bet Normal and Std Normal
𝑧 =
𝑋 − 𝜇
𝜎
; 𝜎 ≠ 0
23
9/2/2022

EXCEL for Standard Normal Distribution
>> Tool bar>>Function category>> Statistical>>Function name
NORM.S.DIST ….. P(Z<z)
NORM.S.INV ….. Z=?, if area under std normal curve is given
NORM.DIST …. Normal Dist. P(X<x) for known mean and variance
NORM.INV …. X=?, if area under normal curve is given for known
mean and variance
24
9/2/2022

Normal approximate to Binomial
If n is large of 𝒏 → ∞, X~ 𝑁(𝑛𝜋, 𝑛𝜋 1 − 𝜋 )
Z =
𝑋−𝑛𝜋
𝑛𝜋(1−𝜋)
~ 𝑁(0,1)
Normal approximate to Poisson
X~ Poisson()
Z =
𝑋−

~ 𝑁(0,1)
Poisson approximate to Binomial
If n is large of 𝒏 → ∞ and 𝜋 → 0 , then 𝑛𝜋 ≈ 𝑛𝜋(1 − 𝜋)
X~ Poisson (𝑛𝜋)
25
9/2/2022

Question 4:
We assumed that the systolic blood pressure of adult men is
approximately normal with mean and standard deviation 125 and
12 mm.Hg. respectively.
a) If one adult man is randomly selected, what will be the chance
that his systolic blood pressure will be
a.1) less than 130 mm.Hg.
a.2) between 110 and 127 mm.Hg.
a.3) more than 120 mm.Hg.
b) If 300 adult men are randomly selected, how many of them will
have their blood pressure more than 120 mm.Hg.
26
9/2/2022

Question 5:
In a small school of size 150 students, if carries prevalence of student in
this school is the same as the provincial level which is 30%. What is the
chance that you’ll find exactly 10, at most 10 and 10-30 students with
carries free.
Question 6:
1% of workers under insurance was injured yearly. If a company with
1000 workers, what is the chance that not more than 8, 8-15 workers will
be injured in the coming year.
27
9/2/2022

Sampling Distribution
Population N=5
3 7 9
5
11
𝜇 =
𝑋
𝑁
= 7 𝜎2 =
(𝑋 − 𝜇)2
𝑁
= 8
28
9/2/2022

Population N=5
3 7 9
5
11
Sampling with replacement
n=2
Total possible
sample= 25
𝑥 =
𝑖=1
2
𝑥𝑖
2
𝑠2 = 𝑖=1
2 (𝑥𝑖−𝑥)2
𝑛−1
29
9/2/2022

1st draw
2nd draw
3 5 7 9 11
3 3,3 3,5 3,7 3,9 3,11
5 5,3 5,5 5,7 5,9 5,11
7 7,3 7,5 7,7 7,9 7,11
9 9,3 9,5 9,7 9,9 9,11
11 11,3 11.5 11,7 11,9 11,11
Sampling with replacement n=2 from N=5
30
9/2/2022

Sample 𝑥 𝑠2 Sample 𝑥 𝑠2 sample 𝑥 𝑠2
3,3 3 0 7,3 5 8 11,3 7 32
3,5 4 2 7,5 6 2 11,5 8 18
3,7 5 8 7,7 7 0 11.7 9 8
3,9 6 18 7,9 8 2 11,9 10 2
3,11 7 32 7,11 9 8 11,11 11 0
5,3 4 2 9,3 6 18 Total 175 200
5,5 5 0 9,5 7 8 Sum
(𝑥)
Sum
(𝑠2)
5,7 6 2 9,7 8 2
5,9 7 8 9,9 9 0
5,11 8 18 9,11 10 2
𝑥 =
𝑖=1
2
𝑥𝑖
2
𝑠2
= 𝑖=1
2 (𝑥𝑖−𝑥)2
𝑛−1 31
9/2/2022

Sampling distn of sample means Sampling distn of sample variances
𝑥 Number 𝑠2 Number
3 3 0 5
4 4 2 8
5 5 8 6
6 6 18 4
7 7 32 2
8 6 Total 25
9 5
10 4
11 3
Total 25
32
9/2/2022

𝜇𝑥 =
𝑥
25
=
175
25
=7 =𝜇𝑥 = population mean
𝜎𝑥
2
=
(𝑥 − 𝜇𝑥)2
25
=
(3 − 7)2+(4 − 7)2+(5 − 7)2+ ⋯ … + (10 − 7)2
25
=
100
25
= 4 =
𝜎2
𝑛
𝜇𝑠2 =
𝑠𝑖
2
25
=
200
25
= 8 = 𝜎𝑥
2 = population variance
Sample mean = unbiased estimate of population mean
Sample variance = unbiased estimate of population variance
=Variance of sample mean
𝜎𝑥= SD(𝑥) =
𝜎
𝑛
= Standard error of Mean =SEM= SE(𝑥)
33
9/2/2022

1st draw
2nd draw
3 5 7 9 11
3 3,5 3,7 3,9 3,11
5 5,7 5,9 5,11
7 7,9 7,11
9 9,11
11
Sampling without replacement n=2 from N=5
34
9/2/2022

Sample 𝑥 𝑠2 Sampling distn of
sample means
Sampling distn of
sample var
3,5 4 2 𝑥 Number 𝑠2 Number
3,7 5 8 4 1 2 4
3,9 6 18 5 1 8 3
3,11 7 32 6 2 18 2
5,7 6 2 7 2 32 1
5,9 7 8 8 2 Total number 10
5,11 8 18 9 1 Sum(𝑠2 )= 100
7,9 8 2 10 1 Mean(𝑠2 )=10 ≠ 𝜎2
7,11 9 8 Total number 10 Slightly higher than 𝜎2
9,11 10 2
Total 70 100 35
9/2/2022

𝜇𝑥 =
𝑥
10
=
70
7
=7 =𝜇𝑥 = population mean
𝜎𝑥
2
=
(𝑥 − 𝜇𝑥)2
10
=
(3 − 7)2+(4 − 7)2+(5 − 7)2+ ⋯ … + (10 − 7)2
10
=
30
10
= 3 =
(𝑁−𝑛)
(𝑁−1)
𝜎2
𝑛
𝜇𝑠2 =
𝑠𝑖
2
10
=
100
10
= 10 ≠ 𝜎𝑥
2 = population variance (Slightly higher than 𝜎2)
36
9/2/2022

Mean of sample means Variance of sample mean
Sampling with replacement 𝜇𝑥 = 𝜇𝑥
𝜎𝑥
2
=
𝜎2
𝑛
Sampling without
replacement
𝜇𝑥 = 𝜇𝑥 𝜎𝑥
2
=
(𝑁−𝑛)
(𝑁−1)
𝜎2
𝑛
Sampling without replacement, for large N and small n,
(𝑁−𝑛)
(𝑁−1)
→ 1, Then 𝜎𝑥
2
=
𝜎2
𝑛
n= sample size and N=size of population
𝜎𝑥 =
𝜎
𝑛
= Standard error of Mean = S.E(𝑥),
for unknown 𝜎2 >>> estimate by 𝑠2
= 𝑖=1
2 (𝑥𝑖−𝑥)2
𝑛−1
, then 𝑠𝑥 =
𝑠
𝑛
an estimate of 𝜎𝑥 .
37
9/2/2022

Central Limit Theorem
If we pick up sample of size n from normally distributed population with
known both mean (𝜇 ) and variance(𝜎2
) , the distribution of sample
means (𝑥) is normally distributed with mean(𝜇) and variance (
𝜎2
𝑛
).
𝑥 ~ 𝑁(𝜇,
𝜎2
𝑛
) then z =
𝑥−𝜇
𝜎
𝑛
38
9/2/2022

Central Limit Theorem
If we pick up sample of size n from non normal distributed population
with known mean (𝜇 ) and known variance(𝜎2
) , for large n the
distribution of sample means (𝑥) is approximately normally distributed
with mean(𝜇) and variance (
𝜎2
𝑛
).
𝑥 ~ 𝑁(𝜇,
𝜎2
𝑛
) then z =
𝑥−𝜇
𝜎
𝑛
For large sample size and unknown variance,
𝑥 ~ 𝑁(𝜇,
𝑠2
𝑛
) then t =
𝑥−𝜇
𝑠
𝑛
; df.=n-1
39
9/2/2022

Sampling distribution of sample proportion
X=0; female and
X=1; male
P(X=1) = P(Male) = 𝜋=
3
5
= 𝜇𝑥
P(X=0)=P(Female)= 1- 𝜋 =
2
5
= 𝜇𝑛−𝑥
Var(X)=
6
25
=
3
5
∗
2
5
= 𝜋(1- 𝜋 )
Population N=5
M1
M2
F3
F4 M5
Sampling with replacement
n=2
Total possible
sample= 25
𝑝 = sample proportion of male
40
9/2/2022

Sample 𝑝 𝑠2 Sample 𝑝 𝑠2 sample 𝑝 𝑠2
M1,M1 1 0 F3,M1 1/2 1/2 M5,M1 1 0
M1,M2 1 0 F3,M2 1/2 1/2 M5,M2 1 0
M1,F3 1/2 1/2 F3,F3 0 0 M5,F3 1/2 1/2
M1,F4 1/2 1/2 F3,F4 0 0 M5,F4 1/2 1/2
M1,M5 1 0 F3,M5 1/2 1/2 M5,M5 1 0
M2,M1 1 0 F4,M1 1/2 1/2 Total 15 6
M2,M2 1 0 F4,M2 1/2 1/2 Sum
(p)
Sum
(𝑠2
)
M2,F3 1/2 1/2 F4,F3 0 0
M2,F4 1/2 1/2 F4,F4 0 0
M2,M5 1 0 F4,M5 1/2
X=0; female and
X=1; male 𝑠2 =
1
2
(𝑥𝑖 − 𝑥)2
2 − 1
𝑝 = 𝑥 =
1
2
𝑥𝑖
2
41
9/2/2022

Sampling dist of sample proportions Sampling dist of sample variance
𝑝 Number 𝑠2 Number
0 4 0 13
1/2 12 1/2 12
1 9
Total 25
𝜇𝑝 =
𝑝
25
=
15
25
=
3
5
=𝜋 = population proportion of male
𝜎𝑝
2 =
(𝑝 − 𝜇𝑝)2
25
=
(1 −
3
5
)2
+(1 −
3
5
)2
+(0 −
3
5
)2
+ ⋯ … + (1 −
3
5
)2
25
=
30
25
/5 =
3
5
∗
2
5
/𝑛 = 𝜋(1- 𝜋 )/n
𝜇𝑠2 =
𝑠𝑖
2
25
=
6
25
= 𝜋(1- 𝜋 )= Var(X)
42
9/2/2022

Mean of sample
proportions
Variance of sample
proportions
Sampling with
replacement
𝜇𝑝 = 𝜋
𝜎𝑝
2=
𝜋(1 − 𝜋 )
𝑛
Sampling without
replacement
𝜇𝑝 = 𝜋 𝜎𝑝
2 =
(𝑁−𝑛)
(𝑁−1)
𝜋(1−𝜋 )
𝑛
Sampling without replacement, for large N and small n,
(𝑁−𝑛)
(𝑁−1)
→ 1, Then 𝜎𝑝
2=
𝜋(1−𝜋 )
𝑛
n= sample size and N=size of population
𝜎𝑝 =
𝜋(1−𝜋 )
𝑛
= Standard error of proportion= S.E(p)
For unknown 𝜋, 𝑠𝑝 =
𝑝(1−𝑝 )
𝑛
= estimate of 𝜎𝑝.
43
9/2/2022

𝜇𝑝 =
𝑝
25
=
15
25
=
3
5
=𝜋 = population proportion of male
𝜎𝑝
2
=
(𝑝 − 𝜇𝑝)2
25
=
(1 −
3
5
)2+(1 −
3
5
)2+(0 −
3
5
)2+ ⋯ … + (1 −
3
5
)2
25
=
3
25
=
3
5
∗
2
5
/2 = 𝜋(1- 𝜋 )/n
If n→large, 𝑝~ 𝑁(𝜋,
𝜋(1− 𝜋 )
𝑛
) then z =
𝑝−𝜋
𝜋(1−𝜋 )
𝑛
If n→ small, X ~ Binomial(𝜋) >>>> Binomial test
44
9/2/2022

Suppose it is known that 90% of pregnant women entering their
3rd trimester had some prenatal care. If a sample of size 200 is
randomly recruited from this population, what will be the chance
that the proportion of having some prenatal care is not more
than 85%.
P(p<0.85)=???; z=
𝑝−𝜋
𝜋(1−𝜋)
𝑛
=
0.85−𝜋
𝜋(1−𝜋)
𝑛
=
0.85−0.90
0.90(1−0.90)
200
=-2.36
P(p<0.85)= P(z<-2,36) =0.0091
45
9/2/2022

Point Estimation
Quantitative variable
1. One set of sample
Sample mean (𝑥 )…………. Estimate ……. Population mean (𝜇)
Sample variance (𝑠2) ….. Estimate …… Population variance (2)
Error of estimation=SEM=SD(𝑥 )= 𝜎𝑥 =
𝜎
𝑛
For unknown 2, SD(𝑥 ) …… Estimate …… 𝑠𝑥 =
𝑠
𝑛
46
9/2/2022

Quantitative variable
2. Two independent sets of sample (𝑖=1,2)
Sample mean (𝑥𝑖)…………. Estimate ……. Population mean (𝜇𝑖)
Sample variance (𝑠𝑖
2
) ….. Estimate …… Population variance (𝜎𝑖
2
)
Error of estimation=SEM=SD(𝑥𝑖)= 𝜎𝑥𝑖
=
𝜎𝑖
𝑛𝑖
For unknown 2, SD(𝑥𝑖) …… Estimate …… 𝑠𝑥𝑖
=
𝑠𝑖
𝑛𝑖
Difference of sample mean = (𝑥1- 𝑥2) ….. Estimate … (𝜇1- 𝜇2)
Error of estimation …… 𝜎 𝑥1 −𝑥2
= 𝜎𝑥1
2
+ 𝜎𝑥2
2
=
𝜎1
2
𝑛1
+
𝜎2
2
𝑛2
For unknown variances of both group, SD (𝑥1- 𝑥2) …… Estimate ……
Equal variances, SD (𝑥1- 𝑥2) = 𝑠𝑝
2
(
1
𝑛1
+
1
𝑛2
) ; 𝑠𝑝
2 =
(𝑛𝑖−𝑖)𝑠𝑖
2
(𝑛𝑖−1)
=pooled variance
Unequal variances, SD (𝑥1- 𝑥2) =
𝑠1
2
𝑛1
+
𝑠2
2
𝑛2
Point Estimation
47

Qualitative variable
1. One set of sample
Sample proportion (p)…. Estimate ….. Population proportion ()
Error of estimation=SD(p) = 𝜎𝑝 =
𝜋(1−𝜋)
𝑛
For unknown , 𝜎𝑝 ….. Estimate …. 𝑠𝑝 =
𝑝(1−𝑝)
𝑛
Point Estimation
48
9/2/2022

Qualitative variable
2. Two independent sets of sample (𝑖=1,2)
Sample proportion (𝑝𝑖)…………. Estimate ……. Population proportion (𝑖)
Error of estimation=SD(𝑝𝑖) = 𝜎𝑝𝑖
=
𝜋𝑖 (1−𝜋𝑖)
𝑛𝑖
For unknown 𝜋𝑖, 𝜎𝑝𝑖
….. Estimate …. 𝑠𝑝𝑖
𝑝𝑖(1−𝑝𝑖)
𝑛𝑖
Difference of sample proportion = (𝑝1- 𝑝2) ….. Estimate … (1- 2) …. Risk difference
Known population proportions
Error of estimation …… 𝜎(𝑝1− 𝑝2) =
𝜋1 (1−𝜋1)
𝑛1
+
𝜋2 (1−𝜋2)
𝑛2
Unknown population proportions
Error of estimation …… 𝑠(𝑝1− 𝑝2) =
𝑝1 (1−𝑝1)
𝑛1
+
𝑝2 (1−𝑝2)
𝑛2
Point Estimation
49
9/2/2022

Interval Estimation
Pr(𝑧𝛼
2
< 𝑧 < 𝑧(1−
𝛼
2
)) = 1−
From CLT and known pop. variance: 𝑥  N(𝜇𝑥,𝜎𝑥
2
) 𝜇𝑥=𝜇
𝜎𝑥
2
=
𝜎2
𝑛
then Z=
𝑥−𝜇
𝜎
𝑛
 N(0,1).
Then Pr(𝑥 + 𝑧𝛼
2
𝜎
𝑛
< 𝜇< 𝑥 + 𝑧(1−
𝛼
2
)
𝜎
𝑛
) = 1−
100(1−)% Confidence Interval of 𝜇 = 𝑥 ± 𝑧(1−
𝛼
2
)
𝜎
𝑛
/2 /2
Z/2 Z(1-/2) Z
Area = 1-
Z
50
9/2/2022

For unknown population variance, 𝑠2 ….. Estimate …… 2, then 𝑥  N(𝜇𝑥,𝑠𝑥
2
)
𝜇𝑥=𝜇 , 𝑠𝑥
2
=
𝑠2
𝑛
then t=
𝑥−𝜇
𝑠
𝑛
 t with df=(n-1)
Then Pr(𝑥 + 𝑡𝛼
2
𝑠
𝑛
< 𝜇< 𝑥 + 𝑡(1−
𝛼
2
)
𝑠
𝑛
) = 1−
100(1−)% Confidence Interval of 𝜇 = 𝑥 ± 𝑡(1−
𝛼
2
)
𝜎
𝑛
100(1−)% CI of True Mean =
𝑥 ± 𝑧(1−
𝛼
2
)
𝜎
𝑛
,
𝑥 ± 𝑡(1−
𝛼
2
)
𝑠
𝑛
 t with df=(n-1)
Interval Estimation
51
9/2/2022

From the same concept, we can obtain 100(1−)% CI of (𝜇1- 𝜇2) as the
following
1. Known pop. Var.
100(1−)% CI of (𝜇1- 𝜇2) = (𝑥1- 𝑥2) ± 𝑧(1−
𝛼
2
)
𝜎1
2
𝑛1
+
𝜎2
2
𝑛2
2. Unknown pop. Var.
2.1 Equal pop. var. …… estimate by 𝑠𝑝
2 =
(𝑛𝑖−𝑖)𝑠𝑖
2
(𝑛𝑖−1)
=pooled variance
100(1−)% CI of (𝜇1- 𝜇2)
= (𝑥1- 𝑥2) ± 𝑡(1−
𝛼
2
)
𝑠𝑝
2
(
1
𝑛1
+
1
𝑛2
) ; t with df=(𝑛1 + 𝑛2 − 2)
Interval Estimation (Diff bet 2 independent pop. Means)
52
9/2/2022

2.2 Unequal var. , each sample var …. Estimate… pop. Var
100(1−)% CI of (𝜇1- 𝜇2)
= (𝑥1- 𝑥2) ± 𝑡(1−
𝛼
2
)
𝑠1
2
𝑛1
+
𝑠2
2
𝑛2
; df =
𝑠1
2
𝑛1
+
𝑠2
2
𝑛2
2
𝑠1
2
𝑛1
2
𝑛1−1
+
𝑠2
2
𝑛2
2
𝑛2−1
Interval Estimation (Diff bet 2 independent pop. Means)
53
9/2/2022

Pr(𝑧𝛼
2
< 𝑧 < 𝑧(1−
𝛼
2
)) = 1−
From CLT and for large n: p  N(𝜇𝑝,𝜎𝑝
2
) where 𝜇𝑝= and 𝜎𝑝 =
𝜋(1−𝜋)
𝑛
, then
Z=
𝑝−𝜋
𝜋(1−𝜋)
𝑛
 N(0,1).
Pr(𝑝 + 𝑧𝛼
2
𝜋(1−𝜋)
𝑛
<< 𝑝 + 𝑧(1−
𝛼
2
)
𝜋(1−𝜋)
𝑛
) = 1−, estimate  again by p
100(1−)% Confidence Interval of  = 𝑝 ± 𝑧(1−
𝛼
2
)
𝑝(1−𝑝)
𝑛
Interval Estimation
/2 /2
Z/2 Z(1-/2) Z
Area = 1-
Z
54
9/2/2022

Interval Estimation of diff. bet 2 independent proportions
From CLT and for large sample size 𝑛𝑖(i=1,2)
𝑝𝑖 N(𝜇𝑝𝑖
,𝜎𝑝𝑖
2
) where 𝜇𝑝𝑖
=𝜋𝑖 and 𝜎𝑝𝑖
=
𝜋𝑖 (1−𝜋𝑖)
𝑛𝑖
,
Then (𝑝1- 𝑝2)  N 𝜇(𝑝1−𝑝2), 𝜎(𝑝1−𝑝2)
2
,
where 𝜇(𝑝1−𝑝2) = (1- 2) and 𝜎(𝑝1−𝑝2)
2
=
𝜋1 (1−𝜋1)
𝑛1
+
𝜋2 (1−𝜋2)
𝑛2
.
Since we do not know those 2 pop. proportions, they both will be estimated
by sample proportion of each group.
100(1−)% CI of(1- 2) = (𝑝1- 𝑝2) ± 𝑧(1−
𝛼
2
)
𝑝1 (1−𝑝1)
𝑛1
+
𝑝2 (1−𝑝2)
𝑛2
55
9/2/2022

RR: Relative Risk
When we wish to compare the probabilities of a specified
event in two different groups, the concept of relative risk is
often useful.
Applicationin Health Research (RR, OR)
56
9/2/2022

Relative Risk: RR
Cohort or Follow-up or Experimental study
Group
Outcome
Total
Cure Not cure
A A B N1
B C D N2
Total A+C B+D N
Population
Sample
Sex(EXPOSED)
Outcome
Total
Cure Not cure
A a b n1
B c d n2
Total a+c b+d n
STARTING POINT
STARTING POINT
57
9/2/2022

True Cure rate of A = A/N1 =𝑃𝐴 and
True Cure rate of B= C/N2 =𝑃𝐵
TRUE RR=
𝑃𝐴
𝑃𝐵
Estimate RR or 𝑅𝑅=
𝑝𝐴
𝑝𝐵
=
𝑎/𝑛1
𝑐/𝑛2
SE (ln 𝑅𝑅)=
1
𝑎
+
1
𝑐
−
1
𝑛1
−
1
𝑛2
1/2
95% confidence interval of RR
= exp( ln 𝑅𝑅 ± 1.96×SE(ln𝑅𝑅) )
58
9/2/2022

TRUE Risk Difference:
RD= Cure rate of A – Cure Rate of B = 𝑃𝐴- 𝑃𝐵
Estimate Risk Difference:
𝑅𝐷 = Sample Cure rate of A – sample Rate of B
= 𝑝𝐴- 𝑝𝐵
SE(𝑅𝐷)= SE(𝑝𝐴- 𝑝𝐵) =
𝑝𝐴(1−𝑝𝐴)
𝑛1
+
𝑝𝐵(1−𝑝𝐵)
𝑛2
59
9/2/2022

OR: Odds Ratio
Another measure that is often used to compare the
probabilities of an event in two different groups. Unlike the
relative risk, which compares the probabilities directly,
however, the odds ratio (as its name would suggest) relates
the odds of the event in the two populations.
60
9/2/2022

From Design … to analysis
Cross-sectional Study : Strictly practice on DMHTT
Sex
Strictly practice on DMHTT
Total
Yes No
Male A B N1
Female C D N2
Total A+C B+D N
Population
Sample
Sex
Total
Yes No
Male a b n1
Female c d n2
Total a+c b+d n
61
9/2/2022

Rational from Population to sample
• Probability of a Yes outcome in Male = A/N1
• Probability of a No outcome in Male = B/N1
Odd of outcome in Male = Probability of a Yes outcome in Male
Probability of a No outcome in Male
= A/B
Odd of outcome in Female = Probability of a Yes outcome in Female
Probability of a No outcome in Female
= C/D
True OR: OR =
Probability of a Yes outcome in Male
Probability of a No outcome in Male
Probability of a Yes outcome in Female
Probability of a No outcome in Female
=
𝐴𝐷
𝐵𝐶
Estimated OR: 𝑂𝑅 =
𝑎𝑑
𝑏𝑐
62
9/2/2022

𝑎𝑑
𝑏𝑐
SE (ln 𝑂𝑅)=
1
𝑎
+
1
𝑏
+
1
𝑐
+
1
𝑑
1/2
95% confidence interval of OR
= exp( ln 𝑂𝑅 ± 1.96×SE(ln 𝑂𝑅))
63
9/2/2022

Probability of outcome and odds ratio
Outcome Total
Yes No
Group A 20 60 80
Group B 10 90 100
Total 30 150 180
• Probability of a Yes outcome in group A = 20/80 =25%
• Probability of a No outcome in group A = 60/80 = 75%
• Odd of outcome in group A = Probability of a Yes outcome in group A
Probability of a No outcome in group A
= 0.25/0.75 =0.33
• Odd of outcome in group B = Probability of a Yes outcome in group B
Probability of a No outcome in group B
= 0.10/0.90 =0.11
64
9/2/2022

Probability of outcome and odds ratio (2)
Outcome Total
Yes No
Group A 20 60 80
Group B 10 90 100
Total 30 150 180
• Odds ratio of group A to group B =
odds of outcome in group A
odds of outcome in group B
= 0.33/0.11 =3.00
65
9/2/2022

Applicationin Health Research (OR)
Case-Control Study : Strictly practice on DMHTT
Sex(EXPOSED)
Total
Yes No
Young A B A+B
OLD C D C+D
Total N1 N2 N
Population
Sample
Sex(EXPOSED)
Total
Yes No
Young a b a+b
Old c d c+d
Total n1 n2 n
STARTING POINT
STARTING POINT
66
9/2/2022

• Probability of a Yes outcome in Male = A/N1
• Probability of a No outcome in Male = C/N1
Odd of Exposed among CASE = Probability of EXPOSED among CASE
Probability of Non−Exposed among CASE
= A/C
Odd of Exposed among CONTROL=
Probability of EXPOSED among CONTROL
Probability of Non−Exposed among CONTROL
= B/D
True OR: OR =
Probability of EXPOSED among CASE
Probability of Non−Exposed among CASE
Probability of EXPOSED among CONTROL
Probability of Non−Exposed among CONTROL
=
𝐴𝐷
𝐵𝐶
𝑎𝑑
𝑏𝑐
67
9/2/2022

Assignment 3
Q1 Slide # 9
Q2 Slide # 15
Q3 Slide # 18
Q4 Slide # 26
Q5 Slide # 27
68
9/2/2022

Relationships among the probabilities of an event in an exposed and unexposed group. Find the
odds ratio, the relative risk and interpret.
p(exp) p(unexp) OR RR
0.01 0.012
0.05 0.059
0.25 0.286
0.50 0.545
0.01 0.014
0.05 0.069
0.25 0.318
0.50 0.583
0.01 0.020
0.05 0.095
0.25 0.400
0.50 0.667

Session 03 Probability & sampling Distribution NEW.pptx

Recommended

Recommended

More Related Content

Similar to Session 03 Probability & sampling Distribution NEW.pptx

Similar to Session 03 Probability & sampling Distribution NEW.pptx (20)

Recently uploaded

Recently uploaded (20)

Session 03 Probability & sampling Distribution NEW.pptx

Editor's Notes