A+.pptx

Confidence Intervals
1
• We can also put confidence intervals around other sample statistics
too (not just the sample mean)
• The CI provides the precision for the sample statistic:
100(1-α)% CI: estimate ± k x (standard error)
• So the CI equals our estimate of the sample statistic plus/minus some multiple of
the standard error
• A 95% CI means that in 95 out of 100 samples of size n, the CI would contain the
population parameter
• Also common: 90% or 99% CI’s

2
𝒙
ഥ± 𝒁
𝝈
𝒏
The width of the confidence interval depends on:
• the variation in the population (i.e. the standard deviation 𝜎 which is
fixed or possibly unknown) – the more variation the wider the CI
• The sample size n – the larger the sample the narrower the CI
• The significance level – 99% CI is wider than a 95% CI because we
have to be “more sure” to capture the true value

3
𝑥ҧ ±
𝑍
𝜎
𝑛
𝜎
𝑛
• Why does the sample size n have to be large?
• CI’s are valid when the CLT holds
• The sample size n affects the standard error
• Remember that a fraction gets closer to its numerator
as it’s denominator gets larger
• As 𝑛 → ∞ the SE
𝜎
𝑛
gets closer and closer to 𝜎

Calculating Confidence Intervals
4
a) Single Mean
b) Difference between means - independent samples
c) Difference between means - dependent samples
d) Single Proportion
e) Difference between proportions - independent samples

5
a) CI for a Single Mean (known 𝜎)
• If the sample size is large and the population SD is
known….
• The CI formula for a single mean is:
𝑥ҧ ±
𝑍
𝜎
𝑛
𝑥ҧ is the sample mean
𝜎 is the population standard deviation
𝑛 is the sample size
𝑍 is a cutoff from the standard normal distribution
90% CI: 𝑍 = 1.64
95% CI: 𝑍 = 1.96
99% CI: 𝑍 = 2.58

6
a) CI for a Single Mean (known 𝜎) EXAMPLE
• We want to estimate the age of death for the US population
• We are told the population SD is 20.2 years
• In a sample of 100 people we calculate a mean age of death of 72.1
years
• a 95% confidence interval for the mean age of death is:
𝑥ҧ ± 𝑍 𝜎
𝑛
where:
𝑥ҧ = 72.1
𝜎 = 20.2
𝑛 = 100
𝑍 =1.96
72.1 + 1.96 20.2
≈ 76.06
100
72.1 − 1.96 20.2
≈ 68.14
100
95% CI: (68.14, 76.06)

C.I. for the difference between
population means (normally
distributed)
7
• Known variance (2 independent samples)A
100(1‐α)% C.I. for μ1 ‐ μ2 is
𝑥
ഥ
1 − 𝑥
ഥ
2 =Zα/2
1
+ 2
𝜎 2 𝜎 2
𝑛1 𝑛2

8
a) CI for a Single Mean (unknown 𝜎)
• Usually the population SD is not known
• We use the sample standard deviation 𝑠 instead
• Use t-distribution instead of standard normal
• The CI formula for a single mean is:
𝑥ҧ ±
𝑡
𝑠
𝑛
𝑥ҧ is the sample mean
𝑠 is the sample standard deviation
𝑛 is the sample size
𝑡 is a cut-off from the t-distribution with 𝑛 − 1 degrees of freedom

9
Student’s t-distribution
• Similar to the normal distribution but fatter tails
• Accounts for extra uncertainly from not knowing 𝜎
• Afamily of curves determined by two parameters:
– significance level 𝛼
– degrees of freedom df
• The degrees of freedom is 𝑛 − 1 because we’ve already used one piece of
information related to the variance to estimate 𝑠

10
Student’s t-distribution
• Calculate cut-off values using Stata using the command:
. display invttail(df, p)
where 𝑑𝑓 = 𝑛 − 1 and p is the area in the right tail
• For a 95% CI use 𝑑𝑓 = 𝑛 − 1 and p=0.025
• Example: if n=100 the t-value for a 95% CI would be
. display invttail(99, 0.025)
=1.984217

11
a) CI for a Single Mean (unknown 𝜎) EXAMPLE
Using formula:
𝑥ҧ ±
𝑡
𝑠
𝑛
The t-value has df=n-1=462-1 and p=0.025
. display invttail(461,0.025)
1.9651232
. display 87.93723 - 1.9651232 * 16.00469 / sqrt(462)=86.473988
. display 87.93723 + 1.9651232 * 16.00469 / sqrt(462)=89.400472
The 95% CI for mean zinc is (86.5, 89.4)
n 𝑥ҧ s

12
a) CI for a Single Mean (unknown 𝜎) EXAMPLE
Using Stata:
cii means n mean sd
ci means varname
n mean sd

13
Effect of significance on width of CI:
95% CI is the default (𝛼 = 0.05):
Use level () to change sig. to 99% (𝛼 = 0.01):
The CI becomes WIDER
[86.0…[86.4… 𝑥 …89.4]…89.9]
95% 99%

14
Effect of sample size on width of CI:
If we increase the sample size from 31 to 131
The standard error decreases and…
The CI becomes NARROWER n=131 n=31
[145.4…[145.6… 𝑥 …148.6]…149.8]

15
Effect of SD on width of CI:
If the standard deviation increases from 6 to 10
The standard error increases and……
The CI becomes WIDER s=6 s=10
[143.9…[145.4… 𝑥 …148.8]…151.3]

Calculating Confidence
Intervals
Comparing Z and t:
• For large n, the t-distribution approximates the normal
• Suppose the sample mean age of death is 𝑥ҧ = 72.1 yrs and that the sample SD is equal t
o
the population SD (s = 𝜎 = 20.2 yrs)
43
CI formula: Large sample size
(n=100) 95% CI:
Small sample size (n=10)
95% CI:
Normal dist.
𝜎
𝑥ҧ ± 𝑍
𝑛
Z=1.96
72.1 ± 1.96 20.2
≈
100
(68.14, 76.06)
Z=1.96
72.1 ± 1.96 20.2
≈
10
(59.58, 84.62)
t-dist.
𝑠
𝑥ҧ ± 𝑡
𝑛
t=1.98
72.1 ± 1.98 20.2
≈
100
(68.10, 76.10)
t=2.26
72.1 ± 2.26 20.2
≈
10
(57.66, 86.54)
Similar CI (large n) t has wider CI than Z (small n)

Intervals
 b) CI for a difference in means (independent samples)
 Suppose we have two independent groups of data and calculate a sample mean and sample
for each. The CI formula is:
17
(𝑥1 − 𝑥2) ± 𝑡 𝑝
𝑠2
1 1
+
𝑛1 𝑛2
Where:
𝑥1 and 𝑥2 are the sample means
𝑛1 and 𝑛2 are the sample sizes
𝑡 is a cut-off from the t-distribution with 𝑑𝑓 = 𝑛1 + 𝑛2 − 2
𝑝
𝑠𝑝 is the pooled variance 𝑠2 =
𝑛1−1 𝑠2+ 𝑛2−1 𝑠2
1 2
𝑛1+𝑛2−2
where 𝑠1 and 𝑠2 are sample
standard deviations

18
b) CI for a difference in means (independent
samples)
Assumptions:
1. The population standard deviations are approximately equal.
We check this by comparing the sample standard deviations.
2. For small sample sizes (say n<100) the population distribution should
approximately follow the normal distribution.
This is checked by assessing the sampling distribution for normality. The
assumption is fairly robust in that the formula is valid as long as the
distribution of data in the sample is approximately mound shaped and
symmetrical.
3. The 2 groups are independent.
4. The subjects within the 2 groups are independent.

19
samples)
Example using Stata command:
ttesti 𝑛1 𝑥1 𝑠1 𝑛2 𝑥2 𝑠2

20
samples)
ttesti assumes the sample SDs are equal by default
Use unequal option if 𝑠1 and 𝑠2 are not similar
df is
affected

Intervals
21
samples)
Suppose we want a CI for the difference in mean height between men
and women (assuming independence and normality hold…)
ttest varname, by(groupvar) Approx.
equal SDs

Intervals
22
c) CI for a difference in means (dependent
samples)
• Dependent samples occur with two groups of paired or
matched data
• Usually equal sample sizes in the 2 groups (1:1)
e.g.
– patient blood pressure before and after a treatment
– patient left leg and right leg measurements
– Two groups where pairs of people have been matched on
important demographics (age, sex, etc.)

 c) CI for a difference in means (dependent samples)
 1. Calculate the pair differences 𝑑
 e.g. for each patient, d = BP_after – BP_before
2. Find the mean 𝑑ҧ and standard deviation 𝑠𝑑 of the pair
 differences
3. The CI for the mean pair differences is:
 𝑑ҧ ± 𝑡
𝑠𝑑
23
𝑛
Where 𝑛 is the number of pairs and t has 𝑑𝑓 = 𝑛 − 1

24
samples) EXAMPLE
• The heartrates of 20 patients
before and after a treatment
• Want a 95% CI for the difference in
mean heartrate

25
samples) EXAMPLE
First, calculate the differences
Then use the formula 𝑑ҧ ± 𝑡
𝑠𝑑 𝑛
The 95% CI is: (1.5, 11.2)

Intervals
26
samples) EXAMPLE
OR using any of these Stata commands…
ttesti 𝑛1 𝑥ҧ𝑠1 0

Intervals
27
𝑝Ƹ =
d) CI for a single proportion
• Suppose we have a population of subjects and some of
them have a characteristic of interest and the rest don’t
– e.g. being female, having a cancer diagnosis, survived
• We want to estimate the true proportion p who have the
characteristic of interest
• If r is the number of sample subjects that have the
characteristic and n is the sample size then the sample
proportion is:
𝑟
𝑛

28
𝑆𝐸 𝑝Ƹ =
• If n is large enough and p is not too extreme, then the
sampling distribution of 𝑝Ƹ is normally distributed (CLT)
• The standard error of a proportion is:
𝑝Ƹ(1 − 𝑝Ƹ)
𝑛
𝑝Ƹ ±
𝑍
• The CI formula for a single proportion is:
𝑝Ƹ(1 − 𝑝Ƹ)
𝑛
where Z is a standard normal cut-off (95% CI: Z=1.96)

Intervals
29
𝑝Ƹ ±
𝑍
𝑝Ƹ(1 −
𝑝
Ƹ
) 𝑛
• This formula assumes the rule of thumb:
𝒏𝒑
ෝ and 𝐧(𝟏 − 𝒑
ෝ ) must both be greater than 5
• (n must be large enough too)
• Otherwise, the formula is not valid and we’d have to use
exact binomial values instead of Z cut-offs

d) CI for a single proportion EXAMPLE
• Consider the 5-yr survival for lung cancer patients (Pagano p328).
• We want to estimate the proportion p who survive 5 yrs since dx.
• In a random sample of n=52 patients only r=6 survive 5 yrs
(𝑝Ƹ=r/n=6/52~0.12).
Ƹ ≈ 45.76 so the rule of thumb holds
• Check n𝑝Ƹ ≈ 6.24 and n 1 − 𝑝
• A 95% CI for p is:
𝑝Ƹ ±
𝑍
𝑝
ෝ (
1
−
𝑝
ෝ )
𝑛
95% CI: (0.03, 0.21)
So between 3% and 21% of ptx with lung cancer survive 5 yrs after
dx. 57

31
d) CI for a single proportion EXAMPLE
• In a random sample of n=52 patients only r=6 survive 5 yrs
Using Stata to find the 95% CI:
cii proportions n r
Why is this answer different to the one we calculated with the formula?
Stata uses exact binomial values instead of the normal approximation
to the binomial

Similarly to the CI for a single mean, the width of the CI for
a single proportion is affected by:
32
• The sample size n
– increasing the sample size makes the CI narrower/more precise
i.e. small samples have wider CI/less precision
– The standard error decreases as n increases
• The significance level
– A 99% CI is wider than a 95% CI which is wider than a 90% CI

Intervals
33
e) CI for a difference in proportions (independent
samples)
• One group has sample proportion 𝑝
ෝ 1and sample size
𝑛1
• Second group has sample proportion 𝑝
ෝ 2and sample
size
𝑛2
• The CI formula is:
𝑝
ෝ 1− 𝑝
ෝ 2
± 𝑍
𝑝
ෝ 1 1
− 𝑝
ෝ 1
𝑝
ෝ 2
1 − 𝑝
ෝ 2
+
𝑛1 𝑛2
• The RHS looks complicated but really it’s just the standard error for
𝑝
ෝ 1 − 𝑝
ෝ 2
𝑆𝐸 = 𝑣𝑎𝑟(𝑝
ෝ 1) − 𝑣𝑎𝑟(𝑝
ෝ 2)

34
e)
CI for a difference in proportions (independent
samples)
𝑝
ෝ 1− 𝑝
ෝ 2
± 𝑍
𝑝
ෝ 1 1 − 𝑝
ෝ 1
𝑝
ෝ 2 1 − 𝑝
ෝ 2
+
𝑛1 𝑛2
• The formula is only valid for large samples and not too extreme
values of 𝑝1 − 𝑝2
• The rule of thumb is: if 𝑝Ƹ =
𝑟1+𝑟2
𝑛1+𝑛2
𝒏𝟏𝒑
ෝ and 𝒏𝟏(𝟏 − 𝒑
and
𝒏𝟐𝒑
ෝ and 𝒏𝟐(𝟏 − 𝒑
• you need to be able to check the rule of thumb, and use the CI
formula (we will calculate the CI using Stata)

35
e) CI for a difference in proportions (independent samples)
EXAMPLE
Stata command:
prtesti 𝑛1 𝑟1 𝑛2 𝑟2 , count
𝑛1 = 100 𝑟1= 80
𝑛2 = 100 𝑟2 = 50

36
e) CI for a difference in proportions (independent
samples) EXAMPLE
The difference in proportion of ptx who had pain relief by surgery and by meds
was between 17% and 43% (a higher % of surgery ptx had pain relief
compared to med ptx)
Difference in
sample
proportions

One more thing…
37
• Some Stata commands used in the lectures/tutes and
Modules are different in previous versions of Stata:
If you are using Stata 14
(the latest version) the CI
commands are:
cii means n mean sd
ci means varname
cii proportions n r
If you are using an older
version of Stata the CI
commands are:
cii n mean sd
ci varname
cii n r

And use Stata’s cii command to produce the 95% confidence interval:
41

Confirm using ttest in Stata:
45
ttesti 12 13.21 1.05 9 11 1.01, level(95)
Two-sample t test with equal variances
| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
+
x.|
y.|
12
9
13.21
11
.3031089
.3366667
1.05
1.01
12.54286
10.22365
13.87714
11.77635
+
combined | 21 12.26286 .328802 1.50676 11.57699 12.94873
+
diff | 2.21 .455663 1.256286 3.163714
diff = mean(x) - mean(y) t = 4.8501
Ho: diff = 0 degrees of freedom = 19
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.9999 Pr(|T| > |t|) = 0.0001 Pr(T > t) = 0.0001
Also see the results of:
ttesti 12 13.21 1.05 9 11 1.01, level(90)
ttesti 12 13.21 1.05 9 11 1.01, level(99)

Some Operations on
Events
1/4/2021 53
• LetAand B be two events defined on the sample space Ω.
1. Union of Two events: (AUB)
– The eventAUB consists of all outcomes inAor in B or in
bothAand B.
– The eventAUB occurs ifAoccurs, or B occurs, or bothA
and B occur.

2. Intersection of Two Events: (AՈB)
– The eventAՈB consists of all outcomes in bothAand B.
– The eventAՈB occurs if bothAand B occur.
1/4/2021 54

3. Complement of an Event: 𝑨 or (AC) or (A′)
– The complement of the eventAis denoted by 𝐴.
– The eventAconsists of all outcomes of Ω that are not inA.
– The event 𝐴 occurs ifAdoes not.
1/4/2021 55

Example: (Classical
Probability)
1/4/2021 56
• Experiment: Selecting a patient randomly from a hospital
room having six beds numbered 1, 2, 3, 4, 5, and 6.
• Define the following events:
(1) E1UE2= {1; 2, 3, 4, 6} selecting an even number or a number
less than 4:

2) E1UE4= {1, 2, 3, 4, 5, 6}= Ω =selecting an even number
or an odd number.
1/4/2021 57
• It can be shown that E1UE4 = Ω where E1 and E4 are called
exhaustive events.
• The union of these events gives the whole sample space.

3) E1ՈE2 ={2} = selecting an even number and a
number less than 4.
1/4/2021 58
4) E1ՈE4 =∅= selecting an even number and an odd number.

• E1ՈE2 =∅= In this case, E1 and E4 are called disjoint (or
mutually exclusive) events.
• These kinds of events cannot occur simultaneously (together at
the same time).
1/4/2021 59

5) The complement of
E1
1/4/2021 60

• Mutually Exclusive (Disjoint) Events
– The eventsAand B are disjoint (or mutually exclusive) if
E1ՈE2 =∅
I. P(AՈB)=0
II. P(AUB)=P(A) + P(B)
1/4/2021 61

• Exhaustive Events
1/4/2021 62

General Probability
Rules
1/4/2021 63

Marginal
Probability
1/4/2021 66
• Given some variable that can be broken down into (m) categories
designated by A1, A2, . . ., Am and another jointly occurring
variable that is broken down into (s) categories designated by B1,
B2, . . . , Bs
• The marginal probability of Ai, P(Ai), is equal to the sum of the
joint probabilities ofAi with all categories of B.
• That is

Example: Relative Frequency or
Empirical
1/4/2021 67
• Let us consider a bivariate table for variablesAand B.
• There are three categories for both the variables,A1,A2,andA3
forAand B1, B2, and B3 for B
Joint frequency distribution for m categories ofAand s categories of B

• Joint probability distribution for m categories ofAand s
categories of B
1/4/2021 68

• Number of elements in each cell
Probabilities of events
1/4/2021 69

Applications of Relative Frequency (Empirical Probability)
• Let us consider a hypothetical data on four types of diseases of
200 patients from a hospital as shown below:
1/4/2021 70
• Experiment: Selecting a patient at random and observe his/her
disease type. Total number of trials, sample size, in this case, is
n =200
Disease type A B C D Total
Number of patients 90 80 20 10 200

Conditional
Probability
1/4/2021 72
• The conditional probability of the eventAwhen we know that
the event B has already occurred is defined by

Multiplication Rules of
Probability
1/4/2021 74
Let us consider a hypothetical set of data on 600 adult males
classified by their ages and smoking habits as summarized
Consider the following event:
(B1|A2) = smokes daily given that age is between 30 and 39

• Two-way table displaying number of respondents by age and
smoking habit of respondents smoking habit
1/4/2021 75

Binomial
Distribution,…
1/4/2021 78
• example;
– if all birth records for a calendar year shows that 85.8% of
the pregnancies had delivery in week 37 or later.
– The 85.8% interpreted as the probability of a recorded birth
in week 37 or later
– If we randomly select five birth records from this
population, what is the probability that exactly three of the
records will be for full-term births?

Binomial Distribution,…
1/4/2021 79
• Let us designate the occurrence of a record for a full-term birth
(F) as a “success” and hasten to add that a premature birth (P)
is not a failure
• It will also be convenient to assign the number 1 to a success
and the number 0 to a failure (record of a premature birth).

1/4/2021 80
• Suppose the five birth records selected resulted in this
sequence of full term births
PFPPF
• In coded form we would write this as
10110
P(1, 0, 1, 1, 0,) =pqppq=q2p3

1/4/2021 81
• Three successes and two failures could occur in any one of the
following additional sequences as well:
From the addition rule we know that this
probability is equal to the sum of the
individual probabilities. In the present
example we need to sum the 10q2p3’s or,
equivalently, multiply q2p3 by 10.

1/4/2021 82
• Answer for original question is
• Since in the population, p=0.858;
q=(1-p)=(1- 0.858)=0.142
10q2p3 =10(0.142)2(0.858)3
=10 (0.0202)(0.6316)
= 0.1276

Combinations
1/4/2021 83
• Acombination of n objects taken x at a time is an unordered
subset of x of the n objects
• Combination is used in large sample procedures
• The number of combinations of n objects that can be formed
by taking x of them at a time is given by
n x
𝐶 =
𝑛!
𝑥! 𝑛−𝑥 !
• where x!, read x factorial, is the product of all the whole
numbers from x down to 1. That is,
x! =x(x-1)(x-2)…(1). We note that, by definition, 0!=1.

1/4/2021 84
• Let us return to our example in which we have a sample of
n=5 birth records and we are interested in finding the
probability that three of them will be for full-term births
5 3
𝐶 =
5! 5𝑥4𝑥3𝑥2𝑥1
3! 5−3 ! (3𝑥2𝑥1) 2𝑥1 12
= = 120
= 10

1/4/2021 85
• In our example we let x =3, the number of successes, so that n-
x = 2, the number of failures. We then may write the
probability of obtaining exactly x successes in n trials as
f(x)= n𝑪x 𝒒𝒏−𝒙 𝒑𝒙=n𝑪x 𝒑𝒙𝒒𝒏−𝒙 , for x=0, 1, 2, …,n
• The Binomial Parameters
– binomial distribution has two parameters, n and p
– μ=np,
– σ2=np(1-p) = npq

Poisson Distribution
• Used to model a discrete random variable representing the number of occurrences or counts of some
random events in an interval of time or space (or some volume of matter)
• The possible values of X = x are x = 0, 1, 2, 3,…
• The discrete random variable, X, is said to have a Poisson distribution with parameter (mean) λ if the
probability
1/4/2021 86
−λ 𝑥
distribution of X is given by f(x) = 𝑒 λ
𝑥 !

Poisson Distribution,…
1/4/2021 87
• where e = 2.71828 (the natural number).
• λ (lambda) is the parameter of the distribution and is the
average number of occurrences of the random event in the
interval
• The Poisson Process
– The occurrences of the events are independent
– The probability of the single occurrence of the event in a
given interval is proportional to the length of the interval

1/4/2021 88
– In any infinitesimally small portion of the interval, the probability of more
than one occurrence of the event is negligible
Example
– In a study of drug-induced anaphylaxis among patients taking rocuronium
bromide as part of their anesthesia the occurrence of anaphylaxis followed a
Poisson model with λ=12 incidents per year in Norway

1/4/2021 89
– Find the probability that in the next year, among patients
receiving rocuronium, exactly three will experience
anaphylaxis
3!
−12 3
f(x=3) = 𝑒 12
= 0.00177
• What is the probability that at least three patients in the next
year will experience anaphylaxis if rocuronium is administered
with anesthesia?

Poisson
Distribution
1/4/2021 90
• Example: Suppose that the number of accidents per day in a city has
a Poisson distribution with average 2 accidents.
1. What is the probability that in a day
I. the number of accidents will be 5,
II. the number of accidents will be less than 2.
2. What is the probability that there will be six accidents in 2
days?
3. What is the probability that there will be no accidents in an
hour?

Poisson
Distribution,…
1/4/2021 91
5!
−2 5
1. P (X =5) =𝑒 2
= 0.036089
2. P(X<2)=P(X=0) + P(X=1) =
−2 0 −2 1
= 𝑒 2
+ 𝑒 2
=0.135335 + 0. 270670= 0.406005.
0! 1!

Probability Distributions of
Continuous data
1/4/2021 92
• A non-negative function f(x) of the continuous random
variable X if the total area bounded by its curve and the x -axis
is equal to 1 and if the subarea under the curve bounded by the
curve, the x -axis, and perpendiculars erected at any two points
a and b give the probability that X is between the points a and
b.
• Also known as probability density function

Probability Distributions of Continuous
data…
1/4/2021 93
Graph of a continuous distribution showing area between a
and b.

Normal
distribution
1/4/2021 94
• Known as the Gaussian distribution
• The normal density is given by
2𝜋
𝜎
f(x)= 1
𝑒− 𝑥−𝜇 2
/2𝜎2
, −∞ < 𝑥 < ∞;
where (e = 2.71828) and (π = 3.14159).
• The parameters of the distribution are the µ and the σ2
X ~N (μ,σ2).

Normal
Distribution,…
1/4/2021 95
• The density function of X, f(x), is a bell-shaped curve
– The highest point of the curve of f(x) is at the mean μ.
Hence, the mode = mean = μ.
– The curve of f(x) is symmetric about the mean μ.
– In other words, mean = mode = median
– The area under the curve is 1

Standard Normal
Distribution
1/4/2021 96
• the standard normal distribution with mean µ = 0 and variance
σ2 =1
• Denoted by Normal (0,1) or N(0,1).
• The standard normal random variable is denoted by Z, and we
write Z~N(0,1)
Z=𝑥−𝜇
𝜎
• The equation for the standard normal distribution is written
Z=
1
2𝜋
2
𝑒−𝑧 /2, −∞ < 𝑧 < ∞

Standard Normal Distribution,…
1/4/2021 97
The standard normal distribution
The z-transformation is useful in application of normal distribution

1/4/2021 98
• Z-transformation that yields a value of Z, Z=1 indicates that
the value of x used in the transformation is 1 standard
deviation above 0.
• A value of Z = -1 indicates that the value of x used in the
transformation is 1 standard deviation below 0.

1/4/2021 99
• Example;
– What is the probability that a z picked at random from the
population of z’s will have a value between -2.55 and 2.55?
answer:
P(-2.55<z<2.55)=0.9946-
0.0054
=0.9892

1/4/2021 100

Application of normal distribution
1/4/2021 101
• Normal distribution is not a law that is adhered to by all
measurable characteristics occurring in nature
• However, many of these characteristics are approximately
normally distributed
• Used to model the distribution of many variables that are of interest
• Allows us to make useful probability statements about some variables
conveniently than would be the case if some more complicated model had to be
used

Application,…
1/4/2021 102
• Example:
– Let us consider weight of women in reproductive age
follows a normal distribution with mean 49 kg and variance
25 kg2
a. Find the probability that a randomly chosen woman in
her reproductive age has weight less than 45 kg.
b. What is the percentage of women having weight less
than 45 kg?
c. In a population of 20,000 women of reproductive age,
how many would you expect to have weight less than
45 kg?

• Solution
– Here the random variable, X = weight of women in
reproductive age, population mean = 49 kg, population
variance= σ2 = 25 kg2, population standard deviation = σ =
5 kg. Hence, X~Normal (49,25).
a. The probability that a randomly chosen woman in
reproductive age has weight less than 45 kg is P(X<45)
1/4/2021 103

Application,…
1/4/2021 104
– The percentage of women of reproductive age who have
weight less than 45 kg is P(x<45) x100% = 0.2119 x100%
= 21.19%
• In a population of 20,000 women of reproductive age, we
would expect that the number of women with weight less than
45 kg is P(X <45)x 20,000 =0.2119 x20,000 = 4238.

Advantages of sampling:
105
• Feasibility: Sampling may be the only feasible method of
collecting the information.
• Reduced cost: Sampling reduces demands on resource such
as finance, personnel, and material.
• Greater accuracy: Sampling may lead to better accuracy of
collecting data
• Sampling error: Precise allowance can be made for sampling
error
• Greater speed: Data can be collected and summarized more
quickly

Disadvantages of sampling:
106
• There is always a sampling error.
• Sampling may create a feeling of discrimination within the
population.
• Sampling may be inadvisable where every unit in the
population is legally required to have a record.

Sampling technique
107
• There are two different approaches to sampling in survey research:
– Nonprobability sampling
– Probability sampling

 Probability sampling methods
• A sample obtained in a way that every number of the
population has a known &non-zero.
– Probability of being include in the sample i.e. involves
random selection of sample
– Involves the selection of a sample from a population,
based on chance
• Probability sampling is
– More complex,
– More time-consuming
– Usually more costly than non-probability sampling.

EXAMPLE OF SIMPLE RANDOM SAMPLING
 Age at first sex and associated factors for early sexual initiation
among students at University of AU, Central Ethiopia
– There are a total of 8, 000 students
– We want to select 700 sample students
– In this case, we assumed homogeneity with respect to age at first sex
– Their ID can be taken as frame
– Hence we can use computer generated random number to select 700
students randomly

 Steps in systematic random sampling
1.Number the units on your frame from 1 to N (where N is the
total population size).& n=sample size
2. Determine the sampling interval (K) by dividing the number of
units in the population by the desired sample size. K=N/n k=sampling
interval=population size n=sample size
3. Draw a random number between one and K. This number is called the
random start and would be the first number included in your sample.
– Let the selected number be j
4. Select every Kth unit after that first number j, j+k, j+2k, j+3k----
-----------------j+nk

EXAMPLE
 A systematic sample is to be selected from 1200 students of a school.
The sample size selected is 100. The sampling fraction is (skip interval)
k=1200/100=12
• The number of the first student to be included in the sample is chosen
randomly, for example by blindly 30 picking one out of twelve pieces of
paper, numbered 1- 12.
• If number 6 is picked, then every twelfth student will be included in the
sample, starting with student number 6, until 100 students are selected:
then numbers selected would be 6, 18, 30, 42, etc.

Stratified sampling …
The procedures are:
– Divided the total population into different homogeneous
subgroups (strata)
– Allocate sample for each strata (ni)
• Proportional allocation (ni =Ni(n/N))» Where
 ni =sample for each strata
Ni=total population of each strata
n=required sample size
N=total population of the
• Disproportional (equal allocation) is some times also
possible

 Example
• A survey is conducted on household water supply in a district comprising
20,000 households, of which 20% are urban and 80% rural
• It is suspected that in urban areas the access to safe water sources is much
more satisfactory. The total population of the district is 10, 000 (urban=4000
and rural=6000). The sample size required has been decided to be 300
• Allocate the sample proportionally for both strata?
n
urban= 4000*300/10,000=120
n
rural= 6000*300/10,000=180

 Steps in cluster sampling
• The reference population (homogeneous) is divided into clusters.
– These clusters are often geographic units (e.g. districts, villages, etc.).
• A number of clusters are selected randomly to represent the total population,
and then all units within selected clusters are included in the sample.
• No units from non-selected clusters are included in the sample—they are
represented by those selected clusters
– This differs from stratified sampling, where some units are selected from
each group
– All the units in the selected clusters are studied

 Example
• In a study of knowledge, attitudes, and practices related to family planning in
rural communities of a region, a list is made of all the villages.
• Using this list, a random sample of villages is chosen and all the adults in the
selected villages are interviewed

Multi-stage sampling
 In a study of utilization of pit latrines in a district, 150 homesteads are to be
visited for interviews with family members as well as for observations on
types and cleanliness of latrines.
• The district is composed of six wards and each ward has between six and
nine villages.
• The following four stage sampling procedure could be performed:
– Select three wards out of the six by simple random sampling
– For each ward, select five villages by simple random sampling (15 villages
in total)

 For each village select ten households. Because simply choosing
households in the center of the village would produce a biased
sample, the following systematic sampling procedure is
proposed:
– Go to the center of the village
– Choose a direction in random way
– Walk in the chosen direction and select every third or every fifth
household (depending on the size of the village) until you have the ten you
need.

PROBLEM
A population of cancer patients has survival standard
deviation of 43.4 months. If one wants to conduct a
study on these populations how large sample size is
needed, so that 95% of the sample mean of this size will
be within ±6 months of the population mean. Population
size is 480 patients. (85)

 In a survey of school children to determine the population of
immunized children against polio, an investigator determined the
maximum discrepancy b/n sample and population proportion of
immunized to be 0.04, at level of confidence of 99%.further the
investigator had a previous knowledge on the prevalence among
children in a similar community to be 90% and the total
population of school children is 800.

 The mean weight of 100 children who are 5 years old in a certain
locality is found to be 14 kg. A clinician wants to know the mean
weight of all the children in that locality with 95 % confidence
interval, if it is known that the SD for all children is 4kg

 suppose a survey conducted on a reprehensive
sample of 900 newborn babies in A/A and it is
found that their average weight at birth is 3.5 kg
with SD of 0.5Kg. estimate the wt of newborn
babies in A/A at 95% level of confidence.

 sample of 20 houses studied to estimate the
mean sprayable area of house for controlling of
malaria
epidemic. The result was =22.9m2, SD is
6.0m.construct CI for mean sprayable of area of
the
population with 95% confidence.

 A random sample of 100 people shows that 25
are left-handed. Form a 95% CI for the true
proportion of
left-handers.

 In a clinical trial for a new drug to treat hypertension, N1 = 50
patients were randomly assigned to receive the new drug, and N2 =
50 patients to receive a placebo. 34 of the patients receiving the drug
showed improvement, while 15 of those receiving placebo showed
improvement.
– Compute a 95% CI estimate for the difference between proportions
improved.

 A simple random sample of 10 people from a certain
population has a mean age of 27. Can we conclude that the
mean age of the population is not 30? The variance is known
to be 20. Let CL = .95.
Data
n = 10, sample mean = 27, 2 = 20, α = 0.05
B. Assumptions
Simple random sample
Normally distributed population

 A simple random sample of 14 people from a certain
population gives a sample mean body mass index (BMI)
of 30.5 and sd of 10.64. Can we conclude that the BMI
is not 35 at α 5%?

 The means SUA levels on 12 individuals with Down’s
syndrome and 15 normal individuals are 4.5 and 3.4
mg/100 ml, respectively. With variances. ( 2=1,
2=1.5, respectively). Is there a difference between the
means of both groups at α 5%?

 We wish to know if we may conclude, at the 95%
confidence level, that smokers, in general, have
greater lung damage than do non-smokers.

 In the general population of 0 to 4-year-olds, the annual
incidence of asthma is 1.4%. If 10 cases of asthma are
observed over a single year in a sample of 500 children
whose mothers smoke, can we
conclude that this is different from the underlying
probability of p0 = 0.014 (or p=1.4%)? cl = 95%

 Among the 225 students who ate the sandwiches, 109 became ill.
While, among the 38 students who did not eat the sandwiches, 4
became ill. Is there a significant difference between the two
groups at α
=5%

A+.pptx

Recommended

Recommended

More Related Content

Similar to A+.pptx

Similar to A+.pptx (20)

More from MohammedAbdela7

More from MohammedAbdela7 (20)

Recently uploaded

Recently uploaded (20)

A+.pptx