SlideShare a Scribd company logo
1 of 132
Confidence Intervals
1
ā€¢ We can also put confidence intervals around other sample statistics
too (not just the sample mean)
ā€¢ The CI provides the precision for the sample statistic:
100(1-Ī±)% CI: estimate Ā± k x (standard error)
ā€¢ So the CI equals our estimate of the sample statistic plus/minus some multiple of
the standard error
ā€¢ A 95% CI means that in 95 out of 100 samples of size n, the CI would contain the
population parameter
ā€¢ Also common: 90% or 99% CIā€™s
Confidence Intervals
2
š’™
ą“„Ā± š’
šˆ
š’
The width of the confidence interval depends on:
ā€¢ the variation in the population (i.e. the standard deviation šœŽ which is
fixed or possibly unknown) ā€“ the more variation the wider the CI
ā€¢ The sample size n ā€“ the larger the sample the narrower the CI
ā€¢ The significance level ā€“ 99% CI is wider than a 95% CI because we
have to be ā€œmore sureā€ to capture the true value
100(1-Ī±)% CI: estimate Ā± k x (standard error)
Confidence Intervals
3
š‘„Ņ§ Ā±
š‘
šœŽ
š‘›
šœŽ
š‘›
ā€¢ Why does the sample size n have to be large?
ā€¢ CIā€™s are valid when the CLT holds
ā€¢ The sample size n affects the standard error
ā€¢ Remember that a fraction gets closer to its numerator
as itā€™s denominator gets larger
ā€¢ As š‘› ā†’ āˆž the SE
šœŽ
š‘›
gets closer and closer to šœŽ
100(1-Ī±)% CI: estimate Ā± k x (standard error)
Calculating Confidence Intervals
4
a) Single Mean
b) Difference between means - independent samples
c) Difference between means - dependent samples
d) Single Proportion
e) Difference between proportions - independent samples
Calculating Confidence Intervals
5
a) CI for a Single Mean (known šœŽ)
ā€¢ If the sample size is large and the population SD is
knownā€¦.
ā€¢ The CI formula for a single mean is:
š‘„Ņ§ Ā±
š‘
šœŽ
š‘›
š‘„Ņ§ is the sample mean
šœŽ is the population standard deviation
š‘› is the sample size
š‘ is a cutoff from the standard normal distribution
90% CI: š‘ = 1.64
95% CI: š‘ = 1.96
99% CI: š‘ = 2.58
Calculating Confidence Intervals
6
a) CI for a Single Mean (known šœŽ) EXAMPLE
ā€¢ We want to estimate the age of death for the US population
ā€¢ We are told the population SD is 20.2 years
ā€¢ In a sample of 100 people we calculate a mean age of death of 72.1
years
ā€¢ a 95% confidence interval for the mean age of death is:
š‘„Ņ§ Ā± š‘ šœŽ
š‘›
where:
š‘„Ņ§ = 72.1
šœŽ = 20.2
š‘› = 100
š‘ =1.96
72.1 + 1.96 20.2
ā‰ˆ 76.06
100
72.1 āˆ’ 1.96 20.2
ā‰ˆ 68.14
100
95% CI: (68.14, 76.06)
C.I. for the difference between
population means (normally
distributed)
7
ā€¢ Known variance (2 independent samples)A
100(1ā€Ī±)% C.I. for Ī¼1 ā€ Ī¼2 is
š‘„
ą“„
1 āˆ’ š‘„
ą“„
2 =ZĪ±/2
1
+ 2
šœŽ 2 šœŽ 2
š‘›1 š‘›2
Calculating Confidence Intervals
8
a) CI for a Single Mean (unknown šœŽ)
ā€¢ Usually the population SD is not known
ā€¢ We use the sample standard deviation š‘  instead
ā€¢ Use t-distribution instead of standard normal
ā€¢ The CI formula for a single mean is:
š‘„Ņ§ Ā±
š‘”
š‘ 
š‘›
š‘„Ņ§ is the sample mean
š‘  is the sample standard deviation
š‘› is the sample size
š‘” is a cut-off from the t-distribution with š‘› āˆ’ 1 degrees of freedom
Calculating Confidence Intervals
9
Studentā€™s t-distribution
ā€¢ Similar to the normal distribution but fatter tails
ā€¢ Accounts for extra uncertainly from not knowing šœŽ
ā€¢ Afamily of curves determined by two parameters:
ā€“ significance level š›¼
ā€“ degrees of freedom df
ā€¢ The degrees of freedom is š‘› āˆ’ 1 because weā€™ve already used one piece of
information related to the variance to estimate š‘ 
Calculating Confidence Intervals
10
Studentā€™s t-distribution
ā€¢ Calculate cut-off values using Stata using the command:
. display invttail(df, p)
where š‘‘š‘“ = š‘› āˆ’ 1 and p is the area in the right tail
ā€¢ For a 95% CI use š‘‘š‘“ = š‘› āˆ’ 1 and p=0.025
ā€¢ Example: if n=100 the t-value for a 95% CI would be
. display invttail(99, 0.025)
=1.984217
Calculating Confidence Intervals
11
a) CI for a Single Mean (unknown šœŽ) EXAMPLE
Using formula:
š‘„Ņ§ Ā±
š‘”
š‘ 
š‘›
The t-value has df=n-1=462-1 and p=0.025
. display invttail(461,0.025)
1.9651232
. display 87.93723 - 1.9651232 * 16.00469 / sqrt(462)=86.473988
. display 87.93723 + 1.9651232 * 16.00469 / sqrt(462)=89.400472
The 95% CI for mean zinc is (86.5, 89.4)
n š‘„Ņ§ s
Calculating Confidence Intervals
12
a) CI for a Single Mean (unknown šœŽ) EXAMPLE
Using Stata:
cii means n mean sd
ci means varname
n mean sd
Calculating Confidence Intervals
13
Effect of significance on width of CI:
95% CI is the default (š›¼ = 0.05):
Use level () to change sig. to 99% (š›¼ = 0.01):
The CI becomes WIDER
[86.0ā€¦[86.4ā€¦ š‘„ ā€¦89.4]ā€¦89.9]
95% 99%
Calculating Confidence Intervals
14
Effect of sample size on width of CI:
If we increase the sample size from 31 to 131
The standard error decreases andā€¦
The CI becomes NARROWER n=131 n=31
[145.4ā€¦[145.6ā€¦ š‘„ ā€¦148.6]ā€¦149.8]
Calculating Confidence Intervals
15
Effect of SD on width of CI:
If the standard deviation increases from 6 to 10
The standard error increases andā€¦ā€¦
The CI becomes WIDER s=6 s=10
[143.9ā€¦[145.4ā€¦ š‘„ ā€¦148.8]ā€¦151.3]
Calculating Confidence
Intervals
Comparing Z and t:
ā€¢ For large n, the t-distribution approximates the normal
ā€¢ Suppose the sample mean age of death is š‘„Ņ§ = 72.1 yrs and that the sample SD is equal t
o
the population SD (s = šœŽ = 20.2 yrs)
43
CI formula: Large sample size
(n=100) 95% CI:
Small sample size (n=10)
95% CI:
Normal dist.
šœŽ
š‘„Ņ§ Ā± š‘
š‘›
Z=1.96
72.1 Ā± 1.96 20.2
ā‰ˆ
100
(68.14, 76.06)
Z=1.96
72.1 Ā± 1.96 20.2
ā‰ˆ
10
(59.58, 84.62)
t-dist.
š‘ 
š‘„Ņ§ Ā± š‘”
š‘›
t=1.98
72.1 Ā± 1.98 20.2
ā‰ˆ
100
(68.10, 76.10)
t=2.26
72.1 Ā± 2.26 20.2
ā‰ˆ
10
(57.66, 86.54)
Similar CI (large n) t has wider CI than Z (small n)
Calculating Confidence
Intervals
ļµ b) CI for a difference in means (independent samples)
ļµ Suppose we have two independent groups of data and calculate a sample mean and sample
for each. The CI formula is:
17
(š‘„1 āˆ’ š‘„2) Ā± š‘” š‘
š‘ 2
1 1
+
š‘›1 š‘›2
Where:
š‘„1 and š‘„2 are the sample means
š‘›1 and š‘›2 are the sample sizes
š‘” is a cut-off from the t-distribution with š‘‘š‘“ = š‘›1 + š‘›2 āˆ’ 2
š‘
š‘ š‘ is the pooled variance š‘ 2 =
š‘›1āˆ’1 š‘ 2+ š‘›2āˆ’1 š‘ 2
1 2
š‘›1+š‘›2āˆ’2
where š‘ 1 and š‘ 2 are sample
standard deviations
Calculating Confidence Intervals
18
b) CI for a difference in means (independent
samples)
Assumptions:
1. The population standard deviations are approximately equal.
We check this by comparing the sample standard deviations.
2. For small sample sizes (say n<100) the population distribution should
approximately follow the normal distribution.
This is checked by assessing the sampling distribution for normality. The
assumption is fairly robust in that the formula is valid as long as the
distribution of data in the sample is approximately mound shaped and
symmetrical.
3. The 2 groups are independent.
4. The subjects within the 2 groups are independent.
Calculating Confidence Intervals
19
b) CI for a difference in means (independent
samples)
Example using Stata command:
ttesti š‘›1 š‘„1 š‘ 1 š‘›2 š‘„2 š‘ 2
Calculating Confidence Intervals
20
b) CI for a difference in means (independent
samples)
ttesti assumes the sample SDs are equal by default
Use unequal option if š‘ 1 and š‘ 2 are not similar
df is
affected
Calculating Confidence
Intervals
21
b) CI for a difference in means (independent
samples)
Suppose we want a CI for the difference in mean height between men
and women (assuming independence and normality holdā€¦)
ttest varname, by(groupvar) Approx.
equal SDs
Calculating Confidence
Intervals
22
c) CI for a difference in means (dependent
samples)
ā€¢ Dependent samples occur with two groups of paired or
matched data
ā€¢ Usually equal sample sizes in the 2 groups (1:1)
e.g.
ā€“ patient blood pressure before and after a treatment
ā€“ patient left leg and right leg measurements
ā€“ Two groups where pairs of people have been matched on
important demographics (age, sex, etc.)
Calculating Confidence Intervals
ļµ c) CI for a difference in means (dependent samples)
ļµ 1. Calculate the pair differences š‘‘
ļµ e.g. for each patient, d = BP_after ā€“ BP_before
2. Find the mean š‘‘Ņ§ and standard deviation š‘ š‘‘ of the pair
ļµ differences
3. The CI for the mean pair differences is:
ļµ š‘‘Ņ§ Ā± š‘”
š‘ š‘‘
23
š‘›
Where š‘› is the number of pairs and t has š‘‘š‘“ = š‘› āˆ’ 1
Calculating Confidence Intervals
24
c) CI for a difference in means (dependent
samples) EXAMPLE
ā€¢ The heartrates of 20 patients
before and after a treatment
ā€¢ Want a 95% CI for the difference in
mean heartrate
Calculating Confidence Intervals
25
c) CI for a difference in means (dependent
samples) EXAMPLE
First, calculate the differences
Then use the formula š‘‘Ņ§ Ā± š‘”
š‘ š‘‘ š‘›
The 95% CI is: (1.5, 11.2)
Calculating Confidence
Intervals
26
c) CI for a difference in means (dependent
samples) EXAMPLE
OR using any of these Stata commandsā€¦
ttesti š‘›1 š‘„Ņ§š‘ 1 0
Calculating Confidence
Intervals
27
š‘Ęø =
d) CI for a single proportion
ā€¢ Suppose we have a population of subjects and some of
them have a characteristic of interest and the rest donā€™t
ā€“ e.g. being female, having a cancer diagnosis, survived
ā€¢ We want to estimate the true proportion p who have the
characteristic of interest
ā€¢ If r is the number of sample subjects that have the
characteristic and n is the sample size then the sample
proportion is:
š‘Ÿ
š‘›
Calculating Confidence Intervals
28
š‘†šø š‘Ęø =
d) CI for a single proportion
ā€¢ If n is large enough and p is not too extreme, then the
sampling distribution of š‘Ęø is normally distributed (CLT)
ā€¢ The standard error of a proportion is:
š‘Ęø(1 āˆ’ š‘Ęø)
š‘›
š‘Ęø Ā±
š‘
ā€¢ The CI formula for a single proportion is:
š‘Ęø(1 āˆ’ š‘Ęø)
š‘›
where Z is a standard normal cut-off (95% CI: Z=1.96)
Calculating Confidence
Intervals
29
d) CI for a single proportion
š‘Ęø Ā±
š‘
š‘Ęø(1 āˆ’
š‘
Ęø
) š‘›
ā€¢ This formula assumes the rule of thumb:
š’š’‘
ą· and š§(šŸ āˆ’ š’‘
ą· ) must both be greater than 5
ā€¢ (n must be large enough too)
ā€¢ Otherwise, the formula is not valid and weā€™d have to use
exact binomial values instead of Z cut-offs
Calculating Confidence Intervals
d) CI for a single proportion EXAMPLE
ā€¢ Consider the 5-yr survival for lung cancer patients (Pagano p328).
ā€¢ We want to estimate the proportion p who survive 5 yrs since dx.
ā€¢ In a random sample of n=52 patients only r=6 survive 5 yrs
(š‘Ęø=r/n=6/52~0.12).
Ęø ā‰ˆ 45.76 so the rule of thumb holds
ā€¢ Check nš‘Ęø ā‰ˆ 6.24 and n 1 āˆ’ š‘
ā€¢ A 95% CI for p is:
š‘Ęø Ā±
š‘
š‘
ą· (
1
āˆ’
š‘
ą· )
š‘›
95% CI: (0.03, 0.21)
So between 3% and 21% of ptx with lung cancer survive 5 yrs after
dx. 57
Calculating Confidence Intervals
31
d) CI for a single proportion EXAMPLE
ā€¢ In a random sample of n=52 patients only r=6 survive 5 yrs
Using Stata to find the 95% CI:
cii proportions n r
Why is this answer different to the one we calculated with the formula?
Stata uses exact binomial values instead of the normal approximation
to the binomial
Calculating Confidence Intervals
Similarly to the CI for a single mean, the width of the CI for
a single proportion is affected by:
32
ā€¢ The sample size n
ā€“ increasing the sample size makes the CI narrower/more precise
i.e. small samples have wider CI/less precision
ā€“ The standard error decreases as n increases
ā€¢ The significance level
ā€“ A 99% CI is wider than a 95% CI which is wider than a 90% CI
Calculating Confidence
Intervals
33
e) CI for a difference in proportions (independent
samples)
ā€¢ One group has sample proportion š‘
ą· 1and sample size
š‘›1
ā€¢ Second group has sample proportion š‘
ą· 2and sample
size
š‘›2
ā€¢ The CI formula is:
š‘
ą· 1āˆ’ š‘
ą· 2
Ā± š‘
š‘
ą· 1 1
āˆ’ š‘
ą· 1
š‘
ą· 2
1 āˆ’ š‘
ą· 2
+
š‘›1 š‘›2
ā€¢ The RHS looks complicated but really itā€™s just the standard error for
š‘
ą· 1 āˆ’ š‘
ą· 2
š‘†šø = š‘£š‘Žš‘Ÿ(š‘
ą· 1) āˆ’ š‘£š‘Žš‘Ÿ(š‘
ą· 2)
Calculating Confidence Intervals
34
e)
CI for a difference in proportions (independent
samples)
š‘
ą· 1āˆ’ š‘
ą· 2
Ā± š‘
š‘
ą· 1 1 āˆ’ š‘
ą· 1
š‘
ą· 2 1 āˆ’ š‘
ą· 2
+
š‘›1 š‘›2
ā€¢ The formula is only valid for large samples and not too extreme
values of š‘1 āˆ’ š‘2
ā€¢ The rule of thumb is: if š‘Ęø =
š‘Ÿ1+š‘Ÿ2
š‘›1+š‘›2
š’šŸš’‘
ą· and š’šŸ(šŸ āˆ’ š’‘
ą· ) must both be greater than 5
and
š’šŸš’‘
ą· and š’šŸ(šŸ āˆ’ š’‘
ą· ) must both be greater than 5
ā€¢ you need to be able to check the rule of thumb, and use the CI
formula (we will calculate the CI using Stata)
Calculating Confidence Intervals
35
e) CI for a difference in proportions (independent samples)
EXAMPLE
Stata command:
prtesti š‘›1 š‘Ÿ1 š‘›2 š‘Ÿ2 , count
š‘›1 = 100 š‘Ÿ1= 80
š‘›2 = 100 š‘Ÿ2 = 50
Calculating Confidence Intervals
36
e) CI for a difference in proportions (independent
samples) EXAMPLE
The difference in proportion of ptx who had pain relief by surgery and by meds
was between 17% and 43% (a higher % of surgery ptx had pain relief
compared to med ptx)
Difference in
sample
proportions
One more thingā€¦
37
ā€¢ Some Stata commands used in the lectures/tutes and
Modules are different in previous versions of Stata:
If you are using Stata 14
(the latest version) the CI
commands are:
cii means n mean sd
ci means varname
cii proportions n r
If you are using an older
version of Stata the CI
commands are:
cii n mean sd
ci varname
cii n r
Exercise
38
39
40
And use Stataā€™s cii command to produce the 95% confidence interval:
41
Exercise
42
t
43
invttail(19,0.005)
71
Confirm using ttest in Stata:
45
ttesti 12 13.21 1.05 9 11 1.01, level(95)
Two-sample t test with equal variances
| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
+
x.|
y.|
12
9
13.21
11
.3031089
.3366667
1.05
1.01
12.54286
10.22365
13.87714
11.77635
+
combined | 21 12.26286 .328802 1.50676 11.57699 12.94873
+
diff | 2.21 .455663 1.256286 3.163714
diff = mean(x) - mean(y) t = 4.8501
Ho: diff = 0 degrees of freedom = 19
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.9999 Pr(|T| > |t|) = 0.0001 Pr(T > t) = 0.0001
Also see the results of:
ttesti 12 13.21 1.05 9 11 1.01, level(90)
ttesti 12 13.21 1.05 9 11 1.01, level(99)
Exercise 4.3
46
47
Exercise
48
Also note that:
49
50
51
Some Operations on
Events
1/4/2021 53
ā€¢ LetAand B be two events defined on the sample space Ī©.
1. Union of Two events: (AUB)
ā€“ The eventAUB consists of all outcomes inAor in B or in
bothAand B.
ā€“ The eventAUB occurs ifAoccurs, or B occurs, or bothA
and B occur.
2. Intersection of Two Events: (AՈB)
ā€“ The eventAՈB consists of all outcomes in bothAand B.
ā€“ The eventAՈB occurs if bothAand B occur.
1/4/2021 54
3. Complement of an Event: š‘Ø or (AC) or (Aā€²)
ā€“ The complement of the eventAis denoted by š“.
ā€“ The eventAconsists of all outcomes of Ī© that are not inA.
ā€“ The event š“ occurs ifAdoes not.
1/4/2021 55
Example: (Classical
Probability)
1/4/2021 56
ā€¢ Experiment: Selecting a patient randomly from a hospital
room having six beds numbered 1, 2, 3, 4, 5, and 6.
ā€¢ Define the following events:
(1) E1UE2= {1; 2, 3, 4, 6} selecting an even number or a number
less than 4:
2) E1UE4= {1, 2, 3, 4, 5, 6}= Ī© =selecting an even number
or an odd number.
1/4/2021 57
ā€¢ It can be shown that E1UE4 = Ī© where E1 and E4 are called
exhaustive events.
ā€¢ The union of these events gives the whole sample space.
3) E1ՈE2 ={2} = selecting an even number and a
number less than 4.
1/4/2021 58
4) E1ՈE4 =āˆ…= selecting an even number and an odd number.
ā€¢ E1ՈE2 =āˆ…= In this case, E1 and E4 are called disjoint (or
mutually exclusive) events.
ā€¢ These kinds of events cannot occur simultaneously (together at
the same time).
1/4/2021 59
5) The complement of
E1
1/4/2021 60
ā€¢ Mutually Exclusive (Disjoint) Events
ā€“ The eventsAand B are disjoint (or mutually exclusive) if
E1ՈE2 =āˆ…
I. P(AՈB)=0
II. P(AUB)=P(A) + P(B)
1/4/2021 61
ā€¢ Exhaustive Events
1/4/2021 62
General Probability
Rules
1/4/2021 63
The Addition
Rule
1/4/2021 64
1/4/2021 65
Marginal
Probability
1/4/2021 66
ā€¢ Given some variable that can be broken down into (m) categories
designated by A1, A2, . . ., Am and another jointly occurring
variable that is broken down into (s) categories designated by B1,
B2, . . . , Bs
ā€¢ The marginal probability of Ai, P(Ai), is equal to the sum of the
joint probabilities ofAi with all categories of B.
ā€¢ That is
Example: Relative Frequency or
Empirical
1/4/2021 67
ā€¢ Let us consider a bivariate table for variablesAand B.
ā€¢ There are three categories for both the variables,A1,A2,andA3
forAand B1, B2, and B3 for B
Joint frequency distribution for m categories ofAand s categories of B
ā€¢ Joint probability distribution for m categories ofAand s
categories of B
1/4/2021 68
ā€¢ Number of elements in each cell
Probabilities of events
1/4/2021 69
Applications of Relative Frequency (Empirical Probability)
ā€¢ Let us consider a hypothetical data on four types of diseases of
200 patients from a hospital as shown below:
1/4/2021 70
ā€¢ Experiment: Selecting a patient at random and observe his/her
disease type. Total number of trials, sample size, in this case, is
n =200
Disease type A B C D Total
Number of patients 90 80 20 10 200
1/4/2021 71
Conditional
Probability
1/4/2021 72
ā€¢ The conditional probability of the eventAwhen we know that
the event B has already occurred is defined by
1/4/2021 73
Multiplication Rules of
Probability
1/4/2021 74
Let us consider a hypothetical set of data on 600 adult males
classified by their ages and smoking habits as summarized
Consider the following event:
(B1|A2) = smokes daily given that age is between 30 and 39
ā€¢ Two-way table displaying number of respondents by age and
smoking habit of respondents smoking habit
1/4/2021 75
1/4/2021 76
Binomial
Distribution,ā€¦
1/4/2021 78
ā€¢ example;
ā€“ if all birth records for a calendar year shows that 85.8% of
the pregnancies had delivery in week 37 or later.
ā€“ The 85.8% interpreted as the probability of a recorded birth
in week 37 or later
ā€“ If we randomly select five birth records from this
population, what is the probability that exactly three of the
records will be for full-term births?
Binomial Distribution,ā€¦
1/4/2021 79
ā€¢ Let us designate the occurrence of a record for a full-term birth
(F) as a ā€œsuccessā€ and hasten to add that a premature birth (P)
is not a failure
ā€¢ It will also be convenient to assign the number 1 to a success
and the number 0 to a failure (record of a premature birth).
Binomial Distribution,ā€¦
1/4/2021 80
ā€¢ Suppose the five birth records selected resulted in this
sequence of full term births
PFPPF
ā€¢ In coded form we would write this as
10110
P(1, 0, 1, 1, 0,) =pqppq=q2p3
Binomial Distribution,ā€¦
1/4/2021 81
ā€¢ Three successes and two failures could occur in any one of the
following additional sequences as well:
From the addition rule we know that this
probability is equal to the sum of the
individual probabilities. In the present
example we need to sum the 10q2p3ā€™s or,
equivalently, multiply q2p3 by 10.
Binomial Distribution,ā€¦
1/4/2021 82
ā€¢ Answer for original question is
ā€¢ Since in the population, p=0.858;
q=(1-p)=(1- 0.858)=0.142
10q2p3 =10(0.142)2(0.858)3
=10 (0.0202)(0.6316)
= 0.1276
Combinations
1/4/2021 83
ā€¢ Acombination of n objects taken x at a time is an unordered
subset of x of the n objects
ā€¢ Combination is used in large sample procedures
ā€¢ The number of combinations of n objects that can be formed
by taking x of them at a time is given by
n x
š¶ =
š‘›!
š‘„! š‘›āˆ’š‘„ !
ā€¢ where x!, read x factorial, is the product of all the whole
numbers from x down to 1. That is,
x! =x(x-1)(x-2)ā€¦(1). We note that, by definition, 0!=1.
Binomial Distribution,ā€¦
1/4/2021 84
ā€¢ Let us return to our example in which we have a sample of
n=5 birth records and we are interested in finding the
probability that three of them will be for full-term births
5 3
š¶ =
5! 5š‘„4š‘„3š‘„2š‘„1
3! 5āˆ’3 ! (3š‘„2š‘„1) 2š‘„1 12
= = 120
= 10
Binomial Distribution,ā€¦
1/4/2021 85
ā€¢ In our example we let x =3, the number of successes, so that n-
x = 2, the number of failures. We then may write the
probability of obtaining exactly x successes in n trials as
f(x)= nš‘Ŗx š’’š’āˆ’š’™ š’‘š’™=nš‘Ŗx š’‘š’™š’’š’āˆ’š’™ , for x=0, 1, 2, ā€¦,n
ā€¢ The Binomial Parameters
ā€“ binomial distribution has two parameters, n and p
ā€“ Ī¼=np,
ā€“ Ļƒ2=np(1-p) = npq
Poisson Distribution
ā€¢ Used to model a discrete random variable representing the number of occurrences or counts of some
random events in an interval of time or space (or some volume of matter)
ā€¢ The possible values of X = x are x = 0, 1, 2, 3,ā€¦
ā€¢ The discrete random variable, X, is said to have a Poisson distribution with parameter (mean) Ī» if the
probability
1/4/2021 86
āˆ’Ī» š‘„
distribution of X is given by f(x) = š‘’ Ī»
š‘„ !
Poisson Distribution,ā€¦
1/4/2021 87
ā€¢ where e = 2.71828 (the natural number).
ā€¢ Ī» (lambda) is the parameter of the distribution and is the
average number of occurrences of the random event in the
interval
ā€¢ The Poisson Process
ā€“ The occurrences of the events are independent
ā€“ The probability of the single occurrence of the event in a
given interval is proportional to the length of the interval
Poisson Distribution,ā€¦
1/4/2021 88
ā€“ In any infinitesimally small portion of the interval, the probability of more
than one occurrence of the event is negligible
Example
ā€“ In a study of drug-induced anaphylaxis among patients taking rocuronium
bromide as part of their anesthesia the occurrence of anaphylaxis followed a
Poisson model with Ī»=12 incidents per year in Norway
Poisson Distribution,ā€¦
1/4/2021 89
ā€“ Find the probability that in the next year, among patients
receiving rocuronium, exactly three will experience
anaphylaxis
3!
āˆ’12 3
f(x=3) = š‘’ 12
= 0.00177
ā€¢ What is the probability that at least three patients in the next
year will experience anaphylaxis if rocuronium is administered
with anesthesia?
Poisson
Distribution
1/4/2021 90
ā€¢ Example: Suppose that the number of accidents per day in a city has
a Poisson distribution with average 2 accidents.
1. What is the probability that in a day
I. the number of accidents will be 5,
II. the number of accidents will be less than 2.
2. What is the probability that there will be six accidents in 2
days?
3. What is the probability that there will be no accidents in an
hour?
Poisson
Distribution,ā€¦
1/4/2021 91
5!
āˆ’2 5
1. P (X =5) =š‘’ 2
= 0.036089
2. P(X<2)=P(X=0) + P(X=1) =
āˆ’2 0 āˆ’2 1
= š‘’ 2
+ š‘’ 2
=0.135335 + 0. 270670= 0.406005.
0! 1!
Probability Distributions of
Continuous data
1/4/2021 92
ā€¢ A non-negative function f(x) of the continuous random
variable X if the total area bounded by its curve and the x -axis
is equal to 1 and if the subarea under the curve bounded by the
curve, the x -axis, and perpendiculars erected at any two points
a and b give the probability that X is between the points a and
b.
ā€¢ Also known as probability density function
Probability Distributions of Continuous
dataā€¦
1/4/2021 93
Graph of a continuous distribution showing area between a
and b.
Normal
distribution
1/4/2021 94
ā€¢ Known as the Gaussian distribution
ā€¢ The normal density is given by
2šœ‹
šœŽ
f(x)= 1
š‘’āˆ’ š‘„āˆ’šœ‡ 2
/2šœŽ2
, āˆ’āˆž < š‘„ < āˆž;
where (e = 2.71828) and (Ļ€ = 3.14159).
ā€¢ The parameters of the distribution are the Āµ and the Ļƒ2
X ~N (Ī¼,Ļƒ2).
Normal
Distribution,ā€¦
1/4/2021 95
ā€¢ The density function of X, f(x), is a bell-shaped curve
ā€“ The highest point of the curve of f(x) is at the mean Ī¼.
Hence, the mode = mean = Ī¼.
ā€“ The curve of f(x) is symmetric about the mean Ī¼.
ā€“ In other words, mean = mode = median
ā€“ The area under the curve is 1
Standard Normal
Distribution
1/4/2021 96
ā€¢ the standard normal distribution with mean Āµ = 0 and variance
Ļƒ2 =1
ā€¢ Denoted by Normal (0,1) or N(0,1).
ā€¢ The standard normal random variable is denoted by Z, and we
write Z~N(0,1)
Z=š‘„āˆ’šœ‡
šœŽ
ā€¢ The equation for the standard normal distribution is written
Z=
1
2šœ‹
2
š‘’āˆ’š‘§ /2, āˆ’āˆž < š‘§ < āˆž
Standard Normal Distribution,ā€¦
1/4/2021 97
The standard normal distribution
The z-transformation is useful in application of normal distribution
Standard Normal Distribution,ā€¦
1/4/2021 98
ā€¢ Z-transformation that yields a value of Z, Z=1 indicates that
the value of x used in the transformation is 1 standard
deviation above 0.
ā€¢ A value of Z = -1 indicates that the value of x used in the
transformation is 1 standard deviation below 0.
Standard Normal Distribution,ā€¦
1/4/2021 99
ā€¢ Example;
ā€“ What is the probability that a z picked at random from the
population of zā€™s will have a value between -2.55 and 2.55?
answer:
P(-2.55<z<2.55)=0.9946-
0.0054
=0.9892
Standard Normal Distribution,ā€¦
1/4/2021 100
Application of normal distribution
1/4/2021 101
ā€¢ Normal distribution is not a law that is adhered to by all
measurable characteristics occurring in nature
ā€¢ However, many of these characteristics are approximately
normally distributed
ā€¢ Used to model the distribution of many variables that are of interest
ā€¢ Allows us to make useful probability statements about some variables
conveniently than would be the case if some more complicated model had to be
used
Application,ā€¦
1/4/2021 102
ā€¢ Example:
ā€“ Let us consider weight of women in reproductive age
follows a normal distribution with mean 49 kg and variance
25 kg2
a. Find the probability that a randomly chosen woman in
her reproductive age has weight less than 45 kg.
b. What is the percentage of women having weight less
than 45 kg?
c. In a population of 20,000 women of reproductive age,
how many would you expect to have weight less than
45 kg?
ā€¢ Solution
ā€“ Here the random variable, X = weight of women in
reproductive age, population mean = 49 kg, population
variance= Ļƒ2 = 25 kg2, population standard deviation = Ļƒ =
5 kg. Hence, X~Normal (49,25).
a. The probability that a randomly chosen woman in
reproductive age has weight less than 45 kg is P(X<45)
1/4/2021 103
Application,ā€¦
1/4/2021 104
ā€“ The percentage of women of reproductive age who have
weight less than 45 kg is P(x<45) x100% = 0.2119 x100%
= 21.19%
ā€¢ In a population of 20,000 women of reproductive age, we
would expect that the number of women with weight less than
45 kg is P(X <45)x 20,000 =0.2119 x20,000 = 4238.
Advantages of sampling:
105
ā€¢ Feasibility: Sampling may be the only feasible method of
collecting the information.
ā€¢ Reduced cost: Sampling reduces demands on resource such
as finance, personnel, and material.
ā€¢ Greater accuracy: Sampling may lead to better accuracy of
collecting data
ā€¢ Sampling error: Precise allowance can be made for sampling
error
ā€¢ Greater speed: Data can be collected and summarized more
quickly
Disadvantages of sampling:
106
ā€¢ There is always a sampling error.
ā€¢ Sampling may create a feeling of discrimination within the
population.
ā€¢ Sampling may be inadvisable where every unit in the
population is legally required to have a record.
Sampling technique
107
ā€¢ There are two different approaches to sampling in survey research:
ā€“ Nonprobability sampling
ā€“ Probability sampling
ļµ Probability sampling methods
ā€¢ A sample obtained in a way that every number of the
population has a known &non-zero.
ā€“ Probability of being include in the sample i.e. involves
random selection of sample
ā€“ Involves the selection of a sample from a population,
based on chance
ā€¢ Probability sampling is
ā€“ More complex,
ā€“ More time-consuming
ā€“ Usually more costly than non-probability sampling.
EXAMPLE OF SIMPLE RANDOM SAMPLING
ļµ Age at first sex and associated factors for early sexual initiation
among students at University of AU, Central Ethiopia
ā€“ There are a total of 8, 000 students
ā€“ We want to select 700 sample students
ā€“ In this case, we assumed homogeneity with respect to age at first sex
ā€“ Their ID can be taken as frame
ā€“ Hence we can use computer generated random number to select 700
students randomly
ļµ Steps in systematic random sampling
1.Number the units on your frame from 1 to N (where N is the
total population size).& n=sample size
2. Determine the sampling interval (K) by dividing the number of
units in the population by the desired sample size. K=N/n k=sampling
interval=population size n=sample size
3. Draw a random number between one and K. This number is called the
random start and would be the first number included in your sample.
ā€“ Let the selected number be j
4. Select every Kth unit after that first number j, j+k, j+2k, j+3k----
-----------------j+nk
EXAMPLE
ļµ A systematic sample is to be selected from 1200 students of a school.
The sample size selected is 100. The sampling fraction is (skip interval)
k=1200/100=12
ā€¢ The number of the first student to be included in the sample is chosen
randomly, for example by blindly 30 picking one out of twelve pieces of
paper, numbered 1- 12.
ā€¢ If number 6 is picked, then every twelfth student will be included in the
sample, starting with student number 6, until 100 students are selected:
then numbers selected would be 6, 18, 30, 42, etc.
Stratified sampling ā€¦
The procedures are:
ā€“ Divided the total population into different homogeneous
subgroups (strata)
ā€“ Allocate sample for each strata (ni)
ā€¢ Proportional allocation (ni =Ni(n/N))Ā» Where
ļµ ni =sample for each strata
Ni=total population of each strata
n=required sample size
N=total population of the
ā€¢ Disproportional (equal allocation) is some times also
possible
ļµ Example
ā€¢ A survey is conducted on household water supply in a district comprising
20,000 households, of which 20% are urban and 80% rural
ā€¢ It is suspected that in urban areas the access to safe water sources is much
more satisfactory. The total population of the district is 10, 000 (urban=4000
and rural=6000). The sample size required has been decided to be 300
ā€¢ Allocate the sample proportionally for both strata?
n
urban= 4000*300/10,000=120
n
rural= 6000*300/10,000=180
ļµ Steps in cluster sampling
ā€¢ The reference population (homogeneous) is divided into clusters.
ā€“ These clusters are often geographic units (e.g. districts, villages, etc.).
ā€¢ A number of clusters are selected randomly to represent the total population,
and then all units within selected clusters are included in the sample.
ā€¢ No units from non-selected clusters are included in the sampleā€”they are
represented by those selected clusters
ā€“ This differs from stratified sampling, where some units are selected from
each group
ā€“ All the units in the selected clusters are studied
ļµ Example
ā€¢ In a study of knowledge, attitudes, and practices related to family planning in
rural communities of a region, a list is made of all the villages.
ā€¢ Using this list, a random sample of villages is chosen and all the adults in the
selected villages are interviewed
Multi-stage sampling
ļµ In a study of utilization of pit latrines in a district, 150 homesteads are to be
visited for interviews with family members as well as for observations on
types and cleanliness of latrines.
ā€¢ The district is composed of six wards and each ward has between six and
nine villages.
ā€¢ The following four stage sampling procedure could be performed:
ā€“ Select three wards out of the six by simple random sampling
ā€“ For each ward, select five villages by simple random sampling (15 villages
in total)
ļµ For each village select ten households. Because simply choosing
households in the center of the village would produce a biased
sample, the following systematic sampling procedure is
proposed:
ā€“ Go to the center of the village
ā€“ Choose a direction in random way
ā€“ Walk in the chosen direction and select every third or every fifth
household (depending on the size of the village) until you have the ten you
need.
PROBLEM
ļµA population of cancer patients has survival standard
deviation of 43.4 months. If one wants to conduct a
study on these populations how large sample size is
needed, so that 95% of the sample mean of this size will
be within Ā±6 months of the population mean. Population
size is 480 patients. (85)
ļµ In a survey of school children to determine the population of
immunized children against polio, an investigator determined the
maximum discrepancy b/n sample and population proportion of
immunized to be 0.04, at level of confidence of 99%.further the
investigator had a previous knowledge on the prevalence among
children in a similar community to be 90% and the total
population of school children is 800.
ļµ The mean weight of 100 children who are 5 years old in a certain
locality is found to be 14 kg. A clinician wants to know the mean
weight of all the children in that locality with 95 % confidence
interval, if it is known that the SD for all children is 4kg
ļµ suppose a survey conducted on a reprehensive
sample of 900 newborn babies in A/A and it is
found that their average weight at birth is 3.5 kg
with SD of 0.5Kg. estimate the wt of newborn
babies in A/A at 95% level of confidence.
ļµ sample of 20 houses studied to estimate the
mean sprayable area of house for controlling of
malaria
epidemic. The result was =22.9m2, SD is
6.0m.construct CI for mean sprayable of area of
the
population with 95% confidence.
ļµ A random sample of 100 people shows that 25
are left-handed. Form a 95% CI for the true
proportion of
left-handers.
ļµ In a clinical trial for a new drug to treat hypertension, N1 = 50
patients were randomly assigned to receive the new drug, and N2 =
50 patients to receive a placebo. 34 of the patients receiving the drug
showed improvement, while 15 of those receiving placebo showed
improvement.
ā€“ Compute a 95% CI estimate for the difference between proportions
improved.
ļµ A simple random sample of 10 people from a certain
population has a mean age of 27. Can we conclude that the
mean age of the population is not 30? The variance is known
to be 20. Let CL = .95.
Data
n = 10, sample mean = 27, ļ³2 = 20, Ī± = 0.05
B. Assumptions
Simple random sample
Normally distributed population
ļµ A simple random sample of 14 people from a certain
population gives a sample mean body mass index (BMI)
of 30.5 and sd of 10.64. Can we conclude that the BMI
is not 35 at Ī± 5%?
ļµ The means SUA levels on 12 individuals with Downā€™s
syndrome and 15 normal individuals are 4.5 and 3.4
mg/100 ml, respectively. With variances. ( 2=1,
2=1.5, respectively). Is there a difference between the
means of both groups at Ī± 5%?
ļµ We wish to know if we may conclude, at the 95%
confidence level, that smokers, in general, have
greater lung damage than do non-smokers.
ļµ In the general population of 0 to 4-year-olds, the annual
incidence of asthma is 1.4%. If 10 cases of asthma are
observed over a single year in a sample of 500 children
whose mothers smoke, can we
conclude that this is different from the underlying
probability of p0 = 0.014 (or p=1.4%)? cl = 95%
ļµ Among the 225 students who ate the sandwiches, 109 became ill.
While, among the 38 students who did not eat the sandwiches, 4
became ill. Is there a significant difference between the two
groups at Ī±
=5%

More Related Content

Similar to A+.pptx

Confidence Intervals
Confidence IntervalsConfidence Intervals
Confidence Intervalsmandalina landy
Ā 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or VarianceEstimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or VarianceLong Beach City College
Ā 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance Long Beach City College
Ā 
Confidence intervals
Confidence intervalsConfidence intervals
Confidence intervalsTanay Tandon
Ā 
Sampling Theory Part 3
Sampling Theory Part 3Sampling Theory Part 3
Sampling Theory Part 3FellowBuddy.com
Ā 
Lecture-3 inferential stastistics.ppt
Lecture-3 inferential stastistics.pptLecture-3 inferential stastistics.ppt
Lecture-3 inferential stastistics.pptfantahungedamu
Ā 
Review of Chapters 1-5.ppt
Review of Chapters 1-5.pptReview of Chapters 1-5.ppt
Review of Chapters 1-5.pptNobelFFarrar
Ā 
Lec 5 statistical intervals
Lec 5 statistical intervalsLec 5 statistical intervals
Lec 5 statistical intervalscairo university
Ā 
Statistical Analysis-Confidence Interval_Session 5.pptx
Statistical Analysis-Confidence Interval_Session 5.pptxStatistical Analysis-Confidence Interval_Session 5.pptx
Statistical Analysis-Confidence Interval_Session 5.pptxmaruco1
Ā 
Ch3_Statistical Analysis and Random Error Estimation.pdf
Ch3_Statistical Analysis and Random Error Estimation.pdfCh3_Statistical Analysis and Random Error Estimation.pdf
Ch3_Statistical Analysis and Random Error Estimation.pdfVamshi962726
Ā 
Lesson04_Static11
Lesson04_Static11Lesson04_Static11
Lesson04_Static11thangv
Ā 
Estimation&amp;ci (assignebt )
Estimation&amp;ci (assignebt )Estimation&amp;ci (assignebt )
Estimation&amp;ci (assignebt )Mmedsc Hahm
Ā 
Sampling distribution.pptx
Sampling distribution.pptxSampling distribution.pptx
Sampling distribution.pptxssusera0e0e9
Ā 
101_sampling__population_Sept_2020.ppt
101_sampling__population_Sept_2020.ppt101_sampling__population_Sept_2020.ppt
101_sampling__population_Sept_2020.pptAndrei33323
Ā 
Epidemiology Lectures for UG
Epidemiology Lectures for UGEpidemiology Lectures for UG
Epidemiology Lectures for UGamitakashyap1
Ā 

Similar to A+.pptx (20)

Confidence Intervals
Confidence IntervalsConfidence Intervals
Confidence Intervals
Ā 
Estimating a Population Mean
Estimating a Population MeanEstimating a Population Mean
Estimating a Population Mean
Ā 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or VarianceEstimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance
Ā 
Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance Estimating a Population Standard Deviation or Variance
Estimating a Population Standard Deviation or Variance
Ā 
Confidence intervals
Confidence intervalsConfidence intervals
Confidence intervals
Ā 
Sampling Theory Part 3
Sampling Theory Part 3Sampling Theory Part 3
Sampling Theory Part 3
Ā 
Estimating a Population Mean
Estimating a Population Mean  Estimating a Population Mean
Estimating a Population Mean
Ā 
Lecture-3 inferential stastistics.ppt
Lecture-3 inferential stastistics.pptLecture-3 inferential stastistics.ppt
Lecture-3 inferential stastistics.ppt
Ā 
Review of Chapters 1-5.ppt
Review of Chapters 1-5.pptReview of Chapters 1-5.ppt
Review of Chapters 1-5.ppt
Ā 
Stats chapter 10
Stats chapter 10Stats chapter 10
Stats chapter 10
Ā 
Lec 5 statistical intervals
Lec 5 statistical intervalsLec 5 statistical intervals
Lec 5 statistical intervals
Ā 
Statistical Analysis-Confidence Interval_Session 5.pptx
Statistical Analysis-Confidence Interval_Session 5.pptxStatistical Analysis-Confidence Interval_Session 5.pptx
Statistical Analysis-Confidence Interval_Session 5.pptx
Ā 
Ch3_Statistical Analysis and Random Error Estimation.pdf
Ch3_Statistical Analysis and Random Error Estimation.pdfCh3_Statistical Analysis and Random Error Estimation.pdf
Ch3_Statistical Analysis and Random Error Estimation.pdf
Ā 
Lesson04_Static11
Lesson04_Static11Lesson04_Static11
Lesson04_Static11
Ā 
Estimation&amp;ci (assignebt )
Estimation&amp;ci (assignebt )Estimation&amp;ci (assignebt )
Estimation&amp;ci (assignebt )
Ā 
Sampling distribution.pptx
Sampling distribution.pptxSampling distribution.pptx
Sampling distribution.pptx
Ā 
QT1 - 07 - Estimation
QT1 - 07 - EstimationQT1 - 07 - Estimation
QT1 - 07 - Estimation
Ā 
101_sampling__population_Sept_2020.ppt
101_sampling__population_Sept_2020.ppt101_sampling__population_Sept_2020.ppt
101_sampling__population_Sept_2020.ppt
Ā 
Applied statistics part 1
Applied statistics part 1Applied statistics part 1
Applied statistics part 1
Ā 
Epidemiology Lectures for UG
Epidemiology Lectures for UGEpidemiology Lectures for UG
Epidemiology Lectures for UG
Ā 

More from MohammedAbdela7

Introduction to Pathology.pptx
Introduction to Pathology.pptxIntroduction to Pathology.pptx
Introduction to Pathology.pptxMohammedAbdela7
Ā 
Hypersensitivity reactions BY GROUP 1.pptx
Hypersensitivity reactions BY GROUP 1.pptxHypersensitivity reactions BY GROUP 1.pptx
Hypersensitivity reactions BY GROUP 1.pptxMohammedAbdela7
Ā 
Cellular Reactions to Injury.pptx
Cellular  Reactions  to Injury.pptxCellular  Reactions  to Injury.pptx
Cellular Reactions to Injury.pptxMohammedAbdela7
Ā 
by Group 8 PID & EP edited.pptx
by Group 8 PID & EP edited.pptxby Group 8 PID & EP edited.pptx
by Group 8 PID & EP edited.pptxMohammedAbdela7
Ā 
ACID-BASE BALANCE.pptx
ACID-BASE BALANCE.pptxACID-BASE BALANCE.pptx
ACID-BASE BALANCE.pptxMohammedAbdela7
Ā 
Autoimmunity group 2.ppt
Autoimmunity group 2.pptAutoimmunity group 2.ppt
Autoimmunity group 2.pptMohammedAbdela7
Ā 
infection prevention.pptx
infection prevention.pptxinfection prevention.pptx
infection prevention.pptxMohammedAbdela7
Ā 
integumentery.pptx
integumentery.pptxintegumentery.pptx
integumentery.pptxMohammedAbdela7
Ā 
Medication and fluid therapy.pptx
Medication and fluid therapy.pptxMedication and fluid therapy.pptx
Medication and fluid therapy.pptxMohammedAbdela7
Ā 
Endocrine System Disorder.pptx
Endocrine System Disorder.pptxEndocrine System Disorder.pptx
Endocrine System Disorder.pptxMohammedAbdela7
Ā 
CVS and abdomen.pptx
CVS and abdomen.pptxCVS and abdomen.pptx
CVS and abdomen.pptxMohammedAbdela7
Ā 
Endocrine DOs.pptx
Endocrine DOs.pptxEndocrine DOs.pptx
Endocrine DOs.pptxMohammedAbdela7
Ā 
2 Assessment of patient with respiratory disorder.pptx
2 Assessment of patient with respiratory disorder.pptx2 Assessment of patient with respiratory disorder.pptx
2 Assessment of patient with respiratory disorder.pptxMohammedAbdela7
Ā 

More from MohammedAbdela7 (20)

Chap.VII.pptx
Chap.VII.pptxChap.VII.pptx
Chap.VII.pptx
Ā 
Introduction to Pathology.pptx
Introduction to Pathology.pptxIntroduction to Pathology.pptx
Introduction to Pathology.pptx
Ā 
preeclampsia.pptx
preeclampsia.pptxpreeclampsia.pptx
preeclampsia.pptx
Ā 
Hypersensitivity reactions BY GROUP 1.pptx
Hypersensitivity reactions BY GROUP 1.pptxHypersensitivity reactions BY GROUP 1.pptx
Hypersensitivity reactions BY GROUP 1.pptx
Ā 
inflammaton.pptx
inflammaton.pptxinflammaton.pptx
inflammaton.pptx
Ā 
FINALLLL HMD.pptx
FINALLLL HMD.pptxFINALLLL HMD.pptx
FINALLLL HMD.pptx
Ā 
Chap.-II.pptx
Chap.-II.pptxChap.-II.pptx
Chap.-II.pptx
Ā 
Cellular Reactions to Injury.pptx
Cellular  Reactions  to Injury.pptxCellular  Reactions  to Injury.pptx
Cellular Reactions to Injury.pptx
Ā 
by Group 8 PID & EP edited.pptx
by Group 8 PID & EP edited.pptxby Group 8 PID & EP edited.pptx
by Group 8 PID & EP edited.pptx
Ā 
ACID-BASE BALANCE.pptx
ACID-BASE BALANCE.pptxACID-BASE BALANCE.pptx
ACID-BASE BALANCE.pptx
Ā 
Autoimmunity group 2.ppt
Autoimmunity group 2.pptAutoimmunity group 2.ppt
Autoimmunity group 2.ppt
Ā 
infection prevention.pptx
infection prevention.pptxinfection prevention.pptx
infection prevention.pptx
Ā 
integumentery.pptx
integumentery.pptxintegumentery.pptx
integumentery.pptx
Ā 
Medication and fluid therapy.pptx
Medication and fluid therapy.pptxMedication and fluid therapy.pptx
Medication and fluid therapy.pptx
Ā 
Endocrine System Disorder.pptx
Endocrine System Disorder.pptxEndocrine System Disorder.pptx
Endocrine System Disorder.pptx
Ā 
CVS and abdomen.pptx
CVS and abdomen.pptxCVS and abdomen.pptx
CVS and abdomen.pptx
Ā 
Endocrine DOs.pptx
Endocrine DOs.pptxEndocrine DOs.pptx
Endocrine DOs.pptx
Ā 
badnews.pptx
badnews.pptxbadnews.pptx
badnews.pptx
Ā 
2 Assessment of patient with respiratory disorder.pptx
2 Assessment of patient with respiratory disorder.pptx2 Assessment of patient with respiratory disorder.pptx
2 Assessment of patient with respiratory disorder.pptx
Ā 
Adult health.pptx
Adult health.pptxAdult health.pptx
Adult health.pptx
Ā 

Recently uploaded

CALL ON āž„9907093804 šŸ” Call Girls Hadapsar ( Pune) Girls Service
CALL ON āž„9907093804 šŸ” Call Girls Hadapsar ( Pune)  Girls ServiceCALL ON āž„9907093804 šŸ” Call Girls Hadapsar ( Pune)  Girls Service
CALL ON āž„9907093804 šŸ” Call Girls Hadapsar ( Pune) Girls ServiceMiss joya
Ā 
Call Girls Yelahanka Bangalore šŸ“² 9907093804 šŸ’ž Full Night Enjoy
Call Girls Yelahanka Bangalore šŸ“² 9907093804 šŸ’ž Full Night EnjoyCall Girls Yelahanka Bangalore šŸ“² 9907093804 šŸ’ž Full Night Enjoy
Call Girls Yelahanka Bangalore šŸ“² 9907093804 šŸ’ž Full Night Enjoynarwatsonia7
Ā 
(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...
(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...
(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...indiancallgirl4rent
Ā 
Call Girl Coimbatore Prishaā˜Žļø 8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prishaā˜Žļø  8250192130 Independent Escort Service CoimbatoreCall Girl Coimbatore Prishaā˜Žļø  8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prishaā˜Žļø 8250192130 Independent Escort Service Coimbatorenarwatsonia7
Ā 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalorenarwatsonia7
Ā 
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escortsvidya singh
Ā 
Call Girls Colaba Mumbai ā¤ļø 9920874524 šŸ‘ˆ Cash on Delivery
Call Girls Colaba Mumbai ā¤ļø 9920874524 šŸ‘ˆ Cash on DeliveryCall Girls Colaba Mumbai ā¤ļø 9920874524 šŸ‘ˆ Cash on Delivery
Call Girls Colaba Mumbai ā¤ļø 9920874524 šŸ‘ˆ Cash on Deliverynehamumbai
Ā 
CALL ON āž„9907093804 šŸ” Call Girls Baramati ( Pune) Girls Service
CALL ON āž„9907093804 šŸ” Call Girls Baramati ( Pune)  Girls ServiceCALL ON āž„9907093804 šŸ” Call Girls Baramati ( Pune)  Girls Service
CALL ON āž„9907093804 šŸ” Call Girls Baramati ( Pune) Girls ServiceMiss joya
Ā 
Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 7001035870  Meetin With Bangalore Esc...Bangalore Call Girls Nelamangala Number 7001035870  Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...narwatsonia7
Ā 
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Miss joya
Ā 
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Miss joya
Ā 
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girlsnehamumbai
Ā 
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...narwatsonia7
Ā 
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...Call girls in Ahmedabad High profile
Ā 
(šŸ‘‘VVIP ISHAAN ) Russian Call Girls Service Navi MumbaišŸ–•9920874524šŸ–•Independent...
(šŸ‘‘VVIP ISHAAN ) Russian Call Girls Service Navi MumbaišŸ–•9920874524šŸ–•Independent...(šŸ‘‘VVIP ISHAAN ) Russian Call Girls Service Navi MumbaišŸ–•9920874524šŸ–•Independent...
(šŸ‘‘VVIP ISHAAN ) Russian Call Girls Service Navi MumbaišŸ–•9920874524šŸ–•Independent...Taniya Sharma
Ā 
Bangalore Call Girls Majestic šŸ“ž 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic šŸ“ž 9907093804 High Profile Service 100% SafeBangalore Call Girls Majestic šŸ“ž 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic šŸ“ž 9907093804 High Profile Service 100% Safenarwatsonia7
Ā 
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...narwatsonia7
Ā 
VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...
VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...
VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...Miss joya
Ā 

Recently uploaded (20)

CALL ON āž„9907093804 šŸ” Call Girls Hadapsar ( Pune) Girls Service
CALL ON āž„9907093804 šŸ” Call Girls Hadapsar ( Pune)  Girls ServiceCALL ON āž„9907093804 šŸ” Call Girls Hadapsar ( Pune)  Girls Service
CALL ON āž„9907093804 šŸ” Call Girls Hadapsar ( Pune) Girls Service
Ā 
Call Girls Yelahanka Bangalore šŸ“² 9907093804 šŸ’ž Full Night Enjoy
Call Girls Yelahanka Bangalore šŸ“² 9907093804 šŸ’ž Full Night EnjoyCall Girls Yelahanka Bangalore šŸ“² 9907093804 šŸ’ž Full Night Enjoy
Call Girls Yelahanka Bangalore šŸ“² 9907093804 šŸ’ž Full Night Enjoy
Ā 
sauth delhi call girls in Bhajanpura šŸ” 9953056974 šŸ” escort Service
sauth delhi call girls in Bhajanpura šŸ” 9953056974 šŸ” escort Servicesauth delhi call girls in Bhajanpura šŸ” 9953056974 šŸ” escort Service
sauth delhi call girls in Bhajanpura šŸ” 9953056974 šŸ” escort Service
Ā 
(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...
(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...
(Rocky) Jaipur Call Girl - 9521753030 Escorts Service 50% Off with Cash ON De...
Ā 
Call Girl Coimbatore Prishaā˜Žļø 8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prishaā˜Žļø  8250192130 Independent Escort Service CoimbatoreCall Girl Coimbatore Prishaā˜Žļø  8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prishaā˜Žļø 8250192130 Independent Escort Service Coimbatore
Ā 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Ā 
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Ā 
Call Girls Colaba Mumbai ā¤ļø 9920874524 šŸ‘ˆ Cash on Delivery
Call Girls Colaba Mumbai ā¤ļø 9920874524 šŸ‘ˆ Cash on DeliveryCall Girls Colaba Mumbai ā¤ļø 9920874524 šŸ‘ˆ Cash on Delivery
Call Girls Colaba Mumbai ā¤ļø 9920874524 šŸ‘ˆ Cash on Delivery
Ā 
CALL ON āž„9907093804 šŸ” Call Girls Baramati ( Pune) Girls Service
CALL ON āž„9907093804 šŸ” Call Girls Baramati ( Pune)  Girls ServiceCALL ON āž„9907093804 šŸ” Call Girls Baramati ( Pune)  Girls Service
CALL ON āž„9907093804 šŸ” Call Girls Baramati ( Pune) Girls Service
Ā 
Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 7001035870  Meetin With Bangalore Esc...Bangalore Call Girls Nelamangala Number 7001035870  Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 7001035870 Meetin With Bangalore Esc...
Ā 
Russian Call Girls in Delhi Tanvi āž”ļø 9711199012 šŸ’‹šŸ“ž Independent Escort Service...
Russian Call Girls in Delhi Tanvi āž”ļø 9711199012 šŸ’‹šŸ“ž Independent Escort Service...Russian Call Girls in Delhi Tanvi āž”ļø 9711199012 šŸ’‹šŸ“ž Independent Escort Service...
Russian Call Girls in Delhi Tanvi āž”ļø 9711199012 šŸ’‹šŸ“ž Independent Escort Service...
Ā 
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Russian Call Girls in Pune Riya 9907093804 Short 1500 Night 6000 Best call gi...
Ā 
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Ā 
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Ā 
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
Ā 
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
Ā 
(šŸ‘‘VVIP ISHAAN ) Russian Call Girls Service Navi MumbaišŸ–•9920874524šŸ–•Independent...
(šŸ‘‘VVIP ISHAAN ) Russian Call Girls Service Navi MumbaišŸ–•9920874524šŸ–•Independent...(šŸ‘‘VVIP ISHAAN ) Russian Call Girls Service Navi MumbaišŸ–•9920874524šŸ–•Independent...
(šŸ‘‘VVIP ISHAAN ) Russian Call Girls Service Navi MumbaišŸ–•9920874524šŸ–•Independent...
Ā 
Bangalore Call Girls Majestic šŸ“ž 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic šŸ“ž 9907093804 High Profile Service 100% SafeBangalore Call Girls Majestic šŸ“ž 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Majestic šŸ“ž 9907093804 High Profile Service 100% Safe
Ā 
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...
Ā 
VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...
VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...
VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...
Ā 

A+.pptx

  • 1. Confidence Intervals 1 ā€¢ We can also put confidence intervals around other sample statistics too (not just the sample mean) ā€¢ The CI provides the precision for the sample statistic: 100(1-Ī±)% CI: estimate Ā± k x (standard error) ā€¢ So the CI equals our estimate of the sample statistic plus/minus some multiple of the standard error ā€¢ A 95% CI means that in 95 out of 100 samples of size n, the CI would contain the population parameter ā€¢ Also common: 90% or 99% CIā€™s
  • 2. Confidence Intervals 2 š’™ ą“„Ā± š’ šˆ š’ The width of the confidence interval depends on: ā€¢ the variation in the population (i.e. the standard deviation šœŽ which is fixed or possibly unknown) ā€“ the more variation the wider the CI ā€¢ The sample size n ā€“ the larger the sample the narrower the CI ā€¢ The significance level ā€“ 99% CI is wider than a 95% CI because we have to be ā€œmore sureā€ to capture the true value 100(1-Ī±)% CI: estimate Ā± k x (standard error)
  • 3. Confidence Intervals 3 š‘„Ņ§ Ā± š‘ šœŽ š‘› šœŽ š‘› ā€¢ Why does the sample size n have to be large? ā€¢ CIā€™s are valid when the CLT holds ā€¢ The sample size n affects the standard error ā€¢ Remember that a fraction gets closer to its numerator as itā€™s denominator gets larger ā€¢ As š‘› ā†’ āˆž the SE šœŽ š‘› gets closer and closer to šœŽ 100(1-Ī±)% CI: estimate Ā± k x (standard error)
  • 4. Calculating Confidence Intervals 4 a) Single Mean b) Difference between means - independent samples c) Difference between means - dependent samples d) Single Proportion e) Difference between proportions - independent samples
  • 5. Calculating Confidence Intervals 5 a) CI for a Single Mean (known šœŽ) ā€¢ If the sample size is large and the population SD is knownā€¦. ā€¢ The CI formula for a single mean is: š‘„Ņ§ Ā± š‘ šœŽ š‘› š‘„Ņ§ is the sample mean šœŽ is the population standard deviation š‘› is the sample size š‘ is a cutoff from the standard normal distribution 90% CI: š‘ = 1.64 95% CI: š‘ = 1.96 99% CI: š‘ = 2.58
  • 6. Calculating Confidence Intervals 6 a) CI for a Single Mean (known šœŽ) EXAMPLE ā€¢ We want to estimate the age of death for the US population ā€¢ We are told the population SD is 20.2 years ā€¢ In a sample of 100 people we calculate a mean age of death of 72.1 years ā€¢ a 95% confidence interval for the mean age of death is: š‘„Ņ§ Ā± š‘ šœŽ š‘› where: š‘„Ņ§ = 72.1 šœŽ = 20.2 š‘› = 100 š‘ =1.96 72.1 + 1.96 20.2 ā‰ˆ 76.06 100 72.1 āˆ’ 1.96 20.2 ā‰ˆ 68.14 100 95% CI: (68.14, 76.06)
  • 7. C.I. for the difference between population means (normally distributed) 7 ā€¢ Known variance (2 independent samples)A 100(1ā€Ī±)% C.I. for Ī¼1 ā€ Ī¼2 is š‘„ ą“„ 1 āˆ’ š‘„ ą“„ 2 =ZĪ±/2 1 + 2 šœŽ 2 šœŽ 2 š‘›1 š‘›2
  • 8. Calculating Confidence Intervals 8 a) CI for a Single Mean (unknown šœŽ) ā€¢ Usually the population SD is not known ā€¢ We use the sample standard deviation š‘  instead ā€¢ Use t-distribution instead of standard normal ā€¢ The CI formula for a single mean is: š‘„Ņ§ Ā± š‘” š‘  š‘› š‘„Ņ§ is the sample mean š‘  is the sample standard deviation š‘› is the sample size š‘” is a cut-off from the t-distribution with š‘› āˆ’ 1 degrees of freedom
  • 9. Calculating Confidence Intervals 9 Studentā€™s t-distribution ā€¢ Similar to the normal distribution but fatter tails ā€¢ Accounts for extra uncertainly from not knowing šœŽ ā€¢ Afamily of curves determined by two parameters: ā€“ significance level š›¼ ā€“ degrees of freedom df ā€¢ The degrees of freedom is š‘› āˆ’ 1 because weā€™ve already used one piece of information related to the variance to estimate š‘ 
  • 10. Calculating Confidence Intervals 10 Studentā€™s t-distribution ā€¢ Calculate cut-off values using Stata using the command: . display invttail(df, p) where š‘‘š‘“ = š‘› āˆ’ 1 and p is the area in the right tail ā€¢ For a 95% CI use š‘‘š‘“ = š‘› āˆ’ 1 and p=0.025 ā€¢ Example: if n=100 the t-value for a 95% CI would be . display invttail(99, 0.025) =1.984217
  • 11. Calculating Confidence Intervals 11 a) CI for a Single Mean (unknown šœŽ) EXAMPLE Using formula: š‘„Ņ§ Ā± š‘” š‘  š‘› The t-value has df=n-1=462-1 and p=0.025 . display invttail(461,0.025) 1.9651232 . display 87.93723 - 1.9651232 * 16.00469 / sqrt(462)=86.473988 . display 87.93723 + 1.9651232 * 16.00469 / sqrt(462)=89.400472 The 95% CI for mean zinc is (86.5, 89.4) n š‘„Ņ§ s
  • 12. Calculating Confidence Intervals 12 a) CI for a Single Mean (unknown šœŽ) EXAMPLE Using Stata: cii means n mean sd ci means varname n mean sd
  • 13. Calculating Confidence Intervals 13 Effect of significance on width of CI: 95% CI is the default (š›¼ = 0.05): Use level () to change sig. to 99% (š›¼ = 0.01): The CI becomes WIDER [86.0ā€¦[86.4ā€¦ š‘„ ā€¦89.4]ā€¦89.9] 95% 99%
  • 14. Calculating Confidence Intervals 14 Effect of sample size on width of CI: If we increase the sample size from 31 to 131 The standard error decreases andā€¦ The CI becomes NARROWER n=131 n=31 [145.4ā€¦[145.6ā€¦ š‘„ ā€¦148.6]ā€¦149.8]
  • 15. Calculating Confidence Intervals 15 Effect of SD on width of CI: If the standard deviation increases from 6 to 10 The standard error increases andā€¦ā€¦ The CI becomes WIDER s=6 s=10 [143.9ā€¦[145.4ā€¦ š‘„ ā€¦148.8]ā€¦151.3]
  • 16. Calculating Confidence Intervals Comparing Z and t: ā€¢ For large n, the t-distribution approximates the normal ā€¢ Suppose the sample mean age of death is š‘„Ņ§ = 72.1 yrs and that the sample SD is equal t o the population SD (s = šœŽ = 20.2 yrs) 43 CI formula: Large sample size (n=100) 95% CI: Small sample size (n=10) 95% CI: Normal dist. šœŽ š‘„Ņ§ Ā± š‘ š‘› Z=1.96 72.1 Ā± 1.96 20.2 ā‰ˆ 100 (68.14, 76.06) Z=1.96 72.1 Ā± 1.96 20.2 ā‰ˆ 10 (59.58, 84.62) t-dist. š‘  š‘„Ņ§ Ā± š‘” š‘› t=1.98 72.1 Ā± 1.98 20.2 ā‰ˆ 100 (68.10, 76.10) t=2.26 72.1 Ā± 2.26 20.2 ā‰ˆ 10 (57.66, 86.54) Similar CI (large n) t has wider CI than Z (small n)
  • 17. Calculating Confidence Intervals ļµ b) CI for a difference in means (independent samples) ļµ Suppose we have two independent groups of data and calculate a sample mean and sample for each. The CI formula is: 17 (š‘„1 āˆ’ š‘„2) Ā± š‘” š‘ š‘ 2 1 1 + š‘›1 š‘›2 Where: š‘„1 and š‘„2 are the sample means š‘›1 and š‘›2 are the sample sizes š‘” is a cut-off from the t-distribution with š‘‘š‘“ = š‘›1 + š‘›2 āˆ’ 2 š‘ š‘ š‘ is the pooled variance š‘ 2 = š‘›1āˆ’1 š‘ 2+ š‘›2āˆ’1 š‘ 2 1 2 š‘›1+š‘›2āˆ’2 where š‘ 1 and š‘ 2 are sample standard deviations
  • 18. Calculating Confidence Intervals 18 b) CI for a difference in means (independent samples) Assumptions: 1. The population standard deviations are approximately equal. We check this by comparing the sample standard deviations. 2. For small sample sizes (say n<100) the population distribution should approximately follow the normal distribution. This is checked by assessing the sampling distribution for normality. The assumption is fairly robust in that the formula is valid as long as the distribution of data in the sample is approximately mound shaped and symmetrical. 3. The 2 groups are independent. 4. The subjects within the 2 groups are independent.
  • 19. Calculating Confidence Intervals 19 b) CI for a difference in means (independent samples) Example using Stata command: ttesti š‘›1 š‘„1 š‘ 1 š‘›2 š‘„2 š‘ 2
  • 20. Calculating Confidence Intervals 20 b) CI for a difference in means (independent samples) ttesti assumes the sample SDs are equal by default Use unequal option if š‘ 1 and š‘ 2 are not similar df is affected
  • 21. Calculating Confidence Intervals 21 b) CI for a difference in means (independent samples) Suppose we want a CI for the difference in mean height between men and women (assuming independence and normality holdā€¦) ttest varname, by(groupvar) Approx. equal SDs
  • 22. Calculating Confidence Intervals 22 c) CI for a difference in means (dependent samples) ā€¢ Dependent samples occur with two groups of paired or matched data ā€¢ Usually equal sample sizes in the 2 groups (1:1) e.g. ā€“ patient blood pressure before and after a treatment ā€“ patient left leg and right leg measurements ā€“ Two groups where pairs of people have been matched on important demographics (age, sex, etc.)
  • 23. Calculating Confidence Intervals ļµ c) CI for a difference in means (dependent samples) ļµ 1. Calculate the pair differences š‘‘ ļµ e.g. for each patient, d = BP_after ā€“ BP_before 2. Find the mean š‘‘Ņ§ and standard deviation š‘ š‘‘ of the pair ļµ differences 3. The CI for the mean pair differences is: ļµ š‘‘Ņ§ Ā± š‘” š‘ š‘‘ 23 š‘› Where š‘› is the number of pairs and t has š‘‘š‘“ = š‘› āˆ’ 1
  • 24. Calculating Confidence Intervals 24 c) CI for a difference in means (dependent samples) EXAMPLE ā€¢ The heartrates of 20 patients before and after a treatment ā€¢ Want a 95% CI for the difference in mean heartrate
  • 25. Calculating Confidence Intervals 25 c) CI for a difference in means (dependent samples) EXAMPLE First, calculate the differences Then use the formula š‘‘Ņ§ Ā± š‘” š‘ š‘‘ š‘› The 95% CI is: (1.5, 11.2)
  • 26. Calculating Confidence Intervals 26 c) CI for a difference in means (dependent samples) EXAMPLE OR using any of these Stata commandsā€¦ ttesti š‘›1 š‘„Ņ§š‘ 1 0
  • 27. Calculating Confidence Intervals 27 š‘Ęø = d) CI for a single proportion ā€¢ Suppose we have a population of subjects and some of them have a characteristic of interest and the rest donā€™t ā€“ e.g. being female, having a cancer diagnosis, survived ā€¢ We want to estimate the true proportion p who have the characteristic of interest ā€¢ If r is the number of sample subjects that have the characteristic and n is the sample size then the sample proportion is: š‘Ÿ š‘›
  • 28. Calculating Confidence Intervals 28 š‘†šø š‘Ęø = d) CI for a single proportion ā€¢ If n is large enough and p is not too extreme, then the sampling distribution of š‘Ęø is normally distributed (CLT) ā€¢ The standard error of a proportion is: š‘Ęø(1 āˆ’ š‘Ęø) š‘› š‘Ęø Ā± š‘ ā€¢ The CI formula for a single proportion is: š‘Ęø(1 āˆ’ š‘Ęø) š‘› where Z is a standard normal cut-off (95% CI: Z=1.96)
  • 29. Calculating Confidence Intervals 29 d) CI for a single proportion š‘Ęø Ā± š‘ š‘Ęø(1 āˆ’ š‘ Ęø ) š‘› ā€¢ This formula assumes the rule of thumb: š’š’‘ ą· and š§(šŸ āˆ’ š’‘ ą· ) must both be greater than 5 ā€¢ (n must be large enough too) ā€¢ Otherwise, the formula is not valid and weā€™d have to use exact binomial values instead of Z cut-offs
  • 30. Calculating Confidence Intervals d) CI for a single proportion EXAMPLE ā€¢ Consider the 5-yr survival for lung cancer patients (Pagano p328). ā€¢ We want to estimate the proportion p who survive 5 yrs since dx. ā€¢ In a random sample of n=52 patients only r=6 survive 5 yrs (š‘Ęø=r/n=6/52~0.12). Ęø ā‰ˆ 45.76 so the rule of thumb holds ā€¢ Check nš‘Ęø ā‰ˆ 6.24 and n 1 āˆ’ š‘ ā€¢ A 95% CI for p is: š‘Ęø Ā± š‘ š‘ ą· ( 1 āˆ’ š‘ ą· ) š‘› 95% CI: (0.03, 0.21) So between 3% and 21% of ptx with lung cancer survive 5 yrs after dx. 57
  • 31. Calculating Confidence Intervals 31 d) CI for a single proportion EXAMPLE ā€¢ In a random sample of n=52 patients only r=6 survive 5 yrs Using Stata to find the 95% CI: cii proportions n r Why is this answer different to the one we calculated with the formula? Stata uses exact binomial values instead of the normal approximation to the binomial
  • 32. Calculating Confidence Intervals Similarly to the CI for a single mean, the width of the CI for a single proportion is affected by: 32 ā€¢ The sample size n ā€“ increasing the sample size makes the CI narrower/more precise i.e. small samples have wider CI/less precision ā€“ The standard error decreases as n increases ā€¢ The significance level ā€“ A 99% CI is wider than a 95% CI which is wider than a 90% CI
  • 33. Calculating Confidence Intervals 33 e) CI for a difference in proportions (independent samples) ā€¢ One group has sample proportion š‘ ą· 1and sample size š‘›1 ā€¢ Second group has sample proportion š‘ ą· 2and sample size š‘›2 ā€¢ The CI formula is: š‘ ą· 1āˆ’ š‘ ą· 2 Ā± š‘ š‘ ą· 1 1 āˆ’ š‘ ą· 1 š‘ ą· 2 1 āˆ’ š‘ ą· 2 + š‘›1 š‘›2 ā€¢ The RHS looks complicated but really itā€™s just the standard error for š‘ ą· 1 āˆ’ š‘ ą· 2 š‘†šø = š‘£š‘Žš‘Ÿ(š‘ ą· 1) āˆ’ š‘£š‘Žš‘Ÿ(š‘ ą· 2)
  • 34. Calculating Confidence Intervals 34 e) CI for a difference in proportions (independent samples) š‘ ą· 1āˆ’ š‘ ą· 2 Ā± š‘ š‘ ą· 1 1 āˆ’ š‘ ą· 1 š‘ ą· 2 1 āˆ’ š‘ ą· 2 + š‘›1 š‘›2 ā€¢ The formula is only valid for large samples and not too extreme values of š‘1 āˆ’ š‘2 ā€¢ The rule of thumb is: if š‘Ęø = š‘Ÿ1+š‘Ÿ2 š‘›1+š‘›2 š’šŸš’‘ ą· and š’šŸ(šŸ āˆ’ š’‘ ą· ) must both be greater than 5 and š’šŸš’‘ ą· and š’šŸ(šŸ āˆ’ š’‘ ą· ) must both be greater than 5 ā€¢ you need to be able to check the rule of thumb, and use the CI formula (we will calculate the CI using Stata)
  • 35. Calculating Confidence Intervals 35 e) CI for a difference in proportions (independent samples) EXAMPLE Stata command: prtesti š‘›1 š‘Ÿ1 š‘›2 š‘Ÿ2 , count š‘›1 = 100 š‘Ÿ1= 80 š‘›2 = 100 š‘Ÿ2 = 50
  • 36. Calculating Confidence Intervals 36 e) CI for a difference in proportions (independent samples) EXAMPLE The difference in proportion of ptx who had pain relief by surgery and by meds was between 17% and 43% (a higher % of surgery ptx had pain relief compared to med ptx) Difference in sample proportions
  • 37. One more thingā€¦ 37 ā€¢ Some Stata commands used in the lectures/tutes and Modules are different in previous versions of Stata: If you are using Stata 14 (the latest version) the CI commands are: cii means n mean sd ci means varname cii proportions n r If you are using an older version of Stata the CI commands are: cii n mean sd ci varname cii n r
  • 39. 39
  • 40. 40
  • 41. And use Stataā€™s cii command to produce the 95% confidence interval: 41
  • 43. t 43
  • 45. Confirm using ttest in Stata: 45 ttesti 12 13.21 1.05 9 11 1.01, level(95) Two-sample t test with equal variances | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] + x.| y.| 12 9 13.21 11 .3031089 .3366667 1.05 1.01 12.54286 10.22365 13.87714 11.77635 + combined | 21 12.26286 .328802 1.50676 11.57699 12.94873 + diff | 2.21 .455663 1.256286 3.163714 diff = mean(x) - mean(y) t = 4.8501 Ho: diff = 0 degrees of freedom = 19 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 0.9999 Pr(|T| > |t|) = 0.0001 Pr(T > t) = 0.0001 Also see the results of: ttesti 12 13.21 1.05 9 11 1.01, level(90) ttesti 12 13.21 1.05 9 11 1.01, level(99)
  • 47. 47
  • 50. 50
  • 51. 51
  • 52.
  • 53. Some Operations on Events 1/4/2021 53 ā€¢ LetAand B be two events defined on the sample space Ī©. 1. Union of Two events: (AUB) ā€“ The eventAUB consists of all outcomes inAor in B or in bothAand B. ā€“ The eventAUB occurs ifAoccurs, or B occurs, or bothA and B occur.
  • 54. 2. Intersection of Two Events: (AՈB) ā€“ The eventAՈB consists of all outcomes in bothAand B. ā€“ The eventAՈB occurs if bothAand B occur. 1/4/2021 54
  • 55. 3. Complement of an Event: š‘Ø or (AC) or (Aā€²) ā€“ The complement of the eventAis denoted by š“. ā€“ The eventAconsists of all outcomes of Ī© that are not inA. ā€“ The event š“ occurs ifAdoes not. 1/4/2021 55
  • 56. Example: (Classical Probability) 1/4/2021 56 ā€¢ Experiment: Selecting a patient randomly from a hospital room having six beds numbered 1, 2, 3, 4, 5, and 6. ā€¢ Define the following events: (1) E1UE2= {1; 2, 3, 4, 6} selecting an even number or a number less than 4:
  • 57. 2) E1UE4= {1, 2, 3, 4, 5, 6}= Ī© =selecting an even number or an odd number. 1/4/2021 57 ā€¢ It can be shown that E1UE4 = Ī© where E1 and E4 are called exhaustive events. ā€¢ The union of these events gives the whole sample space.
  • 58. 3) E1ՈE2 ={2} = selecting an even number and a number less than 4. 1/4/2021 58 4) E1ՈE4 =āˆ…= selecting an even number and an odd number.
  • 59. ā€¢ E1ՈE2 =āˆ…= In this case, E1 and E4 are called disjoint (or mutually exclusive) events. ā€¢ These kinds of events cannot occur simultaneously (together at the same time). 1/4/2021 59
  • 60. 5) The complement of E1 1/4/2021 60
  • 61. ā€¢ Mutually Exclusive (Disjoint) Events ā€“ The eventsAand B are disjoint (or mutually exclusive) if E1ՈE2 =āˆ… I. P(AՈB)=0 II. P(AUB)=P(A) + P(B) 1/4/2021 61
  • 66. Marginal Probability 1/4/2021 66 ā€¢ Given some variable that can be broken down into (m) categories designated by A1, A2, . . ., Am and another jointly occurring variable that is broken down into (s) categories designated by B1, B2, . . . , Bs ā€¢ The marginal probability of Ai, P(Ai), is equal to the sum of the joint probabilities ofAi with all categories of B. ā€¢ That is
  • 67. Example: Relative Frequency or Empirical 1/4/2021 67 ā€¢ Let us consider a bivariate table for variablesAand B. ā€¢ There are three categories for both the variables,A1,A2,andA3 forAand B1, B2, and B3 for B Joint frequency distribution for m categories ofAand s categories of B
  • 68. ā€¢ Joint probability distribution for m categories ofAand s categories of B 1/4/2021 68
  • 69. ā€¢ Number of elements in each cell Probabilities of events 1/4/2021 69
  • 70. Applications of Relative Frequency (Empirical Probability) ā€¢ Let us consider a hypothetical data on four types of diseases of 200 patients from a hospital as shown below: 1/4/2021 70 ā€¢ Experiment: Selecting a patient at random and observe his/her disease type. Total number of trials, sample size, in this case, is n =200 Disease type A B C D Total Number of patients 90 80 20 10 200
  • 72. Conditional Probability 1/4/2021 72 ā€¢ The conditional probability of the eventAwhen we know that the event B has already occurred is defined by
  • 74. Multiplication Rules of Probability 1/4/2021 74 Let us consider a hypothetical set of data on 600 adult males classified by their ages and smoking habits as summarized Consider the following event: (B1|A2) = smokes daily given that age is between 30 and 39
  • 75. ā€¢ Two-way table displaying number of respondents by age and smoking habit of respondents smoking habit 1/4/2021 75
  • 77.
  • 78. Binomial Distribution,ā€¦ 1/4/2021 78 ā€¢ example; ā€“ if all birth records for a calendar year shows that 85.8% of the pregnancies had delivery in week 37 or later. ā€“ The 85.8% interpreted as the probability of a recorded birth in week 37 or later ā€“ If we randomly select five birth records from this population, what is the probability that exactly three of the records will be for full-term births?
  • 79. Binomial Distribution,ā€¦ 1/4/2021 79 ā€¢ Let us designate the occurrence of a record for a full-term birth (F) as a ā€œsuccessā€ and hasten to add that a premature birth (P) is not a failure ā€¢ It will also be convenient to assign the number 1 to a success and the number 0 to a failure (record of a premature birth).
  • 80. Binomial Distribution,ā€¦ 1/4/2021 80 ā€¢ Suppose the five birth records selected resulted in this sequence of full term births PFPPF ā€¢ In coded form we would write this as 10110 P(1, 0, 1, 1, 0,) =pqppq=q2p3
  • 81. Binomial Distribution,ā€¦ 1/4/2021 81 ā€¢ Three successes and two failures could occur in any one of the following additional sequences as well: From the addition rule we know that this probability is equal to the sum of the individual probabilities. In the present example we need to sum the 10q2p3ā€™s or, equivalently, multiply q2p3 by 10.
  • 82. Binomial Distribution,ā€¦ 1/4/2021 82 ā€¢ Answer for original question is ā€¢ Since in the population, p=0.858; q=(1-p)=(1- 0.858)=0.142 10q2p3 =10(0.142)2(0.858)3 =10 (0.0202)(0.6316) = 0.1276
  • 83. Combinations 1/4/2021 83 ā€¢ Acombination of n objects taken x at a time is an unordered subset of x of the n objects ā€¢ Combination is used in large sample procedures ā€¢ The number of combinations of n objects that can be formed by taking x of them at a time is given by n x š¶ = š‘›! š‘„! š‘›āˆ’š‘„ ! ā€¢ where x!, read x factorial, is the product of all the whole numbers from x down to 1. That is, x! =x(x-1)(x-2)ā€¦(1). We note that, by definition, 0!=1.
  • 84. Binomial Distribution,ā€¦ 1/4/2021 84 ā€¢ Let us return to our example in which we have a sample of n=5 birth records and we are interested in finding the probability that three of them will be for full-term births 5 3 š¶ = 5! 5š‘„4š‘„3š‘„2š‘„1 3! 5āˆ’3 ! (3š‘„2š‘„1) 2š‘„1 12 = = 120 = 10
  • 85. Binomial Distribution,ā€¦ 1/4/2021 85 ā€¢ In our example we let x =3, the number of successes, so that n- x = 2, the number of failures. We then may write the probability of obtaining exactly x successes in n trials as f(x)= nš‘Ŗx š’’š’āˆ’š’™ š’‘š’™=nš‘Ŗx š’‘š’™š’’š’āˆ’š’™ , for x=0, 1, 2, ā€¦,n ā€¢ The Binomial Parameters ā€“ binomial distribution has two parameters, n and p ā€“ Ī¼=np, ā€“ Ļƒ2=np(1-p) = npq
  • 86. Poisson Distribution ā€¢ Used to model a discrete random variable representing the number of occurrences or counts of some random events in an interval of time or space (or some volume of matter) ā€¢ The possible values of X = x are x = 0, 1, 2, 3,ā€¦ ā€¢ The discrete random variable, X, is said to have a Poisson distribution with parameter (mean) Ī» if the probability 1/4/2021 86 āˆ’Ī» š‘„ distribution of X is given by f(x) = š‘’ Ī» š‘„ !
  • 87. Poisson Distribution,ā€¦ 1/4/2021 87 ā€¢ where e = 2.71828 (the natural number). ā€¢ Ī» (lambda) is the parameter of the distribution and is the average number of occurrences of the random event in the interval ā€¢ The Poisson Process ā€“ The occurrences of the events are independent ā€“ The probability of the single occurrence of the event in a given interval is proportional to the length of the interval
  • 88. Poisson Distribution,ā€¦ 1/4/2021 88 ā€“ In any infinitesimally small portion of the interval, the probability of more than one occurrence of the event is negligible Example ā€“ In a study of drug-induced anaphylaxis among patients taking rocuronium bromide as part of their anesthesia the occurrence of anaphylaxis followed a Poisson model with Ī»=12 incidents per year in Norway
  • 89. Poisson Distribution,ā€¦ 1/4/2021 89 ā€“ Find the probability that in the next year, among patients receiving rocuronium, exactly three will experience anaphylaxis 3! āˆ’12 3 f(x=3) = š‘’ 12 = 0.00177 ā€¢ What is the probability that at least three patients in the next year will experience anaphylaxis if rocuronium is administered with anesthesia?
  • 90. Poisson Distribution 1/4/2021 90 ā€¢ Example: Suppose that the number of accidents per day in a city has a Poisson distribution with average 2 accidents. 1. What is the probability that in a day I. the number of accidents will be 5, II. the number of accidents will be less than 2. 2. What is the probability that there will be six accidents in 2 days? 3. What is the probability that there will be no accidents in an hour?
  • 91. Poisson Distribution,ā€¦ 1/4/2021 91 5! āˆ’2 5 1. P (X =5) =š‘’ 2 = 0.036089 2. P(X<2)=P(X=0) + P(X=1) = āˆ’2 0 āˆ’2 1 = š‘’ 2 + š‘’ 2 =0.135335 + 0. 270670= 0.406005. 0! 1!
  • 92. Probability Distributions of Continuous data 1/4/2021 92 ā€¢ A non-negative function f(x) of the continuous random variable X if the total area bounded by its curve and the x -axis is equal to 1 and if the subarea under the curve bounded by the curve, the x -axis, and perpendiculars erected at any two points a and b give the probability that X is between the points a and b. ā€¢ Also known as probability density function
  • 93. Probability Distributions of Continuous dataā€¦ 1/4/2021 93 Graph of a continuous distribution showing area between a and b.
  • 94. Normal distribution 1/4/2021 94 ā€¢ Known as the Gaussian distribution ā€¢ The normal density is given by 2šœ‹ šœŽ f(x)= 1 š‘’āˆ’ š‘„āˆ’šœ‡ 2 /2šœŽ2 , āˆ’āˆž < š‘„ < āˆž; where (e = 2.71828) and (Ļ€ = 3.14159). ā€¢ The parameters of the distribution are the Āµ and the Ļƒ2 X ~N (Ī¼,Ļƒ2).
  • 95. Normal Distribution,ā€¦ 1/4/2021 95 ā€¢ The density function of X, f(x), is a bell-shaped curve ā€“ The highest point of the curve of f(x) is at the mean Ī¼. Hence, the mode = mean = Ī¼. ā€“ The curve of f(x) is symmetric about the mean Ī¼. ā€“ In other words, mean = mode = median ā€“ The area under the curve is 1
  • 96. Standard Normal Distribution 1/4/2021 96 ā€¢ the standard normal distribution with mean Āµ = 0 and variance Ļƒ2 =1 ā€¢ Denoted by Normal (0,1) or N(0,1). ā€¢ The standard normal random variable is denoted by Z, and we write Z~N(0,1) Z=š‘„āˆ’šœ‡ šœŽ ā€¢ The equation for the standard normal distribution is written Z= 1 2šœ‹ 2 š‘’āˆ’š‘§ /2, āˆ’āˆž < š‘§ < āˆž
  • 97. Standard Normal Distribution,ā€¦ 1/4/2021 97 The standard normal distribution The z-transformation is useful in application of normal distribution
  • 98. Standard Normal Distribution,ā€¦ 1/4/2021 98 ā€¢ Z-transformation that yields a value of Z, Z=1 indicates that the value of x used in the transformation is 1 standard deviation above 0. ā€¢ A value of Z = -1 indicates that the value of x used in the transformation is 1 standard deviation below 0.
  • 99. Standard Normal Distribution,ā€¦ 1/4/2021 99 ā€¢ Example; ā€“ What is the probability that a z picked at random from the population of zā€™s will have a value between -2.55 and 2.55? answer: P(-2.55<z<2.55)=0.9946- 0.0054 =0.9892
  • 101. Application of normal distribution 1/4/2021 101 ā€¢ Normal distribution is not a law that is adhered to by all measurable characteristics occurring in nature ā€¢ However, many of these characteristics are approximately normally distributed ā€¢ Used to model the distribution of many variables that are of interest ā€¢ Allows us to make useful probability statements about some variables conveniently than would be the case if some more complicated model had to be used
  • 102. Application,ā€¦ 1/4/2021 102 ā€¢ Example: ā€“ Let us consider weight of women in reproductive age follows a normal distribution with mean 49 kg and variance 25 kg2 a. Find the probability that a randomly chosen woman in her reproductive age has weight less than 45 kg. b. What is the percentage of women having weight less than 45 kg? c. In a population of 20,000 women of reproductive age, how many would you expect to have weight less than 45 kg?
  • 103. ā€¢ Solution ā€“ Here the random variable, X = weight of women in reproductive age, population mean = 49 kg, population variance= Ļƒ2 = 25 kg2, population standard deviation = Ļƒ = 5 kg. Hence, X~Normal (49,25). a. The probability that a randomly chosen woman in reproductive age has weight less than 45 kg is P(X<45) 1/4/2021 103
  • 104. Application,ā€¦ 1/4/2021 104 ā€“ The percentage of women of reproductive age who have weight less than 45 kg is P(x<45) x100% = 0.2119 x100% = 21.19% ā€¢ In a population of 20,000 women of reproductive age, we would expect that the number of women with weight less than 45 kg is P(X <45)x 20,000 =0.2119 x20,000 = 4238.
  • 105. Advantages of sampling: 105 ā€¢ Feasibility: Sampling may be the only feasible method of collecting the information. ā€¢ Reduced cost: Sampling reduces demands on resource such as finance, personnel, and material. ā€¢ Greater accuracy: Sampling may lead to better accuracy of collecting data ā€¢ Sampling error: Precise allowance can be made for sampling error ā€¢ Greater speed: Data can be collected and summarized more quickly
  • 106. Disadvantages of sampling: 106 ā€¢ There is always a sampling error. ā€¢ Sampling may create a feeling of discrimination within the population. ā€¢ Sampling may be inadvisable where every unit in the population is legally required to have a record.
  • 107. Sampling technique 107 ā€¢ There are two different approaches to sampling in survey research: ā€“ Nonprobability sampling ā€“ Probability sampling
  • 108. ļµ Probability sampling methods ā€¢ A sample obtained in a way that every number of the population has a known &non-zero. ā€“ Probability of being include in the sample i.e. involves random selection of sample ā€“ Involves the selection of a sample from a population, based on chance ā€¢ Probability sampling is ā€“ More complex, ā€“ More time-consuming ā€“ Usually more costly than non-probability sampling.
  • 109. EXAMPLE OF SIMPLE RANDOM SAMPLING ļµ Age at first sex and associated factors for early sexual initiation among students at University of AU, Central Ethiopia ā€“ There are a total of 8, 000 students ā€“ We want to select 700 sample students ā€“ In this case, we assumed homogeneity with respect to age at first sex ā€“ Their ID can be taken as frame ā€“ Hence we can use computer generated random number to select 700 students randomly
  • 110. ļµ Steps in systematic random sampling 1.Number the units on your frame from 1 to N (where N is the total population size).& n=sample size 2. Determine the sampling interval (K) by dividing the number of units in the population by the desired sample size. K=N/n k=sampling interval=population size n=sample size 3. Draw a random number between one and K. This number is called the random start and would be the first number included in your sample. ā€“ Let the selected number be j 4. Select every Kth unit after that first number j, j+k, j+2k, j+3k---- -----------------j+nk
  • 111. EXAMPLE ļµ A systematic sample is to be selected from 1200 students of a school. The sample size selected is 100. The sampling fraction is (skip interval) k=1200/100=12 ā€¢ The number of the first student to be included in the sample is chosen randomly, for example by blindly 30 picking one out of twelve pieces of paper, numbered 1- 12. ā€¢ If number 6 is picked, then every twelfth student will be included in the sample, starting with student number 6, until 100 students are selected: then numbers selected would be 6, 18, 30, 42, etc.
  • 112. Stratified sampling ā€¦ The procedures are: ā€“ Divided the total population into different homogeneous subgroups (strata) ā€“ Allocate sample for each strata (ni) ā€¢ Proportional allocation (ni =Ni(n/N))Ā» Where ļµ ni =sample for each strata Ni=total population of each strata n=required sample size N=total population of the ā€¢ Disproportional (equal allocation) is some times also possible
  • 113. ļµ Example ā€¢ A survey is conducted on household water supply in a district comprising 20,000 households, of which 20% are urban and 80% rural ā€¢ It is suspected that in urban areas the access to safe water sources is much more satisfactory. The total population of the district is 10, 000 (urban=4000 and rural=6000). The sample size required has been decided to be 300 ā€¢ Allocate the sample proportionally for both strata? n urban= 4000*300/10,000=120 n rural= 6000*300/10,000=180
  • 114. ļµ Steps in cluster sampling ā€¢ The reference population (homogeneous) is divided into clusters. ā€“ These clusters are often geographic units (e.g. districts, villages, etc.). ā€¢ A number of clusters are selected randomly to represent the total population, and then all units within selected clusters are included in the sample. ā€¢ No units from non-selected clusters are included in the sampleā€”they are represented by those selected clusters ā€“ This differs from stratified sampling, where some units are selected from each group ā€“ All the units in the selected clusters are studied
  • 115. ļµ Example ā€¢ In a study of knowledge, attitudes, and practices related to family planning in rural communities of a region, a list is made of all the villages. ā€¢ Using this list, a random sample of villages is chosen and all the adults in the selected villages are interviewed
  • 116. Multi-stage sampling ļµ In a study of utilization of pit latrines in a district, 150 homesteads are to be visited for interviews with family members as well as for observations on types and cleanliness of latrines. ā€¢ The district is composed of six wards and each ward has between six and nine villages. ā€¢ The following four stage sampling procedure could be performed: ā€“ Select three wards out of the six by simple random sampling ā€“ For each ward, select five villages by simple random sampling (15 villages in total)
  • 117. ļµ For each village select ten households. Because simply choosing households in the center of the village would produce a biased sample, the following systematic sampling procedure is proposed: ā€“ Go to the center of the village ā€“ Choose a direction in random way ā€“ Walk in the chosen direction and select every third or every fifth household (depending on the size of the village) until you have the ten you need.
  • 118. PROBLEM ļµA population of cancer patients has survival standard deviation of 43.4 months. If one wants to conduct a study on these populations how large sample size is needed, so that 95% of the sample mean of this size will be within Ā±6 months of the population mean. Population size is 480 patients. (85)
  • 119. ļµ In a survey of school children to determine the population of immunized children against polio, an investigator determined the maximum discrepancy b/n sample and population proportion of immunized to be 0.04, at level of confidence of 99%.further the investigator had a previous knowledge on the prevalence among children in a similar community to be 90% and the total population of school children is 800.
  • 120. ļµ The mean weight of 100 children who are 5 years old in a certain locality is found to be 14 kg. A clinician wants to know the mean weight of all the children in that locality with 95 % confidence interval, if it is known that the SD for all children is 4kg
  • 121. ļµ suppose a survey conducted on a reprehensive sample of 900 newborn babies in A/A and it is found that their average weight at birth is 3.5 kg with SD of 0.5Kg. estimate the wt of newborn babies in A/A at 95% level of confidence.
  • 122. ļµ sample of 20 houses studied to estimate the mean sprayable area of house for controlling of malaria epidemic. The result was =22.9m2, SD is 6.0m.construct CI for mean sprayable of area of the population with 95% confidence.
  • 123. ļµ A random sample of 100 people shows that 25 are left-handed. Form a 95% CI for the true proportion of left-handers.
  • 124.
  • 125.
  • 126. ļµ In a clinical trial for a new drug to treat hypertension, N1 = 50 patients were randomly assigned to receive the new drug, and N2 = 50 patients to receive a placebo. 34 of the patients receiving the drug showed improvement, while 15 of those receiving placebo showed improvement. ā€“ Compute a 95% CI estimate for the difference between proportions improved.
  • 127. ļµ A simple random sample of 10 people from a certain population has a mean age of 27. Can we conclude that the mean age of the population is not 30? The variance is known to be 20. Let CL = .95. Data n = 10, sample mean = 27, ļ³2 = 20, Ī± = 0.05 B. Assumptions Simple random sample Normally distributed population
  • 128. ļµ A simple random sample of 14 people from a certain population gives a sample mean body mass index (BMI) of 30.5 and sd of 10.64. Can we conclude that the BMI is not 35 at Ī± 5%?
  • 129. ļµ The means SUA levels on 12 individuals with Downā€™s syndrome and 15 normal individuals are 4.5 and 3.4 mg/100 ml, respectively. With variances. ( 2=1, 2=1.5, respectively). Is there a difference between the means of both groups at Ī± 5%?
  • 130. ļµ We wish to know if we may conclude, at the 95% confidence level, that smokers, in general, have greater lung damage than do non-smokers.
  • 131. ļµ In the general population of 0 to 4-year-olds, the annual incidence of asthma is 1.4%. If 10 cases of asthma are observed over a single year in a sample of 500 children whose mothers smoke, can we conclude that this is different from the underlying probability of p0 = 0.014 (or p=1.4%)? cl = 95%
  • 132. ļµ Among the 225 students who ate the sandwiches, 109 became ill. While, among the 38 students who did not eat the sandwiches, 4 became ill. Is there a significant difference between the two groups at Ī± =5%