Chapter 5The Role of ProbabilityLearning Objec.docx

Chapter 5
The Role of Probability
Learning Objectives
• Define the terms “equally likely” and “at random”
• Compute and interpret unconditional and conditional
probabilities
• Evaluate and interpret independence of events
• Explain the key features of the binomial distribution
model
• Calculate probabilities using the binomial formula
Learning Objectives
• Explain the key features of the normal distribution
model
• Calculate probabilities using the standard normal
distribution table

• Compute and interpret percentiles of the normal
distribution
• Define and interpret the standard error
• Explain sampling variability
• Apply and interpret the results of the Central Limit
Theorem
Two Areas of Biostatistics
Goal: Statistical Inference
POPULATION
SAMPLE
n, X
Descriptive Statistics
Sampling from a Population
Population
N
n

n
n
n
n
n
n
n
n
n
SAMPLES
Sampling:
Population Size=N, Sample Size=n
• Simple random sample
– Enumerate all members of population N (sampling
frame), select n individuals at random (each has
same probability of being selected)
• Systematic sample
– Start with sampling frame; determine sampling
interval (N/n); select first person at random from

first (N/n) and every (N/n) thereafter
Sampling:
Population Size=N, Sample Size=n
• Stratified sample
– Organize population into mutually exclusive
strata; select individuals at random within each
stratum
• Convenience sample
– Non-probability sample (not for inference)
• Quota sample
– Select a pre-determined number of individuals into
sample from groups of interest
Basics
• Probability reflects the likelihood that outcome will
occur
• 0 < Probability < 1
N
outcomewithNumber

Example 5.1.
Basic Probability
Age 5 6 7 8 9 10 Total
Boys 432 379 501 410 420 418 2560
Girls 408 513 412 436 461 500 2730
Total 840 892 913 846 881 918 5290
P(Select any child) = 1/5290 = 0.0002
Example 5.1.
Basic Probability
P(Select a boy) = 2560/5290 = 0.484
P(Select boy age 10) = 418/5290
= 0.079
P(Select child at least 8 years of age)
= (846+881+918)/5290
= 2645/5290 = 0.500

Conditional Probability
• Probability of outcome in a specific sub-
population
• Example 5.1,
P(Select 9 year old from among girls) =
P(Select 9 year old|girl)
= 461/2730 = 0.169
P(Select boy|6 years of age)
= 379/892=0.425
Example 5.2.
Prostate
Cancer
No Prostate
Cancer
Total
Low PSA 3 61 64

Moderate PSA 13 28 41
High PSA 12 3 15
Total 28 92 120
Example 5.2.
P(Prostate Cancer|Low PSA)
= 3/64 = 0.047
P(Prostate Cancer|Moderate PSA)
= 13/41 = 0.317
P(Prostate Cancer|High PSA)
= 12/15 = 0.80
Sensitivity and Specificity
Sensitivity = true positive fraction
= P(test +|disease)
Specificity = true negative fraction
= P(test -|disease free)

False negative fraction=P(test -|disease)
False positive fraction=P(test +|disease free)
Example 5.4.
Affected
Fetus
Unaffected
Fetus
Total
Positive
Screen
9 351 360
Negative
Screen
1 4449 4450
Total 10 4800 4810
Sensitivity = P(test +|disease) =9/10=0.90

Specificity = P(test -|disease free)
= 4449/4800 = 0.927
False negative fraction= P(test -|disease)
= 1/10 = 0.10
False positive fraction=P(test +|disease free)
= 351/4800 = 0.073
Independence
• Two events, A and B, are independent if
P(A|B) = P(A) or if P(B|A) = P(B)
• Example 5.2. Is screening test independent of
prostate cancer diagnosis?
– P(Prostate Cancer) = 28/120 = 0.023
– P(Prostate Cancer|Low PSA) = 0.047
– P(Prostate Cancer|Moderate PSA) = 0.317
– P(Prostate Cancer|High PSA) = 0.80
Bayes Theorem
• Using Bayes Theorem we revise or update a

probability based on additional information
– Prior probability is an initial probability
– Posterior probability is a probability that is revised
or updated based on additional information
Bayes Theorem
P(B)
A)P(A)|P(B
)A'|)P(BP(A' A)|P(A)P(B
A)P(A)|P(B
B)|P(A
Example
• In Boston, 51% of adults are male
• One adult is randomly selected to participate in
a study
Prior probability of selecting a male= 0.51

)0.49(0.017 )0.51(0.095
)0.095(0.51
Example 5.8.
Bayes Theorem
P(disease) = 0.002
Sensitivity = 0.85 = P(test +|disease)
P(test +)=0.08 and P(test -) = 0.92
What is P(disease|test +)?
Example 5.8.
Bayes Theorem
What is P(disease|test +)?
P(disease) = 0.002
P(test +)=0.08 and P(test -) = 0.92

)P(test
disease)disease)P(|P(test
)test|P(disease
Example 5.8.
Bayes Theorem
P(disease) = 0.002
P(test +)=0.08 and P(test -) = 0.92
021.0
08.0
)002.0(85.0
)P(test
disease)disease)P(|P(test

Binomial Distribution
• Model for discrete outcome
• Process or experiment has 2 possible
outcomes: success and failure
• Replications of process are independent
• P(success) is constant for each replication
Notation:
n=number of times process is replicated,
p=P(success),
x=number of successes of interest
0< x<n
xnx
p)(1p
x)!(nx!
n!
successes)P(x

Example 5.9.
Medication for allergies is effective in reducing symptoms
in 80% of patients. If medication is given to 10 patients,
what is the probability it is effective in 7?
7-107
0.8)(10.8
7)!-(107!
10!
= 120(0.2097)(0.008) = 0.2013
Antibiotic is claimed to be effective in 70% of the patients it is
given to. If antibiotic is given to 5 patients, what is the

probability it is effective on exactly three?
success = antibiotic is effective: n=5, p=0.7, x=3
3-53
0.7)(10.7
3)!-(53!
5!
= 10(0.343)(0.09) = 0.3087
What is the probability that the antibiotic is
effective on all 5 ?
1681.0)1)(1681.0(1
0.7)(10.7
5)!-(55!
5!
5)P(X
5-55

What is the probability that the antibiotic is
effective on at least 3 ?
P(X > 3) = P(3) + P(4) + P(5)
= 0.3087 + 0.3601 + 0.1681 = 0.8369
Mean and Variance of the
s2 = n p ( 1 - p)
For Example, the mean (or expected)
number of patients in whom the antibiotic
is effective is 5*0.7 = 3.5
Normal Distribution
• Model for continuous outcome
• Mean=median=mode

Normal Distribution
Normal Distribution
Properties of Normal Distribution
I) The normal distribution is symmetric about the mean
the normal
distribution.
iii) The mean = the median = the mode.
- s < X <
-
-
iv) P(a < X < b) = the area under the normal curve from a to b.
Example 5.11.

Normal Distribution
Body mass index (BMI) for men age 60 is normally
distributed with a mean of 29 and standard deviation
of 6.
What is the probability that a male has BMI less than
29?
Example 5.11.
Normal Distribution
11 17 23 29 35 41 47
P(X<29)=?
Example 5.11.
Normal Distribution
11 17 23 29 35 41 47
P(X<29)=0.5
0.5 0.5

Example 5.11.
Normal Distribution
Body mass index (BMI) for men age 60 is normally
distributed with a mean of 29 and standard deviation
of 6.
35?
Example 5.11.
Normal Distribution
11 17 23 29 35 41 47
P(X<35)=?
Example 5.11.
Normal Distribution
11 17 23 29 35 41 47
P(X<35)=0.5 + 0.34 = 0.84
0.5 0.34

Standard Normal Distribution Z
-3 -2 -1 0 1 2 3
Example 5.11.
Normal Distribution
σ
μx
Z
11 17 23 29 35 41 47
P(X<35)= P(Z<1) = ?
Example 5.11.
Normal Distribution
P(X<35) = P(Z<1).
Using Table 1, P(Z<1.00) = 0.8413

Table 1. Probabilities of Z
Table entries represent P(Z < Zi)
Zi .00 .01 .02 .03 .04 …
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 …
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 …
.
.
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 …
Example 5.11.
Normal Distribution
11 17 23 29 35 41 47
P(X<30)=?
30?
Example 5.11.
Normal Distribution
0.17

6
2930
σ
μx
P(X<30)= P(Z<0.17) = 0.5675
Percentiles of the Normal Distribution
The kth percentile is defined as the score that holds k
percent of the scores below it.
Eg., 90th percentile is the score that holds 90% of the
scores below it.
Q1 = 25th percentile, median = 50th percentile, Q3 = 75th
percentile
Percentiles
For the normal distribution, the following is used to compute

percentiles:
where
random variable X,
s = standard deviation, and
Z = value from the standard normal distribution for the desired
percentile (See Table 1A).
Percentiles
Percentiles of the Standard Normal Distribution
(Table 1A)
Percentile Z
1st -2.326
2.5th -1.960
5th -1.645
10th -1.282
50th 0
90th 1.282

95th 1.645
97.5th 1.960
99th 2.326
0
-4 -3 -2 -1 0 1 2 3 4
1.645
0.05
0.95
Example 5.12.
Percentiles of the Normal Distribution
BMI in men follows a normal distribution with
The 90th percentile of BMI for men:
X = 29 + 1.282 (6) = 36.69.
The 90th percentile of BMI for women:
X = 28 + 1.282 (7) = 36.97.

Central Limit Theorem
Suppose we have a population with known
simple random samples of size n with
replacement, then for large n, the sampling
distribution of the sample means is
approximately normal with mean and
standard deviation
μμ
X
n
σ
σ
X
Application
• Non-normal population

• Take samples of size n – as long as n is sufficiently
large (usually n > 30 suffices)
• The distribution of the sample mean is approximately
normal, therefore can use Z to compute probabilities
nσ
μx
Z
Example 5.18.
HDL cholesterol has a mean of 54 and
standard deviation of 17 in patients over 50. A
physician has 40 patients over age 50 and
wants to know the probability that their mean
cholesterol is above 60.

Example 5.18.
2.22
4017
5460
nσ
μX
0.0132 0.9868-
Example
Suppose we wish to estimate the mean of a
and equal to 12. Suppose a simple random sample of
100 individuals is selected from the population.
Find the probability that the sample mean is no more
than 2 units from the population mean.

Sampling Distribution of Sample
Mean
- 2
+ 2
-
- 2) - -2/1.2 = -1.67
-
Then: P(-1.67 < Z < 1.67) = 0.9525 – 0.0475 = 0.905
The probability that the sample mean is no more than 2 units
from
the population mean is 0.905, or 90.5%.

X
Chapter 6
Confidence Interval Estimates
Learning Objectives
• Define point estimate, standard error,
confidence level and margin of error
• Compare and contrast standard error and
margin of error
• Compute and interpret confidence intervals for
means and proportions
• Differentiate independent and matched or
paired samples
Learning Objectives
• Compute confidence intervals for the

difference in means and proportions in
independent samples and for the mean
difference in paired samples
• Identify the appropriate confidence interval
formula based on type of outcome variable and
number of samples
Statistical Inference
• There are two broad areas of statistical
inference, estimation and hypothesis testing.
• Estimation, the population parameter is
unknown, and sample statistics are used to
generate estimates of the unknown parameter.
Statistical Inference
• Hypothesis testing, an explicit statement or hypothesis
is generated about the population parameter. Sample
statistics are analyzed and determined to either
support or reject the hypothesis about the parameter.
• In both estimation and hypothesis testing, it is
assumed that the sample drawn from the population is
a random sample.

Estimation
• Process of determining likely values for
unknown population parameter
• Point estimate is best single-valued estimate
for parameter
• Confidence interval is range of values for
parameter:
point estimate + margin of error
Estimation
A point estimate for a population parameter is the
"best" single number estimate of that parameter.
A confidence interval estimate is a range of values for
the population parameter with a level of confidence
attached (e.g., 95% confidence that the range or
interval contains the parameter).

Confidence Interval Estimates
point estimate + margin of error
point estimate + Z SE (point estimate)
where Z = value from standard normal
distribution for desired confidence level and
SE (point estimate) = standard error of the
point estimate
Confidence Intervals for m
• Continuous outcome
• 1 Sample
n > 30 (Find Z in Table 1B)
n < 30 (Find t in Table 2,
df=n-1)
n
s
n
s

Table 2. Critical Values of the t
Distribution
Table entries represent values from t distribution with upper tail
area equal to a.
Confidence Level 80% 90% 95% 98% 99%
Two Sided Test a .20 .10 .05 .02 .01
One Sided Test a .10 .05 .025 .01 .005
df
1 3.078 6.314 12.71 31.82 63.66
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169

Example 6.1.
Confidence Interval for m
In the Framingham Offspring Study (n=3534), the mean
systolic blood pressure (SBP) was 127.3 with a standard
deviation of 19.0. Generate a 95% confidence interval for the
true mean SBP.
n
s
3534
19.0
127.3 + 0.63
(126.7, 127.9)
Example 6.2.
Confidence Interval for m
In a subset of n=10 participants attending the Framingham

Offspring Study, the mean SBP was 121.2 with a standard
deviation of 11.1. Generate a 95% confidence interval for the
true mean SBP.
n
s
10
11.1
(113.3, 129.1)
df=n-1=9, t=2.262
New Scenario
• Outcome is dichotomous (p=population proportion)
– Result of surgery (success, failure)
– Cancer remission (yes/no)
• One study sample
• Data
– On each participant, measure outcome (yes/no)

– n, x=# positive responses,
n
x
p̂
Confidence Intervals for p
• Dichotomous outcome
• 1 Sample
(Find Z in Table 1B)
5)]p̂ n(1,p̂ min[n
n
)p̂ -(1p̂
Zp̂
Example 6.3.
Confidence Interval for p
In the Framingham Offspring Study (n=3532), 1219 patients
were on antihypertensive medications. Generate a 95%
confidence interval for the true proportion on antihypertensive

medication.
n
)p̂ -(1p̂
Zp̂
3532
1219
p̂
3532
0.345)-0.345(1
0.345 + 0.016
(0.329, 0.361)
New Scenario
• Outcome is continuous
– SBP, Weight, cholesterol
• Two independent study samples
• Data
– On each participant, identify group and measure
outcome

– )s(ors,X,n),s(ors,X,n
2
2
2221
2
111
Two Independent Samples
RCT: Set of Subjects Who Meet
Study Eligibility Criteria
Randomize
Treatment 1 Treatment 2
Mean Trt 1 Mean Trt 2
Two Independent Samples
Cohort Study - Set of Subjects Who
Meet Study Inclusion Criteria
Group 1 Group 2

Mean Group 1 Mean Group 2
Confidence Intervals for
• 2 Independent Samples
n1>30
and n2>30 (Find Z in
Table 1B)
n1<30
or n2<30 (Find t in
Table 2,
df=n1+n2-2)
21
21
n
1
n
1
ZSp)X -

21
21
n
1
n
1
tSp)X -
Pooled Estimate of Common Standard
Deviation, Sp
• Previous formulas assume equal variances
(s1
2=s2
2)
• If 0.5 < s1
2/s2
2 < 2, assumption is reasonable
2nn
1)s(n1)s(n
Sp
21

2
22
2
11
Example 6.5.
Using data collected in the Framingham Offspring
Study, generate a 95% confidence interval for the
difference in mean SBP between men and women.
n Mean Std Dev
MEN 1623 128.2 17.5
WOMEN 1911 126.5 20.1
Assess Equality of Variances

• Ratio of sample variances: 17.52/20.12 = 0.76
2nn
1)s(n1)s(n
Sp
21
2
22
2
11
0.1912.359
219111623
1)20.1(19111)17.5(1623
Sp
22

1911
1
1623
1
(19.0) 1.96 126.5) -
21
21
n
1
n
1
ZSp)X -
1.7 + 1.26
(0.44, 2.96)
New Scenario
• Outcome is continuous
– SBP, Weight, cholesterol

• Two matched study samples
• Data
– On each participant, measure outcome under each
experimental condition
– Compute differences (D=X1-X2)
– dd s,Xn,
Two Dependent/Matched Samples
Subject ID Measure 1 Measure 2
1 55 70
2 42 60
.
.
Measures taken serially in time or under different
experimental conditions
Crossover Trial
Treatment Treatment

Eligible R
Participants
Placebo Placebo
Each participant measured on Treatment and placebo
Confidence Intervals for md
• 2 Matched/Paired Samples
n > 30 (Find Z in Table 1B)
n < 30 (Find t in Table 2,
df=n-1)
n
s
ZX d
d
n
s
tX d

d
Example 6.8.
Confidence Interval for md
In a crossover trial to evaluate a new
medication for depressive symptoms, patients’
depressive symptoms were measured after
taking new drug and after taking placebo.
Depressive symptoms were measured on a
scale of 0-100 with higher scores indicative of
more symptoms.
Example 6.8.
Construct a 95% confidence interval for the
mean difference in depressive symptoms
between drug and placebo.
The mean difference in the sample (n=100) is -

12.7 with a standard deviation of 8.9.
Example 6.8.
n
s
ZX d
d
100
8.9
96.112.7-
-12.7 + 1.74
(-14.1, -10.7)
New Scenario
• Outcome is dichotomous
– Result of surgery (success, failure)
– Cancer remission (yes/no)

• Two independent study samples
• Data
– On each participant, identify group and measure
outcome (yes/no)
–
2211
p̂ ,n,p̂ ,n
5)]p̂ (1n,p̂ n),p̂ (1n,p̂ min[n
22221111
2
22
1
11
21

n
)p̂ (1p̂
n
)p̂ -(1p̂
Z)p̂ -p̂ (
Example 6.10.
Confidence Interval for (p1-p2)
A clinical trial compares a new pain reliever to
that considered standard care in patients
undergoing joint replacement surgery. The
outcome of interest is reduction in pain by 3+
scale points. Construct a 95% confidence
interval for the difference in proportions of
patients reporting a reduction between
treatments.

Example 6.10.
Reduction of 3+ Points
Treatment n Number Proportion
New 50 23 0.46
Standard 50 11 0.22
Example 6.10.
2
22
1
11
21
n
)p̂ (1p̂
n
)p̂ -(1p̂
Z)p̂ -p̂ (

50
)22.00.22(1
50
0.46)-0.46(1
96.10.22)-(0.46
0.24 + 0.18
(0.06, 0.42)
Confidence Intervals for Relative Risk (RR)
exp(lower limit), exp(upper limit)
2
222

1
111
n
)/xx-(n
n
)/xx-(n
ZR)R
̂ ln(
Example 6.12.
Confidence Interval for RR
New 50 23 0.46
Standard 50 11 0.22
Construct a 95% CI for the relative risk.
Example 6.12.
Confidence Interval for RR
2.09

0.22
0.46
p̂
p̂
RR
̂
2
50
39/11
50
27/23
1.96ln(2.09)
0.737 + 0.602 exp(0.135), exp(1.339)
(0.135, 1.339) (1.14, 3.82)
Confidence Intervals for Odds Ratio (OR)
exp(lower limit), exp(upper limit)

)x(n
1
n
1
)x(n
1
x
1
ZR)Ôln(
222111
Example 6.14.
Confidence Interval for OR

New 50 23 0.46
Standard 50 11 0.22
Construct a 95% CI for the odds ratio.
Example 6.14.
Confidence Interval for OR
3.02
11/39
23/27
)x-/(nx
)x-/(nx
RÔ
222
39
1
11
1
27

1
23
1
1.105 + 0.870 exp(0.235), exp(1.975)
(0.235, 1.975) (1.26, 7.21)

Chapter 5The Role of ProbabilityLearning Objec.docx

Recommended

Recommended

More Related Content

Similar to Chapter 5The Role of ProbabilityLearning Objec.docx

Similar to Chapter 5The Role of ProbabilityLearning Objec.docx (20)

More from keturahhazelhurst

More from keturahhazelhurst (20)

Recently uploaded

Recently uploaded (20)

Chapter 5The Role of ProbabilityLearning Objec.docx