Binomial distribution and applications

Biostatistics
Introduction to BIOSTATISTICS
Lecturer:
Jalal Karimi, MSc, PhD of Epidemiology
Reference:
Introduction to Biostatistics and Research Methods, Fifth Edition
By Sunder Rao
Department of Community Medicine
Third session
Jalal Karimi, Epidemiologist, PhD,
Community Medicine Department 1

Probability distribution
 For making inferences from samples, we found that we have to think in
terms of the part played by chance.
 This done by considering the sampling distribution and calculating the
probability.
 Three such families witch are fundamental in the theory of statistics are:
 Binomial distribution
 Poisson distribution
 Normal distribution

Binomial distribution
 Very often we are interested in knowing what proportion of individual in a
population possess a particular character.
 For example:
 The proportion persons of a locality who are sick at a particular point of
time.
 An estimate of this proportion is calculated on the basis of a suitably
drown sample from this population and the corresponding sampling
distribution
 In this type of problem the sampling distribution is given by a theoretical
frequency distribution known Binomial distribution.

 Example:
In a morbidity survey in a village, it is found that the proportion of sick persons is 40%.
A sample of 4 person can be any one of the five types having no sick person in the
sample or having 1,2,3, or,4 sick person
Assuming random sampling, there are sixteen ways in witch we will get such sample as
shown in the diagram
4

Binomial distribution, generally
XnX
n
X
pp 






)1(
1-p = probability
of failure
p =
probability of
success
X = #
successes
out of n
trials
n = number of trials
Note the general pattern emerging  if you have only two possible
outcomes (call them 1/0 or yes/no or success/failure) in n independent
trials, then the probability of exactly X “successes”=

The Binomial Distribution
Overview
 However, if order is not important, then
where is the number of ways to obtain x successes
in n trials, and i! = i  (i – 1)  (i – 2)  …  2  1
n!
x!(n – x)!
px  qn – xP(x) =
n!
x!(n – x)!

**All probability distributions are characterized by
an expected value and a variance:
If X follows a binomial distribution with
parameters n and p: X ~ Bin (n, p)
Then:
x= E(X) = np
x
2 =Var (X) = np(1-p)
x =SD (X)= )1( pnp 
Note: the variance will
always lie between
0*N-.25 *N
p(1-p) reaches maximum at
p=.5
P(1-p)=.25

A binomial random variable X is defined to the number
of “successes” in n independent trials where the
P(“success”) = p is constant.
Notation: X ~ BIN(n,p)
In the definition above notice the following conditions
need to be satisfied for a binomial experiment:
1. There is a fixed number of n trials carried out.
2. The outcome of a given trial is either a “success”
or “failure”.
3. The probability of success (p) remains constant
from trial to trial.
4. The trials are independent, the outcome of a trial is
not affected by the outcome of any other trial.

Binomial Distribution
 If X ~ BIN(n, p), then
 where
.,...,1,0)1(
)!(!
!
)1()( nxpp
xnx
n
pp
x
n
xXP xnxxnx








 
psuccessP
nx
nnnn








)"("
trials.insuccesses""
obtaintowaysofnumberthex"choosen"
x
n
11!and10!also,1...)2()1(!

 E.g. when n = 3 and p = .50 there are 8 possible equally
likely outcomes (e.g. flipping a coin)
SSS SSF SFS FSS SFF FSF FFS FFF
X=3 X=2 X=2 X=2 X=1 X=1 X=1 X=0
P(X=3)=1/8, P(X=2)=3/8, P(X=1)=3/8, P(X=0)=1/8
 Now let’s use binomial probability formula instead…
.,...,1,0)1(
)!(!
!
)1()( nxpp
xnx
n
pp
x
n
xXP xnxxnx








 

 E.g. when n = 3, p = .50 find P(X = 2)
.,...,1,0)1(
)!(!
!
)1()( nxpp
xnx
n
pp
x
n
xXP xnxxnx








 
8
3or375.)5)(.5(.3)5(.5.
2
3
)2(
ways3
1)12(
123
!1!2
!3
)!23(!2
!3
2
3
12232



















XP
SSF
SFS
FSS

Example: Treatment of Kidney
Cancer
 Suppose we have n = 40 patients who will be
receiving an experimental therapy which is
believed to be better than current treatments
which historically have had a 5-year survival rate
of 20%, i.e. the probability of 5-year survival is
p = .20.
 Thus the number of patients out of 40 in our
study surviving at least 5 years has a binomial
distribution, i.e. X ~ BIN(40,.20).

Results and “The Question”
 Suppose that using the new treatment we find
that 16 out of the 40 patients survive at least 5
years past diagnosis.
 Q: Does this result suggest that the new therapy
has a better 5-year survival rate than the current,
i.e. is the probability that a patient survives at
least 5 years greater than .20 or a 20% chance
when treated using the new therapy?

What do we consider in answering
the question of interest?
We essentially ask ourselves the following:
 If we assume that new therapy is no better than
the current what is the probability we would see
these results by chance variation alone?
 More specifically what is the probability of
seeing 16 or more successes out of 40 if the
success rate of the new therapy is .20 or 20% as
well?

Connection to Binomial
 This is a binomial experiment situation…
There are n = 40 patients and we are counting the
number of patients that survive 5 or more years. The
individual patient outcomes are independent and IF
WE ASSUME the new method is NOT better then the
probability of success is p = .20 or 20% for all patients.
 So X = # of “successes” in the clinical trial is binomial
with n = 40 and p = .20,
i.e. X ~ BIN(40,.20)

Example: Treatment of Kidney Cancer
 X ~ BIN(40,.20), find the probability that exactly 16
patients survive at least 5 years.
 This requires some calculator gymnastics and some
scratchwork!
 Also, keep in mind we need to find the probability of
having 16 or more patients surviving at least 5 yrs.
001945.80.20.
16
40
)16( 2416






XP

Example: Treatment of Kidney
Cancer
 So we actually need to find:
P(X > 16) = P(X = 16) + P(X = 17) + … + P(X = 40)
+
…
+
= .002936
001945.80.20.
16
40
)16( 2416






XP
000686.80.20.
17
40
)17( 2317






XP
080.20.
40
40
)40( 040






XP
The chance that we would see
16 or more patients out of 40
surviving at least 5 years if the
new method has the same
chance of success as the current
methods (20%) is VERY
SMALL, .0029!!!!

Conclusion
 Because it is high unlikely (p = .0029) that we would
see this many successes in a group 40 patients if the
new method had the same probability of success as the
current method we have to make a choice, either …
A) we have obtained a very rare result by dumb luck.
OR
B) our assumption about the success rate of the new
method is wrong and in actuality the new method has a
better than 20% 5-year survival rate making the
observed result more plausible.

The Poisson Distribution
 When there is a large number of trials, but a small
probability of success, binomial calculation becomes
impractical
 Example: Number of spells of diarrhea observed in a
group of infants over a predetermined period can be
counted but not the number of spells that did not
occur.
 The probability of observing one spell, two spells,
etc., in a given sample in such cases, can theoretically
be found out by the use of Poisson distribution
P(x) =
e -µµx
x!

Assuming these are independent random events, the number
of people killed in a given year therefore has a Poisson
distribution:
Answer:
Poisson distribution



The Normal Distribution
 Properties of the Normal Distribution
 Shapes of Normal Distributions
 Standard (Z) Scores
 The Standard Normal Distribution
 Transforming Z Scores into Proportions
 Transforming Proportions into Z Scores
 Finding the Percentile Rank of a Raw Score
 Finding the Raw Score for a Percentile

 Normal Distribution – A bell-shaped and
symmetrical theoretical distribution, with the
mean, the median, and the mode all coinciding at its
peak and with frequencies gradually decreasing at
both ends of the curve.
Normal Distributions
• The normal distribution is a theoretical ideal
distribution. Real-life empirical distributions never
match this model perfectly. However, many things
in life do approximate the normal distribution, and
are said to be “normally distributed.”

Scores “Normally Distributed?”
 Is this distribution normal?
 There are two things to initially examine: (1) look at
the shape illustrated by the bar chart, and (2)
calculate the mean, median, and mode.
Table 10.1 Final Grades in Social Statistics of 1,200 Students (1983-1993)
Midpoint
Score Frequency Bar Chart Freq.
Cum. Freq.
(below) %
Cum %
(below)
40 * 4 4 0/33 0/33
50 ******* 78 82 6/5 6/83
60 *************** 275 357 22/92 29/75
70 *********************** 483 840 40/25 70
80 *************** 274 1114 22/83 92/83
90 ******* 81 1195 6/75 99/58
100 * 5 1200 0/42 100

Scores Normally Distributed!
 The Mean = 70.07
 The Median = 70
 The Mode = 70
 Since all three are essentially equal, and this is
reflected in the bar graph, we can assume that these
data are normally distributed.
 Also, since the median is approximately equal to
the mean, we know that the distribution is
symmetrical.

The Shape of a Normal Distribution:
The Normal Curve

The Shape of a Normal Distribution
Notice the shape of the normal curve in this graph. Some normal
distributions are tall and thin, while others are short and wide. All
normal distributions, though, are wider in the middle and
symmetrical.

Notice that the standard deviation changes the relative width of the
distribution; the larger the standard deviation, the wider the curve.
Different Shapes of the Normal Distribution

Areas Under the Normal Curve by
Measuring Standard Deviations

Standard (Z) Scores
 A standard score (also called Z score) is
the number of standard deviations that a
given raw score is above or below the
mean.
yS
YY
Z



The Standard Normal Table
 A table showing the area (as a proportion,
which can be translated into a percentage) under
the standard normal curve corresponding to
any Z score or its fraction
Area up to
a given score

The Standard Normal Table
 A table showing the area (as a proportion,
which can be translated into a percentage) under
the standard normal curve corresponding to
any Z score or its fraction
Area beyond
a given score

Finding the Area Between the Mean
and a Positive Z Score
 Using the data presented in Table 10.1, find the
percentage of students whose scores range from the
mean (70.07) to 85.
 (1) Convert 85 to a Z score:
Z = (85-70.07)/10.27 = 1.45
(2) Look up the Z score (1.45) in next slide
finding the proportion (.4265)

Finding the Area Between the
Mean and a Positive Z Score
(3) Convert the proportion (.4265) to a percentage (42.65%); this
is the percentage of students scoring between the mean and 85 in
the course.

Mean and a Negative Z Score
 Using the data presented in Table 10.1, find
the percentage of students scoring between
65 and the mean (70.07)
 (1) Convert 65 to a Z score:
Z = (65-70.07)/10.27 =
•(2) Since the curve is symmetrical and
negative area does not exist, use .49 to find
the area in the standard normal table:
-.49
.1879

Mean and a Negative Z Score
(3) Convert the proportion (.1879) to a percentage (18.79%); this is the
percentage of students scoring between 65 and the mean (70.07)

Finding the Area Between 2 Z Scores
on the Same Side of the Mean
 Using the same data presented in Table 10.1, find the
percentage of students scoring between 74 and 84.
 (1) Find the Z scores for 74 and 84:
Z = .38 and Z = 1.36
 (2) Look up the corresponding areas for those Z scores:
.1480 and .4131

Finding the Area Between 2 Z Scores on the Same
Side of the Mean
(3) To find the highlighted area above, subtract the smaller area
from the larger area (.4131-.1480 = ).2651
Now, we have the percentage of students scoring
between 74 and 84.

Finding the Area Between 2 Z Scores on Opposite
Sides of the Mean
 Using the same data, find the percentage of students
scoring between 62 and 72.
 (1) Find the Z scores for 62 and 72:
Z = (72-70.07)/10.27 = .19
-.79
.3605
Z = (62-70.07)/10.27 =
(2) Look up the areas between these Z scores and
the mean, like in the previous 2 examples:
Z = .19 is .0753 and Z = -.79 is .2852
(3) Add the two areas together: .0753 + .2852 =

Finding the Area Between 2 Z Scores
on Opposite Sides of the Mean
(4) Convert the proportion (.3605) to a percentage (36.05%); this
is the percentage of students scoring between 62 and 72.

Finding Area Above a Positive Z Score or Below a
Negative Z Score
 Find the percentage of students who did (a) very well,
scoring above 85, and (b) those students who did
poorly, scoring below 50.
 (a) Convert 85 to a Z score, then look up the value in
Column C of the Standard Normal Table:
Z = (85-70.07)/10.27 = 1.45 
(b) Convert 50 to a Z score, then look up the value
(look for a positive Z score!) in Column C:
Z = (50-70.07)/10.27 = -1.95 
7.35%
2.56%

Finding Area Above a Positive Z
Score or Below a Negative Z Score

Finding a Z Score Bounding an Area Above It
 Find the raw score that bounds the top 10 percent of
the distribution (Table 10.1)
 (1) 10% = a proportion of .10
 (2) Using the Standard Normal Table, look in Column
C for .1000, then take the value in Column A; this is
the Z score (1.28)
(3) Finally convert the Z score to a raw score:
Y=70.07 + 1.28 (10.27) = 83.22

Finding a Z Score Bounding an Area Above It
(4) 83.22 is the raw score that bounds the upper 10% of the
distribution. The Z score associated with 83.22 in this
distribution is 1.28

Finding a Z Score Bounding an Area
Below It
 Find the raw score that bounds the lowest 5 percent of
the distribution (Table 10.1)
 (1) 5% = a proportion of .05
 (2) Using the Standard Normal Table, look in Column
C for .05, then take the value in Column A; this is the
Z score (-1.65); negative, since it is on the left side of
the distribution
 (3) Finally convert the Z score to a raw score:
Y=70.07 + -1.65 (10.27) = 53.12

Finding a Z Score Bounding an Area Below It
(4) 53.12 is the raw score that bounds the lower 5% of the
distribution. The Z score associated with 53.12 in this
distribution is -1.65

Binomial distribution and applications

More Related Content

What's hot

Similar to Binomial distribution and applications

Recently uploaded

Binomial distribution and applications

Editor's Notes