Health probabilities & estimation of parameters

1. Probability concepts
Probability: Likelihood of occurrence of events that are
subject to chance.
Sample space: Collection of possible events
Mutually exclusive (disjoint) : Events can not exist at the
same time
Not mutually exclusive (not disjoint): Events can happen at
the same time
Independent : The occurrence or non is not affected by others
Dependent (condition ): Second event is conditional on the
first

2. Properties of probability
A. 0 ≤ P(E) ≤1
B. P(E1)+P(E2)+…….+P(En)= 1
C. P(𝑬)=P(not E)=1-P(E)

 A community nurse knows that a type of infant
skin rash can only mean one of the three
conditions : A, B or C.
 She knows from (the literature) that the
probability of the rash being caused by
condition A is 0.1,B is 0.65.
 So, it is immediately apparent that there is a
0.25 chance of C being the underlying
condition.

3.Rules of probability
A. Addition rules of probability (disjoint events )
 P (A or B)= P (A) +P (B)
The table below describes number of live births by birth
weight and trimester of first prenatal care. What is the
probability of late prenatal (third trimester) or no prenatal
care . P (late prenatal or no prenatal care)=0.078
Number of live births by birth weight and trimester of first prenatal care
Trimester pre-natal care begun
Birth weight 1st 2nd 3rd No care Total
≤2,500 g 2,412 754 141 234 3,541
2,500-3,500 20,274 5,480 1,458 1,014 28,226
>3,500 15,250 3,271 738 447 19,706
Total 37,936 9,505 2,337 1,695 51,473

B. Addition rules of probability(not disjoint event )
 P (A or B)= P (A) +P (B)-(P (A) ∩ P(B)
Using the table above find for a woman who had a
live birth the probability that either the birth weight
was 2,500 or less or the woman began her prenatal
care during the first trimester.
P≤2,500 or 1st trim=
3541+37936−2412
51473
=0.759

C. Multiplicative rule of probability(independent event)
 P (A and B)= P (A) x P (B)
we might be interested in whether the
mother or father is hypertensive. Suppose
we know that P(A) = 0.1, P(B) = 0.2. What is
the probability both mother and father are
hypertensive?
P (A∩ B) = P (A) × P(B) = 0.1(0.2) = 0.02

D. Multiplicative rule of probability(dependent
event)
 P (A and B)= P (A) x P (B│A)
53% of the people living with HIV has Tuberculosis .
Of these 27% has Miliary tuberculosis. If a person is
selected at random what is the probability that the
person has both infections?
P(TB and MTB)=0.53X0.27=0.14

E. Condition probability
 P (A│B) =
P (A) ∩ P(B)
P(B)
What is the probability of a woman having a live birth
weighing less than or equal to 2,500 g, given that she
began her prenatal care during the first trimester
Number of live births by birth weight and trimester of first prenatal care
Trimester pre-natal care begun
Birth weight 1st 2nd 3rd No care Total
≤2,500 g 2,412 754 141 234 3,541
2,500-3,500 20,274 5,480 1,458 1,014 28,226
>3,500 15,250 3,271 738 447 19,706
Total 37,936 9,505 2,337 1,695 51,473

F. The law of complements applied to conditional probability
P (‾A│B)= [(P not A)│B]=1- P (A│B)
The probability of breast cancer recurring after
treatment within five years, given that it has
spread to the lymph glands at the time of initial
diagnosis (stage 2 cancer) is 0.6. What is the
probability that a Stage II breast cancer will not
recur within 5 years?
P(No occurrence│stage II)=1- P(
occurrence│stage II)=0.4

D+ is the event
D- is free from the event
T+= Test positive
T- =Test negative

In a certain population, a diagnostic test for
the Human Immuno-Deficiency Virus will
detect the disease in 90% of those who actually
are afflicted. Also, if a person is not infected,
the test will be negative for HIV with probability
0.95. It is estimated that 0.1% (0.001) of this
population is infected with HIV. A person is
chosen at random from the population and is
tested for HIV. The test is positive for the
disease. What is the probability that the person
is in fact HIV infected ?

=
0.90𝑋0.001
0.90𝑋0.001 +(0.05𝑋0.999)
=0.018

Factorial , Combinations, permutation
a) Factorial :product of the all integers up to and including a
given integer
There is a special symbol that is used to denote
k! = k factorial
e.g., 4! = 4x3x2x1= 24
By convention 0!=1
b) Combinations: the number of ways of selecting k items out
of n, where the order of selection does not matter
𝒏
𝒌
= nCk =
𝒏!
𝒌! 𝒏 − 𝒌 !
c) Permutations:number of ways of selecting k objects out of
n, where the order of selection matters.
nPk =
𝒏!
𝒏−𝒌 !

I. Discrete Probability distribution
A. Binomial distribution
The probability distribution of the number of
successes (k) among n trials where each trial has
probability of success = P, and probability of
failure = q = 1-P and the trials are independent.
We can write the binomial distribution in terms of
factorials:
Where k = 0,1…….,n.

Q1. What is the probability of obtaining 2 boys out
of 5 children if the probability of a boy is 0.51 at
each birth and the sexes of successive children are
considered independent random variables?
P(x=2)=0.306
Q2. Suppose the underlying rate of disease in the
offspring is 0.05. Under this assumption, the
number of households in which the infants develop
chronic bronchitis will follow a binomial distribution
with parameters n = 10, p = 0.05. What is the
probability of observing at least 2 households with a
bronchitic child?
P(X≥2 )=1-[P(X=0)+P(X=1)+ P(X=2)]=?

B. Poisson distribution
Events that occur randomly in a given
interval of time (or space).

Births in a hospital occur randomly at an average
rate of 1.8 births per hour. What is the probability of
observing 4 births in a given hour at the hospital?
What about the probability of observing more than
or equal to 2 births in a given hour at the hospital?
P(X≥2 )=1-[P(X=0)+P(X=1)+ P(X=2)]=?

II. Continuous probability distribution
The normal distribution

If the total cholesterol values for a certain target population are
approximately normally distributed with a mean of 200 (mg/100
mL) and a standard deviation of 20 (mg/100 mL), what is the
probability that a person picked at random from this population
will have a cholesterol value a) less than 230, b) between 240 and
250, c) greater than 240 (mg/100 mL)
a)Z=
230−200
20
=1.5=0.9332
b)Z=
240−200
20
=2,
250−200
20
=2.5
=ɸ2.5-ɸ2=0.0166 or 1.66%
c)Z=
240−200
20
=2
=1-0.9772=0.0228 or 2.28%

Statistical inference :reach a conclusion about
a population on the basis of the information
contained in a sample
The sampled population : the population from
which one actually draws a sample.
The target population :the population about
which one wishes to make an inference.

Sampling Distributions & The central-limit
theorem
The distribution of values of a statistic obtained
from repeated samples of the same size from a
given population.
e.g.: Consider a population consisting of six
subjects
The table below gives the subject names (for
identification) and values of a variable
under investigation. In this case the population
mean μ=0.5.We now consider all possible
samples, without replacement, of size 3

This sampling distribution gives us a few
interesting properties:
1. Its mean (i.e., the mean of all possible sample
means) is:
Sample mean (sample proportion) is an unbiased
estimator for the population mean

2. If we form a bar graph for this sampling
distribution . It shows a shape somewhat
similar to that of a symmetric, bell-shaped
normal curve.

The central-limit theorem
Given any population with mean μ and variance
σ2
, the sampling distribution of 𝑋will be
approximately normal with mean μand variance
σ2/n when the sample size n is large.
By transformingthe normal distribution of 𝑋 to
the standard normal distribution:
Z=
𝑋− μ𝑋
σ/ 𝑛

Example
Suppose it is known that in a certain large human
population cranial length is approximatelynormally
distributed with a mean of 185.6 mm and a standard
deviation of 12.7 mm. What is the probability that a
random sample of size 10 from this population will
have a mean greater than 190?
Z=
190−185
4.0161
=1.10
By consulting the standard normal table, we find that
the area to the right of 1.10 is 0.1357; hence, we say
that the probability is 0.1357 that a sample of size 10
will have a mean greater than 190.

A point estimate :single numerical value used
to estimate the corresponding population
parameter.
An interval estimate :two numerical values
defining a range of values that, with a specified
degree of confidence, most likely includes the
parameter being estimated.

The standard error
SE=
𝑆
𝑛
The standard error is a measure of the precision
of a sample estimate.
Interpretation:
 A large standard error indicates that the
estimate is imprecise.
 A small standard error indicates that the
estimate is precise.
 The standard error is reduced, that is, we
obtain a more precise estimate, if the size of
the sample is increased.

Find the SE of birth weights of infants who
were born prematurely, for which n = 98,
=𝑋1.31 kg and s = 0.42 kg

Confidence interval
 CI is a type of interval estimate of a
population parameter.
 Confidence intervals consists of a range of
values (interval) that act as good estimates of
the unknown population parameters.
 Researchers may use any confidence
coefficient they wish.
 The most frequently used values are
90%;95%,99% with associated reliability
factors 1,645;1.96;2.58

Interval estimate for the mean
Such an interval for the population meanμ is
defined by 𝑿 ±1.96XSE
The mean birth weight of 98 infants who were born
prematurely as:
𝑋=1.31Kg
S=0.42
With a 95% confidence interval find the true
population mean birth weight.

Estimation of the proportion
Some studies have suggested a relationship
between exposure to anesthetic gases and
incidence of breast cancer . To examine this
relationship, a study was set up among 10,000
female operating-room nurses age 40-59.
suppose that the 5 years risk of breast cancer
in the general population in this age group
=Po= 0.005. we find that among 10,000
female operating room nurses, 60 women have
developed the disease over 5 years. Is this a
significant excess over the expected rate based
on the national rate ?

Our best estimate of the risk = 𝑃 =60/10000=
0.006
P= true rate for 40-49 years-old which is unknown
Po=0.005=general population rate for 40-49 years-
𝑃= estimate of P=0.006
We will use 𝑃as an estimate of PWe need to obtain a
95% CI for P based on 𝑃 .
Based on the central-limit theorem, an approximate
95% CI for p is given by the interval 𝑃 ± 1.96
𝑃𝑞
𝑛

= 0.006 − 1.96
0.006(0.994)
10,000
; 0.006 + 1.96
0.006(0.994)
10,000
= 0.0045; 0.0075
Since this interval include Po=0.005, we would
conclude that there is not excess risk of breast
cancer among these female.

Health probabilities & estimation of parameters

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Health probabilities & estimation of parameters

Similar to Health probabilities & estimation of parameters (20)

Recently uploaded

Recently uploaded (20)