Micro-Mph (1).pptx

Why study statistics?
• Data is everywhere
• Statistical techniques are used to make many decisions that affect our
lives
• No matter what your career is, you will make professional decisions
that involve data.
• An understanding of statistical methods will help you make these
decisions effectively.
10/4/2023 By Degemu S (MPH) 1

Use of biostatics
• Use for the method of data organization
• Health status assessment
• For evaluation of the health program
• Allocation of resource
• Magnitude of disease /condition
• Assessing risk factor
• Evaluation of new medicine or drug
• Drawing of inference
• Hospital utility statics
• To uptake vaccine

Biostatistics in Public Health?
• What is public health all about?
“Public Health is the science and art
of preventing disease, prolonging life, and promoting health
through the organized efforts of society.”
(World Health Organization)

The Functions of Public Health
• Assessment: Identify problems related to the
public’s health, and measure their extent
•Policy Setting: Prioritize problems find
•Policy Setting: Prioritize problems, find possible solutions, set regulations to
achieve change and predict effect on the population.
•Assurance: Provide services as determined by policy, and monitor compliance
•Evaluation is a theme that cuts across all these functions, i.e., how well are
they performed?

Role of Biostatistics in PH
• Assessment: Identify problems related to the public’s health, and measure their
extent
•Role of Biostatistics in assessment: –
• Decide which information to gather,
• Find patterns in collected data, and
• Make the best summary description of the population and associated problems.
• Design general surveys of the population needs,
Plan experiments to supplement these surveys
• Assist scientists in estimating the extent of health problems and associated risk factors.

• Policy Setting: Prioritize problems, find possible solutions, set regulations
to achieve change, and predict the effect on the population
•Role of the Biostatistics in Policy Setting:
• Measure problems
• Prioritize problems
• Quantify associations of risk factors with the disease,
• Predict the effect of policy changes
• Estimate costs.

• Assurance: Provide services as determined by policy, and monitor co
mpliance.
• Role of the Biostatistics in Assurance & Evaluation:
• Use sampling and estimation methods to study
the factors related to compliance and outcome.
• Decide if improvement is due to compliance or something else, how best to m
easure compliance, and how to increase the compliance level in
the target population.

Role of Biostatistics in Health Research
• Purpose of Health Research:
–To create knowledge essential for action to improve health.
• Without good knowledge health intervention would not have neither
logical nor empirical basis and are bound to fail.

Role of Biostatistics in Health Research
Planning
Designing
Data
processing
Execution
(data
collection)
Data analysis
Interpretation
Publication
Presentation
Step in
research
Statistical
thinking
contribute in
every step in
a research
10/4/2023 9
By Degemu S (MPH)

Introduction …
• Variable :-Any aspect/ Characteristics of an individual that is measured
and take any value for different individuals or cases, like blood pressure, or
recorded, like age, sex etc
• Quantitative Variables:- the is one that can be measured in the usual
sense/number. example heights of adult males, the weights of preschool
children, and the ages of patients seen in a dental clinic.
• Qualitative Variables:- Some characteristics are not capable of being
measured in the sense that height, weight, and age are measured. Example
sex of an individual , ethnicity of an individual , religion of individual.

Introduction….
• Population :- the largest collection of entities for which we have an
interest at a particular time usually people..
• Sample:- A sample may be defined simply as a part of a population

Scale of measurement
• Measurement:- is the assignment of numbers to objects or events
according to a set of rules.
• Scale of measurement concerned with the nature of the numbers that
result from measurements.
• Measurements can be qualitative(categorical or quantitative)
• Although the types of variables could be broadly divided into
categorical (qualitative) and quantitative, it has been a common
practice to see four basic types of data (scales of measurement)

Count
• Most basic measure of disease frequency is a simple count of affected
individuals.
• Example:
• 350,000 cases of polio
• 350,000 cases of polio in 1988
• 350,000 cases of polio in 1988 in 125 countries

Ratio, proportion and rate

Ratio
• The quotient of 2 numbers
• Numerator NOT INCLUDED
in the denominator
• No relationship necessary between the numerator and denominator
• May be expressed as a/b or a:b

Ratio

When the ratio used?
• Sex ratio: Male to female
• Number of health facilities per population
• Number of participants in the course per facilitator
• Number of inhabitants per latrine
• Odds ratio
• Relative risk
• Prevalence ratio
• Maternal mortality ratio

Ratio
• Example 1
• A university has 4000 male students and 2000 female students. The
ratio of male to female students is:
• 4000/2000 = 2/1 or 2:1
• For every 2 male students there is one female student

Ratio
• Example 2
• A foodborne epidemic occurred in an elementary school canteen. The
attack rate in the first grade was 24% while the attack rate in the second
grade was 16%. Compare these two attack rates.
• 24/16 = 3/2 or 3:2
• For every 3 first graders who fell ill, there were 2-second graders who
also fell ill.

Ratio
Example 3, A city of 4 million people has 400 clinics.
Calculate the ratio of clinics per person.
Ratio = 400 / 4,000,000 = 0.0001 clinics / person
Multiply by 104
Ratio = 0.0001 x 104
= 1 clinic / 10,000 persons

Proportion
• Numerator is a sub-group of the population in the denominator
• Numerator is always INCLUDED in the denominator
• Proportion ranges between 0 and 1
• Percentage = proportion x 100

What is the proportion of cases?
50%
100
0.5
total
4
cases
2



+ +
+
+ - -

When is a proportion used?
• Proportion of samples positive for P. Falciparum
• 1000 samples, 236 positive
• Proportion of positive samples = 236/1000 = 0.236
• Parentage of positive samples = 0.236 x 100 = 23. 6%
• Proportion of malaria deaths
• 123 malaria cases, 7 deaths
• Proportion of malaria deaths = 7/123 = 0.057
• Percentage of malaria deaths = 0.057 x 100 = 5.7%

Proportion
• Example 1
• A university has 4000 male students and 2000 female students.
Calculate the proportion of male and female students.
• Male: 4000/6000 x 100% = 66.7%
• Female: 2000/6000 x 100% = 33.3%

Proportion
Example 2
40 children are currently ill with the measles, 80 children all together
have had the measles
• 40 / 80 = .50 (proportion)
• 40 / 80 = .50 * 100 = 50% (percentage)

Rate
• Measures the probability of occurrence of an event over TIME
• Numerator: number of EVENTS
• Denominator: POPULATION at risk for the event in numerator
observed for a given TIME

What is the rate of death?

year
per
100
2
Observed in one year
2 deaths per 100 population per year

When is a rate used?
• Morbidity rates
• Attack rates
• Prevalence rates
• Incidence rates
• Mortality rates
• Natality rates

Rate Example 1
• Mortality rate of tetanus in France in 1995
• Tetanus deaths: 17
• Population in 1995: 58 million
• Time period: 1 year
• Mortality rate = 0.029 per 100,000 population per year
• Rate may be expressed in any power of 10
• 100, 1,000, 10,000, 100,000
• Rate must include an aspect of time
• Per year, per month, per day

Rate Example 2
Continent Rate
Africa 273000
Asia 217000
Europe 2000
Latin America/Caribbean 22000
South America 15000
North America 490
Australia/New Zealand 25
Maternal Mortality for Various Continents (1995)

Summary
14
Is numerator included
in denominator?
Yes No
Is time included
in denominator?
Yes No
Measure: Rate Proportion Ratio
W
hat istheMeasureof F
requency
?

Nominal Data
 As the name implies data that represent mutually
exclusive categories which do not have natural
order/rank
 There is no implied order /rank to the
categories of nominal data.
 Individuals simply placed in the proper category
or group
 Each item must fit into exactly one category.
 “The category can be assigned by numbers,
names or symbols
Sex
1. Male
2. Female
Marital status
1. Single
2. Married
3. Divorced
4. Widow
Outcome of patient after care accident
1. Alive
2. Dead
Blood group
1. A
2. B
3. O
4. AB

Ordinal data
 The data representing mutually exclusive
categories with ranked order is called ordinal
data.
 The spaces or intervals between the categories are
not necessarily equal.
 The function of numbers assigned to ordinal data
is to order (or rank) the observations from lowest
to highest and, hence, the term ordinal.
Example
job satisfaction index
1. Strongly Disagree
2. Disagree
3. Neutral
4. Agree
5. Strongly agree
Class room rank
1. first
2. Second
3. Third
degree of burn
1. first degree
2. Second degree
3. Third degree
4. Fourth degree
progressive health status of patient after admission
1. Unimproved
2. Improved
3. Much improved
Pain level:
1. None
2. Mild
3. Moderate
4. Severe

Interval data
 It is truly quantitative data
 The intervals between measured values are the same
 Distance between any two measurements is known and the
same.
 The unit of distance and a zero point, both of which are
arbitrary.
 Zero may has no true meaning i.e. may not indicate a total
absence of the quantity being measured.
 The ratio between two measurements have no meaning . For
example 40 degrees Fahrenheit is not twice as much as 20
degrees Fahrenheit
• Example
 Temperature scale
degrees Fahrenheit or Celsius.
 In this case the unit of measurement is the degree,
and the point of comparison is the arbitrarily chosen
"zero degrees,” which do not indicate a lack of heat.
 IQ

Ratio data
 The highest level of measurement is the
ratio scale.
 This scale is characterized by the fact
that equality of ratios well as equality of
intervals may be determined.
 Fundamental to the ratio scale is a true
zero point.
• Example
 height
 weight
 Length
 Age
 Cholesterol level
 Serum sugar level
 the number of TB patient flow to the
hospital

Scale of Measurement
Nominal scale
Ordinal scale
Interval scale
Ratio scale
Degree
of
precision
in
measuring

Numerical Discrete and Numerical Continuous Data
 Both interval and ratio data involve measurement.
 Most data analysis techniques that apply to ratio data also apply to
interval data.
 In most practical aspects both interval and ratio data can be classified
as numerical discrete and numerical continuous.

Numerical Discrete
 For discrete data, both ordering and
magnitude are important.
 the numbers represent actual measurable
quantities rather than mere labels.
 discrete data are restricted to taking on
only specified values—often integers or
counts—that differ by fixed amounts
 no intermediate values are possible.
• Example
 The number of bacteria colonies on a plate
 The number of cells within a prescribed area upon
microscopic examination
 The number of heartbeats within a specified time interval
 The number of times a woman has given birth gravidity
 The number of episodes of illness a patient experiences
during some time period
 number of motor vehicle accidents
 The number of beds available in a particular hospital.
 Etc….

Numerical continuous
 The scale with the greatest degree of quantification is
a numerical continuous scale.
 Each observation theoretically falls somewhere along
a continuum.
 One is not restricted, in principle, to particular values
such as the integers of the discrete scale.
 The restricting factor is the degree of accuracy of the
measuring instrument
• Example
 most clinical
measurements, such as
 blood pressure
 serum cholesterol level
 Height
 weight
 age etc. are on a
numerical continuous
scale.

Categorizing Variables-Exercise
1. Year of birth: numerical
2. Marital status of women: Nominal
3. Identification number study participant: numerical
4. Class rank:ordinal
5. Length of infants at ANC clinic:numerical

Discrete or Continuous?

Inferential Statistics

Probability And Probability Distributions
 The central idea of statistical designs for producing data.
 Probabilities are used in everyday communication
 A patient has a 50 – 50 chance of surviving a certain operation
 The chance of a 30 year old woman to celebrate her 70th birthday is 30%
 These examples suggest the chance of an occurrence of some event of
a random variable.
 Probability theory was developed out of attempting to solve problems
related to games of chance such as tossing a coin, rolling a die etc.
i.e. trying to quantify personal beliefs regarding degrees of
uncertainty

Probability And Probability distribution…..
• Probabilities and probability distributions are extensions of the
ideas of relative frequency and histograms, respectively.
 Relative frequency probability: If some process is repeated a large number of n
times, and some resulting event E occurs m times, the relative frequency of E will
be approximately equal to m/n.
 Symbolically: Pr (E) = m/n
 E.g. Suppose that of 158 people who attended a dinner party, 99 were ill due
to food poisoning. The probability of illness for a person selected at random is
Pr (illness) = 99/158 = 0.63 or 63%.

• Results are not certain, uncertainty is high
•To evaluate how accurate our results are:
–Given how our data were collected, are our results accurate?
–Given the level of accuracy needed, how many observations need to
be collected?
–The sample size issue?

• When dealing with a process that has an uncertain outcome
–Birth of male or female child?
–Tossing a coin?
–A patient taking a certain drug(cure/no)?
–The fate of the patient?

• Experiment=any process with an uncertain outcome.
• An experiment is a trial and all possible outcomes are events
Event=some thing that may happen or not when the experiment is
performed (either occur or not)
• Events are represented by upper case letters such as A,B,C,etc

•Probability = can be defined as the number of times in which that event
occurs in a very large number of trials.
• Probability of an Event E a number between 0 and 1 representing the
proportion of times that event E is expected to happen when the
experiment is done over and over again under the same conditions.

Probability And Probability Distributions…..
• Any event can be expressed as a subset of the set of all possible
outcomes(sample space=S)
• S = set of all possible outcomes P(S) = 1
• An event is any set of outcomes of interest.

Why Probability in Medicine
• Because medicine is an in exact science, physicians seldom predict an
outcome with absolute certainty.
•E.g. to formulate a diagnosis, physician must rely on available diagnostic
information about a patient
–History and physical examination
–Laboratory investigation-ray findings, ECG, etc.
• Although no test result is absolutely accurate , it does affect the
probability of the presence(or absence) of a disease.
–Sensitivity and specificity
• An understanding of probability is fundamental for quantifying the
uncertainty that is inherent in the decision-making process.

cont.…
• Probability theory is a foundation for statistical inference.
• Allows us to draw conclusions about a population of patients based
on information obtained from a sample of patients drawn from that
population.
• Probability used to:-
• About probability distributions: Binomial, Poisson, and Normal Distributions
• Sampling and sampling distributions
• Estimation
• Hypothesis testing
• Advanced statistical analysis

Categories of Probability
• Objective and Subjective Probabilities.
• Objective probability
1) Classical probability
2) Relative frequency probability
1. Classical Probability :
• Is based on gambling ideas
•Rolling a die –There are 6 possible outcomes:
• Total ways = {1, 2, 3, 4, 5, 6}.
• Each is equally likely to occur –P(i) = 1/6, i=1,2,...,6. P(1) = 1/6 P(2) = 1/6 ,
P(6) = 1/6
• SUM = 1

Classical Probability
• Definition: If an event can occur in N mutually exclusive and equally
likely ways, and if m of these posses a characteristic , E , the probability
of the occurrence of E=m/N.
• P(E)= the probability of E = m/N P(E)= the probability of E = m/N
• If we toss a die,
What is the probability of 4 coming up?
• m=1(which is 4) and N=6
• The probability of 4 coming up is 1/6.

Classical Probability
• Another “equally likely” setting is the tossing of a coin –
–There are 2 possible outcomes in the set of all possible outcomes
–{H, T}. P(H) = 0.5 P(H) = 0.5 P(T) = 0.5 SUM = 1.0
Relative Frequency Probability
•In the long run process…..
•The proportion of times the event A occurs in a large number of trials
repeated under essentially identical conditions.

Relative Frequency Probability
• Definition: If a process is repeated a large number of times(n), and if
an event with the characteristic E occurs m times, the relative
frequency of E.
• Probability of E = P(E) = m/n.
• If you toss a coin 100 times and the head comes up 40 times,
• P(H)=40/100=0.4
• If we toss a coin 10,000 times and the head comes up 5562, the head comes up
5562
• P(H)=0.5562.
•Therefore, the longer the series and the longer the sample size, the closer the
estimate to the true value (0.5).

Subjective Probability
• Personalistic (An opinion or judgment by a decision maker about the
likelihood of an event).
•Personal assessment of which is more effective to provide a cure–traditional/modern
•Personal assessment of which sports team will win a match.
•Also uses classical and relative frequency methods to assess the likelihood
of an event, but does not rely on the repeatability of any process.
E.g., If someone says that he/she is 90% certain that a cure for AIDS will be
discovered within 5 years, then it means that:
P (discovery of a cure for AIDS within 5 years) P (discovery of a cure for
AIDS within 5 years) = 90% = 0.90

Mutually Exclusive Events
• Two events A and B are mutually exclusive if they cannot both happen
at the same time .
• P (A n B) = 0
• If E1 occur , then E2 cannot occur
• E1 and E2 have no common element
E1 E2
YELLOW
CARD
BLACK
CARD
A card cannot black and
yellow at the same time

Mutually Exclusive Events
• Example: –A coin toss cannot produce heads and tails simultaneously.
–Weight of an individual can’t be classified simultaneously as“ underweight ”,
“normal ”,“ overweight” “normal” ,“overweight”
–Blood pressure reading: A=(DBP<90)and B=(90>DBP<95),can’t occur at the
same time.
Independent Events.
•Two events A and B are independent if the probability of the first one happening is
the same no matter how these condone turns out.
•The outcome of one event has no effect on the occurrence or non-occurrence of the
other.
• non-occurrence of the other.
P(A u B) = P(A) x P(B) (Independent events)
• Example: –The outcomes on the first and second coin tosses are independent

Dependent event
• Occurrence of one affects the probability of the other
• P(A n B) ≠ P(A) x P(B)
•Example: Consider the DBP measurements from a mother and her first-
born child. Let: from a mother and her first-born child.
Let: A = {mother’s DBP≥95} and B = {first-born child’s DBP≥80}
•Suppose P{A n B} = 0.05 P{A} = 0.1 P{B} = 0.2
Then P{AB} = 0.05 > P{A} x P{B} = 0.02 And Events A, B would be
dependent.

Dependent event
E1= rain forecasted on news
E2=take umbrella to work
Probability of the second event affected by occurrence of the first event
Intersection, and union
• The intersection of two events A and B, A n B, is the event that A and
B happen simultaneously. P(A and B)=P(An B)
•Let A represent the event that a randomly selected new born is LBW,
and B the event that he or she is from a multiple birth
•The intersection of A and B is the event that the infant is both LBW and
from a multiple birth.

Intersection, and union
• The union of A and B , AUB, is the event that either A happens or B
happens or they both happen simultaneously
• P(A or B)=P(AUB)
• Here , the union of A and B is the event that the new born is either
LBW or from a multiple birth,or both

Properties of Probability
1. The numerical value of a probability always lies between 0 and 1,
inclusive.
0  P(E)  1
 A value 0 means the event can not occur
 A value 1 means the event definitely will occur
 A value of 0.5 means that the probability that the event will occur is the same as
the probability that it will not occur.

2. The sum of the probabilities of all mutually exclusive outcomes is equal
to 1.
P(E1) + P(E2 ) + .... + P(En ) = 1.
3. For two mutually exclusive events A and B,
P(A or B ) = P(AUB)= P(A) + P(B).
If not mutually exclusive:
P(A or B) = P(A) + P(B) - P(A and B)

4. The complement of an event A, denoted by Ā or Ac, is the event that A
does not occur
• Consists of all the outcomes in which event A does NOT occur
P(Ā) = P(not A) = 1 – P(A)
• Ā occurs only when A does not occur.
• These are complementary events.

• In the example, the complement of A is the event that a newborn is
not LBW
• In other words, A is the event that the child weighs 2500 grams at
birth.
P(Ā) = 1 − P(A)
P(not low bwt) = 1 − P(low bwt)
= 1− 0.076
= 0.924

Basic Probability Rules
1. Addition rule
 If events A and B are mutually exclusive:
P(A or B) = P(A) + P(B)
P(A and B) = 0
 More generally:
P(A or B) = P(A) + P(B) - P(A and B)
P(event A or event B occurs or they both occur)

Example: The probabilities below represent years of
schooling completed by mothers of newborn infants

• What is the probability that a mother has
completed < 12 years of schooling?
P( 8 years) = 0.056 and
P(9-11 years) = 0.159
• Since these two events are mutually exclusive,
P( 8 or 9-11) = P( 8 U 9-11)
= P( 8) + P(9-11)
= 0.056+0.159
= 0.215

• What is the probability that a mother has completed 12 or more years of
schooling?
P(12) = P(12 or 13-15 or 16)
= P(12 U 13-15 U 16)
= P(12)+P(13-15)+P(16)
= 0.321+0.218+0.230
= 0.769

If A and B are not mutually exclusive events,
then subtract the overlapping:
P(AU B) = P(A)+P(B) − P(A ∩ B)

• The following data are the results of electrocardiograms (ECGs) and
radionuclide angiocardiograms (RAs) for 19 patients with post-traumatic
myocardial confusions.
• 7 patients developed both ECG and RA abnormality
• 17 patients developed ECG abnormal
• 9 patients developed RA abnormal
P(ECG abnormal and RA abnormal) = 7/19 = 0.37
P(ECG abnormal or RA abnormal) =
P(ECG abnormal) + P(RA abnormal) – P(Both ECG and RA abnormal) =
17/19 + 9/19 – 7/19 = 19/19 =1.
Note: The problem is that the 7 patients whose ECGs and RAs are both abnormal
are counted twice

2. Multiplication rule
• If A and B are independent events, then
P(A ∩ B) = P(A) × P(B)
• More generally,
P(A ∩ B) = P(A) P(B|A) = P(B) P(A|B)
P(A and B) denotes the probability that A and B both occur at the same time.

Conditional Probability
• Refers to the probability of an event, given that another event is known to
have occurred.
• “What happened first is assumed”
• Hint - When thinking about conditional probabilities, think in stages. Think
of the two events A and B occurring chronologically, one after the other,
either in time or space.

• The conditional probability that event B has occurred given that event A
has already occurred is denoted P(B|A) and is defined
provided that P(A) ≠ 0.

Example:
A study investigating the effect of prolonged exposure to bright light on
retina damage in premature infants.
Retinopathy
YES
Retinopathy
NO
TOTAL
Bright light
Reduced light
18
21
3
18
21
39
TOTAL 39 21 60

• The probability of developing retinopathy is:
P (Retinopathy) = No. of infants with retinopathy
Total No. of infants
= (18+21)/(21+39)
= 0.65

• We want to compare the probability of retinopathy, given that the infant
was exposed to bright light, with that the infant was exposed to reduced
light.
• Exposure to bright light and exposure to reduced light is conditioning
events, events we want to take into account when calculating conditional
probabilities.

• The conditional probability of retinopathy, given exposure to bright
light, is:
• P(Retinopathy/exposure to bright light) =
No. of infants with retinopathy exposed to bright light
No. of infants exposed to bright light
= 18/21 = 0.86

• P(Retinopathy/exposure to reduced light) =
# of infants with retinopathy exposed to reduced light
No. of infants exposed to reduced light
= 21/39 = 0.54
• The conditional probabilities suggest that premature infants exposed
to bright light have a higher risk of retinopathy than premature
infants exposed to reduced light.

 For independent events A and B
P(A/B) = P(A).
 For non-independent events A and B
P(A and B) = P(A/B) P(B)
(General Multiplication Rule)

Test for Independence
• Two events A and B are independent
if:
P(B|A)=P(B)
or
P(A and B) = P(A) • P(B)
• Two events A and B are dependent
if
P(B|A) ≠P(B)
or
P(A and B) ≠P(A) • P(B)

Example
• In a study of optic-nerve degeneration in Alzheimer’s disease,
postmortem examinations were conducted on 10 Alzheimer’s patients.
The following table shows the distribution of these patients according
to sex and evidence of optic-nerve degeneration.
• Are the events “patients has optic-nerve degeneration” and “patient is
female” independent for this sample of 10 patients?

Sex
Optic-nerve Degeneration
Present Not Present
Female 4 1
Male 4 1

Solution
• P(Optic-nerve degeneration/Female) =
No. of females with optic-nerve degeneration
No. of females
= 4/5 = 0.80
P(Optic-nerve degeneration) =
No patients with optic-nerve degeneration
Total No. of patients
= 8/10 = 0.80
The events are independent for this sample.

Exercise:
Culture and Gonodectin (GD) test results for 240 Urethral
Discharge Specimens.
GD Test Result
Culture Result
Gonorrhea No Gonorrhea Total
Positive 175 9 184
Negative 8 48 56
Total 183 57 240

1. What is the probability that a man has gonorrhea?183/240
2. What is the probability that a man has a positive GD test?184/240
3. What is the probability that a man has a positive GD test and
gonorrhea?,175/240,
4. What is the probability that a man has a negative GD test and does not
have gonorrhea? =48/240
5. What is the probability that a man with gonorrhea has a positive GD
test?175/183

6. What is the probability that a man does not have gonorrhea has a
negative GD test?48/57
7. What is the probability that a man does not have gonorrhea has a
positive GD test?9/57
8. What is the probability that a man with positive GD test has
gonorrhea? 175/184

Probability Distributions
• A probability distribution is a device used to describe the behavior
that a random variable may have by applying the theory of probability.
• It is the way data are distributed, in order to draw conclusions about a
set of data
• Random Variable = Any quantity or characteristic that is able to
assume a number of different values such that any particular outcome
is determined by chance.

• Random variables can be either discrete or continuous
• A discrete random variable is able to assume only a finite or countable
number of outcomes.
• A continuous random variable can take on any value in a specified
interval.

• With categorical variables, we obtain the frequency distribution of each
variable
• With numeric variables, the aim is to determine whether or not normality
may be assumed
• If not we may consider transforming the variable or categorize it for analysis (e g age
group)

Therefore, the probability distribution of a random variable is a table,
graph, or mathematical formula that gives the probabilities with which
the random variable takes different values or ranges of values.

A. Discrete Probability Distributions
• For a discrete random variable, the probability distribution
specifies each of the possible outcomes of the random variable
along with the probability that each will occur.
• Examples can be:
• Frequency distribution
• Relative frequency distribution
• Cumulative frequency

• We represent a potential outcome of the random variable X by x
0 ≤ P(X = x) ≤ 1
∑ P(X = x) = 1

The following data shows the number of diagnostic services a
patient receives

• What is the probability that a patient receives exactly 3
diagnostic services?
P(X=3) = 0.031
• What is the probability that a patient receives at most one
diagnostic service?
P (X≤1) = P(X = 0) + P(X = 1)
= 0.671 + 0.229
= 0.900

• What is the probability that a patient receives at least four diagnostic
services?
P (X≥4) = P(X = 4) + P(X = 5)
= 0.010 + 0.006
= 0.016

Probability distributions can also be displayed using a graph
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 1 2 3 4 5
No. of diagnostic services, x
Probability,
X=x
10/4/2023 By Degemu S (MPH) 100

The Expected Value of a Discrete Random variable
• If a random variable is able to take on a large number of values, then
a probability mass function might not be the most useful way to
summarize its behavior
• Instead, measures of location and dispersion can be calculated (as
long as the data are not categorical)
10/4/2023 By Degemu S (MPH) 101

• The average value assumed by a random variable is
called its expected value, or the population mean
• It is represented by E(X) or µ
• To obtain the expected value of a discrete random
variable X, we multiply each possible outcome by its
associated probability and sum all values with a
probability greater than 0
10/4/2023 By Degemu S (MPH) 102

• For the diagnostic service data:
Mean (X) = 0(0.671) +1(0.229) +2(0.053)
+3(0.031) +4(0.010) +5(0.006)
= 0.498 ≈ 0.5
• We would expect an average of 0.5 services for each
visit
10/4/2023 By Degemu S (MPH) 103

• The variance of a random variable X is called the population
variance(standard deviation ) and is represented by Var(X) or 2
• It quantifies the dispersion of the possible outcomes of X around the
expected value μ
The Variance of a Discrete Random Variable
10/4/2023 By Degemu S (MPH) 104

σ2 = ∑(xi-µ)2P(X=xi)
= (0− 0.5)2(0.671) +(1 − 0.5)2(0.229)
+(2 − 0.5)2(0.053) +(3 − 0.5)2(0.031)
+(4 − 0.5)2(0.010) +(5 − 0.5)2(0.006)
= 0.782
Standard deviation = σ = √0.782 = 0.884
10/4/2023 By Degemu S (MPH) 105

10/4/2023 By Degemu S (MPH) 106
Binomial and Poisson Distribution

1. Binomial Distribution
• It is one of the most widely encountered discrete probability distributions.
• Consider a dichotomous (binary) random variable
• Is based on the Bernoulli trial
• When a single trial of an experiment can result in only one of two mutually
exclusive outcomes (success or failure; dead or alive; sick or well, male or female)
10/4/2023 By Degemu S (MPH) 107

Example:
• We are interested in determining whether a newborn infant will survive
until his/her 70th birthday
• Let Y represent the survival status of the
child at age 70 years
• Y = 1 if the child survives and Y = 0 if he/she does not
10/4/2023 By Degemu S (MPH) 108

•The outcomes are mutually exclusive and exhaustive
•Suppose that 72% of infants born survive to age 70
years
P(Y = 1) = p = 0.72
P(Y = 0) = 1 − p = 0.28
10/4/2023 By Degemu S (MPH) 109

10/4/2023 By Degemu S (MPH) 110

A binomial probability distribution occurs when the following
requirements are met.
1. The procedure has a fixed number of trials.
2. The trials must be independent.
3. Each trial must have all outcomes that fall into two categories.
4. The probabilities must remain constant for each trial
[P(success) = p].
10/4/2023 By Degemu S (MPH) 111

Characteristics of a Binomial Distribution
• The experiment consists of n identical trials.
• Only two possible outcomes on each trial.
• The probability of A (success), denoted by p, remains the same from trial
to trial. The probability of B (failure), denoted by q,
q = 1- p.
• The trials are independent.
• n and  are the parameters of the binomial distribution.
• The mean is n and the variance is n(1- )
10/4/2023 By Degemu S (MPH) 112

• Suppose an event can have only binary outcomes A and B.
• Let the probability of A is  and that of B is 1 - .
• The probability  stays the same each time the event occurs.
10/4/2023 By Degemu S (MPH) 113

• If an experiment is repeated n times and the outcome is
independent from one trial to another, the probability that
outcome A occurs exactly x times is:
• P (X=x) = , x = 0, 1, 2, ..., n.
=
10/4/2023 By Degemu S (MPH) 114

• n denotes the number of fixed trials
• x denotes the number of successes in
the n trials
• p denotes the probability of success
• q denotes the probability of failure (1- p)
=
• Represents the number of ways of selecting x objects out of n where the
order of selection does not matter.
• where n!=n(n-1)(n-2)…(1) , and 0!=1
10/4/2023 By Degemu S (MPH) 115

Example:
• Suppose we know that 40% of a certain population are cigarette smokers.
If we take a random sample of 10 people from this population, what is the
probability that we will have exactly 4 smokers in our sample?
10/4/2023 By Degemu S (MPH) 116

• If the probability that any individual in the population is a smoker to
be P=.40, then the probability that x=4 smokers out of n=10 subjects
selected is:
P(X=4) =10C4(0.4)4
(1-0.4)10-4
= 10C4(0.4)4
(0.6)6
= 210(.0256)(.04666)
= 0.25
• The probability of obtaining exactly 4 smokers in the sample is about
0.25.
10/4/2023 By Degemu S (MPH) 117

• We can compute the probability of observing zero smokers out of 10
subjects selected at random, exactly 1 smoker, and so on, and display the
results in a table, as given, below.
• The third column, P(X ≤ x), gives the cumulative probability. E.g. the
probability of selecting 3 or fewer smokers into the sample of 10 subjects
is
P(X ≤ 3) =.3823, or about 38%.
10/4/2023 By Degemu S (MPH) 118

10/4/2023 By Degemu S (MPH) 119

The probability in the above table can be converted into
the following graph
0
0.05
0.1
0.15
0.2
0.25
0.3
0 1 2 3 4 5 6 7 8 9 10
No. of Smokers
Probability
10/4/2023 By Degemu S (MPH) 120

Exercise
Each child born to a particular set of parents
has a probability of 0.25 of having blood type
O. If these parents have 5 children.
What is the probability that
a. Exactly two of them have blood type O
b. At most 2 have blood type O
c. At least 4 have blood type O
d. 2 do not have blood type O.
10/4/2023 By Degemu S (MPH) 121

Solution for ‘a’
a.)
2637
.
0
)
75
.
0
(
)
25
.
0
(
2
5
=
2)
P(x 2
-
5
2








10/4/2023 By Degemu S (MPH) 122

The Mean and Variance of a Binomial
Distribution
• Once n and P are specified, we can compute the proportion of
success,
P = x/n
• and the mean and variance of the distribution are given by :
E(X) = μ = np, σ2 = npq, σ = √npq
10/4/2023 By Degemu S (MPH) 123

Example:
• 70% of a certain population has been immunized for polio. If a
sample of size 50 is taken, what is the “expected total number”, in the
sample who have been immunized?
µ = np = 50(.70) = 35
• This tells us that “on the average” we expect to see 35 immunized
subjects in a sample of 50 from this population.
10/4/2023 By Degemu S (MPH) 124

• If repeated samples of size 10 are selected from the population of
infants born, the mean number of children per sample who
survive to age 70 would be
µ = np = (10)(0.72) = 7.2
• The variance would be npq = (10)(0.72)(0.28) = 2.02 and the SD
would be
√2.02 = 1.42
10/4/2023 By Degemu S (MPH) 125

2. The Poisson Distribution
• Is a discrete probability distribution used to model the number of
occurrences of an event that takes place infrequently in time or space
• Applicable for counts of events over a given interval of time, for
example:
• number of patients arriving at an emergency department in a day
• number of new cases of HIV diagnosed at a clinic in a month
10/4/2023 By Degemu S (MPH) 126

• In such cases, we take a sample of days and observe the number of patients
arriving at the emergency department on each day,
• or a sample of months and observe the number of new cases of HIV
diagnosed at the clinic.
• We are observing a count or number of events, rather than a yes/no or
success/ failure outcome for each subject or trial, as in the binomial.
10/4/2023 By Degemu S (MPH) 127

• In theory, a random variable X is a count that can assume any
integer value greater than or equal to 0
10/4/2023 By Degemu S (MPH) 128

• Suppose events happen randomly and independently in time at a
constant rate. If events happen with rate  events per unit time, the
probability of x events happening in unit time is:
P(x) =
e
x!
x
 

10/4/2023 By Degemu S (MPH) 129

• where x = 0, 1, 2, . . .∞
• x is a potential outcome of X
• The constant λ (lambda) represents the rate at which the event occurs, or
the expected number of events per unit time
• e = 2.71828
• It depends up on just one parameter, which is the µ number of occurrences
(λ).
10/4/2023 By Degemu S (MPH) 130

• Three assumptions must be met for a Poisson distribution to apply:
1. The probability that a single event occurs within a given small
subinterval is proportional to the length of the subinterval
P(event) ≈ λΔt for constant λ
2. The rate at which the event occurs is constant over the entire
interval t
3. Events occurring in consecutive subintervals are independent of
each other
10/4/2023 By Degemu S (MPH) 131

Example
• The daily number of new registrations of cancer is 2.2 on
average.
What is the probability of
a) Getting no new cases
b) Getting 1 case
c) Getting 2 cases
d) Getting 3 cases
e) Getting 4 cases
10/4/2023 By Degemu S (MPH) 132

Solutions
a)
b) P(X=1) = 0.244
c) P(X=2) = 0.268
d) P(X=3) = 0.197
e) P(X=4) = 0.108
111
.
0
!
0
)
2
.
2
(
)
0
(
2
.
2
0




e
X
P
10/4/2023 By Degemu S (MPH) 133

0 1 2 3 4 5 6 7
0.3
0.2
0.1
0.0
Probability
Poisson distribution with mean 2.2
10/4/2023 By Degemu S (MPH) 134

Example:
• In a given geographical area, cases of tetanus are reported at a rate of
λ = 4.5/month
• What is the probability that 0 cases of tetanus will be reported in a
given month?
10/4/2023 By Degemu S (MPH) 135

• What is the probability that 1 case of tetanus will be
reported?
10/4/2023 By Degemu S (MPH) 136

Characteristics
• The Poisson distribution is very asymmetric when its mean is small
• With large means it becomes nearly symmetric
• It has no theoretical maximum value, but the probabilities tail off towards
zero very quickly
•  is the parameter of the Poisson distribution
• The mean is  and the variance is also .
10/4/2023 By Degemu S (MPH) 137

B. Continuous Probability Distributions
• A continuous random variable X can take on any value in a specified
interval or range
• With a large number of class intervals, the frequency polygon begins to
resemble a smooth curve.
• The probability distribution of X is represented by a smooth curve called
a probability density function
10/4/2023 By Degemu S (MPH) 138

• The area under the smooth curve is equal to 1
• The area under the curve between any two points x1 and x2 is the
probability that X takes a value between x1 and x2
Distribution of serum
triglyceride
10/4/2023 By Degemu S (MPH) 139

• Instead of assigning probabilities to specific outcomes of the random
variable X, probabilities are assigned to ranges of values
• The probability associated with any one particular value is equal to 0
• Therefore, P(X=x) = 0
• Also, P(X ≥ x) = P(X > x)
10/4/2023 By Degemu S (MPH) 140

• We calculate:
Pr [ a < X < b], the probability of an
interval of values of X.
• For the above reason,
• is also without meaning.
10/4/2023 By Degemu S (MPH) 141

The Normal distribution
• The ND is the most important probability distribution in statistics
• Frequently called the “Gaussian distribution” or bell-shape curve.
• Variables such as blood pressure, weight, height, serum
cholesterol level, and IQ score — are approximately normally
distributed
10/4/2023 By Degemu S (MPH) 142

A random variable is said to have a normal distribution if it has a
probability distribution that is symmetric and bell-shaped
10/4/2023 By Degemu S (MPH) 143

• The ND is vital to statistical work, most estimation procedures and
hypothesis tests underlie ND
• The concept of “probability of X=x” in the discrete probability
distribution is replaced by the “probability density function f(x).
• The ND is also an approximating distribution to other distributions
(e.g., binomial)
10/4/2023 By Degemu S (MPH) 144

• A random variable X is said to follow ND, if and only
if, its probability density function is:
, - < x < .
f(x) =
1
2
e
x-
2
 









1
2
10/4/2023 By Degemu S (MPH) 145

π (pi) = 3.14159
e = 2.71828, x = Value of X
Range of possible values of X: -∞ to +∞
µ = Expected value of X (“the long run average”)
σ2 = Variance of X.
µ and σ are the parameters of the normal distribution — they
completely define its shape
10/4/2023 By Degemu S (MPH) 146

10/4/2023 By Degemu S (MPH) 147

1. The mean µ tells you about location -
• Increase µ - Location shifts right
• Decrease µ – Location shifts left
• Shape is unchanged
2. The variance σ2 tells you about narrowness or flatness of
the bell -
• Increase σ2 - Bell flattens. Extreme values are more likely
• Decrease σ2 - Bell narrows. Extreme values are less likely
• Location is unchanged
10/4/2023 By Degemu S (MPH) 148

10/4/2023 By Degemu S (MPH) 149

Properties of the Normal Distribution(ND)
1. It is symmetrical about its mean, .
2. The mean, the median and mode are almost equal. It is unimodal.
3. The total area under the curve about the x-axis is 1 square unit.
4. The curve never touches the x-axis.
5. As the value of  increases, the curve becomes more and more flat and
vice versa.
10/4/2023 By Degemu S (MPH) 150

6. Perpendiculars of:
± 1SD contain about 68%;
±2 SD contain about 95%;
±3 SD contain about 99.7%
of the area under the curve.
Next slide
7. The distribution is completely determined by the parameters  and .
10/4/2023 By Degemu S (MPH) 151

10/4/2023 By Degemu S (MPH) 152

• We have different normal distributions depending on the values of
μ and σ2.
• We cannot tabulate every possible distribution
• Tabulated normal probability calculations are available only for
the ND with µ = 0 and σ2=1.
10/4/2023 By Degemu S (MPH) 153

Standard Normal Distribution
 It is a normal distribution that has a mean equal to 0 and a SD equal to
1, and is denoted by N(0, 1).
 The main idea is to standardize all the data that is given by using Z-
scores.
 These Z-scores can then be used to find the area (and thus the
probability) under the normal curve.
10/4/2023 By Degemu S (MPH) 154

The standard normal distribution has
mean 0 and variance 1
• Approximately 68% of the area under the standard normal curve lies
between ±1, about 95% between ±2, and about 99% between ±2.5
10/4/2023 By Degemu S (MPH) 155

Z - Transformation
• If a random variable X~N(,) then we can transform it to a SND
with the help of Z-transformation
Z = x - 

• Z represents the Z-score for a given x value
10/4/2023 By Degemu S (MPH) 156

• Consider redefining the scale to be in terms of how many SDs
away from mean for normal distribution, μ=110 and σ=15.
Value x
50 65 80 95 110 125 140 155 170
-4 -3 -2 -1 0 1 2 3 4
SDs from mean using
(x-110)/15 = (x-μ)/σ
10/4/2023 By Degemu S (MPH) 157

• This process is known as standardization and gives the position on a
normal curve with μ = 0 and σ =1, i.e., the SND, Z.
• A Z-score is the number of standard deviations that a given x value is
above or below the mean.
10/4/2023 By Degemu S (MPH) 158

Finding normal curve areas
1. The table gives areas between -∞ and the value of zo.
2. Find the z value in tenths in the column at left margin and locate its
row. Find the hundredth place in the appropriate column.
3. Read the value of the area (P) from the body of the table where the
row and column intersect. Values of P are in the form of a decimal
point and four places.
10/4/2023 By Degemu S (MPH) 159

Some Useful Tips
10/4/2023 By Degemu S (MPH) 160

a) What is the probability that z < -1.96?
(1) Sketch a normal curve
(2) Draw a perpendicular line for z = -1.9
(3) Find the area in the table
(4) The answer is the area to the left of the line P(z < -1.96) = 0.0250
10/4/2023 By Degemu S (MPH) 161

10/4/2023 By Degemu S (MPH) 162

b) What is the probability that -1.96 < z < 1.96?
The area between the values P(-1.96 < z < 1.96)
= .9750 - .0250 = .9500
10/4/2023 By Degemu S (MPH) 163

c) What is the probability that z > 1.96?
• The answer is the area to the right of the line; found by subtracting table
value from 1.0000; P(z > 1.96) =1.0000 - .9750 = .0250
10/4/2023 By Degemu S (MPH) 164

10/4/2023 By Degemu S (MPH) 165

Exercise
1. Compute P(-1 ≤ Z ≤ 1.5)
2. Find the area under the SND from 0 to 1.45
3. Compute P(-1.66 < Z < 2.85)
10/4/2023 By Degemu S (MPH) 166
Ans: 0.7745
Ans: 0.4265
Ans: 0.9493

Applications of the Normal Distribution
• The ND is used as a model to study many different variables.
• The ND can be used to answer probability questions about continuous
random variables.
• Following the model of the ND, a given value of x must be converted to a
z score before it can be looked up in the z table.
10/4/2023 By Degemu S (MPH) 167

Example:
• The diastolic blood pressures of males 35–44 years of age are normally
distributed with µ = 80 mm Hg and σ2 = 144 mm Hg2
σ = 12 mm Hg
• Therefore, a DBP of 80+12 = 92 mm Hg lies 1 SD above the mean
• Let individuals with BP above 95 mm Hg are considered to be
hypertensive
10/4/2023 By Degemu S (MPH) 168

a. What is the probability that a randomly selected male has a BP above 95
mm Hg?
• Approximately 10.6% of this population would be classified as
hypertensive.
10/4/2023 By Degemu S (MPH) 169

b. What is the probability that a randomly selected male has a DBP above
110 mm Hg?
Z = 110 – 80 = 2.50
12
P (Z > 2.50) = 0.0062
• Approximately 0.6% of the population has a DBP above 110 mm Hg
10/4/2023 By Degemu S (MPH) 170

c. What is the probability that a randomly selected male has a DBP below 60
mm Hg?
Z = 60 – 80 = -1.67
12
P (Z < -1.67) = 0.0475
• Approximately 4.8% of the population has a DBP below 60 mm Hg
10/4/2023 By Degemu S (MPH) 171

d. What value of DBP cuts off the upper 5% of this population?
• Looking at the table, the value Z = 1.645 cuts off an area of 0.05 in the
upper tail
• We want the value of X that corresponds to Z = 1.645
Z = X – μ
σ
1.645 = X – μ, X = 99.7
σ
• Approximately 5% of the men in this population have a DBP greater than
99.7 mm Hg
10/4/2023 By Degemu S (MPH) 172

Chapter -Sampling
Sampling distribution
And
10/4/2023 By Degemu S (MPH) 173

Sampling
10/4/2023 By Degemu S (MPH) 174

• Researchers often use sample survey methodology to obtain
information about a larger population by selecting and measuring a
sample from that population.
• Since population is too large, we rely on the information collected
from the sample.
10/4/2023 By Degemu S (MPH) 175

• Inferences about the population are based on the information from the
sample drawn from that population.
• However, due to the variability in the characteristics of the population,
scientific sample designs should be applied to select a representative
sample.
• If not, there is a high risk of distorting the view of the population.
10/4/2023 By Degemu S (MPH) 176

• A sample is a collection of individuals selected from a larger population.
• For example, we may have a single sample composed of 50 cases,
representing a population of 1000 individuals.
10/4/2023 By Degemu S (MPH) 177

• Sampling enables us to estimate the characteristic of a population by
directly observing a portion of the population.
• Researchers are not interested in the sample itself, but in what can be
learned from the sample—and how this information can be applied
to the entire population.
10/4/2023 By Degemu S (MPH) 178

Sample Information
Population
10/4/2023 By Degemu S (MPH) 179

• Therefore, it is essential that a sample should be correctly defined
and organized.
• If the wrong questions are posed to the wrong people, reliable
information will not be received and lead to a wrong conclusion
when applied to the entire population.
10/4/2023 By Degemu S (MPH) 180

Steps needed to select a sample and ensure that this sample will
fulfill its goals.
1. Establish the study's objectives
• The first step in planning a useful and efficient survey is to specify the
objectives with as much detail as possible.
• Without objectives, the survey is unlikely to generate valuable results.
• Clarifying the aims of the survey is critical to its ultimate success.
• The initial users and uses of the data should be identified at this stage.
10/4/2023 By Degemu S (MPH) 181

2. Define the target population
• The target population is the total population for which the information is
required.
• Specifically, the target population is defined by the following characteristics:
• Nature of data required
• Geographic location
• Reference period
• Other characteristics, such as socio-demographic characteristics
10/4/2023 By Degemu S (MPH) 182

3. Decide on the data to be collected
• The data requirements of the survey must be established.
• To ensure that the requirements are operationally sound, the necessary data
terms and definitions also need to be determined.
10/4/2023 By Degemu S (MPH) 183

4. Set the level of precision
• There is a level of uncertainty associated with estimates
coming from a sample.
• The sample-to-sample variation is what causes the
sampling error.
• Researchers can estimate the sampling error associated
with a particular sampling plan, and try to minimize it.
10/4/2023 By Degemu S (MPH) 184

5. Decide on the methods on measurement
• Choose measuring instrument and method of approach to the population
• Data about a person’s state of health may be obtained from statements that
he/she makes or from a medical examination
• The survey may employ a self-administered questionnaire, an interviewing
10/4/2023 By Degemu S (MPH) 185

6. Preparing Frame
• List of all members of the population
• The elements must not overlap
10/4/2023 By Degemu S (MPH) 186

The sample design
• Sample design: how the sample will be collected.
• Estimation techniques: how the results from the sample will be extended to the
whole population.
• Measures of precision: how the sampling error will be measured.
10/4/2023 By Degemu S (MPH) 187

Other Considerations
• Sample size determination
• Questionnaire development
• Pretest
• Organization of the field work
• Data collection
• Summary and analysis of the data
• Edit the completed questionnaires
• Decide on computation procedures
10/4/2023 By Degemu S (MPH) 188

Sampling theory in public health
• A health survey (sampling) is a planned study to investigate the health
characteristics of a population
10/4/2023 By Degemu S (MPH) 189

A health survey is used to:
• Measure the total amount of illness in the population;
• Measure the amount of illness caused by a specified disease;
• Examine the utilization of existing health care facilities and demand
for new ones;
• Measure the distribution of a particular characteristic, e.g.. breast-
feeding practice in the population;
• Examine the role and relationship of one or more factors in the
etiology of a disease.
10/4/2023 By Degemu S (MPH) 190

Sampling
• The process of selecting a portion of the population to represent the
entire population.
• A main concern in sampling:
• Ensure that the sample represents the population, and
• The findings can be generalized.
10/4/2023 By Degemu S (MPH) 191

Advantages of sampling:
• Feasibility: Sampling may be the only feasible method of collecting
information.
• Reduced cost: Sampling reduces demands on resource such as finance,
personnel, and material.
• Greater accuracy: Sampling may lead to better accuracy of collecting
data
• Sampling error: Precise allowance can be made for sampling error
• Greater speed: Data can be collected and summarized more quickly
10/4/2023 By Degemu S (MPH) 192

Disadvantages of sampling:
• There is always a sampling error.
• Sampling may create a feeling of discrimination within the
population.
• Sampling may be inadvisable where every unit in the population is
legally required to have a record.
10/4/2023 By Degemu S (MPH) 193

Errors in sampling
1) Sampling error: Errors introduced due to errors in the selection of
a sample.
• They cannot be avoided or totally eliminated.
2) Non-sampling error:
- Observational error
- Respondent error
- Lack of preciseness of definition
- Errors in editing and tabulation of data
10/4/2023 By Degemu S (MPH) 194

Random number table
• It is a table of random numbers constructed by a process that
1. In any position in the table, each of the numbers 0 through 9 has a probability
1/10 of occurring.
2. The occurrence of any number in one part of the table is independent of the
occurrence of any number in any other part of the table.
10/4/2023 By Degemu S (MPH) 195

Sampling Methods
Two broad divisions:
A. Probability sampling methods
B. Non-probability sampling methods
10/4/2023 By Degemu S (MPH) 196

A. Probability sampling
• Involves random selection of a sample
• A sample is obtained in a way that ensures every member of the
population to have a known, non zero probability of being included in the
sample.
• Involves the selection of a sample from a population, based on chance.
10/4/2023 By Degemu S (MPH) 197

• Probability sampling is:
• more complex,
• more time-consuming and
• usually more costly than non-probability
sampling.
• However, because study samples are randomly selected and their
probability of inclusion can be calculated,
• reliable estimates can be produced and
• inferences can be made about the population.
10/4/2023 By Degemu S (MPH) 198

• There are several different ways in which a probability sample
can be selected.
• The method chosen depends on a number of factors, such as
• the available sampling frame,
• how spread out the population is,
• how costly it is to survey members of the population
10/4/2023 By Degemu S (MPH) 199

• When choosing a probability sample design,
• Our goal should be to minimize the sampling error of the
estimates for the most important survey variables,
• While simultaneously minimizing the time and cost of
conducting the survey.
10/4/2023 By Degemu S (MPH) 200

Most common probability
sampling methods
1. Simple random sampling
2. Systematic random sampling
3. Sampling with probability proportional to size
4. Stratified random sampling
5. Cluster sampling
6. Multi-stage sampling
10/4/2023 By Degemu S (MPH) 201

1. Simple random sampling
• Involves random selection
• Each member of a population has an equal chance of
being included in the sample.
10/4/2023 By Degemu S (MPH) 202

• To use a SRS method:
• Make a numbered list of all the units in the population
• Each unit should be numbered from 1 to N (where N is the
size of the population)
• Select the required number.
10/4/2023 By Degemu S (MPH) 203

• The randomness of the sample is ensured by:
• use of “lottery’ methods
• a table of random numbers
10/4/2023 By Degemu S (MPH) 204

Example
• Suppose your school has 500 students and you need to conduct a
short survey on the quality of the food served in the cafeteria.
• You decide that a sample of 10 students should be sufficient for your
purposes.
• In order to get your sample, you assign a number from 1 to 500 to
each student in your school.
10/4/2023 By Degemu S (MPH) 205

• To select the sample, you use a table of randomly generated numbers.
• Pick a starting point in the table (a row and column number) and look
at the random numbers that appear there. In this case, since the data
run into three digits, the random numbers would need to contain three
digits as well.
10/4/2023 By Degemu S (MPH) 206

• Ignore all random numbers after 500 because they do not correspond to
any of the students in the school.
• Remember that the sample is without replacement, so if a number recurs,
skip over it and use the next random number.
• The first 10 different numbers between 001 and 500 make up your
sample.
10/4/2023 By Degemu S (MPH) 207

• SRS has certain limitations:
• Requires a sampling frame.
• Difficult if the reference population is dispersed.
• Minority subgroups of interest may not be selected.
10/4/2023 By Degemu S (MPH) 208

2. Systematic random sampling
• Sometimes called interval sampling, systematic
sampling means that there is a gap, or interval,
between each selected unit in the sample
• The selection is systematic rather than randomly
10/4/2023 By Degemu S (MPH) 209

• Important if the reference population is arranged
in some order:
• Order of registration of patients
• Numerical number of house numbers
• Student’s registration books
• Taking individuals at fixed intervals (every kth)
based on the sampling fraction, eg. if the sample
includes 20%, then every fifth.
10/4/2023 By Degemu S (MPH) 210

Steps in systematic random sampling
1. Number the units on your frame from 1 to N (where
N is the total population size).
2. Determine the sampling interval (K) by dividing the
number of units in the population by the desired
sample size.
10/4/2023 By Degemu S (MPH) 211

3. Select a number between one and K at random.
This number is called the random start and would
be the first number included in your sample.
4. Select every Kth unit after that first number
Note: Systematic sampling should not be used when
a cyclic repetition is inherent in the sampling
frame.
10/4/2023 By Degemu S (MPH) 212

Example
• To select a sample of 100 from a population of 400,
you would need a sampling interval of 400 ÷ 100 = 4.
• Therefore, K = 4.
• You will need to select one unit out of every four
units to end up with a total of 100 units in your
sample.
• Select a number between 1 and 4 from a table of
random numbers.
10/4/2023 By Degemu S (MPH) 213

• If you choose 3, the third unit on your frame would be
the first unit included in your sample;
• The sample might consist of the following units to
make up a sample of 100: 3 (the random start), 7, 11,
15, 19...395, 399 (up to N, which is 400 in this case).
10/4/2023 By Degemu S (MPH) 214

• Using the above example, you can see that with a
systematic sample approach there are only four
possible samples that can be selected,
corresponding to the four possible random starts:
A. 1, 5, 9, 13...393, 397
B. 2, 6, 10, 14...394, 398
C. 3, 7, 11, 15...395, 399
D. 4, 8, 12, 16...396, 400
10/4/2023 By Degemu S (MPH) 215

• Each member of the population belongs to only one
of the four samples and each sample has the same
chance of being selected.
• The main difference with SRS, any combination of
100 units would have a chance of making up the
sample, while with systematic sampling, there are
only four possible samples.
10/4/2023 By Degemu S (MPH) 216

3. Sampling with probability
proportional to size
• Probability sampling requires that each member of the
survey population has a chance of being included in
the sample, but it does not require that this chance be
the same for everyone.
10/4/2023 By Degemu S (MPH) 217

• If information is available on the frame about the size of
each unit and if those units vary in size, this information
can be used in the sampling selection in order to
increase the efficiency.
• This is known as sampling with probability
proportional to size (PPS).
10/4/2023 By Degemu S (MPH) 218

• With this method, the bigger the size of the unit, the
higher the chance it has of being included in the
sample.
• For this method to achieve increased efficiency, the
measure of size needs to be accurate.
10/4/2023 By Degemu S (MPH) 219

Steps in PPS
• List all Kebeles/clusters with their population size
• Calculate the cumulative frequency
• Calculate the sampling interval by dividing the total
population size by the sample size, say K
• Randomly choose a number between 1 and K, say j
• Kebeles/clusters with cumulative frequency contacting the
jth, (j+1)th, ….(j+(k-1))th will be included in the sample
10/4/2023 By Degemu S (MPH) 220

4. Stratified random sampling
• It is done when the population is known to be have
heterogeneity with regard to some factors and those
factors are used for stratification
• Using stratified sampling, the population is divided into
homogeneous, mutually exclusive groups called strata,
and
• A population can be stratified by any variable that is
available for all units prior to sampling (e.g., age, sex,
province of residence, income, etc.).
10/4/2023 By Degemu S (MPH) 221

• A separate sample is taken independently from each
stratum.
• Any of the sampling methods mentioned in this
section (and others that exist) can be used to sample
within each stratum.
10/4/2023 By Degemu S (MPH) 222

Why do we need to create strata?
• That it can make the sampling strategy more efficient.
• A larger sample is required to get a more accurate estimation if a
characteristic varies greatly from one unit to the other.
• For example, if every person in a population had the same salary, then
a sample of one individual would be enough to get a precise estimate
of the average salary.
10/4/2023 By Degemu S (MPH) 223

• This is the idea behind the efficiency gain obtained with
stratification.
• If you create strata within which units share similar characteristics (e.g.,
income) and are considerably different from units in other strata (e.g.,
occupation, type of dwelling) then you would only need a small sample from
each stratum to get a precise estimate of total income for that stratum.
10/4/2023 By Degemu S (MPH) 224

• Then you could combine these estimates to get a precise estimate
of total income for the whole population.
• If you use a SRS approach in the whole population without
stratification, the sample would need to be larger than the
total of all stratum samples to get an estimate of total
income with the same level of precision.
10/4/2023 By Degemu S (MPH) 225

• Stratified sampling ensures an adequate sample size for sub-
groups in the population of interest.
• When a population is stratified, each stratum becomes an
independent population and you will need to decide the sample
size for each stratum.
10/4/2023 By Degemu S (MPH) 226

• Equal allocation:
• Allocate equal sample size to each stratum
• Proportionate allocation:
, j = 1, 2, ..., k where, k is
the number of strata and
• nj is sample size of the jth stratum
• Nj is population size of the jth stratum
• n = n1 + n2 + ...+ nk is the total sample size
• N = N1 + N2 + ...+ Nk is the total population
size
n
n
N
N
j j

10/4/2023 By Degemu S (MPH) 227

5. Cluster sampling
• Sometimes it is too expensive to spread a sample across the population
as a whole.
• Travel costs can become expensive if interviewers have to survey people
from one end of the country to the other.
• To reduce costs, researchers may choose a cluster sampling technique
• The clusters should be homogeneous, unlike stratified sampling where
by the strata are heterogeneous
10/4/2023 By Degemu S (MPH) 228

Steps in cluster sampling
• Cluster sampling divides the population into groups or
clusters.
• A number of clusters are selected randomly to
represent the total population, and then all units within
selected clusters are included in the sample.
• No units from non-selected clusters are included in the
sample—they are represented by those from selected
clusters.
• This differs from stratified sampling, where some
units are selected from each group.
10/4/2023 By Degemu S (MPH) 229

Example
• In a school based study, we assume students of the same
school are homogeneous.
• We can select randomly sections and include all students
of the selected sections only
10/4/2023 By Degemu S (MPH) 230

• As mentioned, cost reduction is a reason for using
cluster sampling.
• It creates 'pockets' of sampled units instead of
spreading the sample over the whole territory.
• Another reason is that sometimes a list of all units in
the population is not available, while a list of all
clusters is either available or easy to create.
10/4/2023 By Degemu S (MPH) 231

• In most cases, the main drawback is a loss of efficiency
when compared with SRS.
• It is usually better to survey a large number of small
clusters instead of a small number of large clusters.
• This is because neighboring units tend to be more alike,
resulting in a sample that does not represent the whole
spectrum of opinions or situations present in the overall
population.
10/4/2023 By Degemu S (MPH) 232

• Another drawback to cluster sampling is that you do not have
total control over the final sample size.
• Since not all schools have the same number of (say Grade 11)
students and city blocks do not all have the same number of
households, and you must interview every student or household
in your sample, as an example, the final size may be larger or
smaller than you expected.
10/4/2023 By Degemu S (MPH) 233

6. Multi-stage sampling
• Similar to the cluster sampling, except that it involves
picking a sample from within each chosen cluster,
rather than including all units in the cluster.
• This type of sampling requires at least two stages.
10/4/2023 By Degemu S (MPH) 234

• In the first stage, large groups or clusters are identified and selected.
These clusters contain more population units than are needed for the
final sample.
• In the second stage, population units are picked from within the
selected clusters (using any of the possible probability sampling
methods) for a final sample.
10/4/2023 By Degemu S (MPH) 235

• If more than two stages are used, the process of choosing population units
within clusters continues until there is a final sample.
• With multi-stage sampling, you still have the benefit of a more
concentrated sample for cost reduction.
• However, the sample is not as concentrated as other clusters and the
sample size is still bigger than for a simple random sample size.
10/4/2023 By Degemu S (MPH) 236

• Also, you do not need to have a list of all of the units in the
population. All you need is a list of clusters and list of the units in the
selected clusters.
• Admittedly, more information is needed in this type of sample than
what is required in cluster sampling. However, multi-stage sampling
still saves a great amount of time and effort by not having to create a
list of all the units in a population.
10/4/2023 By Degemu S (MPH) 237

B. Non-probability sampling
• The difference between probability and non-probability sampling has
to do with a basic assumption about the nature of the population under
study.
• In probability sampling, every item has a known chance of being
selected.
• In non-probability sampling, there is an assumption that there is an even
distribution of a characteristic of interest within the population.
10/4/2023 By Degemu S (MPH) 238

• This is what makes the researcher believe that any sample would be
representative and because of that, results will be accurate.
• For probability sampling, random is a feature of the selection process,
rather than an assumption about the structure of the population.
10/4/2023 By Degemu S (MPH) 239

• In non-probability sampling, since elements are chosen arbitrarily,
there is no way to estimate the probability of any one element being
included in the sample.
• Also, no assurance is given that each item has a chance of being
included, making it impossible either to estimate sampling variability
or to identify possible bias
10/4/2023 By Degemu S (MPH) 240

• Reliability cannot be measured in non-probability sampling; the only way
to address data quality is to compare some of the survey results with
available information about the population.
• Still, there is no assurance that the estimates will meet an acceptable level
of error.
• Researchers are reluctant to use these methods because there is no way to
measure the precision of the resulting sample.
10/4/2023 By Degemu S (MPH) 241

• Despite these drawbacks, non-probability sampling methods can be useful
when descriptive comments about the sample itself are desired.
• Secondly, they are quick, inexpensive and convenient.
• There are also other circumstances, such as researches, when it is
unfeasible or impractical to conduct probability sampling.
10/4/2023 By Degemu S (MPH) 242

The most common types of non-probability
sampling
1. Convenience or haphazard sampling
2. Volunteer sampling
3. Judgment sampling
4. Quota sampling
5. Snowball sampling technique
10/4/2023 By Degemu S (MPH) 243

1. Convenience or haphazard sampling
• Convenience sampling is sometimes referred to as haphazard or
accidental sampling.
• It is not normally representative of the target population because
sample units are only selected if they can be accessed easily and
conveniently.
10/4/2023 By Degemu S (MPH) 244

• The obvious advantage is that the method is easy to use, but that
advantage is greatly offset by the presence of bias.
• Although useful applications of the technique are limited, it can deliver
accurate results when the population is homogeneous.
10/4/2023 By Degemu S (MPH) 245

• For example, a scientist could use this method to determine whether a
lake is polluted or not.
• Assuming that the lake water is well-mixed, any sample would yield
similar information.
• A scientist could safely draw water anywhere on the lake without
bothering about whether or not the sample is representative
10/4/2023 By Degemu S (MPH) 246

2. Volunteer sampling
• As the term implies, this type of sampling occurs when people volunteer
to be involved in the study.
• In psychological experiments or pharmaceutical trials (drug testing), for
example, it would be difficult and unethical to enlist random participants
from the general public.
• In these instances, the sample is taken from a group of volunteers.
10/4/2023 By Degemu S (MPH) 247

• Sometimes, the researcher offers payment to attract respondents.
• In exchange, the volunteers accept the possibility of a lengthy,
demanding or sometimes unpleasant process.
10/4/2023 By Degemu S (MPH) 248

• Sampling voluntary participants as opposed to the general population
may introduce strong biases.
• Often in opinion polling, only the people who care strongly enough
about the subject tend to respond.
• The silent majority does not typically respond, resulting in large
selection bias.
10/4/2023 By Degemu S (MPH) 249

3. Judgment sampling
• This approach is used when a sample is taken based on certain judgments
about the overall population.
• The underlying assumption is that the investigator will select units that are
characteristic of the population.
• The critical issue here is objectivity: how much can judgment be relied
upon to arrive at a typical sample?
10/4/2023 By Degemu S (MPH) 250

• Judgment sampling is subject to the researcher's biases and is perhaps
even more biased than haphazard sampling.
• Since any preconceptions the researcher may have are reflected in the
sample, large biases can be introduced if these preconceptions are
inaccurate.
10/4/2023 By Degemu S (MPH) 251

• Researchers often use this method in exploratory studies like pre-testing
of questionnaires and focus groups.
• They also prefer to use this method in laboratory settings where the
choice of experimental subjects (i.e., animal, human) reflects the
investigator's pre-existing beliefs about the population.
10/4/2023 By Degemu S (MPH) 252

• One advantage of judgment sampling is the reduced cost and time
involved in acquiring the sample.
10/4/2023 By Degemu S (MPH) 253

4. Quota sampling
• This is one of the most common forms of non-probability sampling.
• Sampling is done until a specific number of units (quotas) for various
sub-populations have been selected.
10/4/2023 By Degemu S (MPH) 254

• Since there are no rules as to how these quotas are to be filled, quota
sampling is really a means for satisfying sample size objectives for
certain sub-populations.
10/4/2023 By Degemu S (MPH) 255

• As with all other non-probability sampling methods, in order to make
inferences about the population, it is necessary to assume that persons
selected are similar to those not selected.
• Such strong assumptions are rarely valid.
10/4/2023 By Degemu S (MPH) 256

• The main argument against quota sampling is that it does not meet the
basic requirement of randomness.
• Some units may have no chance of selection or the chance of selection
may be unknown.
• Therefore, the sample may be biased.
10/4/2023 By Degemu S (MPH) 257

• Quota sampling is generally less expensive than random sampling.
• It is also easy to administer, especially considering the tasks of listing the
whole population, randomly selecting the sample and following-up on
non-respondents can be omitted from the procedure.
10/4/2023 By Degemu S (MPH) 258

• Quota sampling is an effective sampling method when information is
urgently required and can be carried out sampling frames.
• In many cases where the population has no suitable frame, quota
sampling may be the only appropriate sampling method.
10/4/2023 By Degemu S (MPH) 259

5. Snowball sampling
• A technique for selecting a research sample where existing study
subjects recruit future subjects from among their acquaintances.
• Thus the sample group appears to grow like a rolling snowball.
10/4/2023 By Degemu S (MPH) 260

• This sampling technique is often used in hidden populations which are
difficult for researchers to access; example populations would be drug
users or commercial sex workers.
• Because sample members are not selected from a sampling frame,
snowball samples are subject to numerous biases. For example, people
who have many friends are more likely to be recruited into the sample.
10/4/2023 By Degemu S (MPH) 261

Sampling Distributions
10/4/2023 By Degemu S (MPH) 262

•A sampling distribution is a distribution of all possible
values of a statistic computed from samples of the
same size randomly selected from the same population.
•Serves to answer probability questions about sample
statistics.
10/4/2023 By Degemu S (MPH) 263

• When sampling a discrete, finite population, a sampling distribution can
be constructed.
• However, this construction is difficult with a large population and
impossible with an infinite population.
10/4/2023 By Degemu S (MPH) 264

• We consider sample statistics as random variables.
Example:
• Age of individuals is a random variable.
• Similarly, mean age is a random variable.
10/4/2023 By Degemu S (MPH) 265

• Conclusions about values of population parameters based on one
individual value can not be drawn.
• It should be based on sample statistics computed from an adequate
sample size.
10/4/2023 By Degemu S (MPH) 266

• Similarly, take a sample and calculate the statistic, e.g., mean.
• Take another sample (same size) and calculate mean.
• Repeat & repeat & repeat & ………..
• Do you expect all the sample means the same? NO
• They will vary BUT less variation
• Put all these sample statistics together to get a distribution of sample
statistics.
10/4/2023 By Degemu S (MPH) 267

Construction of sampling distributions
1. From a population of size N, randomly
draw all possible samples of size n.
2. Compute the statistic of interest for
each sample.
3. Create a frequency distribution of the
statistic.
10/4/2023 By Degemu S (MPH) 268

Main types of sampling distributions
A. Distribution of the sample mean
B. Distribution of the difference between two means
C. Distribution of the sample proportion
D. Distribution of the difference between two proportions
10/4/2023 By Degemu S (MPH) 269

A. Sampling distribution of sample mean
• Suppose we have a population of size N=4, constituting the ages of
four outpatients.
x, Age (years): 18, 20, 22, 24
21
4
24
22
20
18
N
x
μ i







2.236
N
μ)
(x
σ
2
i




10/4/2023 By Degemu S (MPH) 270

Now consider all possible samples of size
n=2
• 16 possible samples (with
replacement)
1st 2nd Observation
Obs 18 20 22 24
18 18 19 20 21
20 19 20 21 22
22 20 21 22 23
24 21 22 23 24
1st
2nd
Observation
Obs 18 20 22 24
18 18,18 18,20 18,22 18,24
20 20,18 20,20 20,22 20,24
22 22,18 22,20 22,22 22,24
24 24,18 24,20 24,22 24,24
• 16 Sample Means
10/4/2023 By Degemu S (MPH) 271

Sample means Freq P( )
18
19
20
21
22
23
24
1
2
3
4
3
2
1
0.0625
0.1250
0.1875
0.2500
0.1875
0.1250
0.0625
10/4/2023 By Degemu S (MPH) 272

1st 2nd Observation
Obs 18 20 22 24
18 18 19 20 21
20 19 20 21 22
22 20 21 22 23
24 21 22 23 24
Sampling distribution of all sample means
18 19 20 21 22 23 24
0
.1
.2
.3
P(x)
x
Sample Means
Distribution
16 Sample Means
_
10/4/2023 By Degemu S (MPH) 273

Summary measures of this sampling distribution: Add the 16
sample means & divide by 16. Also calculate the SD of the sample
means.
21
16
24
21
19
18
N
x
μ i
x







 
1.58
16
21)
-
(24
21)
-
(19
21)
-
(18
N
)
μ
(x
σ
2
2
2
2
x
i
x









10/4/2023 By Degemu S (MPH) 274

Comparing the population with its sampling
distribution
18 19 20 21 22 23 24
0
.1
.2
.3
P(x)
Mean
18 20 22 24
0
.1
.2
.3
Population
N = 4
P(x)
x
_
1.58
σ
21
μ x
x


2.236
σ
21
μ 

Sample means distribution
n = 2
10/4/2023 By Degemu S (MPH) 275

• We note that the mean of the sampling distribution of
has the same value as the mean of the original
population.
• However, the variance is ≠ the original population
variance; but is equal to the population variance
divided by the sample size used to obtain sampling
distribution.
10/4/2023 By Degemu S (MPH) 276

• The square root of the sampling distribution variance is called
the standard error of the mean or, simply, standard error.
• OR, the standard deviation of any sample statistic is called its
standard error.
n
σ
σx 
10/4/2023 By Degemu S (MPH) 277

• SE is determined by both the sample size and the degree of variability
among the individual observations
• SD quantifies the amount of variability among individuals in a
population, while
• SE quantifies the variability among means of repeated samples drawn
from that population
• The SE is always smaller than the SD (except when n = 1)
10/4/2023 By Degemu S (MPH) 278

Sampling Error
• Sample statistics are used to estimate
population parameters
ex: X is an estimate of the population mean, μ
• Problems:
• Different samples provide different estimates of the population
parameter
• Sample results have potential variability, thus sampling error exits
10/4/2023 By Degemu S (MPH) 279

Calculating sampling error
• Sampling error:
The difference between a value (a statistic) computed from a sample
and the corresponding value (a parameter) computed from a
population
Example: (for the mean)
where:
μ
-
x
Error
Sampling 
mean
population
μ
mean
sample
x


10/4/2023 By Degemu S (MPH) 280

Example
x
x
If the population mean is μ = 98.6 degrees and a
sample of n = 5 temperatures yields a sample mean
of = 99.2 degrees, then the sampling error is:
Sample mean- μ = 99.2 – 98.6 = 0.6 degrees
x
10/4/2023 By Degemu S (MPH) 281

Note:
• The sampling error may be positive or negative (may be
greater than or less than μ)
• The expected sampling error decreases as the sample size
increases
x
10/4/2023 By Degemu S (MPH) 282

Properties of sampling distribution of mean
A. Sampling from normally distributed
populations.
a. If a population is normal with mean μ and standard
deviation σ, the sampling distribution of is
also normally distributed with
and
x
μ
μx 
n
σ
σx 
10/4/2023 By Degemu S (MPH) 283

b. The mean, μ, of the distribution of sample mean is equal to
the mean of the population from which the samples were
drawn
c. The variance of the distribution of sample mean is equal to
the variance of the population divided by the sample size
10/4/2023 By Degemu S (MPH) 284

B. Sampling from non-normally distributed
populations
• When the sampling is done from a non-normally distributed
population, the central limit theorem is used.
• The larger the sample size, the better will be the normal
approximation to the sampling distribution of the mean.
10/4/2023 By Degemu S (MPH) 285

• We can apply the Central Limit Theorem:
• Even if the population is not normal,
• …sample means from the population will be approximately
normal as long as the sample size is large enough
• …and the sampling distribution will have
and
μ
μx 
n
σ
σx 
10/4/2023 By Degemu S (MPH) 286

n↑
As the
sample size
gets large
enough…
the sampling
distribution
becomes almost
normal
regardless of
shape of
population
x
10/4/2023 By Degemu S (MPH) 287

Population Distribution
Sampling Distribution
(becomes normal as n increases)
Central Tendency
Variation
x
x
Larger
sample
size
Smaller sample
size
If the population is not normal
Sampling distribution
properties:
μ
μx 
n
σ
σx 
x
μ
μ
10/4/2023 By Degemu S (MPH) 288

Below is a graph of results from a sampling activity. Samples were taken at increasing sizes,
from 4 cases to 98 cases. You can see that as sample size increases, not only do the sample
means become closer to the population mean, but fluctuations in sample means becomes
smaller.
10/4/2023
By Degemu S (MPH)
289

• Generally, as n increases, the sample mean and sample variance S2
approach the values of the true population parameters µ and σ2,
respectively.
• The average of the sample means based on repeated samples of size
n approaches the population mean µ as the number of samples
selected gets large.
E (x) = µ
• The estimator x is said to be unbiased
10/4/2023 By Degemu S (MPH) 290

How large is large enough?
• For most distributions, n > 30 will give a sampling distribution that is
nearly normal
• For fairly symmetric distributions, n > 15
• For normal population distributions, the sampling distribution of the
mean is always normally distributed.
• However, the general answer depends on the shape of the distribution
of the sampled population.
10/4/2023 By Degemu S (MPH) 291

Sampling
distribution
of for
different
population
and
different
sizes.
x
10/4/2023 By Degemu S (MPH) 292

Applications of the sampling distributions of sample
mean
• Helps in computing the probability of obtaining a sample with a
mean of some specified magnitude.
10/4/2023 By Degemu S (MPH) 293

z-value for sampling distribution
of x
where: = sample mean
= population mean
σ = population standard deviation
n = sample size
x
μ
n
σ
μ)
x
(
z


10/4/2023 By Degemu S (MPH) 294

Finite Population Correction
• Apply the Finite Population Correction if:
• the sample is large relative to the
population (n/N > 5%) and…
• Sampling is without replacement
Then
1
N
n
N
n
σ
μ)
x
(
z




10/4/2023 By Degemu S (MPH) 295

• When the population is much larger than the
sample, the difference between σ2/n and
(σ2/n)[(N-n)/(N-1)] will be negligible.
• Example: N = 10,000; n=25
• Finite Population Correction = (N-n)/(N-1)
= (10,000-25)/(10,000-1) =0.9976 ≈ 1
10/4/2023 By Degemu S (MPH) 296

Example 1
• Given: μ = 50, σ = 16, n = 64
Find: P(x > 53)
Solution
1. Write the given information, μ=50, σ=16, n=64
2. Sketch a normal curve
10/4/2023 By Degemu S (MPH) 297

3. Convert x to a z score
4. Find the appropriate value(s) in the Table
The area of the SND above a value of z = 1.5 gives an area of
0.0668. The probability P (z > 1.5) = 0.0668
5. Complete the answer
The probability that X is greater than 53 is 0.0668.
10/4/2023 By Degemu S (MPH) 298

Example 2
• Suppose a population has mean μ = 8 and standard deviation σ
= 3. Suppose a random sample of size n = 36 is selected.
• What is the probability that the sample mean is between 7.8
and 8.2?
10/4/2023 By Degemu S (MPH) 299

Solution:
• Even if the population is not normally distributed, the
central limit theorem can be used (n > 30)
• … so the sampling distribution of is approximately
normal
• … with mean = 8
• …and
0.5
36
3
n
σ
σx 


x
x
μ
10/4/2023 By Degemu S (MPH) 300

x
0.3108
0.4)
z
P(-0.4
36
3
8
-
8.2
n
σ
μ
-
μ
36
3
8
-
7.8
P
8.2)
μ
P(7.8
x
x



















z
7.8 8.2 -0.4 0.4
Sampling
Distribution
Standard Normal
Distribution
.1554
+.1554
x
Population
Distribution
?
?
?
?
?
?
?
?
?
?
?
?
Sample Standardize
8
μ  8
μx
 0
μz 
10/4/2023 By Degemu S (MPH) 301

Example 3
• The distribution of serum cholesterol levels for all 20-70 year-old males
has mean µ = 211 mg/100 ml and SD = 46 mg/100 ml.
a. If a sample of size 25 is selected from this population, what is the
probability that the sample has a mean of 230 or above?
• Since x has a normal distribution with mean 211 and standard error 9.2,
10/4/2023 By Degemu S (MPH) 302

10/4/2023 By Degemu S (MPH) 303

• The area under the standard normal curve to the right
of z = 2.07 is 0.0197
• Consequently, the probability that a sample of size 25
has a mean of 230 mg/100 ml or higher is 0.0197.
10/4/2023 By Degemu S (MPH) 304

b. What mean value of serum cholesterol level cuts off the lower 10%
of the sampling distribution?
• An area of 0.1003 in the lower tail of the SND is marked by the
value z = −1.28
• What is the corresponding value of ?
10/4/2023 By Degemu S (MPH) 305

Approximately 10% of samples of size 25
have means that are less than or equal to
199.2 mg/100 ml.
The other 90% of the samples have means
that are greater than 199.2 mg/100 ml
10/4/2023 By Degemu S (MPH) 306

B. Distribution of the difference between two
sample means
• Important to compare two population means (comparative studies)
• Are the two population means different?
• If yes by how much do they differ?
• For example, mean serum cholesterol(MSC) level for sedentary office
workers vs laborers.
10/4/2023 By Degemu S (MPH) 307

• It is generally assumed that the two populations are normally distributed.
• For sampling from non-normal populations, large samples are
recommended by the application of the CLT.
• Plotting sample differences (Mean1-Mean2) against frequency gives a
normal distribution with mean equal to μ1-μ2 which is the difference
between the two population means.
10/4/2023 By Degemu S (MPH) 308

• The variance of the distribution of the sample differences is:
= (σ1
2
/n1) + (σ2
2
/n2)
• Thus, the standard error of the difference between sample means is:
=
10/4/2023 By Degemu S (MPH) 309

• To convert to the SND, we use the formula
• We find the z score by assuming that there is no
difference between the population means.
10/4/2023 By Degemu S (MPH) 310

Micro-Mph (1).pptx

Recommended

Recommended

More Related Content

Similar to Micro-Mph (1).pptx

Similar to Micro-Mph (1).pptx (20)

More from TolasaaNugusee

More from TolasaaNugusee (6)

Recently uploaded

Recently uploaded (20)

Micro-Mph (1).pptx