SlideShare a Scribd company logo
1 of 675
Why study statistics?
• Data is everywhere
• Statistical techniques are used to make many decisions that affect our
lives
• No matter what your career is, you will make professional decisions
that involve data.
• An understanding of statistical methods will help you make these
decisions effectively.
10/4/2023 By Degemu S (MPH) 1
Use of biostatics
• Use for the method of data organization
• Health status assessment
• For evaluation of the health program
• Allocation of resource
• Magnitude of disease /condition
• Assessing risk factor
• Evaluation of new medicine or drug
• Drawing of inference
• Hospital utility statics
• To uptake vaccine
10/4/2023 By Degemu S (MPH) 2
Biostatistics in Public Health?
• What is public health all about?
“Public Health is the science and art
of preventing disease, prolonging life, and promoting health
through the organized efforts of society.”
(World Health Organization)
10/4/2023 By Degemu S (MPH) 3
The Functions of Public Health
• Assessment: Identify problems related to the
public’s health, and measure their extent
•Policy Setting: Prioritize problems find
•Policy Setting: Prioritize problems, find possible solutions, set regulations to
achieve change and predict effect on the population.
•Assurance: Provide services as determined by policy, and monitor compliance
•Evaluation is a theme that cuts across all these functions, i.e., how well are
they performed?
10/4/2023 By Degemu S (MPH) 4
Role of Biostatistics in PH
• Assessment: Identify problems related to the public’s health, and measure their
extent
•Role of Biostatistics in assessment: –
• Decide which information to gather,
• Find patterns in collected data, and
• Make the best summary description of the population and associated problems.
• Design general surveys of the population needs,
Plan experiments to supplement these surveys
• Assist scientists in estimating the extent of health problems and associated risk factors.
10/4/2023 By Degemu S (MPH) 5
Role of Biostatistics in PH
• Policy Setting: Prioritize problems, find possible solutions, set regulations
to achieve change, and predict the effect on the population
•Role of the Biostatistics in Policy Setting:
• Measure problems
• Prioritize problems
• Quantify associations of risk factors with the disease,
• Predict the effect of policy changes
• Estimate costs.
10/4/2023 By Degemu S (MPH) 6
Role of Biostatistics in PH
• Assurance: Provide services as determined by policy, and monitor co
mpliance.
• Role of the Biostatistics in Assurance & Evaluation:
• Use sampling and estimation methods to study
the factors related to compliance and outcome.
• Decide if improvement is due to compliance or something else, how best to m
easure compliance, and how to increase the compliance level in
the target population.
10/4/2023 By Degemu S (MPH) 7
Role of Biostatistics in Health Research
• Purpose of Health Research:
–To create knowledge essential for action to improve health.
• Without good knowledge health intervention would not have neither
logical nor empirical basis and are bound to fail.
10/4/2023 By Degemu S (MPH) 8
Role of Biostatistics in Health Research
Planning
Designing
Data
processing
Execution
(data
collection)
Data analysis
Interpretation
Publication
Presentation
Step in
research
Statistical
thinking
contribute in
every step in
a research
10/4/2023 9
By Degemu S (MPH)
Introduction …
• Variable :-Any aspect/ Characteristics of an individual that is measured
and take any value for different individuals or cases, like blood pressure, or
recorded, like age, sex etc
• Quantitative Variables:- the is one that can be measured in the usual
sense/number. example heights of adult males, the weights of preschool
children, and the ages of patients seen in a dental clinic.
• Qualitative Variables:- Some characteristics are not capable of being
measured in the sense that height, weight, and age are measured. Example
sex of an individual , ethnicity of an individual , religion of individual.
10/4/2023 By Degemu S (MPH) 10
Introduction….
• Population :- the largest collection of entities for which we have an
interest at a particular time usually people..
• Sample:- A sample may be defined simply as a part of a population
10/4/2023 By Degemu S (MPH) 11
Scale of measurement
• Measurement:- is the assignment of numbers to objects or events
according to a set of rules.
• Scale of measurement concerned with the nature of the numbers that
result from measurements.
• Measurements can be qualitative(categorical or quantitative)
• Although the types of variables could be broadly divided into
categorical (qualitative) and quantitative, it has been a common
practice to see four basic types of data (scales of measurement)
10/4/2023 By Degemu S (MPH) 12
Count
• Most basic measure of disease frequency is a simple count of affected
individuals.
• Example:
• 350,000 cases of polio
• 350,000 cases of polio in 1988
• 350,000 cases of polio in 1988 in 125 countries
10/4/2023 By Degemu S (MPH) 13
Ratio, proportion and rate
10/4/2023 By Degemu S (MPH) 14
Ratio
• The quotient of 2 numbers
• Numerator NOT INCLUDED
in the denominator
• No relationship necessary between the numerator and denominator
• May be expressed as a/b or a:b
10/4/2023 By Degemu S (MPH) 15
Ratio
10/4/2023 By Degemu S (MPH) 16
When the ratio used?
• Sex ratio: Male to female
• Number of health facilities per population
• Number of participants in the course per facilitator
• Number of inhabitants per latrine
• Odds ratio
• Relative risk
• Prevalence ratio
• Maternal mortality ratio
10/4/2023 By Degemu S (MPH) 17
Ratio
• Example 1
• A university has 4000 male students and 2000 female students. The
ratio of male to female students is:
• 4000/2000 = 2/1 or 2:1
• For every 2 male students there is one female student
10/4/2023 By Degemu S (MPH) 18
Ratio
• Example 2
• A foodborne epidemic occurred in an elementary school canteen. The
attack rate in the first grade was 24% while the attack rate in the second
grade was 16%. Compare these two attack rates.
• 24/16 = 3/2 or 3:2
• For every 3 first graders who fell ill, there were 2-second graders who
also fell ill.
10/4/2023 By Degemu S (MPH) 19
Ratio
Example 3, A city of 4 million people has 400 clinics.
Calculate the ratio of clinics per person.
Ratio = 400 / 4,000,000 = 0.0001 clinics / person
Multiply by 104
Ratio = 0.0001 x 104
= 1 clinic / 10,000 persons
10/4/2023 By Degemu S (MPH) 20
Proportion
• The quotient of 2 numbers
• Numerator is a sub-group of the population in the denominator
• Numerator is always INCLUDED in the denominator
• Proportion ranges between 0 and 1
• Percentage = proportion x 100
10/4/2023 By Degemu S (MPH) 21
What is the proportion of cases?
50%
100
0.5
total
4
cases
2



+ +
+
+ - -
10/4/2023 By Degemu S (MPH) 22
When is a proportion used?
• Proportion of samples positive for P. Falciparum
• 1000 samples, 236 positive
• Proportion of positive samples = 236/1000 = 0.236
• Parentage of positive samples = 0.236 x 100 = 23. 6%
• Proportion of malaria deaths
• 123 malaria cases, 7 deaths
• Proportion of malaria deaths = 7/123 = 0.057
• Percentage of malaria deaths = 0.057 x 100 = 5.7%
10/4/2023 By Degemu S (MPH) 23
Proportion
• Example 1
• A university has 4000 male students and 2000 female students.
Calculate the proportion of male and female students.
• Male: 4000/6000 x 100% = 66.7%
• Female: 2000/6000 x 100% = 33.3%
10/4/2023 By Degemu S (MPH) 24
Proportion
Example 2
40 children are currently ill with the measles, 80 children all together
have had the measles
• 40 / 80 = .50 (proportion)
• 40 / 80 = .50 * 100 = 50% (percentage)
10/4/2023 By Degemu S (MPH) 25
Rate
• The quotient of 2 numbers
• Measures the probability of occurrence of an event over TIME
• Numerator: number of EVENTS
• Denominator: POPULATION at risk for the event in numerator
observed for a given TIME
10/4/2023 By Degemu S (MPH) 27
What is the rate of death?

year
per
100
2
Observed in one year
2 deaths per 100 population per year
10/4/2023 By Degemu S (MPH) 28
When is a rate used?
• Morbidity rates
• Attack rates
• Prevalence rates
• Incidence rates
• Mortality rates
• Natality rates
10/4/2023 By Degemu S (MPH) 29
Rate Example 1
• Mortality rate of tetanus in France in 1995
• Tetanus deaths: 17
• Population in 1995: 58 million
• Time period: 1 year
• Mortality rate = 0.029 per 100,000 population per year
• Rate may be expressed in any power of 10
• 100, 1,000, 10,000, 100,000
• Rate must include an aspect of time
• Per year, per month, per day
10/4/2023 By Degemu S (MPH) 30
Rate Example 2
Continent Rate
Africa 273000
Asia 217000
Europe 2000
Latin America/Caribbean 22000
South America 15000
North America 490
Australia/New Zealand 25
Maternal Mortality for Various Continents (1995)
Summary
14
Is numerator included
in denominator?
Yes No
Is time included
in denominator?
Yes No
Measure: Rate Proportion Ratio
W
hat istheMeasureof F
requency
?
10/4/2023 By Degemu S (MPH) 32
Nominal Data
 As the name implies data that represent mutually
exclusive categories which do not have natural
order/rank
 There is no implied order /rank to the
categories of nominal data.
 Individuals simply placed in the proper category
or group
 Each item must fit into exactly one category.
 “The category can be assigned by numbers,
names or symbols
Sex
1. Male
2. Female
Marital status
1. Single
2. Married
3. Divorced
4. Widow
Outcome of patient after care accident
1. Alive
2. Dead
Blood group
1. A
2. B
3. O
4. AB
10/4/2023 By Degemu S (MPH) 33
Ordinal data
 The data representing mutually exclusive
categories with ranked order is called ordinal
data.
 The spaces or intervals between the categories are
not necessarily equal.
 The function of numbers assigned to ordinal data
is to order (or rank) the observations from lowest
to highest and, hence, the term ordinal.
Example
job satisfaction index
1. Strongly Disagree
2. Disagree
3. Neutral
4. Agree
5. Strongly agree
Class room rank
1. first
2. Second
3. Third
degree of burn
1. first degree
2. Second degree
3. Third degree
4. Fourth degree
progressive health status of patient after admission
1. Unimproved
2. Improved
3. Much improved
Pain level:
1. None
2. Mild
3. Moderate
4. Severe
10/4/2023 By Degemu S (MPH) 34
Interval data
 It is truly quantitative data
 The intervals between measured values are the same
 Distance between any two measurements is known and the
same.
 The unit of distance and a zero point, both of which are
arbitrary.
 Zero may has no true meaning i.e. may not indicate a total
absence of the quantity being measured.
 The ratio between two measurements have no meaning . For
example 40 degrees Fahrenheit is not twice as much as 20
degrees Fahrenheit
• Example
 Temperature scale
degrees Fahrenheit or Celsius.
 In this case the unit of measurement is the degree,
and the point of comparison is the arbitrarily chosen
"zero degrees,” which do not indicate a lack of heat.
 IQ
10/4/2023 By Degemu S (MPH) 35
Ratio data
 The highest level of measurement is the
ratio scale.
 This scale is characterized by the fact
that equality of ratios well as equality of
intervals may be determined.
 Fundamental to the ratio scale is a true
zero point.
• Example
 height
 weight
 Length
 Age
 Cholesterol level
 Serum sugar level
 the number of TB patient flow to the
hospital
10/4/2023 By Degemu S (MPH) 36
Scale of Measurement
Nominal scale
Ordinal scale
Interval scale
Ratio scale
Degree
of
precision
in
measuring
10/4/2023 By Degemu S (MPH) 37
Numerical Discrete and Numerical Continuous Data
 Both interval and ratio data involve measurement.
 Most data analysis techniques that apply to ratio data also apply to
interval data.
 In most practical aspects both interval and ratio data can be classified
as numerical discrete and numerical continuous.
10/4/2023 By Degemu S (MPH) 38
Numerical Discrete
 For discrete data, both ordering and
magnitude are important.
 the numbers represent actual measurable
quantities rather than mere labels.
 discrete data are restricted to taking on
only specified values—often integers or
counts—that differ by fixed amounts
 no intermediate values are possible.
• Example
 The number of bacteria colonies on a plate
 The number of cells within a prescribed area upon
microscopic examination
 The number of heartbeats within a specified time interval
 The number of times a woman has given birth gravidity
 The number of episodes of illness a patient experiences
during some time period
 number of motor vehicle accidents
 The number of beds available in a particular hospital.
 Etc….
10/4/2023 By Degemu S (MPH) 39
Numerical continuous
 The scale with the greatest degree of quantification is
a numerical continuous scale.
 Each observation theoretically falls somewhere along
a continuum.
 One is not restricted, in principle, to particular values
such as the integers of the discrete scale.
 The restricting factor is the degree of accuracy of the
measuring instrument
• Example
 most clinical
measurements, such as
 blood pressure
 serum cholesterol level
 Height
 weight
 age etc. are on a
numerical continuous
scale.
10/4/2023 By Degemu S (MPH) 40
Categorizing Variables-Exercise
10/4/2023 By Degemu S (MPH) 41
1. Year of birth: numerical
2. Marital status of women: Nominal
3. Identification number study participant: numerical
4. Class rank:ordinal
5. Length of infants at ANC clinic:numerical
Discrete or Continuous?
10/4/2023 By Degemu S (MPH) 42
Inferential Statistics
10/4/2023 By Degemu S (MPH) 43
10/4/2023 By Degemu S (MPH) 44
Inferential Statistics
10/4/2023 By Degemu S (MPH) 45
Probability And Probability Distributions
 The central idea of statistical designs for producing data.
 Probabilities are used in everyday communication
 A patient has a 50 – 50 chance of surviving a certain operation
 The chance of a 30 year old woman to celebrate her 70th birthday is 30%
 These examples suggest the chance of an occurrence of some event of
a random variable.
 Probability theory was developed out of attempting to solve problems
related to games of chance such as tossing a coin, rolling a die etc.
i.e. trying to quantify personal beliefs regarding degrees of
uncertainty
10/4/2023 By Degemu S (MPH) 46
Probability And Probability distribution…..
• Probabilities and probability distributions are extensions of the
ideas of relative frequency and histograms, respectively.
 Relative frequency probability: If some process is repeated a large number of n
times, and some resulting event E occurs m times, the relative frequency of E will
be approximately equal to m/n.
 Symbolically: Pr (E) = m/n
 E.g. Suppose that of 158 people who attended a dinner party, 99 were ill due
to food poisoning. The probability of illness for a person selected at random is
Pr (illness) = 99/158 = 0.63 or 63%.
10/4/2023 By Degemu S (MPH) 47
Probability And Probability distribution…..
• Results are not certain, uncertainty is high
•To evaluate how accurate our results are:
–Given how our data were collected, are our results accurate?
–Given the level of accuracy needed, how many observations need to
be collected?
–The sample size issue?
10/4/2023 By Degemu S (MPH) 48
Probability And Probability distribution…..
• When dealing with a process that has an uncertain outcome
–Birth of male or female child?
–Tossing a coin?
–A patient taking a certain drug(cure/no)?
–The fate of the patient?
10/4/2023 By Degemu S (MPH) 49
Probability And Probability distribution…..
• Experiment=any process with an uncertain outcome.
• An experiment is a trial and all possible outcomes are events
Event=some thing that may happen or not when the experiment is
performed (either occur or not)
• Events are represented by upper case letters such as A,B,C,etc
10/4/2023 By Degemu S (MPH) 50
Probability And Probability distribution…..
•Probability = can be defined as the number of times in which that event
occurs in a very large number of trials.
• Probability of an Event E a number between 0 and 1 representing the
proportion of times that event E is expected to happen when the
experiment is done over and over again under the same conditions.
10/4/2023 By Degemu S (MPH) 51
Probability And Probability Distributions…..
• Any event can be expressed as a subset of the set of all possible
outcomes(sample space=S)
• S = set of all possible outcomes P(S) = 1
• An event is any set of outcomes of interest.
10/4/2023 By Degemu S (MPH) 52
Why Probability in Medicine
• Because medicine is an in exact science, physicians seldom predict an
outcome with absolute certainty.
•E.g. to formulate a diagnosis, physician must rely on available diagnostic
information about a patient
–History and physical examination
–Laboratory investigation-ray findings, ECG, etc.
• Although no test result is absolutely accurate , it does affect the
probability of the presence(or absence) of a disease.
–Sensitivity and specificity
• An understanding of probability is fundamental for quantifying the
uncertainty that is inherent in the decision-making process.
10/4/2023 By Degemu S (MPH) 53
cont.…
• Probability theory is a foundation for statistical inference.
• Allows us to draw conclusions about a population of patients based
on information obtained from a sample of patients drawn from that
population.
• Probability used to:-
• About probability distributions: Binomial, Poisson, and Normal Distributions
• Sampling and sampling distributions
• Estimation
• Hypothesis testing
• Advanced statistical analysis
10/4/2023 By Degemu S (MPH) 54
Categories of Probability
• Objective and Subjective Probabilities.
• Objective probability
1) Classical probability
2) Relative frequency probability
1. Classical Probability :
• Is based on gambling ideas
•Rolling a die –There are 6 possible outcomes:
• Total ways = {1, 2, 3, 4, 5, 6}.
• Each is equally likely to occur –P(i) = 1/6, i=1,2,...,6. P(1) = 1/6 P(2) = 1/6 ,
P(6) = 1/6
• SUM = 1
10/4/2023 By Degemu S (MPH) 55
Classical Probability
• Definition: If an event can occur in N mutually exclusive and equally
likely ways, and if m of these posses a characteristic , E , the probability
of the occurrence of E=m/N.
• P(E)= the probability of E = m/N P(E)= the probability of E = m/N
• If we toss a die,
What is the probability of 4 coming up?
• m=1(which is 4) and N=6
• The probability of 4 coming up is 1/6.
10/4/2023 By Degemu S (MPH) 56
Classical Probability
• Another “equally likely” setting is the tossing of a coin –
–There are 2 possible outcomes in the set of all possible outcomes
–{H, T}. P(H) = 0.5 P(H) = 0.5 P(T) = 0.5 SUM = 1.0
Relative Frequency Probability
•In the long run process…..
•The proportion of times the event A occurs in a large number of trials
repeated under essentially identical conditions.
10/4/2023 By Degemu S (MPH) 57
Relative Frequency Probability
• Definition: If a process is repeated a large number of times(n), and if
an event with the characteristic E occurs m times, the relative
frequency of E.
• Probability of E = P(E) = m/n.
• If you toss a coin 100 times and the head comes up 40 times,
• P(H)=40/100=0.4
• If we toss a coin 10,000 times and the head comes up 5562, the head comes up
5562
• P(H)=0.5562.
•Therefore, the longer the series and the longer the sample size, the closer the
estimate to the true value (0.5).
10/4/2023 By Degemu S (MPH) 58
Subjective Probability
• Personalistic (An opinion or judgment by a decision maker about the
likelihood of an event).
•Personal assessment of which is more effective to provide a cure–traditional/modern
•Personal assessment of which sports team will win a match.
•Also uses classical and relative frequency methods to assess the likelihood
of an event, but does not rely on the repeatability of any process.
E.g., If someone says that he/she is 90% certain that a cure for AIDS will be
discovered within 5 years, then it means that:
P (discovery of a cure for AIDS within 5 years) P (discovery of a cure for
AIDS within 5 years) = 90% = 0.90
10/4/2023 By Degemu S (MPH) 59
Mutually Exclusive Events
• Two events A and B are mutually exclusive if they cannot both happen
at the same time .
• P (A n B) = 0
• If E1 occur , then E2 cannot occur
• E1 and E2 have no common element
E1 E2
YELLOW
CARD
BLACK
CARD
A card cannot black and
yellow at the same time
10/4/2023 By Degemu S (MPH) 60
Mutually Exclusive Events
• Example: –A coin toss cannot produce heads and tails simultaneously.
–Weight of an individual can’t be classified simultaneously as“ underweight ”,
“normal ”,“ overweight” “normal” ,“overweight”
–Blood pressure reading: A=(DBP<90)and B=(90>DBP<95),can’t occur at the
same time.
Independent Events.
•Two events A and B are independent if the probability of the first one happening is
the same no matter how these condone turns out.
•The outcome of one event has no effect on the occurrence or non-occurrence of the
other.
• non-occurrence of the other.
P(A u B) = P(A) x P(B) (Independent events)
• Example: –The outcomes on the first and second coin tosses are independent
10/4/2023 By Degemu S (MPH) 61
Dependent event
• Occurrence of one affects the probability of the other
• P(A n B) ≠ P(A) x P(B)
•Example: Consider the DBP measurements from a mother and her first-
born child. Let: from a mother and her first-born child.
Let: A = {mother’s DBP≥95} and B = {first-born child’s DBP≥80}
•Suppose P{A n B} = 0.05 P{A} = 0.1 P{B} = 0.2
Then P{AB} = 0.05 > P{A} x P{B} = 0.02 And Events A, B would be
dependent.
10/4/2023 By Degemu S (MPH) 62
Dependent event
E1= rain forecasted on news
E2=take umbrella to work
Probability of the second event affected by occurrence of the first event
Intersection, and union
• The intersection of two events A and B, A n B, is the event that A and
B happen simultaneously. P(A and B)=P(An B)
•Let A represent the event that a randomly selected new born is LBW,
and B the event that he or she is from a multiple birth
•The intersection of A and B is the event that the infant is both LBW and
from a multiple birth.
10/4/2023 By Degemu S (MPH) 63
Intersection, and union
• The union of A and B , AUB, is the event that either A happens or B
happens or they both happen simultaneously
• P(A or B)=P(AUB)
• Here , the union of A and B is the event that the new born is either
LBW or from a multiple birth,or both
10/4/2023 By Degemu S (MPH) 64
Properties of Probability
1. The numerical value of a probability always lies between 0 and 1,
inclusive.
0  P(E)  1
 A value 0 means the event can not occur
 A value 1 means the event definitely will occur
 A value of 0.5 means that the probability that the event will occur is the same as
the probability that it will not occur.
10/4/2023 By Degemu S (MPH) 65
2. The sum of the probabilities of all mutually exclusive outcomes is equal
to 1.
P(E1) + P(E2 ) + .... + P(En ) = 1.
3. For two mutually exclusive events A and B,
P(A or B ) = P(AUB)= P(A) + P(B).
If not mutually exclusive:
P(A or B) = P(A) + P(B) - P(A and B)
10/4/2023 By Degemu S (MPH) 66
4. The complement of an event A, denoted by Ā or Ac, is the event that A
does not occur
• Consists of all the outcomes in which event A does NOT occur
P(Ā) = P(not A) = 1 – P(A)
• Ā occurs only when A does not occur.
• These are complementary events.
10/4/2023 By Degemu S (MPH) 67
• In the example, the complement of A is the event that a newborn is
not LBW
• In other words, A is the event that the child weighs 2500 grams at
birth.
P(Ā) = 1 − P(A)
P(not low bwt) = 1 − P(low bwt)
= 1− 0.076
= 0.924
10/4/2023 By Degemu S (MPH) 68
Basic Probability Rules
1. Addition rule
 If events A and B are mutually exclusive:
P(A or B) = P(A) + P(B)
P(A and B) = 0
 More generally:
P(A or B) = P(A) + P(B) - P(A and B)
P(event A or event B occurs or they both occur)
10/4/2023 By Degemu S (MPH) 69
Example: The probabilities below represent years of
schooling completed by mothers of newborn infants
10/4/2023 By Degemu S (MPH) 70
• What is the probability that a mother has
completed < 12 years of schooling?
P( 8 years) = 0.056 and
P(9-11 years) = 0.159
• Since these two events are mutually exclusive,
P( 8 or 9-11) = P( 8 U 9-11)
= P( 8) + P(9-11)
= 0.056+0.159
= 0.215
10/4/2023 By Degemu S (MPH) 71
• What is the probability that a mother has completed 12 or more years of
schooling?
P(12) = P(12 or 13-15 or 16)
= P(12 U 13-15 U 16)
= P(12)+P(13-15)+P(16)
= 0.321+0.218+0.230
= 0.769
10/4/2023 By Degemu S (MPH) 72
If A and B are not mutually exclusive events,
then subtract the overlapping:
P(AU B) = P(A)+P(B) − P(A ∩ B)
10/4/2023 By Degemu S (MPH) 73
• The following data are the results of electrocardiograms (ECGs) and
radionuclide angiocardiograms (RAs) for 19 patients with post-traumatic
myocardial confusions.
• 7 patients developed both ECG and RA abnormality
• 17 patients developed ECG abnormal
• 9 patients developed RA abnormal
P(ECG abnormal and RA abnormal) = 7/19 = 0.37
P(ECG abnormal or RA abnormal) =
P(ECG abnormal) + P(RA abnormal) – P(Both ECG and RA abnormal) =
17/19 + 9/19 – 7/19 = 19/19 =1.
Note: The problem is that the 7 patients whose ECGs and RAs are both abnormal
are counted twice
10/4/2023 By Degemu S (MPH) 74
2. Multiplication rule
• If A and B are independent events, then
P(A ∩ B) = P(A) × P(B)
• More generally,
P(A ∩ B) = P(A) P(B|A) = P(B) P(A|B)
P(A and B) denotes the probability that A and B both occur at the same time.
10/4/2023 By Degemu S (MPH) 75
Conditional Probability
• Refers to the probability of an event, given that another event is known to
have occurred.
• “What happened first is assumed”
• Hint - When thinking about conditional probabilities, think in stages. Think
of the two events A and B occurring chronologically, one after the other,
either in time or space.
10/4/2023 By Degemu S (MPH) 76
• The conditional probability that event B has occurred given that event A
has already occurred is denoted P(B|A) and is defined
provided that P(A) ≠ 0.
10/4/2023 By Degemu S (MPH) 77
Example:
A study investigating the effect of prolonged exposure to bright light on
retina damage in premature infants.
Retinopathy
YES
Retinopathy
NO
TOTAL
Bright light
Reduced light
18
21
3
18
21
39
TOTAL 39 21 60
10/4/2023 By Degemu S (MPH) 78
• The probability of developing retinopathy is:
P (Retinopathy) = No. of infants with retinopathy
Total No. of infants
= (18+21)/(21+39)
= 0.65
10/4/2023 By Degemu S (MPH) 79
• We want to compare the probability of retinopathy, given that the infant
was exposed to bright light, with that the infant was exposed to reduced
light.
• Exposure to bright light and exposure to reduced light is conditioning
events, events we want to take into account when calculating conditional
probabilities.
10/4/2023 By Degemu S (MPH) 80
• The conditional probability of retinopathy, given exposure to bright
light, is:
• P(Retinopathy/exposure to bright light) =
No. of infants with retinopathy exposed to bright light
No. of infants exposed to bright light
= 18/21 = 0.86
10/4/2023 By Degemu S (MPH) 81
• P(Retinopathy/exposure to reduced light) =
# of infants with retinopathy exposed to reduced light
No. of infants exposed to reduced light
= 21/39 = 0.54
• The conditional probabilities suggest that premature infants exposed
to bright light have a higher risk of retinopathy than premature
infants exposed to reduced light.
10/4/2023 By Degemu S (MPH) 82
 For independent events A and B
P(A/B) = P(A).
 For non-independent events A and B
P(A and B) = P(A/B) P(B)
(General Multiplication Rule)
10/4/2023 By Degemu S (MPH) 83
Test for Independence
• Two events A and B are independent
if:
P(B|A)=P(B)
or
P(A and B) = P(A) • P(B)
• Two events A and B are dependent
if
P(B|A) ≠P(B)
or
P(A and B) ≠P(A) • P(B)
10/4/2023 By Degemu S (MPH) 84
Example
• In a study of optic-nerve degeneration in Alzheimer’s disease,
postmortem examinations were conducted on 10 Alzheimer’s patients.
The following table shows the distribution of these patients according
to sex and evidence of optic-nerve degeneration.
• Are the events “patients has optic-nerve degeneration” and “patient is
female” independent for this sample of 10 patients?
10/4/2023 By Degemu S (MPH) 85
Sex
Optic-nerve Degeneration
Present Not Present
Female 4 1
Male 4 1
10/4/2023 By Degemu S (MPH) 86
Solution
• P(Optic-nerve degeneration/Female) =
No. of females with optic-nerve degeneration
No. of females
= 4/5 = 0.80
P(Optic-nerve degeneration) =
No patients with optic-nerve degeneration
Total No. of patients
= 8/10 = 0.80
The events are independent for this sample.
10/4/2023 By Degemu S (MPH) 87
Exercise:
Culture and Gonodectin (GD) test results for 240 Urethral
Discharge Specimens.
GD Test Result
Culture Result
Gonorrhea No Gonorrhea Total
Positive 175 9 184
Negative 8 48 56
Total 183 57 240
10/4/2023 By Degemu S (MPH) 88
1. What is the probability that a man has gonorrhea?183/240
2. What is the probability that a man has a positive GD test?184/240
3. What is the probability that a man has a positive GD test and
gonorrhea?,175/240,
4. What is the probability that a man has a negative GD test and does not
have gonorrhea? =48/240
5. What is the probability that a man with gonorrhea has a positive GD
test?175/183
10/4/2023 By Degemu S (MPH) 89
6. What is the probability that a man does not have gonorrhea has a
negative GD test?48/57
7. What is the probability that a man does not have gonorrhea has a
positive GD test?9/57
8. What is the probability that a man with positive GD test has
gonorrhea? 175/184
10/4/2023 By Degemu S (MPH) 90
Probability Distributions
• A probability distribution is a device used to describe the behavior
that a random variable may have by applying the theory of probability.
• It is the way data are distributed, in order to draw conclusions about a
set of data
• Random Variable = Any quantity or characteristic that is able to
assume a number of different values such that any particular outcome
is determined by chance.
10/4/2023 By Degemu S (MPH) 91
• Random variables can be either discrete or continuous
• A discrete random variable is able to assume only a finite or countable
number of outcomes.
• A continuous random variable can take on any value in a specified
interval.
10/4/2023 By Degemu S (MPH) 92
• With categorical variables, we obtain the frequency distribution of each
variable
• With numeric variables, the aim is to determine whether or not normality
may be assumed
• If not we may consider transforming the variable or categorize it for analysis (e g age
group)
10/4/2023 By Degemu S (MPH) 93
Therefore, the probability distribution of a random variable is a table,
graph, or mathematical formula that gives the probabilities with which
the random variable takes different values or ranges of values.
10/4/2023 By Degemu S (MPH) 94
A. Discrete Probability Distributions
• For a discrete random variable, the probability distribution
specifies each of the possible outcomes of the random variable
along with the probability that each will occur.
• Examples can be:
• Frequency distribution
• Relative frequency distribution
• Cumulative frequency
10/4/2023 By Degemu S (MPH) 95
• We represent a potential outcome of the random variable X by x
0 ≤ P(X = x) ≤ 1
∑ P(X = x) = 1
10/4/2023 By Degemu S (MPH) 96
The following data shows the number of diagnostic services a
patient receives
10/4/2023 By Degemu S (MPH) 97
• What is the probability that a patient receives exactly 3
diagnostic services?
P(X=3) = 0.031
• What is the probability that a patient receives at most one
diagnostic service?
P (X≤1) = P(X = 0) + P(X = 1)
= 0.671 + 0.229
= 0.900
10/4/2023 By Degemu S (MPH) 98
• What is the probability that a patient receives at least four diagnostic
services?
P (X≥4) = P(X = 4) + P(X = 5)
= 0.010 + 0.006
= 0.016
10/4/2023 By Degemu S (MPH) 99
Probability distributions can also be displayed using a graph
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 1 2 3 4 5
No. of diagnostic services, x
Probability,
X=x
10/4/2023 By Degemu S (MPH) 100
The Expected Value of a Discrete Random variable
• If a random variable is able to take on a large number of values, then
a probability mass function might not be the most useful way to
summarize its behavior
• Instead, measures of location and dispersion can be calculated (as
long as the data are not categorical)
10/4/2023 By Degemu S (MPH) 101
• The average value assumed by a random variable is
called its expected value, or the population mean
• It is represented by E(X) or µ
• To obtain the expected value of a discrete random
variable X, we multiply each possible outcome by its
associated probability and sum all values with a
probability greater than 0
10/4/2023 By Degemu S (MPH) 102
• For the diagnostic service data:
Mean (X) = 0(0.671) +1(0.229) +2(0.053)
+3(0.031) +4(0.010) +5(0.006)
= 0.498 ≈ 0.5
• We would expect an average of 0.5 services for each
visit
10/4/2023 By Degemu S (MPH) 103
• The variance of a random variable X is called the population
variance(standard deviation ) and is represented by Var(X) or 2
• It quantifies the dispersion of the possible outcomes of X around the
expected value μ
The Variance of a Discrete Random Variable
10/4/2023 By Degemu S (MPH) 104
σ2 = ∑(xi-µ)2P(X=xi)
= (0− 0.5)2(0.671) +(1 − 0.5)2(0.229)
+(2 − 0.5)2(0.053) +(3 − 0.5)2(0.031)
+(4 − 0.5)2(0.010) +(5 − 0.5)2(0.006)
= 0.782
Standard deviation = σ = √0.782 = 0.884
10/4/2023 By Degemu S (MPH) 105
10/4/2023 By Degemu S (MPH) 106
Binomial and Poisson Distribution
1. Binomial Distribution
• It is one of the most widely encountered discrete probability distributions.
• Consider a dichotomous (binary) random variable
• Is based on the Bernoulli trial
• When a single trial of an experiment can result in only one of two mutually
exclusive outcomes (success or failure; dead or alive; sick or well, male or female)
10/4/2023 By Degemu S (MPH) 107
Example:
• We are interested in determining whether a newborn infant will survive
until his/her 70th birthday
• Let Y represent the survival status of the
child at age 70 years
• Y = 1 if the child survives and Y = 0 if he/she does not
10/4/2023 By Degemu S (MPH) 108
•The outcomes are mutually exclusive and exhaustive
•Suppose that 72% of infants born survive to age 70
years
P(Y = 1) = p = 0.72
P(Y = 0) = 1 − p = 0.28
10/4/2023 By Degemu S (MPH) 109
10/4/2023 By Degemu S (MPH) 110
A binomial probability distribution occurs when the following
requirements are met.
1. The procedure has a fixed number of trials.
2. The trials must be independent.
3. Each trial must have all outcomes that fall into two categories.
4. The probabilities must remain constant for each trial
[P(success) = p].
10/4/2023 By Degemu S (MPH) 111
Characteristics of a Binomial Distribution
• The experiment consists of n identical trials.
• Only two possible outcomes on each trial.
• The probability of A (success), denoted by p, remains the same from trial
to trial. The probability of B (failure), denoted by q,
q = 1- p.
• The trials are independent.
• n and  are the parameters of the binomial distribution.
• The mean is n and the variance is n(1- )
10/4/2023 By Degemu S (MPH) 112
• Suppose an event can have only binary outcomes A and B.
• Let the probability of A is  and that of B is 1 - .
• The probability  stays the same each time the event occurs.
10/4/2023 By Degemu S (MPH) 113
• If an experiment is repeated n times and the outcome is
independent from one trial to another, the probability that
outcome A occurs exactly x times is:
• P (X=x) = , x = 0, 1, 2, ..., n.
=
10/4/2023 By Degemu S (MPH) 114
• n denotes the number of fixed trials
• x denotes the number of successes in
the n trials
• p denotes the probability of success
• q denotes the probability of failure (1- p)
=
• Represents the number of ways of selecting x objects out of n where the
order of selection does not matter.
• where n!=n(n-1)(n-2)…(1) , and 0!=1
10/4/2023 By Degemu S (MPH) 115
Example:
• Suppose we know that 40% of a certain population are cigarette smokers.
If we take a random sample of 10 people from this population, what is the
probability that we will have exactly 4 smokers in our sample?
10/4/2023 By Degemu S (MPH) 116
• If the probability that any individual in the population is a smoker to
be P=.40, then the probability that x=4 smokers out of n=10 subjects
selected is:
P(X=4) =10C4(0.4)4
(1-0.4)10-4
= 10C4(0.4)4
(0.6)6
= 210(.0256)(.04666)
= 0.25
• The probability of obtaining exactly 4 smokers in the sample is about
0.25.
10/4/2023 By Degemu S (MPH) 117
• We can compute the probability of observing zero smokers out of 10
subjects selected at random, exactly 1 smoker, and so on, and display the
results in a table, as given, below.
• The third column, P(X ≤ x), gives the cumulative probability. E.g. the
probability of selecting 3 or fewer smokers into the sample of 10 subjects
is
P(X ≤ 3) =.3823, or about 38%.
10/4/2023 By Degemu S (MPH) 118
10/4/2023 By Degemu S (MPH) 119
The probability in the above table can be converted into
the following graph
0
0.05
0.1
0.15
0.2
0.25
0.3
0 1 2 3 4 5 6 7 8 9 10
No. of Smokers
Probability
10/4/2023 By Degemu S (MPH) 120
Exercise
Each child born to a particular set of parents
has a probability of 0.25 of having blood type
O. If these parents have 5 children.
What is the probability that
a. Exactly two of them have blood type O
b. At most 2 have blood type O
c. At least 4 have blood type O
d. 2 do not have blood type O.
10/4/2023 By Degemu S (MPH) 121
Solution for ‘a’
a.)
2637
.
0
)
75
.
0
(
)
25
.
0
(
2
5
=
2)
P(x 2
-
5
2








10/4/2023 By Degemu S (MPH) 122
The Mean and Variance of a Binomial
Distribution
• Once n and P are specified, we can compute the proportion of
success,
P = x/n
• and the mean and variance of the distribution are given by :
E(X) = μ = np, σ2 = npq, σ = √npq
10/4/2023 By Degemu S (MPH) 123
Example:
• 70% of a certain population has been immunized for polio. If a
sample of size 50 is taken, what is the “expected total number”, in the
sample who have been immunized?
µ = np = 50(.70) = 35
• This tells us that “on the average” we expect to see 35 immunized
subjects in a sample of 50 from this population.
10/4/2023 By Degemu S (MPH) 124
• If repeated samples of size 10 are selected from the population of
infants born, the mean number of children per sample who
survive to age 70 would be
µ = np = (10)(0.72) = 7.2
• The variance would be npq = (10)(0.72)(0.28) = 2.02 and the SD
would be
√2.02 = 1.42
10/4/2023 By Degemu S (MPH) 125
2. The Poisson Distribution
• Is a discrete probability distribution used to model the number of
occurrences of an event that takes place infrequently in time or space
• Applicable for counts of events over a given interval of time, for
example:
• number of patients arriving at an emergency department in a day
• number of new cases of HIV diagnosed at a clinic in a month
10/4/2023 By Degemu S (MPH) 126
• In such cases, we take a sample of days and observe the number of patients
arriving at the emergency department on each day,
• or a sample of months and observe the number of new cases of HIV
diagnosed at the clinic.
• We are observing a count or number of events, rather than a yes/no or
success/ failure outcome for each subject or trial, as in the binomial.
10/4/2023 By Degemu S (MPH) 127
• In theory, a random variable X is a count that can assume any
integer value greater than or equal to 0
10/4/2023 By Degemu S (MPH) 128
• Suppose events happen randomly and independently in time at a
constant rate. If events happen with rate  events per unit time, the
probability of x events happening in unit time is:
P(x) =
e
x!
x
 

10/4/2023 By Degemu S (MPH) 129
• where x = 0, 1, 2, . . .∞
• x is a potential outcome of X
• The constant λ (lambda) represents the rate at which the event occurs, or
the expected number of events per unit time
• e = 2.71828
• It depends up on just one parameter, which is the µ number of occurrences
(λ).
10/4/2023 By Degemu S (MPH) 130
• Three assumptions must be met for a Poisson distribution to apply:
1. The probability that a single event occurs within a given small
subinterval is proportional to the length of the subinterval
P(event) ≈ λΔt for constant λ
2. The rate at which the event occurs is constant over the entire
interval t
3. Events occurring in consecutive subintervals are independent of
each other
10/4/2023 By Degemu S (MPH) 131
Example
• The daily number of new registrations of cancer is 2.2 on
average.
What is the probability of
a) Getting no new cases
b) Getting 1 case
c) Getting 2 cases
d) Getting 3 cases
e) Getting 4 cases
10/4/2023 By Degemu S (MPH) 132
Solutions
a)
b) P(X=1) = 0.244
c) P(X=2) = 0.268
d) P(X=3) = 0.197
e) P(X=4) = 0.108
111
.
0
!
0
)
2
.
2
(
)
0
(
2
.
2
0




e
X
P
10/4/2023 By Degemu S (MPH) 133
0 1 2 3 4 5 6 7
0.3
0.2
0.1
0.0
Probability
Poisson distribution with mean 2.2
10/4/2023 By Degemu S (MPH) 134
Example:
• In a given geographical area, cases of tetanus are reported at a rate of
λ = 4.5/month
• What is the probability that 0 cases of tetanus will be reported in a
given month?
10/4/2023 By Degemu S (MPH) 135
• What is the probability that 1 case of tetanus will be
reported?
10/4/2023 By Degemu S (MPH) 136
Characteristics
• The Poisson distribution is very asymmetric when its mean is small
• With large means it becomes nearly symmetric
• It has no theoretical maximum value, but the probabilities tail off towards
zero very quickly
•  is the parameter of the Poisson distribution
• The mean is  and the variance is also .
10/4/2023 By Degemu S (MPH) 137
B. Continuous Probability Distributions
• A continuous random variable X can take on any value in a specified
interval or range
• With a large number of class intervals, the frequency polygon begins to
resemble a smooth curve.
• The probability distribution of X is represented by a smooth curve called
a probability density function
10/4/2023 By Degemu S (MPH) 138
• The area under the smooth curve is equal to 1
• The area under the curve between any two points x1 and x2 is the
probability that X takes a value between x1 and x2
Distribution of serum
triglyceride
10/4/2023 By Degemu S (MPH) 139
• Instead of assigning probabilities to specific outcomes of the random
variable X, probabilities are assigned to ranges of values
• The probability associated with any one particular value is equal to 0
• Therefore, P(X=x) = 0
• Also, P(X ≥ x) = P(X > x)
10/4/2023 By Degemu S (MPH) 140
• We calculate:
Pr [ a < X < b], the probability of an
interval of values of X.
• For the above reason,
• is also without meaning.
10/4/2023 By Degemu S (MPH) 141
The Normal distribution
• The ND is the most important probability distribution in statistics
• Frequently called the “Gaussian distribution” or bell-shape curve.
• Variables such as blood pressure, weight, height, serum
cholesterol level, and IQ score — are approximately normally
distributed
10/4/2023 By Degemu S (MPH) 142
A random variable is said to have a normal distribution if it has a
probability distribution that is symmetric and bell-shaped
10/4/2023 By Degemu S (MPH) 143
• The ND is vital to statistical work, most estimation procedures and
hypothesis tests underlie ND
• The concept of “probability of X=x” in the discrete probability
distribution is replaced by the “probability density function f(x).
• The ND is also an approximating distribution to other distributions
(e.g., binomial)
10/4/2023 By Degemu S (MPH) 144
• A random variable X is said to follow ND, if and only
if, its probability density function is:
, - < x < .
f(x) =
1
2
e
x-
2
 









1
2
10/4/2023 By Degemu S (MPH) 145
π (pi) = 3.14159
e = 2.71828, x = Value of X
Range of possible values of X: -∞ to +∞
µ = Expected value of X (“the long run average”)
σ2 = Variance of X.
µ and σ are the parameters of the normal distribution — they
completely define its shape
10/4/2023 By Degemu S (MPH) 146
10/4/2023 By Degemu S (MPH) 147
1. The mean µ tells you about location -
• Increase µ - Location shifts right
• Decrease µ – Location shifts left
• Shape is unchanged
2. The variance σ2 tells you about narrowness or flatness of
the bell -
• Increase σ2 - Bell flattens. Extreme values are more likely
• Decrease σ2 - Bell narrows. Extreme values are less likely
• Location is unchanged
10/4/2023 By Degemu S (MPH) 148
10/4/2023 By Degemu S (MPH) 149
Properties of the Normal Distribution(ND)
1. It is symmetrical about its mean, .
2. The mean, the median and mode are almost equal. It is unimodal.
3. The total area under the curve about the x-axis is 1 square unit.
4. The curve never touches the x-axis.
5. As the value of  increases, the curve becomes more and more flat and
vice versa.
10/4/2023 By Degemu S (MPH) 150
6. Perpendiculars of:
± 1SD contain about 68%;
±2 SD contain about 95%;
±3 SD contain about 99.7%
of the area under the curve.
Next slide
7. The distribution is completely determined by the parameters  and .
10/4/2023 By Degemu S (MPH) 151
10/4/2023 By Degemu S (MPH) 152
• We have different normal distributions depending on the values of
μ and σ2.
• We cannot tabulate every possible distribution
• Tabulated normal probability calculations are available only for
the ND with µ = 0 and σ2=1.
10/4/2023 By Degemu S (MPH) 153
Standard Normal Distribution
 It is a normal distribution that has a mean equal to 0 and a SD equal to
1, and is denoted by N(0, 1).
 The main idea is to standardize all the data that is given by using Z-
scores.
 These Z-scores can then be used to find the area (and thus the
probability) under the normal curve.
10/4/2023 By Degemu S (MPH) 154
The standard normal distribution has
mean 0 and variance 1
• Approximately 68% of the area under the standard normal curve lies
between ±1, about 95% between ±2, and about 99% between ±2.5
10/4/2023 By Degemu S (MPH) 155
Z - Transformation
• If a random variable X~N(,) then we can transform it to a SND
with the help of Z-transformation
Z = x - 

• Z represents the Z-score for a given x value
10/4/2023 By Degemu S (MPH) 156
• Consider redefining the scale to be in terms of how many SDs
away from mean for normal distribution, μ=110 and σ=15.
Value x
50 65 80 95 110 125 140 155 170
-4 -3 -2 -1 0 1 2 3 4
SDs from mean using
(x-110)/15 = (x-μ)/σ
10/4/2023 By Degemu S (MPH) 157
• This process is known as standardization and gives the position on a
normal curve with μ = 0 and σ =1, i.e., the SND, Z.
• A Z-score is the number of standard deviations that a given x value is
above or below the mean.
10/4/2023 By Degemu S (MPH) 158
Finding normal curve areas
1. The table gives areas between -∞ and the value of zo.
2. Find the z value in tenths in the column at left margin and locate its
row. Find the hundredth place in the appropriate column.
3. Read the value of the area (P) from the body of the table where the
row and column intersect. Values of P are in the form of a decimal
point and four places.
10/4/2023 By Degemu S (MPH) 159
Some Useful Tips
10/4/2023 By Degemu S (MPH) 160
a) What is the probability that z < -1.96?
(1) Sketch a normal curve
(2) Draw a perpendicular line for z = -1.9
(3) Find the area in the table
(4) The answer is the area to the left of the line P(z < -1.96) = 0.0250
10/4/2023 By Degemu S (MPH) 161
10/4/2023 By Degemu S (MPH) 162
b) What is the probability that -1.96 < z < 1.96?
The area between the values P(-1.96 < z < 1.96)
= .9750 - .0250 = .9500
10/4/2023 By Degemu S (MPH) 163
c) What is the probability that z > 1.96?
• The answer is the area to the right of the line; found by subtracting table
value from 1.0000; P(z > 1.96) =1.0000 - .9750 = .0250
10/4/2023 By Degemu S (MPH) 164
10/4/2023 By Degemu S (MPH) 165
Exercise
1. Compute P(-1 ≤ Z ≤ 1.5)
2. Find the area under the SND from 0 to 1.45
3. Compute P(-1.66 < Z < 2.85)
10/4/2023 By Degemu S (MPH) 166
Ans: 0.7745
Ans: 0.4265
Ans: 0.9493
Applications of the Normal Distribution
• The ND is used as a model to study many different variables.
• The ND can be used to answer probability questions about continuous
random variables.
• Following the model of the ND, a given value of x must be converted to a
z score before it can be looked up in the z table.
10/4/2023 By Degemu S (MPH) 167
Example:
• The diastolic blood pressures of males 35–44 years of age are normally
distributed with µ = 80 mm Hg and σ2 = 144 mm Hg2
σ = 12 mm Hg
• Therefore, a DBP of 80+12 = 92 mm Hg lies 1 SD above the mean
• Let individuals with BP above 95 mm Hg are considered to be
hypertensive
10/4/2023 By Degemu S (MPH) 168
a. What is the probability that a randomly selected male has a BP above 95
mm Hg?
• Approximately 10.6% of this population would be classified as
hypertensive.
10/4/2023 By Degemu S (MPH) 169
b. What is the probability that a randomly selected male has a DBP above
110 mm Hg?
Z = 110 – 80 = 2.50
12
P (Z > 2.50) = 0.0062
• Approximately 0.6% of the population has a DBP above 110 mm Hg
10/4/2023 By Degemu S (MPH) 170
c. What is the probability that a randomly selected male has a DBP below 60
mm Hg?
Z = 60 – 80 = -1.67
12
P (Z < -1.67) = 0.0475
• Approximately 4.8% of the population has a DBP below 60 mm Hg
10/4/2023 By Degemu S (MPH) 171
d. What value of DBP cuts off the upper 5% of this population?
• Looking at the table, the value Z = 1.645 cuts off an area of 0.05 in the
upper tail
• We want the value of X that corresponds to Z = 1.645
Z = X – μ
σ
1.645 = X – μ, X = 99.7
σ
• Approximately 5% of the men in this population have a DBP greater than
99.7 mm Hg
10/4/2023 By Degemu S (MPH) 172
Chapter -Sampling
Sampling distribution
And
10/4/2023 By Degemu S (MPH) 173
Sampling
10/4/2023 By Degemu S (MPH) 174
• Researchers often use sample survey methodology to obtain
information about a larger population by selecting and measuring a
sample from that population.
• Since population is too large, we rely on the information collected
from the sample.
10/4/2023 By Degemu S (MPH) 175
• Inferences about the population are based on the information from the
sample drawn from that population.
• However, due to the variability in the characteristics of the population,
scientific sample designs should be applied to select a representative
sample.
• If not, there is a high risk of distorting the view of the population.
10/4/2023 By Degemu S (MPH) 176
• A sample is a collection of individuals selected from a larger population.
• For example, we may have a single sample composed of 50 cases,
representing a population of 1000 individuals.
10/4/2023 By Degemu S (MPH) 177
• Sampling enables us to estimate the characteristic of a population by
directly observing a portion of the population.
• Researchers are not interested in the sample itself, but in what can be
learned from the sample—and how this information can be applied
to the entire population.
10/4/2023 By Degemu S (MPH) 178
Sample Information
Population
10/4/2023 By Degemu S (MPH) 179
• Therefore, it is essential that a sample should be correctly defined
and organized.
• If the wrong questions are posed to the wrong people, reliable
information will not be received and lead to a wrong conclusion
when applied to the entire population.
10/4/2023 By Degemu S (MPH) 180
Steps needed to select a sample and ensure that this sample will
fulfill its goals.
1. Establish the study's objectives
• The first step in planning a useful and efficient survey is to specify the
objectives with as much detail as possible.
• Without objectives, the survey is unlikely to generate valuable results.
• Clarifying the aims of the survey is critical to its ultimate success.
• The initial users and uses of the data should be identified at this stage.
10/4/2023 By Degemu S (MPH) 181
2. Define the target population
• The target population is the total population for which the information is
required.
• Specifically, the target population is defined by the following characteristics:
• Nature of data required
• Geographic location
• Reference period
• Other characteristics, such as socio-demographic characteristics
10/4/2023 By Degemu S (MPH) 182
3. Decide on the data to be collected
• The data requirements of the survey must be established.
• To ensure that the requirements are operationally sound, the necessary data
terms and definitions also need to be determined.
10/4/2023 By Degemu S (MPH) 183
4. Set the level of precision
• There is a level of uncertainty associated with estimates
coming from a sample.
• The sample-to-sample variation is what causes the
sampling error.
• Researchers can estimate the sampling error associated
with a particular sampling plan, and try to minimize it.
10/4/2023 By Degemu S (MPH) 184
5. Decide on the methods on measurement
• Choose measuring instrument and method of approach to the population
• Data about a person’s state of health may be obtained from statements that
he/she makes or from a medical examination
• The survey may employ a self-administered questionnaire, an interviewing
10/4/2023 By Degemu S (MPH) 185
6. Preparing Frame
• List of all members of the population
• The elements must not overlap
10/4/2023 By Degemu S (MPH) 186
The sample design
• Sample design: how the sample will be collected.
• Estimation techniques: how the results from the sample will be extended to the
whole population.
• Measures of precision: how the sampling error will be measured.
10/4/2023 By Degemu S (MPH) 187
Other Considerations
• Sample size determination
• Questionnaire development
• Pretest
• Organization of the field work
• Data collection
• Summary and analysis of the data
• Edit the completed questionnaires
• Decide on computation procedures
10/4/2023 By Degemu S (MPH) 188
Sampling theory in public health
• A health survey (sampling) is a planned study to investigate the health
characteristics of a population
10/4/2023 By Degemu S (MPH) 189
A health survey is used to:
• Measure the total amount of illness in the population;
• Measure the amount of illness caused by a specified disease;
• Examine the utilization of existing health care facilities and demand
for new ones;
• Measure the distribution of a particular characteristic, e.g.. breast-
feeding practice in the population;
• Examine the role and relationship of one or more factors in the
etiology of a disease.
10/4/2023 By Degemu S (MPH) 190
Sampling
• The process of selecting a portion of the population to represent the
entire population.
• A main concern in sampling:
• Ensure that the sample represents the population, and
• The findings can be generalized.
10/4/2023 By Degemu S (MPH) 191
Advantages of sampling:
• Feasibility: Sampling may be the only feasible method of collecting
information.
• Reduced cost: Sampling reduces demands on resource such as finance,
personnel, and material.
• Greater accuracy: Sampling may lead to better accuracy of collecting
data
• Sampling error: Precise allowance can be made for sampling error
• Greater speed: Data can be collected and summarized more quickly
10/4/2023 By Degemu S (MPH) 192
Disadvantages of sampling:
• There is always a sampling error.
• Sampling may create a feeling of discrimination within the
population.
• Sampling may be inadvisable where every unit in the population is
legally required to have a record.
10/4/2023 By Degemu S (MPH) 193
Errors in sampling
1) Sampling error: Errors introduced due to errors in the selection of
a sample.
• They cannot be avoided or totally eliminated.
2) Non-sampling error:
- Observational error
- Respondent error
- Lack of preciseness of definition
- Errors in editing and tabulation of data
10/4/2023 By Degemu S (MPH) 194
Random number table
• It is a table of random numbers constructed by a process that
1. In any position in the table, each of the numbers 0 through 9 has a probability
1/10 of occurring.
2. The occurrence of any number in one part of the table is independent of the
occurrence of any number in any other part of the table.
10/4/2023 By Degemu S (MPH) 195
Sampling Methods
Two broad divisions:
A. Probability sampling methods
B. Non-probability sampling methods
10/4/2023 By Degemu S (MPH) 196
A. Probability sampling
• Involves random selection of a sample
• A sample is obtained in a way that ensures every member of the
population to have a known, non zero probability of being included in the
sample.
• Involves the selection of a sample from a population, based on chance.
10/4/2023 By Degemu S (MPH) 197
• Probability sampling is:
• more complex,
• more time-consuming and
• usually more costly than non-probability
sampling.
• However, because study samples are randomly selected and their
probability of inclusion can be calculated,
• reliable estimates can be produced and
• inferences can be made about the population.
10/4/2023 By Degemu S (MPH) 198
• There are several different ways in which a probability sample
can be selected.
• The method chosen depends on a number of factors, such as
• the available sampling frame,
• how spread out the population is,
• how costly it is to survey members of the population
10/4/2023 By Degemu S (MPH) 199
• When choosing a probability sample design,
• Our goal should be to minimize the sampling error of the
estimates for the most important survey variables,
• While simultaneously minimizing the time and cost of
conducting the survey.
10/4/2023 By Degemu S (MPH) 200
Most common probability
sampling methods
1. Simple random sampling
2. Systematic random sampling
3. Sampling with probability proportional to size
4. Stratified random sampling
5. Cluster sampling
6. Multi-stage sampling
10/4/2023 By Degemu S (MPH) 201
1. Simple random sampling
• Involves random selection
• Each member of a population has an equal chance of
being included in the sample.
10/4/2023 By Degemu S (MPH) 202
• To use a SRS method:
• Make a numbered list of all the units in the population
• Each unit should be numbered from 1 to N (where N is the
size of the population)
• Select the required number.
10/4/2023 By Degemu S (MPH) 203
• The randomness of the sample is ensured by:
• use of “lottery’ methods
• a table of random numbers
10/4/2023 By Degemu S (MPH) 204
Example
• Suppose your school has 500 students and you need to conduct a
short survey on the quality of the food served in the cafeteria.
• You decide that a sample of 10 students should be sufficient for your
purposes.
• In order to get your sample, you assign a number from 1 to 500 to
each student in your school.
10/4/2023 By Degemu S (MPH) 205
• To select the sample, you use a table of randomly generated numbers.
• Pick a starting point in the table (a row and column number) and look
at the random numbers that appear there. In this case, since the data
run into three digits, the random numbers would need to contain three
digits as well.
10/4/2023 By Degemu S (MPH) 206
• Ignore all random numbers after 500 because they do not correspond to
any of the students in the school.
• Remember that the sample is without replacement, so if a number recurs,
skip over it and use the next random number.
• The first 10 different numbers between 001 and 500 make up your
sample.
10/4/2023 By Degemu S (MPH) 207
• SRS has certain limitations:
• Requires a sampling frame.
• Difficult if the reference population is dispersed.
• Minority subgroups of interest may not be selected.
10/4/2023 By Degemu S (MPH) 208
2. Systematic random sampling
• Sometimes called interval sampling, systematic
sampling means that there is a gap, or interval,
between each selected unit in the sample
• The selection is systematic rather than randomly
10/4/2023 By Degemu S (MPH) 209
• Important if the reference population is arranged
in some order:
• Order of registration of patients
• Numerical number of house numbers
• Student’s registration books
• Taking individuals at fixed intervals (every kth)
based on the sampling fraction, eg. if the sample
includes 20%, then every fifth.
10/4/2023 By Degemu S (MPH) 210
Steps in systematic random sampling
1. Number the units on your frame from 1 to N (where
N is the total population size).
2. Determine the sampling interval (K) by dividing the
number of units in the population by the desired
sample size.
10/4/2023 By Degemu S (MPH) 211
3. Select a number between one and K at random.
This number is called the random start and would
be the first number included in your sample.
4. Select every Kth unit after that first number
Note: Systematic sampling should not be used when
a cyclic repetition is inherent in the sampling
frame.
10/4/2023 By Degemu S (MPH) 212
Example
• To select a sample of 100 from a population of 400,
you would need a sampling interval of 400 ÷ 100 = 4.
• Therefore, K = 4.
• You will need to select one unit out of every four
units to end up with a total of 100 units in your
sample.
• Select a number between 1 and 4 from a table of
random numbers.
10/4/2023 By Degemu S (MPH) 213
• If you choose 3, the third unit on your frame would be
the first unit included in your sample;
• The sample might consist of the following units to
make up a sample of 100: 3 (the random start), 7, 11,
15, 19...395, 399 (up to N, which is 400 in this case).
10/4/2023 By Degemu S (MPH) 214
• Using the above example, you can see that with a
systematic sample approach there are only four
possible samples that can be selected,
corresponding to the four possible random starts:
A. 1, 5, 9, 13...393, 397
B. 2, 6, 10, 14...394, 398
C. 3, 7, 11, 15...395, 399
D. 4, 8, 12, 16...396, 400
10/4/2023 By Degemu S (MPH) 215
• Each member of the population belongs to only one
of the four samples and each sample has the same
chance of being selected.
• The main difference with SRS, any combination of
100 units would have a chance of making up the
sample, while with systematic sampling, there are
only four possible samples.
10/4/2023 By Degemu S (MPH) 216
3. Sampling with probability
proportional to size
• Probability sampling requires that each member of the
survey population has a chance of being included in
the sample, but it does not require that this chance be
the same for everyone.
10/4/2023 By Degemu S (MPH) 217
• If information is available on the frame about the size of
each unit and if those units vary in size, this information
can be used in the sampling selection in order to
increase the efficiency.
• This is known as sampling with probability
proportional to size (PPS).
10/4/2023 By Degemu S (MPH) 218
• With this method, the bigger the size of the unit, the
higher the chance it has of being included in the
sample.
• For this method to achieve increased efficiency, the
measure of size needs to be accurate.
10/4/2023 By Degemu S (MPH) 219
Steps in PPS
• List all Kebeles/clusters with their population size
• Calculate the cumulative frequency
• Calculate the sampling interval by dividing the total
population size by the sample size, say K
• Randomly choose a number between 1 and K, say j
• Kebeles/clusters with cumulative frequency contacting the
jth, (j+1)th, ….(j+(k-1))th will be included in the sample
10/4/2023 By Degemu S (MPH) 220
4. Stratified random sampling
• It is done when the population is known to be have
heterogeneity with regard to some factors and those
factors are used for stratification
• Using stratified sampling, the population is divided into
homogeneous, mutually exclusive groups called strata,
and
• A population can be stratified by any variable that is
available for all units prior to sampling (e.g., age, sex,
province of residence, income, etc.).
10/4/2023 By Degemu S (MPH) 221
• A separate sample is taken independently from each
stratum.
• Any of the sampling methods mentioned in this
section (and others that exist) can be used to sample
within each stratum.
10/4/2023 By Degemu S (MPH) 222
Why do we need to create strata?
• That it can make the sampling strategy more efficient.
• A larger sample is required to get a more accurate estimation if a
characteristic varies greatly from one unit to the other.
• For example, if every person in a population had the same salary, then
a sample of one individual would be enough to get a precise estimate
of the average salary.
10/4/2023 By Degemu S (MPH) 223
• This is the idea behind the efficiency gain obtained with
stratification.
• If you create strata within which units share similar characteristics (e.g.,
income) and are considerably different from units in other strata (e.g.,
occupation, type of dwelling) then you would only need a small sample from
each stratum to get a precise estimate of total income for that stratum.
10/4/2023 By Degemu S (MPH) 224
• Then you could combine these estimates to get a precise estimate
of total income for the whole population.
• If you use a SRS approach in the whole population without
stratification, the sample would need to be larger than the
total of all stratum samples to get an estimate of total
income with the same level of precision.
10/4/2023 By Degemu S (MPH) 225
• Stratified sampling ensures an adequate sample size for sub-
groups in the population of interest.
• When a population is stratified, each stratum becomes an
independent population and you will need to decide the sample
size for each stratum.
10/4/2023 By Degemu S (MPH) 226
• Equal allocation:
• Allocate equal sample size to each stratum
• Proportionate allocation:
, j = 1, 2, ..., k where, k is
the number of strata and
• nj is sample size of the jth stratum
• Nj is population size of the jth stratum
• n = n1 + n2 + ...+ nk is the total sample size
• N = N1 + N2 + ...+ Nk is the total population
size
n
n
N
N
j j

10/4/2023 By Degemu S (MPH) 227
5. Cluster sampling
• Sometimes it is too expensive to spread a sample across the population
as a whole.
• Travel costs can become expensive if interviewers have to survey people
from one end of the country to the other.
• To reduce costs, researchers may choose a cluster sampling technique
• The clusters should be homogeneous, unlike stratified sampling where
by the strata are heterogeneous
10/4/2023 By Degemu S (MPH) 228
Steps in cluster sampling
• Cluster sampling divides the population into groups or
clusters.
• A number of clusters are selected randomly to
represent the total population, and then all units within
selected clusters are included in the sample.
• No units from non-selected clusters are included in the
sample—they are represented by those from selected
clusters.
• This differs from stratified sampling, where some
units are selected from each group.
10/4/2023 By Degemu S (MPH) 229
Example
• In a school based study, we assume students of the same
school are homogeneous.
• We can select randomly sections and include all students
of the selected sections only
10/4/2023 By Degemu S (MPH) 230
• As mentioned, cost reduction is a reason for using
cluster sampling.
• It creates 'pockets' of sampled units instead of
spreading the sample over the whole territory.
• Another reason is that sometimes a list of all units in
the population is not available, while a list of all
clusters is either available or easy to create.
10/4/2023 By Degemu S (MPH) 231
• In most cases, the main drawback is a loss of efficiency
when compared with SRS.
• It is usually better to survey a large number of small
clusters instead of a small number of large clusters.
• This is because neighboring units tend to be more alike,
resulting in a sample that does not represent the whole
spectrum of opinions or situations present in the overall
population.
10/4/2023 By Degemu S (MPH) 232
• Another drawback to cluster sampling is that you do not have
total control over the final sample size.
• Since not all schools have the same number of (say Grade 11)
students and city blocks do not all have the same number of
households, and you must interview every student or household
in your sample, as an example, the final size may be larger or
smaller than you expected.
10/4/2023 By Degemu S (MPH) 233
6. Multi-stage sampling
• Similar to the cluster sampling, except that it involves
picking a sample from within each chosen cluster,
rather than including all units in the cluster.
• This type of sampling requires at least two stages.
10/4/2023 By Degemu S (MPH) 234
• In the first stage, large groups or clusters are identified and selected.
These clusters contain more population units than are needed for the
final sample.
• In the second stage, population units are picked from within the
selected clusters (using any of the possible probability sampling
methods) for a final sample.
10/4/2023 By Degemu S (MPH) 235
• If more than two stages are used, the process of choosing population units
within clusters continues until there is a final sample.
• With multi-stage sampling, you still have the benefit of a more
concentrated sample for cost reduction.
• However, the sample is not as concentrated as other clusters and the
sample size is still bigger than for a simple random sample size.
10/4/2023 By Degemu S (MPH) 236
• Also, you do not need to have a list of all of the units in the
population. All you need is a list of clusters and list of the units in the
selected clusters.
• Admittedly, more information is needed in this type of sample than
what is required in cluster sampling. However, multi-stage sampling
still saves a great amount of time and effort by not having to create a
list of all the units in a population.
10/4/2023 By Degemu S (MPH) 237
B. Non-probability sampling
• The difference between probability and non-probability sampling has
to do with a basic assumption about the nature of the population under
study.
• In probability sampling, every item has a known chance of being
selected.
• In non-probability sampling, there is an assumption that there is an even
distribution of a characteristic of interest within the population.
10/4/2023 By Degemu S (MPH) 238
• This is what makes the researcher believe that any sample would be
representative and because of that, results will be accurate.
• For probability sampling, random is a feature of the selection process,
rather than an assumption about the structure of the population.
10/4/2023 By Degemu S (MPH) 239
• In non-probability sampling, since elements are chosen arbitrarily,
there is no way to estimate the probability of any one element being
included in the sample.
• Also, no assurance is given that each item has a chance of being
included, making it impossible either to estimate sampling variability
or to identify possible bias
10/4/2023 By Degemu S (MPH) 240
• Reliability cannot be measured in non-probability sampling; the only way
to address data quality is to compare some of the survey results with
available information about the population.
• Still, there is no assurance that the estimates will meet an acceptable level
of error.
• Researchers are reluctant to use these methods because there is no way to
measure the precision of the resulting sample.
10/4/2023 By Degemu S (MPH) 241
• Despite these drawbacks, non-probability sampling methods can be useful
when descriptive comments about the sample itself are desired.
• Secondly, they are quick, inexpensive and convenient.
• There are also other circumstances, such as researches, when it is
unfeasible or impractical to conduct probability sampling.
10/4/2023 By Degemu S (MPH) 242
The most common types of non-probability
sampling
1. Convenience or haphazard sampling
2. Volunteer sampling
3. Judgment sampling
4. Quota sampling
5. Snowball sampling technique
10/4/2023 By Degemu S (MPH) 243
1. Convenience or haphazard sampling
• Convenience sampling is sometimes referred to as haphazard or
accidental sampling.
• It is not normally representative of the target population because
sample units are only selected if they can be accessed easily and
conveniently.
10/4/2023 By Degemu S (MPH) 244
• The obvious advantage is that the method is easy to use, but that
advantage is greatly offset by the presence of bias.
• Although useful applications of the technique are limited, it can deliver
accurate results when the population is homogeneous.
10/4/2023 By Degemu S (MPH) 245
• For example, a scientist could use this method to determine whether a
lake is polluted or not.
• Assuming that the lake water is well-mixed, any sample would yield
similar information.
• A scientist could safely draw water anywhere on the lake without
bothering about whether or not the sample is representative
10/4/2023 By Degemu S (MPH) 246
2. Volunteer sampling
• As the term implies, this type of sampling occurs when people volunteer
to be involved in the study.
• In psychological experiments or pharmaceutical trials (drug testing), for
example, it would be difficult and unethical to enlist random participants
from the general public.
• In these instances, the sample is taken from a group of volunteers.
10/4/2023 By Degemu S (MPH) 247
• Sometimes, the researcher offers payment to attract respondents.
• In exchange, the volunteers accept the possibility of a lengthy,
demanding or sometimes unpleasant process.
10/4/2023 By Degemu S (MPH) 248
• Sampling voluntary participants as opposed to the general population
may introduce strong biases.
• Often in opinion polling, only the people who care strongly enough
about the subject tend to respond.
• The silent majority does not typically respond, resulting in large
selection bias.
10/4/2023 By Degemu S (MPH) 249
3. Judgment sampling
• This approach is used when a sample is taken based on certain judgments
about the overall population.
• The underlying assumption is that the investigator will select units that are
characteristic of the population.
• The critical issue here is objectivity: how much can judgment be relied
upon to arrive at a typical sample?
10/4/2023 By Degemu S (MPH) 250
• Judgment sampling is subject to the researcher's biases and is perhaps
even more biased than haphazard sampling.
• Since any preconceptions the researcher may have are reflected in the
sample, large biases can be introduced if these preconceptions are
inaccurate.
10/4/2023 By Degemu S (MPH) 251
• Researchers often use this method in exploratory studies like pre-testing
of questionnaires and focus groups.
• They also prefer to use this method in laboratory settings where the
choice of experimental subjects (i.e., animal, human) reflects the
investigator's pre-existing beliefs about the population.
10/4/2023 By Degemu S (MPH) 252
• One advantage of judgment sampling is the reduced cost and time
involved in acquiring the sample.
10/4/2023 By Degemu S (MPH) 253
4. Quota sampling
• This is one of the most common forms of non-probability sampling.
• Sampling is done until a specific number of units (quotas) for various
sub-populations have been selected.
10/4/2023 By Degemu S (MPH) 254
• Since there are no rules as to how these quotas are to be filled, quota
sampling is really a means for satisfying sample size objectives for
certain sub-populations.
10/4/2023 By Degemu S (MPH) 255
• As with all other non-probability sampling methods, in order to make
inferences about the population, it is necessary to assume that persons
selected are similar to those not selected.
• Such strong assumptions are rarely valid.
10/4/2023 By Degemu S (MPH) 256
• The main argument against quota sampling is that it does not meet the
basic requirement of randomness.
• Some units may have no chance of selection or the chance of selection
may be unknown.
• Therefore, the sample may be biased.
10/4/2023 By Degemu S (MPH) 257
• Quota sampling is generally less expensive than random sampling.
• It is also easy to administer, especially considering the tasks of listing the
whole population, randomly selecting the sample and following-up on
non-respondents can be omitted from the procedure.
10/4/2023 By Degemu S (MPH) 258
• Quota sampling is an effective sampling method when information is
urgently required and can be carried out sampling frames.
• In many cases where the population has no suitable frame, quota
sampling may be the only appropriate sampling method.
10/4/2023 By Degemu S (MPH) 259
5. Snowball sampling
• A technique for selecting a research sample where existing study
subjects recruit future subjects from among their acquaintances.
• Thus the sample group appears to grow like a rolling snowball.
10/4/2023 By Degemu S (MPH) 260
• This sampling technique is often used in hidden populations which are
difficult for researchers to access; example populations would be drug
users or commercial sex workers.
• Because sample members are not selected from a sampling frame,
snowball samples are subject to numerous biases. For example, people
who have many friends are more likely to be recruited into the sample.
10/4/2023 By Degemu S (MPH) 261
Sampling Distributions
10/4/2023 By Degemu S (MPH) 262
•A sampling distribution is a distribution of all possible
values of a statistic computed from samples of the
same size randomly selected from the same population.
•Serves to answer probability questions about sample
statistics.
10/4/2023 By Degemu S (MPH) 263
• When sampling a discrete, finite population, a sampling distribution can
be constructed.
• However, this construction is difficult with a large population and
impossible with an infinite population.
10/4/2023 By Degemu S (MPH) 264
• We consider sample statistics as random variables.
Example:
• Age of individuals is a random variable.
• Similarly, mean age is a random variable.
10/4/2023 By Degemu S (MPH) 265
• Conclusions about values of population parameters based on one
individual value can not be drawn.
• It should be based on sample statistics computed from an adequate
sample size.
10/4/2023 By Degemu S (MPH) 266
• Similarly, take a sample and calculate the statistic, e.g., mean.
• Take another sample (same size) and calculate mean.
• Repeat & repeat & repeat & ………..
• Do you expect all the sample means the same? NO
• They will vary BUT less variation
• Put all these sample statistics together to get a distribution of sample
statistics.
10/4/2023 By Degemu S (MPH) 267
Construction of sampling distributions
1. From a population of size N, randomly
draw all possible samples of size n.
2. Compute the statistic of interest for
each sample.
3. Create a frequency distribution of the
statistic.
10/4/2023 By Degemu S (MPH) 268
Main types of sampling distributions
A. Distribution of the sample mean
B. Distribution of the difference between two means
C. Distribution of the sample proportion
D. Distribution of the difference between two proportions
10/4/2023 By Degemu S (MPH) 269
A. Sampling distribution of sample mean
• Suppose we have a population of size N=4, constituting the ages of
four outpatients.
x, Age (years): 18, 20, 22, 24
21
4
24
22
20
18
N
x
μ i







2.236
N
μ)
(x
σ
2
i




10/4/2023 By Degemu S (MPH) 270
Now consider all possible samples of size
n=2
• 16 possible samples (with
replacement)
1st 2nd Observation
Obs 18 20 22 24
18 18 19 20 21
20 19 20 21 22
22 20 21 22 23
24 21 22 23 24
1st
2nd
Observation
Obs 18 20 22 24
18 18,18 18,20 18,22 18,24
20 20,18 20,20 20,22 20,24
22 22,18 22,20 22,22 22,24
24 24,18 24,20 24,22 24,24
• 16 Sample Means
10/4/2023 By Degemu S (MPH) 271
Sample means Freq P( )
18
19
20
21
22
23
24
1
2
3
4
3
2
1
0.0625
0.1250
0.1875
0.2500
0.1875
0.1250
0.0625
10/4/2023 By Degemu S (MPH) 272
1st 2nd Observation
Obs 18 20 22 24
18 18 19 20 21
20 19 20 21 22
22 20 21 22 23
24 21 22 23 24
Sampling distribution of all sample means
18 19 20 21 22 23 24
0
.1
.2
.3
P(x)
x
Sample Means
Distribution
16 Sample Means
_
10/4/2023 By Degemu S (MPH) 273
Summary measures of this sampling distribution: Add the 16
sample means & divide by 16. Also calculate the SD of the sample
means.
21
16
24
21
19
18
N
x
μ i
x







 
1.58
16
21)
-
(24
21)
-
(19
21)
-
(18
N
)
μ
(x
σ
2
2
2
2
x
i
x









10/4/2023 By Degemu S (MPH) 274
Comparing the population with its sampling
distribution
18 19 20 21 22 23 24
0
.1
.2
.3
P(x)
Mean
18 20 22 24
0
.1
.2
.3
Population
N = 4
P(x)
x
_
1.58
σ
21
μ x
x


2.236
σ
21
μ 

Sample means distribution
n = 2
10/4/2023 By Degemu S (MPH) 275
• We note that the mean of the sampling distribution of
has the same value as the mean of the original
population.
• However, the variance is ≠ the original population
variance; but is equal to the population variance
divided by the sample size used to obtain sampling
distribution.
10/4/2023 By Degemu S (MPH) 276
• The square root of the sampling distribution variance is called
the standard error of the mean or, simply, standard error.
• OR, the standard deviation of any sample statistic is called its
standard error.
n
σ
σx 
10/4/2023 By Degemu S (MPH) 277
• SE is determined by both the sample size and the degree of variability
among the individual observations
• SD quantifies the amount of variability among individuals in a
population, while
• SE quantifies the variability among means of repeated samples drawn
from that population
• The SE is always smaller than the SD (except when n = 1)
10/4/2023 By Degemu S (MPH) 278
Sampling Error
• Sample statistics are used to estimate
population parameters
ex: X is an estimate of the population mean, μ
• Problems:
• Different samples provide different estimates of the population
parameter
• Sample results have potential variability, thus sampling error exits
10/4/2023 By Degemu S (MPH) 279
Calculating sampling error
• Sampling error:
The difference between a value (a statistic) computed from a sample
and the corresponding value (a parameter) computed from a
population
Example: (for the mean)
where:
μ
-
x
Error
Sampling 
mean
population
μ
mean
sample
x


10/4/2023 By Degemu S (MPH) 280
Example
x
x
If the population mean is μ = 98.6 degrees and a
sample of n = 5 temperatures yields a sample mean
of = 99.2 degrees, then the sampling error is:
Sample mean- μ = 99.2 – 98.6 = 0.6 degrees
x
10/4/2023 By Degemu S (MPH) 281
Note:
• The sampling error may be positive or negative (may be
greater than or less than μ)
• The expected sampling error decreases as the sample size
increases
x
10/4/2023 By Degemu S (MPH) 282
Properties of sampling distribution of mean
A. Sampling from normally distributed
populations.
a. If a population is normal with mean μ and standard
deviation σ, the sampling distribution of is
also normally distributed with
and
x
μ
μx 
n
σ
σx 
10/4/2023 By Degemu S (MPH) 283
b. The mean, μ, of the distribution of sample mean is equal to
the mean of the population from which the samples were
drawn
c. The variance of the distribution of sample mean is equal to
the variance of the population divided by the sample size
10/4/2023 By Degemu S (MPH) 284
B. Sampling from non-normally distributed
populations
• When the sampling is done from a non-normally distributed
population, the central limit theorem is used.
• The larger the sample size, the better will be the normal
approximation to the sampling distribution of the mean.
10/4/2023 By Degemu S (MPH) 285
• We can apply the Central Limit Theorem:
• Even if the population is not normal,
• …sample means from the population will be approximately
normal as long as the sample size is large enough
• …and the sampling distribution will have
and
μ
μx 
n
σ
σx 
10/4/2023 By Degemu S (MPH) 286
n↑
As the
sample size
gets large
enough…
the sampling
distribution
becomes almost
normal
regardless of
shape of
population
x
10/4/2023 By Degemu S (MPH) 287
Population Distribution
Sampling Distribution
(becomes normal as n increases)
Central Tendency
Variation
x
x
Larger
sample
size
Smaller sample
size
If the population is not normal
Sampling distribution
properties:
μ
μx 
n
σ
σx 
x
μ
μ
10/4/2023 By Degemu S (MPH) 288
Below is a graph of results from a sampling activity. Samples were taken at increasing sizes,
from 4 cases to 98 cases. You can see that as sample size increases, not only do the sample
means become closer to the population mean, but fluctuations in sample means becomes
smaller.
10/4/2023
By Degemu S (MPH)
289
• Generally, as n increases, the sample mean and sample variance S2
approach the values of the true population parameters µ and σ2,
respectively.
• The average of the sample means based on repeated samples of size
n approaches the population mean µ as the number of samples
selected gets large.
E (x) = µ
• The estimator x is said to be unbiased
10/4/2023 By Degemu S (MPH) 290
How large is large enough?
• For most distributions, n > 30 will give a sampling distribution that is
nearly normal
• For fairly symmetric distributions, n > 15
• For normal population distributions, the sampling distribution of the
mean is always normally distributed.
• However, the general answer depends on the shape of the distribution
of the sampled population.
10/4/2023 By Degemu S (MPH) 291
Sampling
distribution
of for
different
population
and
different
sizes.
x
10/4/2023 By Degemu S (MPH) 292
Applications of the sampling distributions of sample
mean
• Helps in computing the probability of obtaining a sample with a
mean of some specified magnitude.
10/4/2023 By Degemu S (MPH) 293
z-value for sampling distribution
of x
where: = sample mean
= population mean
σ = population standard deviation
n = sample size
x
μ
n
σ
μ)
x
(
z


10/4/2023 By Degemu S (MPH) 294
Finite Population Correction
• Apply the Finite Population Correction if:
• the sample is large relative to the
population (n/N > 5%) and…
• Sampling is without replacement
Then
1
N
n
N
n
σ
μ)
x
(
z




10/4/2023 By Degemu S (MPH) 295
• When the population is much larger than the
sample, the difference between σ2/n and
(σ2/n)[(N-n)/(N-1)] will be negligible.
• Example: N = 10,000; n=25
• Finite Population Correction = (N-n)/(N-1)
= (10,000-25)/(10,000-1) =0.9976 ≈ 1
10/4/2023 By Degemu S (MPH) 296
Example 1
• Given: μ = 50, σ = 16, n = 64
Find: P(x > 53)
Solution
1. Write the given information, μ=50, σ=16, n=64
2. Sketch a normal curve
10/4/2023 By Degemu S (MPH) 297
3. Convert x to a z score
4. Find the appropriate value(s) in the Table
The area of the SND above a value of z = 1.5 gives an area of
0.0668. The probability P (z > 1.5) = 0.0668
5. Complete the answer
The probability that X is greater than 53 is 0.0668.
10/4/2023 By Degemu S (MPH) 298
Example 2
• Suppose a population has mean μ = 8 and standard deviation σ
= 3. Suppose a random sample of size n = 36 is selected.
• What is the probability that the sample mean is between 7.8
and 8.2?
10/4/2023 By Degemu S (MPH) 299
Solution:
• Even if the population is not normally distributed, the
central limit theorem can be used (n > 30)
• … so the sampling distribution of is approximately
normal
• … with mean = 8
• …and
0.5
36
3
n
σ
σx 


x
x
μ
10/4/2023 By Degemu S (MPH) 300
x
0.3108
0.4)
z
P(-0.4
36
3
8
-
8.2
n
σ
μ
-
μ
36
3
8
-
7.8
P
8.2)
μ
P(7.8
x
x



















z
7.8 8.2 -0.4 0.4
Sampling
Distribution
Standard Normal
Distribution
.1554
+.1554
x
Population
Distribution
?
?
?
?
?
?
?
?
?
?
?
?
Sample Standardize
8
μ  8
μx
 0
μz 
10/4/2023 By Degemu S (MPH) 301
Example 3
• The distribution of serum cholesterol levels for all 20-70 year-old males
has mean µ = 211 mg/100 ml and SD = 46 mg/100 ml.
a. If a sample of size 25 is selected from this population, what is the
probability that the sample has a mean of 230 or above?
• Since x has a normal distribution with mean 211 and standard error 9.2,
10/4/2023 By Degemu S (MPH) 302
10/4/2023 By Degemu S (MPH) 303
• The area under the standard normal curve to the right
of z = 2.07 is 0.0197
• Consequently, the probability that a sample of size 25
has a mean of 230 mg/100 ml or higher is 0.0197.
10/4/2023 By Degemu S (MPH) 304
b. What mean value of serum cholesterol level cuts off the lower 10%
of the sampling distribution?
• An area of 0.1003 in the lower tail of the SND is marked by the
value z = −1.28
• What is the corresponding value of ?
10/4/2023 By Degemu S (MPH) 305
Approximately 10% of samples of size 25
have means that are less than or equal to
199.2 mg/100 ml.
The other 90% of the samples have means
that are greater than 199.2 mg/100 ml
10/4/2023 By Degemu S (MPH) 306
B. Distribution of the difference between two
sample means
• Important to compare two population means (comparative studies)
• Are the two population means different?
• If yes by how much do they differ?
• For example, mean serum cholesterol(MSC) level for sedentary office
workers vs laborers.
10/4/2023 By Degemu S (MPH) 307
• It is generally assumed that the two populations are normally distributed.
• For sampling from non-normal populations, large samples are
recommended by the application of the CLT.
• Plotting sample differences (Mean1-Mean2) against frequency gives a
normal distribution with mean equal to μ1-μ2 which is the difference
between the two population means.
10/4/2023 By Degemu S (MPH) 308
• The variance of the distribution of the sample differences is:
= (σ1
2
/n1) + (σ2
2
/n2)
• Thus, the standard error of the difference between sample means is:
=
10/4/2023 By Degemu S (MPH) 309
• To convert to the SND, we use the formula
• We find the z score by assuming that there is no
difference between the population means.
10/4/2023 By Degemu S (MPH) 310
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx
Micro-Mph (1).pptx

More Related Content

Similar to Micro-Mph (1).pptx

Strayer university has 535 final exam 2
Strayer university has 535 final exam 2Strayer university has 535 final exam 2
Strayer university has 535 final exam 2Christina Walkar
 
Running Head Colorectal Cancer Prevention Program-Evaluation Des.docx
 Running Head Colorectal Cancer Prevention Program-Evaluation Des.docx Running Head Colorectal Cancer Prevention Program-Evaluation Des.docx
Running Head Colorectal Cancer Prevention Program-Evaluation Des.docxaryan532920
 
COHORT STUDIES.pptx
COHORT STUDIES.pptxCOHORT STUDIES.pptx
COHORT STUDIES.pptxreshmasu
 
Public health nutrition- national disorders
Public health nutrition- national disordersPublic health nutrition- national disorders
Public health nutrition- national disordersRenu K Abraham
 
Statistics.pdf.pdf for Research Physiotherapy and Occupational Therapy
Statistics.pdf.pdf for Research Physiotherapy and Occupational TherapyStatistics.pdf.pdf for Research Physiotherapy and Occupational Therapy
Statistics.pdf.pdf for Research Physiotherapy and Occupational TherapySakhileKhoza2
 
Evidence based population health screening
Evidence based population health screeningEvidence based population health screening
Evidence based population health screeningmeducationdotnet
 
Question study design
Question study designQuestion study design
Question study designAnisur Rahman
 
Research Designs
Research DesignsResearch Designs
Research DesignsAravind L R
 
Epidemiological study designs
Epidemiological study designsEpidemiological study designs
Epidemiological study designsjarati
 
244531187-Epidemiology.pptx
244531187-Epidemiology.pptx244531187-Epidemiology.pptx
244531187-Epidemiology.pptxSugunaChinniK
 
MEASURES OF DISEASE FREQUENCY. ASSOSCIATION AND IMPACT
MEASURES OF DISEASE FREQUENCY. ASSOSCIATION AND IMPACTMEASURES OF DISEASE FREQUENCY. ASSOSCIATION AND IMPACT
MEASURES OF DISEASE FREQUENCY. ASSOSCIATION AND IMPACTAneesa K Ayoob
 
Epidemiology methods, approaches and tools of measurement
Epidemiology methods, approaches and tools of measurement Epidemiology methods, approaches and tools of measurement
Epidemiology methods, approaches and tools of measurement Swapnilsalve1998
 
Epidemiological study designs Part - I.pptx
Epidemiological study designs Part - I.pptxEpidemiological study designs Part - I.pptx
Epidemiological study designs Part - I.pptxIsaacLalrawngbawla1
 
statistics introduction.ppt
statistics introduction.pptstatistics introduction.ppt
statistics introduction.pptCHANDAN PADHAN
 
Epidemiology.pptx
Epidemiology.pptxEpidemiology.pptx
Epidemiology.pptxDeepakRx1
 
Epidemiological Studies
Epidemiological StudiesEpidemiological Studies
Epidemiological StudiesINAAMUL HAQ
 

Similar to Micro-Mph (1).pptx (20)

Strayer university has 535 final exam 2
Strayer university has 535 final exam 2Strayer university has 535 final exam 2
Strayer university has 535 final exam 2
 
Running Head Colorectal Cancer Prevention Program-Evaluation Des.docx
 Running Head Colorectal Cancer Prevention Program-Evaluation Des.docx Running Head Colorectal Cancer Prevention Program-Evaluation Des.docx
Running Head Colorectal Cancer Prevention Program-Evaluation Des.docx
 
COHORT STUDIES.pptx
COHORT STUDIES.pptxCOHORT STUDIES.pptx
COHORT STUDIES.pptx
 
slidebank331
slidebank331slidebank331
slidebank331
 
slidebank331slide
slidebank331slideslidebank331slide
slidebank331slide
 
Public health nutrition- national disorders
Public health nutrition- national disordersPublic health nutrition- national disorders
Public health nutrition- national disorders
 
Statistics.pdf.pdf for Research Physiotherapy and Occupational Therapy
Statistics.pdf.pdf for Research Physiotherapy and Occupational TherapyStatistics.pdf.pdf for Research Physiotherapy and Occupational Therapy
Statistics.pdf.pdf for Research Physiotherapy and Occupational Therapy
 
Evidence based population health screening
Evidence based population health screeningEvidence based population health screening
Evidence based population health screening
 
Question study design
Question study designQuestion study design
Question study design
 
Research Designs
Research DesignsResearch Designs
Research Designs
 
Epidemiological statistics III
Epidemiological statistics IIIEpidemiological statistics III
Epidemiological statistics III
 
Epidemiological study designs
Epidemiological study designsEpidemiological study designs
Epidemiological study designs
 
244531187-Epidemiology.pptx
244531187-Epidemiology.pptx244531187-Epidemiology.pptx
244531187-Epidemiology.pptx
 
What’s a Picture Worth
What’s a Picture WorthWhat’s a Picture Worth
What’s a Picture Worth
 
MEASURES OF DISEASE FREQUENCY. ASSOSCIATION AND IMPACT
MEASURES OF DISEASE FREQUENCY. ASSOSCIATION AND IMPACTMEASURES OF DISEASE FREQUENCY. ASSOSCIATION AND IMPACT
MEASURES OF DISEASE FREQUENCY. ASSOSCIATION AND IMPACT
 
Epidemiology methods, approaches and tools of measurement
Epidemiology methods, approaches and tools of measurement Epidemiology methods, approaches and tools of measurement
Epidemiology methods, approaches and tools of measurement
 
Epidemiological study designs Part - I.pptx
Epidemiological study designs Part - I.pptxEpidemiological study designs Part - I.pptx
Epidemiological study designs Part - I.pptx
 
statistics introduction.ppt
statistics introduction.pptstatistics introduction.ppt
statistics introduction.ppt
 
Epidemiology.pptx
Epidemiology.pptxEpidemiology.pptx
Epidemiology.pptx
 
Epidemiological Studies
Epidemiological StudiesEpidemiological Studies
Epidemiological Studies
 

More from TolasaaNugusee

More from TolasaaNugusee (6)

CONTROLLING (1).pptx
CONTROLLING (1).pptxCONTROLLING (1).pptx
CONTROLLING (1).pptx
 
controlling seminar.pptx
controlling seminar.pptxcontrolling seminar.pptx
controlling seminar.pptx
 
Asthma.pptx
Asthma.pptxAsthma.pptx
Asthma.pptx
 
Asthma.pptx
 Asthma.pptx Asthma.pptx
Asthma.pptx
 
huntingtons-disease.pptx
huntingtons-disease.pptxhuntingtons-disease.pptx
huntingtons-disease.pptx
 
1590180412.pptx
1590180412.pptx1590180412.pptx
1590180412.pptx
 

Recently uploaded

Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426jennyeacort
 
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...chennailover
 
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...chetankumar9855
 
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...Sheetaleventcompany
 
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service AvailableGENUINE ESCORT AGENCY
 
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋TANUJA PANDEY
 
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...Namrata Singh
 
Call Girls Madurai Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Madurai Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Madurai Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Madurai Just Call 9630942363 Top Class Call Girl Service AvailableGENUINE ESCORT AGENCY
 
Top Rated Call Girls Kerala ☎ 8250092165👄 Delivery in 20 Mins Near Me
Top Rated Call Girls Kerala ☎ 8250092165👄 Delivery in 20 Mins Near MeTop Rated Call Girls Kerala ☎ 8250092165👄 Delivery in 20 Mins Near Me
Top Rated Call Girls Kerala ☎ 8250092165👄 Delivery in 20 Mins Near Mechennailover
 
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...adilkhan87451
 
Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...
Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...
Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...GENUINE ESCORT AGENCY
 
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls Amritsar Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Amritsar Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Amritsar Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Amritsar Just Call 8250077686 Top Class Call Girl Service AvailableDipal Arora
 
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...mahaiklolahd
 
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...parulsinha
 
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...parulsinha
 
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...parulsinha
 
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...Anamika Rawat
 
Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Top Rated  Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...Top Rated  Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...chandars293
 

Recently uploaded (20)

Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
 
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
 
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
 
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
 
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
 
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
 
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
 
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
 
Call Girls Madurai Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Madurai Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Madurai Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Madurai Just Call 9630942363 Top Class Call Girl Service Available
 
Top Rated Call Girls Kerala ☎ 8250092165👄 Delivery in 20 Mins Near Me
Top Rated Call Girls Kerala ☎ 8250092165👄 Delivery in 20 Mins Near MeTop Rated Call Girls Kerala ☎ 8250092165👄 Delivery in 20 Mins Near Me
Top Rated Call Girls Kerala ☎ 8250092165👄 Delivery in 20 Mins Near Me
 
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
Russian Call Girls Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service...
 
Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...
Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...
Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...
 
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Amritsar Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Amritsar Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Amritsar Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Amritsar Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
 
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
 
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
Premium Call Girls In Jaipur {8445551418} ❤️VVIP SEEMA Call Girl in Jaipur Ra...
 
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
Call Girls Service Jaipur {8445551418} ❤️VVIP BHAWNA Call Girl in Jaipur Raja...
 
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
 
Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Top Rated  Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...Top Rated  Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
 

Micro-Mph (1).pptx

  • 1. Why study statistics? • Data is everywhere • Statistical techniques are used to make many decisions that affect our lives • No matter what your career is, you will make professional decisions that involve data. • An understanding of statistical methods will help you make these decisions effectively. 10/4/2023 By Degemu S (MPH) 1
  • 2. Use of biostatics • Use for the method of data organization • Health status assessment • For evaluation of the health program • Allocation of resource • Magnitude of disease /condition • Assessing risk factor • Evaluation of new medicine or drug • Drawing of inference • Hospital utility statics • To uptake vaccine 10/4/2023 By Degemu S (MPH) 2
  • 3. Biostatistics in Public Health? • What is public health all about? “Public Health is the science and art of preventing disease, prolonging life, and promoting health through the organized efforts of society.” (World Health Organization) 10/4/2023 By Degemu S (MPH) 3
  • 4. The Functions of Public Health • Assessment: Identify problems related to the public’s health, and measure their extent •Policy Setting: Prioritize problems find •Policy Setting: Prioritize problems, find possible solutions, set regulations to achieve change and predict effect on the population. •Assurance: Provide services as determined by policy, and monitor compliance •Evaluation is a theme that cuts across all these functions, i.e., how well are they performed? 10/4/2023 By Degemu S (MPH) 4
  • 5. Role of Biostatistics in PH • Assessment: Identify problems related to the public’s health, and measure their extent •Role of Biostatistics in assessment: – • Decide which information to gather, • Find patterns in collected data, and • Make the best summary description of the population and associated problems. • Design general surveys of the population needs, Plan experiments to supplement these surveys • Assist scientists in estimating the extent of health problems and associated risk factors. 10/4/2023 By Degemu S (MPH) 5
  • 6. Role of Biostatistics in PH • Policy Setting: Prioritize problems, find possible solutions, set regulations to achieve change, and predict the effect on the population •Role of the Biostatistics in Policy Setting: • Measure problems • Prioritize problems • Quantify associations of risk factors with the disease, • Predict the effect of policy changes • Estimate costs. 10/4/2023 By Degemu S (MPH) 6
  • 7. Role of Biostatistics in PH • Assurance: Provide services as determined by policy, and monitor co mpliance. • Role of the Biostatistics in Assurance & Evaluation: • Use sampling and estimation methods to study the factors related to compliance and outcome. • Decide if improvement is due to compliance or something else, how best to m easure compliance, and how to increase the compliance level in the target population. 10/4/2023 By Degemu S (MPH) 7
  • 8. Role of Biostatistics in Health Research • Purpose of Health Research: –To create knowledge essential for action to improve health. • Without good knowledge health intervention would not have neither logical nor empirical basis and are bound to fail. 10/4/2023 By Degemu S (MPH) 8
  • 9. Role of Biostatistics in Health Research Planning Designing Data processing Execution (data collection) Data analysis Interpretation Publication Presentation Step in research Statistical thinking contribute in every step in a research 10/4/2023 9 By Degemu S (MPH)
  • 10. Introduction … • Variable :-Any aspect/ Characteristics of an individual that is measured and take any value for different individuals or cases, like blood pressure, or recorded, like age, sex etc • Quantitative Variables:- the is one that can be measured in the usual sense/number. example heights of adult males, the weights of preschool children, and the ages of patients seen in a dental clinic. • Qualitative Variables:- Some characteristics are not capable of being measured in the sense that height, weight, and age are measured. Example sex of an individual , ethnicity of an individual , religion of individual. 10/4/2023 By Degemu S (MPH) 10
  • 11. Introduction…. • Population :- the largest collection of entities for which we have an interest at a particular time usually people.. • Sample:- A sample may be defined simply as a part of a population 10/4/2023 By Degemu S (MPH) 11
  • 12. Scale of measurement • Measurement:- is the assignment of numbers to objects or events according to a set of rules. • Scale of measurement concerned with the nature of the numbers that result from measurements. • Measurements can be qualitative(categorical or quantitative) • Although the types of variables could be broadly divided into categorical (qualitative) and quantitative, it has been a common practice to see four basic types of data (scales of measurement) 10/4/2023 By Degemu S (MPH) 12
  • 13. Count • Most basic measure of disease frequency is a simple count of affected individuals. • Example: • 350,000 cases of polio • 350,000 cases of polio in 1988 • 350,000 cases of polio in 1988 in 125 countries 10/4/2023 By Degemu S (MPH) 13
  • 14. Ratio, proportion and rate 10/4/2023 By Degemu S (MPH) 14
  • 15. Ratio • The quotient of 2 numbers • Numerator NOT INCLUDED in the denominator • No relationship necessary between the numerator and denominator • May be expressed as a/b or a:b 10/4/2023 By Degemu S (MPH) 15
  • 17. When the ratio used? • Sex ratio: Male to female • Number of health facilities per population • Number of participants in the course per facilitator • Number of inhabitants per latrine • Odds ratio • Relative risk • Prevalence ratio • Maternal mortality ratio 10/4/2023 By Degemu S (MPH) 17
  • 18. Ratio • Example 1 • A university has 4000 male students and 2000 female students. The ratio of male to female students is: • 4000/2000 = 2/1 or 2:1 • For every 2 male students there is one female student 10/4/2023 By Degemu S (MPH) 18
  • 19. Ratio • Example 2 • A foodborne epidemic occurred in an elementary school canteen. The attack rate in the first grade was 24% while the attack rate in the second grade was 16%. Compare these two attack rates. • 24/16 = 3/2 or 3:2 • For every 3 first graders who fell ill, there were 2-second graders who also fell ill. 10/4/2023 By Degemu S (MPH) 19
  • 20. Ratio Example 3, A city of 4 million people has 400 clinics. Calculate the ratio of clinics per person. Ratio = 400 / 4,000,000 = 0.0001 clinics / person Multiply by 104 Ratio = 0.0001 x 104 = 1 clinic / 10,000 persons 10/4/2023 By Degemu S (MPH) 20
  • 21. Proportion • The quotient of 2 numbers • Numerator is a sub-group of the population in the denominator • Numerator is always INCLUDED in the denominator • Proportion ranges between 0 and 1 • Percentage = proportion x 100 10/4/2023 By Degemu S (MPH) 21
  • 22. What is the proportion of cases? 50% 100 0.5 total 4 cases 2    + + + + - - 10/4/2023 By Degemu S (MPH) 22
  • 23. When is a proportion used? • Proportion of samples positive for P. Falciparum • 1000 samples, 236 positive • Proportion of positive samples = 236/1000 = 0.236 • Parentage of positive samples = 0.236 x 100 = 23. 6% • Proportion of malaria deaths • 123 malaria cases, 7 deaths • Proportion of malaria deaths = 7/123 = 0.057 • Percentage of malaria deaths = 0.057 x 100 = 5.7% 10/4/2023 By Degemu S (MPH) 23
  • 24. Proportion • Example 1 • A university has 4000 male students and 2000 female students. Calculate the proportion of male and female students. • Male: 4000/6000 x 100% = 66.7% • Female: 2000/6000 x 100% = 33.3% 10/4/2023 By Degemu S (MPH) 24
  • 25. Proportion Example 2 40 children are currently ill with the measles, 80 children all together have had the measles • 40 / 80 = .50 (proportion) • 40 / 80 = .50 * 100 = 50% (percentage) 10/4/2023 By Degemu S (MPH) 25
  • 26.
  • 27. Rate • The quotient of 2 numbers • Measures the probability of occurrence of an event over TIME • Numerator: number of EVENTS • Denominator: POPULATION at risk for the event in numerator observed for a given TIME 10/4/2023 By Degemu S (MPH) 27
  • 28. What is the rate of death?  year per 100 2 Observed in one year 2 deaths per 100 population per year 10/4/2023 By Degemu S (MPH) 28
  • 29. When is a rate used? • Morbidity rates • Attack rates • Prevalence rates • Incidence rates • Mortality rates • Natality rates 10/4/2023 By Degemu S (MPH) 29
  • 30. Rate Example 1 • Mortality rate of tetanus in France in 1995 • Tetanus deaths: 17 • Population in 1995: 58 million • Time period: 1 year • Mortality rate = 0.029 per 100,000 population per year • Rate may be expressed in any power of 10 • 100, 1,000, 10,000, 100,000 • Rate must include an aspect of time • Per year, per month, per day 10/4/2023 By Degemu S (MPH) 30
  • 31. Rate Example 2 Continent Rate Africa 273000 Asia 217000 Europe 2000 Latin America/Caribbean 22000 South America 15000 North America 490 Australia/New Zealand 25 Maternal Mortality for Various Continents (1995)
  • 32. Summary 14 Is numerator included in denominator? Yes No Is time included in denominator? Yes No Measure: Rate Proportion Ratio W hat istheMeasureof F requency ? 10/4/2023 By Degemu S (MPH) 32
  • 33. Nominal Data  As the name implies data that represent mutually exclusive categories which do not have natural order/rank  There is no implied order /rank to the categories of nominal data.  Individuals simply placed in the proper category or group  Each item must fit into exactly one category.  “The category can be assigned by numbers, names or symbols Sex 1. Male 2. Female Marital status 1. Single 2. Married 3. Divorced 4. Widow Outcome of patient after care accident 1. Alive 2. Dead Blood group 1. A 2. B 3. O 4. AB 10/4/2023 By Degemu S (MPH) 33
  • 34. Ordinal data  The data representing mutually exclusive categories with ranked order is called ordinal data.  The spaces or intervals between the categories are not necessarily equal.  The function of numbers assigned to ordinal data is to order (or rank) the observations from lowest to highest and, hence, the term ordinal. Example job satisfaction index 1. Strongly Disagree 2. Disagree 3. Neutral 4. Agree 5. Strongly agree Class room rank 1. first 2. Second 3. Third degree of burn 1. first degree 2. Second degree 3. Third degree 4. Fourth degree progressive health status of patient after admission 1. Unimproved 2. Improved 3. Much improved Pain level: 1. None 2. Mild 3. Moderate 4. Severe 10/4/2023 By Degemu S (MPH) 34
  • 35. Interval data  It is truly quantitative data  The intervals between measured values are the same  Distance between any two measurements is known and the same.  The unit of distance and a zero point, both of which are arbitrary.  Zero may has no true meaning i.e. may not indicate a total absence of the quantity being measured.  The ratio between two measurements have no meaning . For example 40 degrees Fahrenheit is not twice as much as 20 degrees Fahrenheit • Example  Temperature scale degrees Fahrenheit or Celsius.  In this case the unit of measurement is the degree, and the point of comparison is the arbitrarily chosen "zero degrees,” which do not indicate a lack of heat.  IQ 10/4/2023 By Degemu S (MPH) 35
  • 36. Ratio data  The highest level of measurement is the ratio scale.  This scale is characterized by the fact that equality of ratios well as equality of intervals may be determined.  Fundamental to the ratio scale is a true zero point. • Example  height  weight  Length  Age  Cholesterol level  Serum sugar level  the number of TB patient flow to the hospital 10/4/2023 By Degemu S (MPH) 36
  • 37. Scale of Measurement Nominal scale Ordinal scale Interval scale Ratio scale Degree of precision in measuring 10/4/2023 By Degemu S (MPH) 37
  • 38. Numerical Discrete and Numerical Continuous Data  Both interval and ratio data involve measurement.  Most data analysis techniques that apply to ratio data also apply to interval data.  In most practical aspects both interval and ratio data can be classified as numerical discrete and numerical continuous. 10/4/2023 By Degemu S (MPH) 38
  • 39. Numerical Discrete  For discrete data, both ordering and magnitude are important.  the numbers represent actual measurable quantities rather than mere labels.  discrete data are restricted to taking on only specified values—often integers or counts—that differ by fixed amounts  no intermediate values are possible. • Example  The number of bacteria colonies on a plate  The number of cells within a prescribed area upon microscopic examination  The number of heartbeats within a specified time interval  The number of times a woman has given birth gravidity  The number of episodes of illness a patient experiences during some time period  number of motor vehicle accidents  The number of beds available in a particular hospital.  Etc…. 10/4/2023 By Degemu S (MPH) 39
  • 40. Numerical continuous  The scale with the greatest degree of quantification is a numerical continuous scale.  Each observation theoretically falls somewhere along a continuum.  One is not restricted, in principle, to particular values such as the integers of the discrete scale.  The restricting factor is the degree of accuracy of the measuring instrument • Example  most clinical measurements, such as  blood pressure  serum cholesterol level  Height  weight  age etc. are on a numerical continuous scale. 10/4/2023 By Degemu S (MPH) 40
  • 41. Categorizing Variables-Exercise 10/4/2023 By Degemu S (MPH) 41 1. Year of birth: numerical 2. Marital status of women: Nominal 3. Identification number study participant: numerical 4. Class rank:ordinal 5. Length of infants at ANC clinic:numerical
  • 42. Discrete or Continuous? 10/4/2023 By Degemu S (MPH) 42
  • 44. 10/4/2023 By Degemu S (MPH) 44 Inferential Statistics
  • 45. 10/4/2023 By Degemu S (MPH) 45
  • 46. Probability And Probability Distributions  The central idea of statistical designs for producing data.  Probabilities are used in everyday communication  A patient has a 50 – 50 chance of surviving a certain operation  The chance of a 30 year old woman to celebrate her 70th birthday is 30%  These examples suggest the chance of an occurrence of some event of a random variable.  Probability theory was developed out of attempting to solve problems related to games of chance such as tossing a coin, rolling a die etc. i.e. trying to quantify personal beliefs regarding degrees of uncertainty 10/4/2023 By Degemu S (MPH) 46
  • 47. Probability And Probability distribution….. • Probabilities and probability distributions are extensions of the ideas of relative frequency and histograms, respectively.  Relative frequency probability: If some process is repeated a large number of n times, and some resulting event E occurs m times, the relative frequency of E will be approximately equal to m/n.  Symbolically: Pr (E) = m/n  E.g. Suppose that of 158 people who attended a dinner party, 99 were ill due to food poisoning. The probability of illness for a person selected at random is Pr (illness) = 99/158 = 0.63 or 63%. 10/4/2023 By Degemu S (MPH) 47
  • 48. Probability And Probability distribution….. • Results are not certain, uncertainty is high •To evaluate how accurate our results are: –Given how our data were collected, are our results accurate? –Given the level of accuracy needed, how many observations need to be collected? –The sample size issue? 10/4/2023 By Degemu S (MPH) 48
  • 49. Probability And Probability distribution….. • When dealing with a process that has an uncertain outcome –Birth of male or female child? –Tossing a coin? –A patient taking a certain drug(cure/no)? –The fate of the patient? 10/4/2023 By Degemu S (MPH) 49
  • 50. Probability And Probability distribution….. • Experiment=any process with an uncertain outcome. • An experiment is a trial and all possible outcomes are events Event=some thing that may happen or not when the experiment is performed (either occur or not) • Events are represented by upper case letters such as A,B,C,etc 10/4/2023 By Degemu S (MPH) 50
  • 51. Probability And Probability distribution….. •Probability = can be defined as the number of times in which that event occurs in a very large number of trials. • Probability of an Event E a number between 0 and 1 representing the proportion of times that event E is expected to happen when the experiment is done over and over again under the same conditions. 10/4/2023 By Degemu S (MPH) 51
  • 52. Probability And Probability Distributions….. • Any event can be expressed as a subset of the set of all possible outcomes(sample space=S) • S = set of all possible outcomes P(S) = 1 • An event is any set of outcomes of interest. 10/4/2023 By Degemu S (MPH) 52
  • 53. Why Probability in Medicine • Because medicine is an in exact science, physicians seldom predict an outcome with absolute certainty. •E.g. to formulate a diagnosis, physician must rely on available diagnostic information about a patient –History and physical examination –Laboratory investigation-ray findings, ECG, etc. • Although no test result is absolutely accurate , it does affect the probability of the presence(or absence) of a disease. –Sensitivity and specificity • An understanding of probability is fundamental for quantifying the uncertainty that is inherent in the decision-making process. 10/4/2023 By Degemu S (MPH) 53
  • 54. cont.… • Probability theory is a foundation for statistical inference. • Allows us to draw conclusions about a population of patients based on information obtained from a sample of patients drawn from that population. • Probability used to:- • About probability distributions: Binomial, Poisson, and Normal Distributions • Sampling and sampling distributions • Estimation • Hypothesis testing • Advanced statistical analysis 10/4/2023 By Degemu S (MPH) 54
  • 55. Categories of Probability • Objective and Subjective Probabilities. • Objective probability 1) Classical probability 2) Relative frequency probability 1. Classical Probability : • Is based on gambling ideas •Rolling a die –There are 6 possible outcomes: • Total ways = {1, 2, 3, 4, 5, 6}. • Each is equally likely to occur –P(i) = 1/6, i=1,2,...,6. P(1) = 1/6 P(2) = 1/6 , P(6) = 1/6 • SUM = 1 10/4/2023 By Degemu S (MPH) 55
  • 56. Classical Probability • Definition: If an event can occur in N mutually exclusive and equally likely ways, and if m of these posses a characteristic , E , the probability of the occurrence of E=m/N. • P(E)= the probability of E = m/N P(E)= the probability of E = m/N • If we toss a die, What is the probability of 4 coming up? • m=1(which is 4) and N=6 • The probability of 4 coming up is 1/6. 10/4/2023 By Degemu S (MPH) 56
  • 57. Classical Probability • Another “equally likely” setting is the tossing of a coin – –There are 2 possible outcomes in the set of all possible outcomes –{H, T}. P(H) = 0.5 P(H) = 0.5 P(T) = 0.5 SUM = 1.0 Relative Frequency Probability •In the long run process….. •The proportion of times the event A occurs in a large number of trials repeated under essentially identical conditions. 10/4/2023 By Degemu S (MPH) 57
  • 58. Relative Frequency Probability • Definition: If a process is repeated a large number of times(n), and if an event with the characteristic E occurs m times, the relative frequency of E. • Probability of E = P(E) = m/n. • If you toss a coin 100 times and the head comes up 40 times, • P(H)=40/100=0.4 • If we toss a coin 10,000 times and the head comes up 5562, the head comes up 5562 • P(H)=0.5562. •Therefore, the longer the series and the longer the sample size, the closer the estimate to the true value (0.5). 10/4/2023 By Degemu S (MPH) 58
  • 59. Subjective Probability • Personalistic (An opinion or judgment by a decision maker about the likelihood of an event). •Personal assessment of which is more effective to provide a cure–traditional/modern •Personal assessment of which sports team will win a match. •Also uses classical and relative frequency methods to assess the likelihood of an event, but does not rely on the repeatability of any process. E.g., If someone says that he/she is 90% certain that a cure for AIDS will be discovered within 5 years, then it means that: P (discovery of a cure for AIDS within 5 years) P (discovery of a cure for AIDS within 5 years) = 90% = 0.90 10/4/2023 By Degemu S (MPH) 59
  • 60. Mutually Exclusive Events • Two events A and B are mutually exclusive if they cannot both happen at the same time . • P (A n B) = 0 • If E1 occur , then E2 cannot occur • E1 and E2 have no common element E1 E2 YELLOW CARD BLACK CARD A card cannot black and yellow at the same time 10/4/2023 By Degemu S (MPH) 60
  • 61. Mutually Exclusive Events • Example: –A coin toss cannot produce heads and tails simultaneously. –Weight of an individual can’t be classified simultaneously as“ underweight ”, “normal ”,“ overweight” “normal” ,“overweight” –Blood pressure reading: A=(DBP<90)and B=(90>DBP<95),can’t occur at the same time. Independent Events. •Two events A and B are independent if the probability of the first one happening is the same no matter how these condone turns out. •The outcome of one event has no effect on the occurrence or non-occurrence of the other. • non-occurrence of the other. P(A u B) = P(A) x P(B) (Independent events) • Example: –The outcomes on the first and second coin tosses are independent 10/4/2023 By Degemu S (MPH) 61
  • 62. Dependent event • Occurrence of one affects the probability of the other • P(A n B) ≠ P(A) x P(B) •Example: Consider the DBP measurements from a mother and her first- born child. Let: from a mother and her first-born child. Let: A = {mother’s DBP≥95} and B = {first-born child’s DBP≥80} •Suppose P{A n B} = 0.05 P{A} = 0.1 P{B} = 0.2 Then P{AB} = 0.05 > P{A} x P{B} = 0.02 And Events A, B would be dependent. 10/4/2023 By Degemu S (MPH) 62
  • 63. Dependent event E1= rain forecasted on news E2=take umbrella to work Probability of the second event affected by occurrence of the first event Intersection, and union • The intersection of two events A and B, A n B, is the event that A and B happen simultaneously. P(A and B)=P(An B) •Let A represent the event that a randomly selected new born is LBW, and B the event that he or she is from a multiple birth •The intersection of A and B is the event that the infant is both LBW and from a multiple birth. 10/4/2023 By Degemu S (MPH) 63
  • 64. Intersection, and union • The union of A and B , AUB, is the event that either A happens or B happens or they both happen simultaneously • P(A or B)=P(AUB) • Here , the union of A and B is the event that the new born is either LBW or from a multiple birth,or both 10/4/2023 By Degemu S (MPH) 64
  • 65. Properties of Probability 1. The numerical value of a probability always lies between 0 and 1, inclusive. 0  P(E)  1  A value 0 means the event can not occur  A value 1 means the event definitely will occur  A value of 0.5 means that the probability that the event will occur is the same as the probability that it will not occur. 10/4/2023 By Degemu S (MPH) 65
  • 66. 2. The sum of the probabilities of all mutually exclusive outcomes is equal to 1. P(E1) + P(E2 ) + .... + P(En ) = 1. 3. For two mutually exclusive events A and B, P(A or B ) = P(AUB)= P(A) + P(B). If not mutually exclusive: P(A or B) = P(A) + P(B) - P(A and B) 10/4/2023 By Degemu S (MPH) 66
  • 67. 4. The complement of an event A, denoted by Ā or Ac, is the event that A does not occur • Consists of all the outcomes in which event A does NOT occur P(Ā) = P(not A) = 1 – P(A) • Ā occurs only when A does not occur. • These are complementary events. 10/4/2023 By Degemu S (MPH) 67
  • 68. • In the example, the complement of A is the event that a newborn is not LBW • In other words, A is the event that the child weighs 2500 grams at birth. P(Ā) = 1 − P(A) P(not low bwt) = 1 − P(low bwt) = 1− 0.076 = 0.924 10/4/2023 By Degemu S (MPH) 68
  • 69. Basic Probability Rules 1. Addition rule  If events A and B are mutually exclusive: P(A or B) = P(A) + P(B) P(A and B) = 0  More generally: P(A or B) = P(A) + P(B) - P(A and B) P(event A or event B occurs or they both occur) 10/4/2023 By Degemu S (MPH) 69
  • 70. Example: The probabilities below represent years of schooling completed by mothers of newborn infants 10/4/2023 By Degemu S (MPH) 70
  • 71. • What is the probability that a mother has completed < 12 years of schooling? P( 8 years) = 0.056 and P(9-11 years) = 0.159 • Since these two events are mutually exclusive, P( 8 or 9-11) = P( 8 U 9-11) = P( 8) + P(9-11) = 0.056+0.159 = 0.215 10/4/2023 By Degemu S (MPH) 71
  • 72. • What is the probability that a mother has completed 12 or more years of schooling? P(12) = P(12 or 13-15 or 16) = P(12 U 13-15 U 16) = P(12)+P(13-15)+P(16) = 0.321+0.218+0.230 = 0.769 10/4/2023 By Degemu S (MPH) 72
  • 73. If A and B are not mutually exclusive events, then subtract the overlapping: P(AU B) = P(A)+P(B) − P(A ∩ B) 10/4/2023 By Degemu S (MPH) 73
  • 74. • The following data are the results of electrocardiograms (ECGs) and radionuclide angiocardiograms (RAs) for 19 patients with post-traumatic myocardial confusions. • 7 patients developed both ECG and RA abnormality • 17 patients developed ECG abnormal • 9 patients developed RA abnormal P(ECG abnormal and RA abnormal) = 7/19 = 0.37 P(ECG abnormal or RA abnormal) = P(ECG abnormal) + P(RA abnormal) – P(Both ECG and RA abnormal) = 17/19 + 9/19 – 7/19 = 19/19 =1. Note: The problem is that the 7 patients whose ECGs and RAs are both abnormal are counted twice 10/4/2023 By Degemu S (MPH) 74
  • 75. 2. Multiplication rule • If A and B are independent events, then P(A ∩ B) = P(A) × P(B) • More generally, P(A ∩ B) = P(A) P(B|A) = P(B) P(A|B) P(A and B) denotes the probability that A and B both occur at the same time. 10/4/2023 By Degemu S (MPH) 75
  • 76. Conditional Probability • Refers to the probability of an event, given that another event is known to have occurred. • “What happened first is assumed” • Hint - When thinking about conditional probabilities, think in stages. Think of the two events A and B occurring chronologically, one after the other, either in time or space. 10/4/2023 By Degemu S (MPH) 76
  • 77. • The conditional probability that event B has occurred given that event A has already occurred is denoted P(B|A) and is defined provided that P(A) ≠ 0. 10/4/2023 By Degemu S (MPH) 77
  • 78. Example: A study investigating the effect of prolonged exposure to bright light on retina damage in premature infants. Retinopathy YES Retinopathy NO TOTAL Bright light Reduced light 18 21 3 18 21 39 TOTAL 39 21 60 10/4/2023 By Degemu S (MPH) 78
  • 79. • The probability of developing retinopathy is: P (Retinopathy) = No. of infants with retinopathy Total No. of infants = (18+21)/(21+39) = 0.65 10/4/2023 By Degemu S (MPH) 79
  • 80. • We want to compare the probability of retinopathy, given that the infant was exposed to bright light, with that the infant was exposed to reduced light. • Exposure to bright light and exposure to reduced light is conditioning events, events we want to take into account when calculating conditional probabilities. 10/4/2023 By Degemu S (MPH) 80
  • 81. • The conditional probability of retinopathy, given exposure to bright light, is: • P(Retinopathy/exposure to bright light) = No. of infants with retinopathy exposed to bright light No. of infants exposed to bright light = 18/21 = 0.86 10/4/2023 By Degemu S (MPH) 81
  • 82. • P(Retinopathy/exposure to reduced light) = # of infants with retinopathy exposed to reduced light No. of infants exposed to reduced light = 21/39 = 0.54 • The conditional probabilities suggest that premature infants exposed to bright light have a higher risk of retinopathy than premature infants exposed to reduced light. 10/4/2023 By Degemu S (MPH) 82
  • 83.  For independent events A and B P(A/B) = P(A).  For non-independent events A and B P(A and B) = P(A/B) P(B) (General Multiplication Rule) 10/4/2023 By Degemu S (MPH) 83
  • 84. Test for Independence • Two events A and B are independent if: P(B|A)=P(B) or P(A and B) = P(A) • P(B) • Two events A and B are dependent if P(B|A) ≠P(B) or P(A and B) ≠P(A) • P(B) 10/4/2023 By Degemu S (MPH) 84
  • 85. Example • In a study of optic-nerve degeneration in Alzheimer’s disease, postmortem examinations were conducted on 10 Alzheimer’s patients. The following table shows the distribution of these patients according to sex and evidence of optic-nerve degeneration. • Are the events “patients has optic-nerve degeneration” and “patient is female” independent for this sample of 10 patients? 10/4/2023 By Degemu S (MPH) 85
  • 86. Sex Optic-nerve Degeneration Present Not Present Female 4 1 Male 4 1 10/4/2023 By Degemu S (MPH) 86
  • 87. Solution • P(Optic-nerve degeneration/Female) = No. of females with optic-nerve degeneration No. of females = 4/5 = 0.80 P(Optic-nerve degeneration) = No patients with optic-nerve degeneration Total No. of patients = 8/10 = 0.80 The events are independent for this sample. 10/4/2023 By Degemu S (MPH) 87
  • 88. Exercise: Culture and Gonodectin (GD) test results for 240 Urethral Discharge Specimens. GD Test Result Culture Result Gonorrhea No Gonorrhea Total Positive 175 9 184 Negative 8 48 56 Total 183 57 240 10/4/2023 By Degemu S (MPH) 88
  • 89. 1. What is the probability that a man has gonorrhea?183/240 2. What is the probability that a man has a positive GD test?184/240 3. What is the probability that a man has a positive GD test and gonorrhea?,175/240, 4. What is the probability that a man has a negative GD test and does not have gonorrhea? =48/240 5. What is the probability that a man with gonorrhea has a positive GD test?175/183 10/4/2023 By Degemu S (MPH) 89
  • 90. 6. What is the probability that a man does not have gonorrhea has a negative GD test?48/57 7. What is the probability that a man does not have gonorrhea has a positive GD test?9/57 8. What is the probability that a man with positive GD test has gonorrhea? 175/184 10/4/2023 By Degemu S (MPH) 90
  • 91. Probability Distributions • A probability distribution is a device used to describe the behavior that a random variable may have by applying the theory of probability. • It is the way data are distributed, in order to draw conclusions about a set of data • Random Variable = Any quantity or characteristic that is able to assume a number of different values such that any particular outcome is determined by chance. 10/4/2023 By Degemu S (MPH) 91
  • 92. • Random variables can be either discrete or continuous • A discrete random variable is able to assume only a finite or countable number of outcomes. • A continuous random variable can take on any value in a specified interval. 10/4/2023 By Degemu S (MPH) 92
  • 93. • With categorical variables, we obtain the frequency distribution of each variable • With numeric variables, the aim is to determine whether or not normality may be assumed • If not we may consider transforming the variable or categorize it for analysis (e g age group) 10/4/2023 By Degemu S (MPH) 93
  • 94. Therefore, the probability distribution of a random variable is a table, graph, or mathematical formula that gives the probabilities with which the random variable takes different values or ranges of values. 10/4/2023 By Degemu S (MPH) 94
  • 95. A. Discrete Probability Distributions • For a discrete random variable, the probability distribution specifies each of the possible outcomes of the random variable along with the probability that each will occur. • Examples can be: • Frequency distribution • Relative frequency distribution • Cumulative frequency 10/4/2023 By Degemu S (MPH) 95
  • 96. • We represent a potential outcome of the random variable X by x 0 ≤ P(X = x) ≤ 1 ∑ P(X = x) = 1 10/4/2023 By Degemu S (MPH) 96
  • 97. The following data shows the number of diagnostic services a patient receives 10/4/2023 By Degemu S (MPH) 97
  • 98. • What is the probability that a patient receives exactly 3 diagnostic services? P(X=3) = 0.031 • What is the probability that a patient receives at most one diagnostic service? P (X≤1) = P(X = 0) + P(X = 1) = 0.671 + 0.229 = 0.900 10/4/2023 By Degemu S (MPH) 98
  • 99. • What is the probability that a patient receives at least four diagnostic services? P (X≥4) = P(X = 4) + P(X = 5) = 0.010 + 0.006 = 0.016 10/4/2023 By Degemu S (MPH) 99
  • 100. Probability distributions can also be displayed using a graph 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 1 2 3 4 5 No. of diagnostic services, x Probability, X=x 10/4/2023 By Degemu S (MPH) 100
  • 101. The Expected Value of a Discrete Random variable • If a random variable is able to take on a large number of values, then a probability mass function might not be the most useful way to summarize its behavior • Instead, measures of location and dispersion can be calculated (as long as the data are not categorical) 10/4/2023 By Degemu S (MPH) 101
  • 102. • The average value assumed by a random variable is called its expected value, or the population mean • It is represented by E(X) or µ • To obtain the expected value of a discrete random variable X, we multiply each possible outcome by its associated probability and sum all values with a probability greater than 0 10/4/2023 By Degemu S (MPH) 102
  • 103. • For the diagnostic service data: Mean (X) = 0(0.671) +1(0.229) +2(0.053) +3(0.031) +4(0.010) +5(0.006) = 0.498 ≈ 0.5 • We would expect an average of 0.5 services for each visit 10/4/2023 By Degemu S (MPH) 103
  • 104. • The variance of a random variable X is called the population variance(standard deviation ) and is represented by Var(X) or 2 • It quantifies the dispersion of the possible outcomes of X around the expected value μ The Variance of a Discrete Random Variable 10/4/2023 By Degemu S (MPH) 104
  • 105. σ2 = ∑(xi-µ)2P(X=xi) = (0− 0.5)2(0.671) +(1 − 0.5)2(0.229) +(2 − 0.5)2(0.053) +(3 − 0.5)2(0.031) +(4 − 0.5)2(0.010) +(5 − 0.5)2(0.006) = 0.782 Standard deviation = σ = √0.782 = 0.884 10/4/2023 By Degemu S (MPH) 105
  • 106. 10/4/2023 By Degemu S (MPH) 106 Binomial and Poisson Distribution
  • 107. 1. Binomial Distribution • It is one of the most widely encountered discrete probability distributions. • Consider a dichotomous (binary) random variable • Is based on the Bernoulli trial • When a single trial of an experiment can result in only one of two mutually exclusive outcomes (success or failure; dead or alive; sick or well, male or female) 10/4/2023 By Degemu S (MPH) 107
  • 108. Example: • We are interested in determining whether a newborn infant will survive until his/her 70th birthday • Let Y represent the survival status of the child at age 70 years • Y = 1 if the child survives and Y = 0 if he/she does not 10/4/2023 By Degemu S (MPH) 108
  • 109. •The outcomes are mutually exclusive and exhaustive •Suppose that 72% of infants born survive to age 70 years P(Y = 1) = p = 0.72 P(Y = 0) = 1 − p = 0.28 10/4/2023 By Degemu S (MPH) 109
  • 110. 10/4/2023 By Degemu S (MPH) 110
  • 111. A binomial probability distribution occurs when the following requirements are met. 1. The procedure has a fixed number of trials. 2. The trials must be independent. 3. Each trial must have all outcomes that fall into two categories. 4. The probabilities must remain constant for each trial [P(success) = p]. 10/4/2023 By Degemu S (MPH) 111
  • 112. Characteristics of a Binomial Distribution • The experiment consists of n identical trials. • Only two possible outcomes on each trial. • The probability of A (success), denoted by p, remains the same from trial to trial. The probability of B (failure), denoted by q, q = 1- p. • The trials are independent. • n and  are the parameters of the binomial distribution. • The mean is n and the variance is n(1- ) 10/4/2023 By Degemu S (MPH) 112
  • 113. • Suppose an event can have only binary outcomes A and B. • Let the probability of A is  and that of B is 1 - . • The probability  stays the same each time the event occurs. 10/4/2023 By Degemu S (MPH) 113
  • 114. • If an experiment is repeated n times and the outcome is independent from one trial to another, the probability that outcome A occurs exactly x times is: • P (X=x) = , x = 0, 1, 2, ..., n. = 10/4/2023 By Degemu S (MPH) 114
  • 115. • n denotes the number of fixed trials • x denotes the number of successes in the n trials • p denotes the probability of success • q denotes the probability of failure (1- p) = • Represents the number of ways of selecting x objects out of n where the order of selection does not matter. • where n!=n(n-1)(n-2)…(1) , and 0!=1 10/4/2023 By Degemu S (MPH) 115
  • 116. Example: • Suppose we know that 40% of a certain population are cigarette smokers. If we take a random sample of 10 people from this population, what is the probability that we will have exactly 4 smokers in our sample? 10/4/2023 By Degemu S (MPH) 116
  • 117. • If the probability that any individual in the population is a smoker to be P=.40, then the probability that x=4 smokers out of n=10 subjects selected is: P(X=4) =10C4(0.4)4 (1-0.4)10-4 = 10C4(0.4)4 (0.6)6 = 210(.0256)(.04666) = 0.25 • The probability of obtaining exactly 4 smokers in the sample is about 0.25. 10/4/2023 By Degemu S (MPH) 117
  • 118. • We can compute the probability of observing zero smokers out of 10 subjects selected at random, exactly 1 smoker, and so on, and display the results in a table, as given, below. • The third column, P(X ≤ x), gives the cumulative probability. E.g. the probability of selecting 3 or fewer smokers into the sample of 10 subjects is P(X ≤ 3) =.3823, or about 38%. 10/4/2023 By Degemu S (MPH) 118
  • 119. 10/4/2023 By Degemu S (MPH) 119
  • 120. The probability in the above table can be converted into the following graph 0 0.05 0.1 0.15 0.2 0.25 0.3 0 1 2 3 4 5 6 7 8 9 10 No. of Smokers Probability 10/4/2023 By Degemu S (MPH) 120
  • 121. Exercise Each child born to a particular set of parents has a probability of 0.25 of having blood type O. If these parents have 5 children. What is the probability that a. Exactly two of them have blood type O b. At most 2 have blood type O c. At least 4 have blood type O d. 2 do not have blood type O. 10/4/2023 By Degemu S (MPH) 121
  • 122. Solution for ‘a’ a.) 2637 . 0 ) 75 . 0 ( ) 25 . 0 ( 2 5 = 2) P(x 2 - 5 2         10/4/2023 By Degemu S (MPH) 122
  • 123. The Mean and Variance of a Binomial Distribution • Once n and P are specified, we can compute the proportion of success, P = x/n • and the mean and variance of the distribution are given by : E(X) = μ = np, σ2 = npq, σ = √npq 10/4/2023 By Degemu S (MPH) 123
  • 124. Example: • 70% of a certain population has been immunized for polio. If a sample of size 50 is taken, what is the “expected total number”, in the sample who have been immunized? µ = np = 50(.70) = 35 • This tells us that “on the average” we expect to see 35 immunized subjects in a sample of 50 from this population. 10/4/2023 By Degemu S (MPH) 124
  • 125. • If repeated samples of size 10 are selected from the population of infants born, the mean number of children per sample who survive to age 70 would be µ = np = (10)(0.72) = 7.2 • The variance would be npq = (10)(0.72)(0.28) = 2.02 and the SD would be √2.02 = 1.42 10/4/2023 By Degemu S (MPH) 125
  • 126. 2. The Poisson Distribution • Is a discrete probability distribution used to model the number of occurrences of an event that takes place infrequently in time or space • Applicable for counts of events over a given interval of time, for example: • number of patients arriving at an emergency department in a day • number of new cases of HIV diagnosed at a clinic in a month 10/4/2023 By Degemu S (MPH) 126
  • 127. • In such cases, we take a sample of days and observe the number of patients arriving at the emergency department on each day, • or a sample of months and observe the number of new cases of HIV diagnosed at the clinic. • We are observing a count or number of events, rather than a yes/no or success/ failure outcome for each subject or trial, as in the binomial. 10/4/2023 By Degemu S (MPH) 127
  • 128. • In theory, a random variable X is a count that can assume any integer value greater than or equal to 0 10/4/2023 By Degemu S (MPH) 128
  • 129. • Suppose events happen randomly and independently in time at a constant rate. If events happen with rate  events per unit time, the probability of x events happening in unit time is: P(x) = e x! x    10/4/2023 By Degemu S (MPH) 129
  • 130. • where x = 0, 1, 2, . . .∞ • x is a potential outcome of X • The constant λ (lambda) represents the rate at which the event occurs, or the expected number of events per unit time • e = 2.71828 • It depends up on just one parameter, which is the µ number of occurrences (λ). 10/4/2023 By Degemu S (MPH) 130
  • 131. • Three assumptions must be met for a Poisson distribution to apply: 1. The probability that a single event occurs within a given small subinterval is proportional to the length of the subinterval P(event) ≈ λΔt for constant λ 2. The rate at which the event occurs is constant over the entire interval t 3. Events occurring in consecutive subintervals are independent of each other 10/4/2023 By Degemu S (MPH) 131
  • 132. Example • The daily number of new registrations of cancer is 2.2 on average. What is the probability of a) Getting no new cases b) Getting 1 case c) Getting 2 cases d) Getting 3 cases e) Getting 4 cases 10/4/2023 By Degemu S (MPH) 132
  • 133. Solutions a) b) P(X=1) = 0.244 c) P(X=2) = 0.268 d) P(X=3) = 0.197 e) P(X=4) = 0.108 111 . 0 ! 0 ) 2 . 2 ( ) 0 ( 2 . 2 0     e X P 10/4/2023 By Degemu S (MPH) 133
  • 134. 0 1 2 3 4 5 6 7 0.3 0.2 0.1 0.0 Probability Poisson distribution with mean 2.2 10/4/2023 By Degemu S (MPH) 134
  • 135. Example: • In a given geographical area, cases of tetanus are reported at a rate of λ = 4.5/month • What is the probability that 0 cases of tetanus will be reported in a given month? 10/4/2023 By Degemu S (MPH) 135
  • 136. • What is the probability that 1 case of tetanus will be reported? 10/4/2023 By Degemu S (MPH) 136
  • 137. Characteristics • The Poisson distribution is very asymmetric when its mean is small • With large means it becomes nearly symmetric • It has no theoretical maximum value, but the probabilities tail off towards zero very quickly •  is the parameter of the Poisson distribution • The mean is  and the variance is also . 10/4/2023 By Degemu S (MPH) 137
  • 138. B. Continuous Probability Distributions • A continuous random variable X can take on any value in a specified interval or range • With a large number of class intervals, the frequency polygon begins to resemble a smooth curve. • The probability distribution of X is represented by a smooth curve called a probability density function 10/4/2023 By Degemu S (MPH) 138
  • 139. • The area under the smooth curve is equal to 1 • The area under the curve between any two points x1 and x2 is the probability that X takes a value between x1 and x2 Distribution of serum triglyceride 10/4/2023 By Degemu S (MPH) 139
  • 140. • Instead of assigning probabilities to specific outcomes of the random variable X, probabilities are assigned to ranges of values • The probability associated with any one particular value is equal to 0 • Therefore, P(X=x) = 0 • Also, P(X ≥ x) = P(X > x) 10/4/2023 By Degemu S (MPH) 140
  • 141. • We calculate: Pr [ a < X < b], the probability of an interval of values of X. • For the above reason, • is also without meaning. 10/4/2023 By Degemu S (MPH) 141
  • 142. The Normal distribution • The ND is the most important probability distribution in statistics • Frequently called the “Gaussian distribution” or bell-shape curve. • Variables such as blood pressure, weight, height, serum cholesterol level, and IQ score — are approximately normally distributed 10/4/2023 By Degemu S (MPH) 142
  • 143. A random variable is said to have a normal distribution if it has a probability distribution that is symmetric and bell-shaped 10/4/2023 By Degemu S (MPH) 143
  • 144. • The ND is vital to statistical work, most estimation procedures and hypothesis tests underlie ND • The concept of “probability of X=x” in the discrete probability distribution is replaced by the “probability density function f(x). • The ND is also an approximating distribution to other distributions (e.g., binomial) 10/4/2023 By Degemu S (MPH) 144
  • 145. • A random variable X is said to follow ND, if and only if, its probability density function is: , - < x < . f(x) = 1 2 e x- 2            1 2 10/4/2023 By Degemu S (MPH) 145
  • 146. π (pi) = 3.14159 e = 2.71828, x = Value of X Range of possible values of X: -∞ to +∞ µ = Expected value of X (“the long run average”) σ2 = Variance of X. µ and σ are the parameters of the normal distribution — they completely define its shape 10/4/2023 By Degemu S (MPH) 146
  • 147. 10/4/2023 By Degemu S (MPH) 147
  • 148. 1. The mean µ tells you about location - • Increase µ - Location shifts right • Decrease µ – Location shifts left • Shape is unchanged 2. The variance σ2 tells you about narrowness or flatness of the bell - • Increase σ2 - Bell flattens. Extreme values are more likely • Decrease σ2 - Bell narrows. Extreme values are less likely • Location is unchanged 10/4/2023 By Degemu S (MPH) 148
  • 149. 10/4/2023 By Degemu S (MPH) 149
  • 150. Properties of the Normal Distribution(ND) 1. It is symmetrical about its mean, . 2. The mean, the median and mode are almost equal. It is unimodal. 3. The total area under the curve about the x-axis is 1 square unit. 4. The curve never touches the x-axis. 5. As the value of  increases, the curve becomes more and more flat and vice versa. 10/4/2023 By Degemu S (MPH) 150
  • 151. 6. Perpendiculars of: ± 1SD contain about 68%; ±2 SD contain about 95%; ±3 SD contain about 99.7% of the area under the curve. Next slide 7. The distribution is completely determined by the parameters  and . 10/4/2023 By Degemu S (MPH) 151
  • 152. 10/4/2023 By Degemu S (MPH) 152
  • 153. • We have different normal distributions depending on the values of μ and σ2. • We cannot tabulate every possible distribution • Tabulated normal probability calculations are available only for the ND with µ = 0 and σ2=1. 10/4/2023 By Degemu S (MPH) 153
  • 154. Standard Normal Distribution  It is a normal distribution that has a mean equal to 0 and a SD equal to 1, and is denoted by N(0, 1).  The main idea is to standardize all the data that is given by using Z- scores.  These Z-scores can then be used to find the area (and thus the probability) under the normal curve. 10/4/2023 By Degemu S (MPH) 154
  • 155. The standard normal distribution has mean 0 and variance 1 • Approximately 68% of the area under the standard normal curve lies between ±1, about 95% between ±2, and about 99% between ±2.5 10/4/2023 By Degemu S (MPH) 155
  • 156. Z - Transformation • If a random variable X~N(,) then we can transform it to a SND with the help of Z-transformation Z = x -   • Z represents the Z-score for a given x value 10/4/2023 By Degemu S (MPH) 156
  • 157. • Consider redefining the scale to be in terms of how many SDs away from mean for normal distribution, μ=110 and σ=15. Value x 50 65 80 95 110 125 140 155 170 -4 -3 -2 -1 0 1 2 3 4 SDs from mean using (x-110)/15 = (x-μ)/σ 10/4/2023 By Degemu S (MPH) 157
  • 158. • This process is known as standardization and gives the position on a normal curve with μ = 0 and σ =1, i.e., the SND, Z. • A Z-score is the number of standard deviations that a given x value is above or below the mean. 10/4/2023 By Degemu S (MPH) 158
  • 159. Finding normal curve areas 1. The table gives areas between -∞ and the value of zo. 2. Find the z value in tenths in the column at left margin and locate its row. Find the hundredth place in the appropriate column. 3. Read the value of the area (P) from the body of the table where the row and column intersect. Values of P are in the form of a decimal point and four places. 10/4/2023 By Degemu S (MPH) 159
  • 160. Some Useful Tips 10/4/2023 By Degemu S (MPH) 160
  • 161. a) What is the probability that z < -1.96? (1) Sketch a normal curve (2) Draw a perpendicular line for z = -1.9 (3) Find the area in the table (4) The answer is the area to the left of the line P(z < -1.96) = 0.0250 10/4/2023 By Degemu S (MPH) 161
  • 162. 10/4/2023 By Degemu S (MPH) 162
  • 163. b) What is the probability that -1.96 < z < 1.96? The area between the values P(-1.96 < z < 1.96) = .9750 - .0250 = .9500 10/4/2023 By Degemu S (MPH) 163
  • 164. c) What is the probability that z > 1.96? • The answer is the area to the right of the line; found by subtracting table value from 1.0000; P(z > 1.96) =1.0000 - .9750 = .0250 10/4/2023 By Degemu S (MPH) 164
  • 165. 10/4/2023 By Degemu S (MPH) 165
  • 166. Exercise 1. Compute P(-1 ≤ Z ≤ 1.5) 2. Find the area under the SND from 0 to 1.45 3. Compute P(-1.66 < Z < 2.85) 10/4/2023 By Degemu S (MPH) 166 Ans: 0.7745 Ans: 0.4265 Ans: 0.9493
  • 167. Applications of the Normal Distribution • The ND is used as a model to study many different variables. • The ND can be used to answer probability questions about continuous random variables. • Following the model of the ND, a given value of x must be converted to a z score before it can be looked up in the z table. 10/4/2023 By Degemu S (MPH) 167
  • 168. Example: • The diastolic blood pressures of males 35–44 years of age are normally distributed with µ = 80 mm Hg and σ2 = 144 mm Hg2 σ = 12 mm Hg • Therefore, a DBP of 80+12 = 92 mm Hg lies 1 SD above the mean • Let individuals with BP above 95 mm Hg are considered to be hypertensive 10/4/2023 By Degemu S (MPH) 168
  • 169. a. What is the probability that a randomly selected male has a BP above 95 mm Hg? • Approximately 10.6% of this population would be classified as hypertensive. 10/4/2023 By Degemu S (MPH) 169
  • 170. b. What is the probability that a randomly selected male has a DBP above 110 mm Hg? Z = 110 – 80 = 2.50 12 P (Z > 2.50) = 0.0062 • Approximately 0.6% of the population has a DBP above 110 mm Hg 10/4/2023 By Degemu S (MPH) 170
  • 171. c. What is the probability that a randomly selected male has a DBP below 60 mm Hg? Z = 60 – 80 = -1.67 12 P (Z < -1.67) = 0.0475 • Approximately 4.8% of the population has a DBP below 60 mm Hg 10/4/2023 By Degemu S (MPH) 171
  • 172. d. What value of DBP cuts off the upper 5% of this population? • Looking at the table, the value Z = 1.645 cuts off an area of 0.05 in the upper tail • We want the value of X that corresponds to Z = 1.645 Z = X – μ σ 1.645 = X – μ, X = 99.7 σ • Approximately 5% of the men in this population have a DBP greater than 99.7 mm Hg 10/4/2023 By Degemu S (MPH) 172
  • 175. • Researchers often use sample survey methodology to obtain information about a larger population by selecting and measuring a sample from that population. • Since population is too large, we rely on the information collected from the sample. 10/4/2023 By Degemu S (MPH) 175
  • 176. • Inferences about the population are based on the information from the sample drawn from that population. • However, due to the variability in the characteristics of the population, scientific sample designs should be applied to select a representative sample. • If not, there is a high risk of distorting the view of the population. 10/4/2023 By Degemu S (MPH) 176
  • 177. • A sample is a collection of individuals selected from a larger population. • For example, we may have a single sample composed of 50 cases, representing a population of 1000 individuals. 10/4/2023 By Degemu S (MPH) 177
  • 178. • Sampling enables us to estimate the characteristic of a population by directly observing a portion of the population. • Researchers are not interested in the sample itself, but in what can be learned from the sample—and how this information can be applied to the entire population. 10/4/2023 By Degemu S (MPH) 178
  • 180. • Therefore, it is essential that a sample should be correctly defined and organized. • If the wrong questions are posed to the wrong people, reliable information will not be received and lead to a wrong conclusion when applied to the entire population. 10/4/2023 By Degemu S (MPH) 180
  • 181. Steps needed to select a sample and ensure that this sample will fulfill its goals. 1. Establish the study's objectives • The first step in planning a useful and efficient survey is to specify the objectives with as much detail as possible. • Without objectives, the survey is unlikely to generate valuable results. • Clarifying the aims of the survey is critical to its ultimate success. • The initial users and uses of the data should be identified at this stage. 10/4/2023 By Degemu S (MPH) 181
  • 182. 2. Define the target population • The target population is the total population for which the information is required. • Specifically, the target population is defined by the following characteristics: • Nature of data required • Geographic location • Reference period • Other characteristics, such as socio-demographic characteristics 10/4/2023 By Degemu S (MPH) 182
  • 183. 3. Decide on the data to be collected • The data requirements of the survey must be established. • To ensure that the requirements are operationally sound, the necessary data terms and definitions also need to be determined. 10/4/2023 By Degemu S (MPH) 183
  • 184. 4. Set the level of precision • There is a level of uncertainty associated with estimates coming from a sample. • The sample-to-sample variation is what causes the sampling error. • Researchers can estimate the sampling error associated with a particular sampling plan, and try to minimize it. 10/4/2023 By Degemu S (MPH) 184
  • 185. 5. Decide on the methods on measurement • Choose measuring instrument and method of approach to the population • Data about a person’s state of health may be obtained from statements that he/she makes or from a medical examination • The survey may employ a self-administered questionnaire, an interviewing 10/4/2023 By Degemu S (MPH) 185
  • 186. 6. Preparing Frame • List of all members of the population • The elements must not overlap 10/4/2023 By Degemu S (MPH) 186
  • 187. The sample design • Sample design: how the sample will be collected. • Estimation techniques: how the results from the sample will be extended to the whole population. • Measures of precision: how the sampling error will be measured. 10/4/2023 By Degemu S (MPH) 187
  • 188. Other Considerations • Sample size determination • Questionnaire development • Pretest • Organization of the field work • Data collection • Summary and analysis of the data • Edit the completed questionnaires • Decide on computation procedures 10/4/2023 By Degemu S (MPH) 188
  • 189. Sampling theory in public health • A health survey (sampling) is a planned study to investigate the health characteristics of a population 10/4/2023 By Degemu S (MPH) 189
  • 190. A health survey is used to: • Measure the total amount of illness in the population; • Measure the amount of illness caused by a specified disease; • Examine the utilization of existing health care facilities and demand for new ones; • Measure the distribution of a particular characteristic, e.g.. breast- feeding practice in the population; • Examine the role and relationship of one or more factors in the etiology of a disease. 10/4/2023 By Degemu S (MPH) 190
  • 191. Sampling • The process of selecting a portion of the population to represent the entire population. • A main concern in sampling: • Ensure that the sample represents the population, and • The findings can be generalized. 10/4/2023 By Degemu S (MPH) 191
  • 192. Advantages of sampling: • Feasibility: Sampling may be the only feasible method of collecting information. • Reduced cost: Sampling reduces demands on resource such as finance, personnel, and material. • Greater accuracy: Sampling may lead to better accuracy of collecting data • Sampling error: Precise allowance can be made for sampling error • Greater speed: Data can be collected and summarized more quickly 10/4/2023 By Degemu S (MPH) 192
  • 193. Disadvantages of sampling: • There is always a sampling error. • Sampling may create a feeling of discrimination within the population. • Sampling may be inadvisable where every unit in the population is legally required to have a record. 10/4/2023 By Degemu S (MPH) 193
  • 194. Errors in sampling 1) Sampling error: Errors introduced due to errors in the selection of a sample. • They cannot be avoided or totally eliminated. 2) Non-sampling error: - Observational error - Respondent error - Lack of preciseness of definition - Errors in editing and tabulation of data 10/4/2023 By Degemu S (MPH) 194
  • 195. Random number table • It is a table of random numbers constructed by a process that 1. In any position in the table, each of the numbers 0 through 9 has a probability 1/10 of occurring. 2. The occurrence of any number in one part of the table is independent of the occurrence of any number in any other part of the table. 10/4/2023 By Degemu S (MPH) 195
  • 196. Sampling Methods Two broad divisions: A. Probability sampling methods B. Non-probability sampling methods 10/4/2023 By Degemu S (MPH) 196
  • 197. A. Probability sampling • Involves random selection of a sample • A sample is obtained in a way that ensures every member of the population to have a known, non zero probability of being included in the sample. • Involves the selection of a sample from a population, based on chance. 10/4/2023 By Degemu S (MPH) 197
  • 198. • Probability sampling is: • more complex, • more time-consuming and • usually more costly than non-probability sampling. • However, because study samples are randomly selected and their probability of inclusion can be calculated, • reliable estimates can be produced and • inferences can be made about the population. 10/4/2023 By Degemu S (MPH) 198
  • 199. • There are several different ways in which a probability sample can be selected. • The method chosen depends on a number of factors, such as • the available sampling frame, • how spread out the population is, • how costly it is to survey members of the population 10/4/2023 By Degemu S (MPH) 199
  • 200. • When choosing a probability sample design, • Our goal should be to minimize the sampling error of the estimates for the most important survey variables, • While simultaneously minimizing the time and cost of conducting the survey. 10/4/2023 By Degemu S (MPH) 200
  • 201. Most common probability sampling methods 1. Simple random sampling 2. Systematic random sampling 3. Sampling with probability proportional to size 4. Stratified random sampling 5. Cluster sampling 6. Multi-stage sampling 10/4/2023 By Degemu S (MPH) 201
  • 202. 1. Simple random sampling • Involves random selection • Each member of a population has an equal chance of being included in the sample. 10/4/2023 By Degemu S (MPH) 202
  • 203. • To use a SRS method: • Make a numbered list of all the units in the population • Each unit should be numbered from 1 to N (where N is the size of the population) • Select the required number. 10/4/2023 By Degemu S (MPH) 203
  • 204. • The randomness of the sample is ensured by: • use of “lottery’ methods • a table of random numbers 10/4/2023 By Degemu S (MPH) 204
  • 205. Example • Suppose your school has 500 students and you need to conduct a short survey on the quality of the food served in the cafeteria. • You decide that a sample of 10 students should be sufficient for your purposes. • In order to get your sample, you assign a number from 1 to 500 to each student in your school. 10/4/2023 By Degemu S (MPH) 205
  • 206. • To select the sample, you use a table of randomly generated numbers. • Pick a starting point in the table (a row and column number) and look at the random numbers that appear there. In this case, since the data run into three digits, the random numbers would need to contain three digits as well. 10/4/2023 By Degemu S (MPH) 206
  • 207. • Ignore all random numbers after 500 because they do not correspond to any of the students in the school. • Remember that the sample is without replacement, so if a number recurs, skip over it and use the next random number. • The first 10 different numbers between 001 and 500 make up your sample. 10/4/2023 By Degemu S (MPH) 207
  • 208. • SRS has certain limitations: • Requires a sampling frame. • Difficult if the reference population is dispersed. • Minority subgroups of interest may not be selected. 10/4/2023 By Degemu S (MPH) 208
  • 209. 2. Systematic random sampling • Sometimes called interval sampling, systematic sampling means that there is a gap, or interval, between each selected unit in the sample • The selection is systematic rather than randomly 10/4/2023 By Degemu S (MPH) 209
  • 210. • Important if the reference population is arranged in some order: • Order of registration of patients • Numerical number of house numbers • Student’s registration books • Taking individuals at fixed intervals (every kth) based on the sampling fraction, eg. if the sample includes 20%, then every fifth. 10/4/2023 By Degemu S (MPH) 210
  • 211. Steps in systematic random sampling 1. Number the units on your frame from 1 to N (where N is the total population size). 2. Determine the sampling interval (K) by dividing the number of units in the population by the desired sample size. 10/4/2023 By Degemu S (MPH) 211
  • 212. 3. Select a number between one and K at random. This number is called the random start and would be the first number included in your sample. 4. Select every Kth unit after that first number Note: Systematic sampling should not be used when a cyclic repetition is inherent in the sampling frame. 10/4/2023 By Degemu S (MPH) 212
  • 213. Example • To select a sample of 100 from a population of 400, you would need a sampling interval of 400 ÷ 100 = 4. • Therefore, K = 4. • You will need to select one unit out of every four units to end up with a total of 100 units in your sample. • Select a number between 1 and 4 from a table of random numbers. 10/4/2023 By Degemu S (MPH) 213
  • 214. • If you choose 3, the third unit on your frame would be the first unit included in your sample; • The sample might consist of the following units to make up a sample of 100: 3 (the random start), 7, 11, 15, 19...395, 399 (up to N, which is 400 in this case). 10/4/2023 By Degemu S (MPH) 214
  • 215. • Using the above example, you can see that with a systematic sample approach there are only four possible samples that can be selected, corresponding to the four possible random starts: A. 1, 5, 9, 13...393, 397 B. 2, 6, 10, 14...394, 398 C. 3, 7, 11, 15...395, 399 D. 4, 8, 12, 16...396, 400 10/4/2023 By Degemu S (MPH) 215
  • 216. • Each member of the population belongs to only one of the four samples and each sample has the same chance of being selected. • The main difference with SRS, any combination of 100 units would have a chance of making up the sample, while with systematic sampling, there are only four possible samples. 10/4/2023 By Degemu S (MPH) 216
  • 217. 3. Sampling with probability proportional to size • Probability sampling requires that each member of the survey population has a chance of being included in the sample, but it does not require that this chance be the same for everyone. 10/4/2023 By Degemu S (MPH) 217
  • 218. • If information is available on the frame about the size of each unit and if those units vary in size, this information can be used in the sampling selection in order to increase the efficiency. • This is known as sampling with probability proportional to size (PPS). 10/4/2023 By Degemu S (MPH) 218
  • 219. • With this method, the bigger the size of the unit, the higher the chance it has of being included in the sample. • For this method to achieve increased efficiency, the measure of size needs to be accurate. 10/4/2023 By Degemu S (MPH) 219
  • 220. Steps in PPS • List all Kebeles/clusters with their population size • Calculate the cumulative frequency • Calculate the sampling interval by dividing the total population size by the sample size, say K • Randomly choose a number between 1 and K, say j • Kebeles/clusters with cumulative frequency contacting the jth, (j+1)th, ….(j+(k-1))th will be included in the sample 10/4/2023 By Degemu S (MPH) 220
  • 221. 4. Stratified random sampling • It is done when the population is known to be have heterogeneity with regard to some factors and those factors are used for stratification • Using stratified sampling, the population is divided into homogeneous, mutually exclusive groups called strata, and • A population can be stratified by any variable that is available for all units prior to sampling (e.g., age, sex, province of residence, income, etc.). 10/4/2023 By Degemu S (MPH) 221
  • 222. • A separate sample is taken independently from each stratum. • Any of the sampling methods mentioned in this section (and others that exist) can be used to sample within each stratum. 10/4/2023 By Degemu S (MPH) 222
  • 223. Why do we need to create strata? • That it can make the sampling strategy more efficient. • A larger sample is required to get a more accurate estimation if a characteristic varies greatly from one unit to the other. • For example, if every person in a population had the same salary, then a sample of one individual would be enough to get a precise estimate of the average salary. 10/4/2023 By Degemu S (MPH) 223
  • 224. • This is the idea behind the efficiency gain obtained with stratification. • If you create strata within which units share similar characteristics (e.g., income) and are considerably different from units in other strata (e.g., occupation, type of dwelling) then you would only need a small sample from each stratum to get a precise estimate of total income for that stratum. 10/4/2023 By Degemu S (MPH) 224
  • 225. • Then you could combine these estimates to get a precise estimate of total income for the whole population. • If you use a SRS approach in the whole population without stratification, the sample would need to be larger than the total of all stratum samples to get an estimate of total income with the same level of precision. 10/4/2023 By Degemu S (MPH) 225
  • 226. • Stratified sampling ensures an adequate sample size for sub- groups in the population of interest. • When a population is stratified, each stratum becomes an independent population and you will need to decide the sample size for each stratum. 10/4/2023 By Degemu S (MPH) 226
  • 227. • Equal allocation: • Allocate equal sample size to each stratum • Proportionate allocation: , j = 1, 2, ..., k where, k is the number of strata and • nj is sample size of the jth stratum • Nj is population size of the jth stratum • n = n1 + n2 + ...+ nk is the total sample size • N = N1 + N2 + ...+ Nk is the total population size n n N N j j  10/4/2023 By Degemu S (MPH) 227
  • 228. 5. Cluster sampling • Sometimes it is too expensive to spread a sample across the population as a whole. • Travel costs can become expensive if interviewers have to survey people from one end of the country to the other. • To reduce costs, researchers may choose a cluster sampling technique • The clusters should be homogeneous, unlike stratified sampling where by the strata are heterogeneous 10/4/2023 By Degemu S (MPH) 228
  • 229. Steps in cluster sampling • Cluster sampling divides the population into groups or clusters. • A number of clusters are selected randomly to represent the total population, and then all units within selected clusters are included in the sample. • No units from non-selected clusters are included in the sample—they are represented by those from selected clusters. • This differs from stratified sampling, where some units are selected from each group. 10/4/2023 By Degemu S (MPH) 229
  • 230. Example • In a school based study, we assume students of the same school are homogeneous. • We can select randomly sections and include all students of the selected sections only 10/4/2023 By Degemu S (MPH) 230
  • 231. • As mentioned, cost reduction is a reason for using cluster sampling. • It creates 'pockets' of sampled units instead of spreading the sample over the whole territory. • Another reason is that sometimes a list of all units in the population is not available, while a list of all clusters is either available or easy to create. 10/4/2023 By Degemu S (MPH) 231
  • 232. • In most cases, the main drawback is a loss of efficiency when compared with SRS. • It is usually better to survey a large number of small clusters instead of a small number of large clusters. • This is because neighboring units tend to be more alike, resulting in a sample that does not represent the whole spectrum of opinions or situations present in the overall population. 10/4/2023 By Degemu S (MPH) 232
  • 233. • Another drawback to cluster sampling is that you do not have total control over the final sample size. • Since not all schools have the same number of (say Grade 11) students and city blocks do not all have the same number of households, and you must interview every student or household in your sample, as an example, the final size may be larger or smaller than you expected. 10/4/2023 By Degemu S (MPH) 233
  • 234. 6. Multi-stage sampling • Similar to the cluster sampling, except that it involves picking a sample from within each chosen cluster, rather than including all units in the cluster. • This type of sampling requires at least two stages. 10/4/2023 By Degemu S (MPH) 234
  • 235. • In the first stage, large groups or clusters are identified and selected. These clusters contain more population units than are needed for the final sample. • In the second stage, population units are picked from within the selected clusters (using any of the possible probability sampling methods) for a final sample. 10/4/2023 By Degemu S (MPH) 235
  • 236. • If more than two stages are used, the process of choosing population units within clusters continues until there is a final sample. • With multi-stage sampling, you still have the benefit of a more concentrated sample for cost reduction. • However, the sample is not as concentrated as other clusters and the sample size is still bigger than for a simple random sample size. 10/4/2023 By Degemu S (MPH) 236
  • 237. • Also, you do not need to have a list of all of the units in the population. All you need is a list of clusters and list of the units in the selected clusters. • Admittedly, more information is needed in this type of sample than what is required in cluster sampling. However, multi-stage sampling still saves a great amount of time and effort by not having to create a list of all the units in a population. 10/4/2023 By Degemu S (MPH) 237
  • 238. B. Non-probability sampling • The difference between probability and non-probability sampling has to do with a basic assumption about the nature of the population under study. • In probability sampling, every item has a known chance of being selected. • In non-probability sampling, there is an assumption that there is an even distribution of a characteristic of interest within the population. 10/4/2023 By Degemu S (MPH) 238
  • 239. • This is what makes the researcher believe that any sample would be representative and because of that, results will be accurate. • For probability sampling, random is a feature of the selection process, rather than an assumption about the structure of the population. 10/4/2023 By Degemu S (MPH) 239
  • 240. • In non-probability sampling, since elements are chosen arbitrarily, there is no way to estimate the probability of any one element being included in the sample. • Also, no assurance is given that each item has a chance of being included, making it impossible either to estimate sampling variability or to identify possible bias 10/4/2023 By Degemu S (MPH) 240
  • 241. • Reliability cannot be measured in non-probability sampling; the only way to address data quality is to compare some of the survey results with available information about the population. • Still, there is no assurance that the estimates will meet an acceptable level of error. • Researchers are reluctant to use these methods because there is no way to measure the precision of the resulting sample. 10/4/2023 By Degemu S (MPH) 241
  • 242. • Despite these drawbacks, non-probability sampling methods can be useful when descriptive comments about the sample itself are desired. • Secondly, they are quick, inexpensive and convenient. • There are also other circumstances, such as researches, when it is unfeasible or impractical to conduct probability sampling. 10/4/2023 By Degemu S (MPH) 242
  • 243. The most common types of non-probability sampling 1. Convenience or haphazard sampling 2. Volunteer sampling 3. Judgment sampling 4. Quota sampling 5. Snowball sampling technique 10/4/2023 By Degemu S (MPH) 243
  • 244. 1. Convenience or haphazard sampling • Convenience sampling is sometimes referred to as haphazard or accidental sampling. • It is not normally representative of the target population because sample units are only selected if they can be accessed easily and conveniently. 10/4/2023 By Degemu S (MPH) 244
  • 245. • The obvious advantage is that the method is easy to use, but that advantage is greatly offset by the presence of bias. • Although useful applications of the technique are limited, it can deliver accurate results when the population is homogeneous. 10/4/2023 By Degemu S (MPH) 245
  • 246. • For example, a scientist could use this method to determine whether a lake is polluted or not. • Assuming that the lake water is well-mixed, any sample would yield similar information. • A scientist could safely draw water anywhere on the lake without bothering about whether or not the sample is representative 10/4/2023 By Degemu S (MPH) 246
  • 247. 2. Volunteer sampling • As the term implies, this type of sampling occurs when people volunteer to be involved in the study. • In psychological experiments or pharmaceutical trials (drug testing), for example, it would be difficult and unethical to enlist random participants from the general public. • In these instances, the sample is taken from a group of volunteers. 10/4/2023 By Degemu S (MPH) 247
  • 248. • Sometimes, the researcher offers payment to attract respondents. • In exchange, the volunteers accept the possibility of a lengthy, demanding or sometimes unpleasant process. 10/4/2023 By Degemu S (MPH) 248
  • 249. • Sampling voluntary participants as opposed to the general population may introduce strong biases. • Often in opinion polling, only the people who care strongly enough about the subject tend to respond. • The silent majority does not typically respond, resulting in large selection bias. 10/4/2023 By Degemu S (MPH) 249
  • 250. 3. Judgment sampling • This approach is used when a sample is taken based on certain judgments about the overall population. • The underlying assumption is that the investigator will select units that are characteristic of the population. • The critical issue here is objectivity: how much can judgment be relied upon to arrive at a typical sample? 10/4/2023 By Degemu S (MPH) 250
  • 251. • Judgment sampling is subject to the researcher's biases and is perhaps even more biased than haphazard sampling. • Since any preconceptions the researcher may have are reflected in the sample, large biases can be introduced if these preconceptions are inaccurate. 10/4/2023 By Degemu S (MPH) 251
  • 252. • Researchers often use this method in exploratory studies like pre-testing of questionnaires and focus groups. • They also prefer to use this method in laboratory settings where the choice of experimental subjects (i.e., animal, human) reflects the investigator's pre-existing beliefs about the population. 10/4/2023 By Degemu S (MPH) 252
  • 253. • One advantage of judgment sampling is the reduced cost and time involved in acquiring the sample. 10/4/2023 By Degemu S (MPH) 253
  • 254. 4. Quota sampling • This is one of the most common forms of non-probability sampling. • Sampling is done until a specific number of units (quotas) for various sub-populations have been selected. 10/4/2023 By Degemu S (MPH) 254
  • 255. • Since there are no rules as to how these quotas are to be filled, quota sampling is really a means for satisfying sample size objectives for certain sub-populations. 10/4/2023 By Degemu S (MPH) 255
  • 256. • As with all other non-probability sampling methods, in order to make inferences about the population, it is necessary to assume that persons selected are similar to those not selected. • Such strong assumptions are rarely valid. 10/4/2023 By Degemu S (MPH) 256
  • 257. • The main argument against quota sampling is that it does not meet the basic requirement of randomness. • Some units may have no chance of selection or the chance of selection may be unknown. • Therefore, the sample may be biased. 10/4/2023 By Degemu S (MPH) 257
  • 258. • Quota sampling is generally less expensive than random sampling. • It is also easy to administer, especially considering the tasks of listing the whole population, randomly selecting the sample and following-up on non-respondents can be omitted from the procedure. 10/4/2023 By Degemu S (MPH) 258
  • 259. • Quota sampling is an effective sampling method when information is urgently required and can be carried out sampling frames. • In many cases where the population has no suitable frame, quota sampling may be the only appropriate sampling method. 10/4/2023 By Degemu S (MPH) 259
  • 260. 5. Snowball sampling • A technique for selecting a research sample where existing study subjects recruit future subjects from among their acquaintances. • Thus the sample group appears to grow like a rolling snowball. 10/4/2023 By Degemu S (MPH) 260
  • 261. • This sampling technique is often used in hidden populations which are difficult for researchers to access; example populations would be drug users or commercial sex workers. • Because sample members are not selected from a sampling frame, snowball samples are subject to numerous biases. For example, people who have many friends are more likely to be recruited into the sample. 10/4/2023 By Degemu S (MPH) 261
  • 262. Sampling Distributions 10/4/2023 By Degemu S (MPH) 262
  • 263. •A sampling distribution is a distribution of all possible values of a statistic computed from samples of the same size randomly selected from the same population. •Serves to answer probability questions about sample statistics. 10/4/2023 By Degemu S (MPH) 263
  • 264. • When sampling a discrete, finite population, a sampling distribution can be constructed. • However, this construction is difficult with a large population and impossible with an infinite population. 10/4/2023 By Degemu S (MPH) 264
  • 265. • We consider sample statistics as random variables. Example: • Age of individuals is a random variable. • Similarly, mean age is a random variable. 10/4/2023 By Degemu S (MPH) 265
  • 266. • Conclusions about values of population parameters based on one individual value can not be drawn. • It should be based on sample statistics computed from an adequate sample size. 10/4/2023 By Degemu S (MPH) 266
  • 267. • Similarly, take a sample and calculate the statistic, e.g., mean. • Take another sample (same size) and calculate mean. • Repeat & repeat & repeat & ……….. • Do you expect all the sample means the same? NO • They will vary BUT less variation • Put all these sample statistics together to get a distribution of sample statistics. 10/4/2023 By Degemu S (MPH) 267
  • 268. Construction of sampling distributions 1. From a population of size N, randomly draw all possible samples of size n. 2. Compute the statistic of interest for each sample. 3. Create a frequency distribution of the statistic. 10/4/2023 By Degemu S (MPH) 268
  • 269. Main types of sampling distributions A. Distribution of the sample mean B. Distribution of the difference between two means C. Distribution of the sample proportion D. Distribution of the difference between two proportions 10/4/2023 By Degemu S (MPH) 269
  • 270. A. Sampling distribution of sample mean • Suppose we have a population of size N=4, constituting the ages of four outpatients. x, Age (years): 18, 20, 22, 24 21 4 24 22 20 18 N x μ i        2.236 N μ) (x σ 2 i     10/4/2023 By Degemu S (MPH) 270
  • 271. Now consider all possible samples of size n=2 • 16 possible samples (with replacement) 1st 2nd Observation Obs 18 20 22 24 18 18 19 20 21 20 19 20 21 22 22 20 21 22 23 24 21 22 23 24 1st 2nd Observation Obs 18 20 22 24 18 18,18 18,20 18,22 18,24 20 20,18 20,20 20,22 20,24 22 22,18 22,20 22,22 22,24 24 24,18 24,20 24,22 24,24 • 16 Sample Means 10/4/2023 By Degemu S (MPH) 271
  • 272. Sample means Freq P( ) 18 19 20 21 22 23 24 1 2 3 4 3 2 1 0.0625 0.1250 0.1875 0.2500 0.1875 0.1250 0.0625 10/4/2023 By Degemu S (MPH) 272
  • 273. 1st 2nd Observation Obs 18 20 22 24 18 18 19 20 21 20 19 20 21 22 22 20 21 22 23 24 21 22 23 24 Sampling distribution of all sample means 18 19 20 21 22 23 24 0 .1 .2 .3 P(x) x Sample Means Distribution 16 Sample Means _ 10/4/2023 By Degemu S (MPH) 273
  • 274. Summary measures of this sampling distribution: Add the 16 sample means & divide by 16. Also calculate the SD of the sample means. 21 16 24 21 19 18 N x μ i x          1.58 16 21) - (24 21) - (19 21) - (18 N ) μ (x σ 2 2 2 2 x i x          10/4/2023 By Degemu S (MPH) 274
  • 275. Comparing the population with its sampling distribution 18 19 20 21 22 23 24 0 .1 .2 .3 P(x) Mean 18 20 22 24 0 .1 .2 .3 Population N = 4 P(x) x _ 1.58 σ 21 μ x x   2.236 σ 21 μ   Sample means distribution n = 2 10/4/2023 By Degemu S (MPH) 275
  • 276. • We note that the mean of the sampling distribution of has the same value as the mean of the original population. • However, the variance is ≠ the original population variance; but is equal to the population variance divided by the sample size used to obtain sampling distribution. 10/4/2023 By Degemu S (MPH) 276
  • 277. • The square root of the sampling distribution variance is called the standard error of the mean or, simply, standard error. • OR, the standard deviation of any sample statistic is called its standard error. n σ σx  10/4/2023 By Degemu S (MPH) 277
  • 278. • SE is determined by both the sample size and the degree of variability among the individual observations • SD quantifies the amount of variability among individuals in a population, while • SE quantifies the variability among means of repeated samples drawn from that population • The SE is always smaller than the SD (except when n = 1) 10/4/2023 By Degemu S (MPH) 278
  • 279. Sampling Error • Sample statistics are used to estimate population parameters ex: X is an estimate of the population mean, μ • Problems: • Different samples provide different estimates of the population parameter • Sample results have potential variability, thus sampling error exits 10/4/2023 By Degemu S (MPH) 279
  • 280. Calculating sampling error • Sampling error: The difference between a value (a statistic) computed from a sample and the corresponding value (a parameter) computed from a population Example: (for the mean) where: μ - x Error Sampling  mean population μ mean sample x   10/4/2023 By Degemu S (MPH) 280
  • 281. Example x x If the population mean is μ = 98.6 degrees and a sample of n = 5 temperatures yields a sample mean of = 99.2 degrees, then the sampling error is: Sample mean- μ = 99.2 – 98.6 = 0.6 degrees x 10/4/2023 By Degemu S (MPH) 281
  • 282. Note: • The sampling error may be positive or negative (may be greater than or less than μ) • The expected sampling error decreases as the sample size increases x 10/4/2023 By Degemu S (MPH) 282
  • 283. Properties of sampling distribution of mean A. Sampling from normally distributed populations. a. If a population is normal with mean μ and standard deviation σ, the sampling distribution of is also normally distributed with and x μ μx  n σ σx  10/4/2023 By Degemu S (MPH) 283
  • 284. b. The mean, μ, of the distribution of sample mean is equal to the mean of the population from which the samples were drawn c. The variance of the distribution of sample mean is equal to the variance of the population divided by the sample size 10/4/2023 By Degemu S (MPH) 284
  • 285. B. Sampling from non-normally distributed populations • When the sampling is done from a non-normally distributed population, the central limit theorem is used. • The larger the sample size, the better will be the normal approximation to the sampling distribution of the mean. 10/4/2023 By Degemu S (MPH) 285
  • 286. • We can apply the Central Limit Theorem: • Even if the population is not normal, • …sample means from the population will be approximately normal as long as the sample size is large enough • …and the sampling distribution will have and μ μx  n σ σx  10/4/2023 By Degemu S (MPH) 286
  • 287. n↑ As the sample size gets large enough… the sampling distribution becomes almost normal regardless of shape of population x 10/4/2023 By Degemu S (MPH) 287
  • 288. Population Distribution Sampling Distribution (becomes normal as n increases) Central Tendency Variation x x Larger sample size Smaller sample size If the population is not normal Sampling distribution properties: μ μx  n σ σx  x μ μ 10/4/2023 By Degemu S (MPH) 288
  • 289. Below is a graph of results from a sampling activity. Samples were taken at increasing sizes, from 4 cases to 98 cases. You can see that as sample size increases, not only do the sample means become closer to the population mean, but fluctuations in sample means becomes smaller. 10/4/2023 By Degemu S (MPH) 289
  • 290. • Generally, as n increases, the sample mean and sample variance S2 approach the values of the true population parameters µ and σ2, respectively. • The average of the sample means based on repeated samples of size n approaches the population mean µ as the number of samples selected gets large. E (x) = µ • The estimator x is said to be unbiased 10/4/2023 By Degemu S (MPH) 290
  • 291. How large is large enough? • For most distributions, n > 30 will give a sampling distribution that is nearly normal • For fairly symmetric distributions, n > 15 • For normal population distributions, the sampling distribution of the mean is always normally distributed. • However, the general answer depends on the shape of the distribution of the sampled population. 10/4/2023 By Degemu S (MPH) 291
  • 293. Applications of the sampling distributions of sample mean • Helps in computing the probability of obtaining a sample with a mean of some specified magnitude. 10/4/2023 By Degemu S (MPH) 293
  • 294. z-value for sampling distribution of x where: = sample mean = population mean σ = population standard deviation n = sample size x μ n σ μ) x ( z   10/4/2023 By Degemu S (MPH) 294
  • 295. Finite Population Correction • Apply the Finite Population Correction if: • the sample is large relative to the population (n/N > 5%) and… • Sampling is without replacement Then 1 N n N n σ μ) x ( z     10/4/2023 By Degemu S (MPH) 295
  • 296. • When the population is much larger than the sample, the difference between σ2/n and (σ2/n)[(N-n)/(N-1)] will be negligible. • Example: N = 10,000; n=25 • Finite Population Correction = (N-n)/(N-1) = (10,000-25)/(10,000-1) =0.9976 ≈ 1 10/4/2023 By Degemu S (MPH) 296
  • 297. Example 1 • Given: μ = 50, σ = 16, n = 64 Find: P(x > 53) Solution 1. Write the given information, μ=50, σ=16, n=64 2. Sketch a normal curve 10/4/2023 By Degemu S (MPH) 297
  • 298. 3. Convert x to a z score 4. Find the appropriate value(s) in the Table The area of the SND above a value of z = 1.5 gives an area of 0.0668. The probability P (z > 1.5) = 0.0668 5. Complete the answer The probability that X is greater than 53 is 0.0668. 10/4/2023 By Degemu S (MPH) 298
  • 299. Example 2 • Suppose a population has mean μ = 8 and standard deviation σ = 3. Suppose a random sample of size n = 36 is selected. • What is the probability that the sample mean is between 7.8 and 8.2? 10/4/2023 By Degemu S (MPH) 299
  • 300. Solution: • Even if the population is not normally distributed, the central limit theorem can be used (n > 30) • … so the sampling distribution of is approximately normal • … with mean = 8 • …and 0.5 36 3 n σ σx    x x μ 10/4/2023 By Degemu S (MPH) 300
  • 301. x 0.3108 0.4) z P(-0.4 36 3 8 - 8.2 n σ μ - μ 36 3 8 - 7.8 P 8.2) μ P(7.8 x x                    z 7.8 8.2 -0.4 0.4 Sampling Distribution Standard Normal Distribution .1554 +.1554 x Population Distribution ? ? ? ? ? ? ? ? ? ? ? ? Sample Standardize 8 μ  8 μx  0 μz  10/4/2023 By Degemu S (MPH) 301
  • 302. Example 3 • The distribution of serum cholesterol levels for all 20-70 year-old males has mean µ = 211 mg/100 ml and SD = 46 mg/100 ml. a. If a sample of size 25 is selected from this population, what is the probability that the sample has a mean of 230 or above? • Since x has a normal distribution with mean 211 and standard error 9.2, 10/4/2023 By Degemu S (MPH) 302
  • 303. 10/4/2023 By Degemu S (MPH) 303
  • 304. • The area under the standard normal curve to the right of z = 2.07 is 0.0197 • Consequently, the probability that a sample of size 25 has a mean of 230 mg/100 ml or higher is 0.0197. 10/4/2023 By Degemu S (MPH) 304
  • 305. b. What mean value of serum cholesterol level cuts off the lower 10% of the sampling distribution? • An area of 0.1003 in the lower tail of the SND is marked by the value z = −1.28 • What is the corresponding value of ? 10/4/2023 By Degemu S (MPH) 305
  • 306. Approximately 10% of samples of size 25 have means that are less than or equal to 199.2 mg/100 ml. The other 90% of the samples have means that are greater than 199.2 mg/100 ml 10/4/2023 By Degemu S (MPH) 306
  • 307. B. Distribution of the difference between two sample means • Important to compare two population means (comparative studies) • Are the two population means different? • If yes by how much do they differ? • For example, mean serum cholesterol(MSC) level for sedentary office workers vs laborers. 10/4/2023 By Degemu S (MPH) 307
  • 308. • It is generally assumed that the two populations are normally distributed. • For sampling from non-normal populations, large samples are recommended by the application of the CLT. • Plotting sample differences (Mean1-Mean2) against frequency gives a normal distribution with mean equal to μ1-μ2 which is the difference between the two population means. 10/4/2023 By Degemu S (MPH) 308
  • 309. • The variance of the distribution of the sample differences is: = (σ1 2 /n1) + (σ2 2 /n2) • Thus, the standard error of the difference between sample means is: = 10/4/2023 By Degemu S (MPH) 309
  • 310. • To convert to the SND, we use the formula • We find the z score by assuming that there is no difference between the population means. 10/4/2023 By Degemu S (MPH) 310