SlideShare a Scribd company logo
1 of 319
School of Public Health
Name: Huruy Assefa
E-mail: Huruyame@yahoo.com
Mob.: 0914-728565
Chapter-1
Introduction to Biostatistics
School of Public Health
Introduction
• What is statistics?
• Statistics: A field of study concerned with:
– collection, organization, analysis, summarization and
interpretation of numerical data, and
– the drawing of inferences about a body of data when only a
small part of the data is observed.
• Statistics helps us use numbers to communicate ideas
• Statisticians try to interpret and communicate the results
to others.
2
School of Public Health
 Biostatistics: The application of statistical methods to
the fields of biological and medical sciences.
 Concerned with interpretation of biological data & the
communication of information derived from these data
 Has central role in medical investigations
Cont.
3
School of Public Health
Uses of biostatistics
• Provide methods of organizing information
• Assessment of health status
• Health program evaluation
• Resource allocation
• Magnitude of association
– Strong vs weak association between exposure and
outcome
4
School of Public Health
Cont.
• Assessing risk factors
– Cause & effect relationship
• Evaluation of a new vaccine or drug
– What can be concluded if the proportion of people
free from the disease is greater among the vaccinated
than the unvaccinated?
– How effective is the vaccine (drug)?
– Is the effect due to chance or some bias?
• Drawing of inferences
– Information from sample to population
5
What does biostatistics cover?
Research Planning
Design
Execution (Data collection)
Data Processing
Data Analysis
Presentation
Interpretation
Publication
The best way to
learn about
biostatistics is to
follow the flow of a
research from
inception to the
final publication
6
School of Public Health
variable:
 It is a characteristic that takes on different
values in different persons, places, or things.
For example:
- heart rate,
- the heights of adult males,
- the weights of preschool children,
- the ages of patients seen in a dental clinic.
7
Types of variables
Quantitative Qualitative
Quantitative Variables
It can be measured in the
usual sense.
For example:
 the heights of adult males,
 the weights of preschool
children,
 the ages of patients seen in
a
 dental clinic.
Qualitative Variables
Many characteristics are not
capable of being measured. Some
of them can be ordered or ranked.
For example:
 classification of people into socio-
economic groups,
 social classes based on income,
education, etc.
8
School of Public Health
Interval
Types of variables &
scale of measurement
Quantitative variables
(Numerical)
Qualitative variables
(Categorical)
Ratio
Nominal
Ordinal
9
School of Public Health
Types of Statistics
1. Descriptive statistics:
• Ways of organizing and summarizing data
• Helps to identify the general features and trends in a
set of data and extracting useful information
• Also very important in conveying the final results of a
study
• Example: tables, graphs, numerical summary measures
10
School of Public Health
Cont.
2. Inferential statistics:
• Methods used for drawing conclusions about a
population based on the information obtained
from a sample of observations drawn from that
population
• Example: Principles of probability, estimation,
confidence interval, comparison of two or more
means or proportions, hypothesis testing, etc.
11
School of Public Health
Data
• Data are numbers which can be obtained by
measurement or counting
• The raw material for statistics
• Can be obtained from:
– Routinely kept records, literature
– Surveys
– Counting
– Experiments
– Reports
– Observation
– Etc
12
School of Public Health
Types of Data
1. Primary data: collected from the items or
individual respondents directly by the researcher
for the purpose of a study.
2. Secondary data: which had been collected by
certain people or organization, & statistically
treated and the information contained in it is used
for other purpose by other people
13
School of Public Health
Population and Sample
• Population:
– Refers to any collection of objects
• Target population:
– A collection of items that have something in common
for which we wish to draw conclusions at a particular
time.
• E.g., All hospitals in Ethiopia
– The whole group of interest
14
School of Public Health
Cont.
Study (Sampled) Population:
• The subset of the target population that has at least
some chance of being sampled
• The specific population group from which samples are
drawn and data are collected
15
School of Public Health
Cont.
Sample:
 A subset of a study population, about which
information is actually obtained.
 The individuals who are actually measured
and comprise the actual data.
16
School of Public Health
Population
Sample
Information
• Role of statistics
in using information
from a sample to make
inferences about the
population
Cont.
17
Sample
Study Population
Target Population
E.g.: In a study of the prevalence
of HIV among adolescents in
Ethiopia, a random sample of
adolescents in Ayder of Mekelle
were included.
Target Population: All
adolescents in Ethiopia
Study population: All
adolescents in Mekelle
Sample: Adolescents in Ayder
sub-city who were included in
the study
Cont.
18
School of Public Health
Generalizability
• Is a two-stage procedure:
• We need to be able to generalize from:
– the sample to the study population, &
– then from the study population to the target
population
• If the sample is not representative of the
population, the conclusions are restricted to
the sample & don’t have general applicability
19
School of Public Health
Parameter and Statistic
• Parameter: A descriptive measure computed
from the data of a population.
– E.g., the mean (µ) age of the target population
• Statistic: A descriptive measure computed
from the data of a sample.
– E.g., sample mean age ( )
20
School of Public Health
Descriptive Statistics:
Summarizing data
21
School of Public Health
Measures of Central Tendency (MCT)
• On the scale of values of a variable there is a certain
stage at which the largest number of items tend to
cluster.
• Since this stage is usually in the centre of distribution,
the tendency of the statistical data to get concentrated
at a certain value is called “central tendency”
• The various methods of determining the point about
which the observations tend to concentrate are called
Measures of Central Tendency.
22
• The objective of calculating MCT is to determine a
single figure/value which may be used to
represent the whole data set.
• In that sense it is an even more compact
description of the statistical data than the
frequency distribution.
• Since a MCT represents the entire data, it
facilitates comparison within one group or
between groups of data.
Cont.
23
School of Public Health
Characteristics of a good MCT
MCT is good or satisfactory if it possesses the following
characteristics.
1. It should be based on all the observations
2. It should not be affected by the extreme values
3. It should be as close to the maximum number of values as
possible
4. It should have a definite value
5. It should not be subjected to complicated and tedious
calculations (easy)
24
• The most common measures of central
tendency include:
– Arithmetic Mean
– Median
– Mode
– Others
Cont.
25
1. Arithmetic Mean
A. Ungrouped Data
• The arithmetic mean is the "average" of the data set
and by far the most widely used measure of central
location
• Is the sum of all the observations divided by the total
number of observations.
26
Cont.
27
School of Public Health
The heart rates for n=10 patients were as follows (beats
per minute):
167, 120, 150, 125, 150, 140, 40, 136, 120, 150
What is the arithmetic mean for the heart rate of these
patients?
Cont.
28
b) Grouped data
In calculating the mean from grouped data, we assume that all values falling into a
particular class interval are located at the mid-point of the interval. It is calculated as
follow:
x =
m f
f
i i
i=1
k
i
i=1
k


where,
k = the number of class intervals
mi = the mid-point of the ith
class interval
fi = the frequency of the ith
class interval
Cont.
29
Example. Compute the mean age of 169 subjects from the grouped data.
Mean = 5810.5/169 = 34.48 years
Class interval Mid-point (mi) Frequency (fi) mifi
10-19
20-29
30-39
40-49
50-59
60-69
14.5
24.5
34.5
44.5
54.5
64.5
4
66
47
36
12
4
58.0
1617.0
1621.5
1602.0
654.0
258.0
Total __ 169 5810.5
Cont.
30
School of Public Health
When the data are skewed, the mean is “dragged” in the direction of
the skewness
• It is possible in extreme cases for all but one of the sample points to be on
one side of the arithmetic mean & in this case, the mean is a poor measure of
central location or does not reflect the center of the sample.
Cont.
31
School of Public Health
Properties of the Arithmetic Mean
• For a given set of data there is one and only one
arithmetic mean (uniqueness)
• Easy to calculate and understand (simple)
• Influenced by each and every value in a data set
• Greatly affected by the extreme values
• In case of grouped data if any class interval is open,
arithmetic mean can not be calculated
32
School of Public Health
Median
a) Ungrouped data
• The median is the value which divides the data set into two
equal parts.
• If the number of values is odd, the median will be the middle
value when all values are arranged in order of magnitude.
• When the number of observations is even, there is no single
middle value but two middle observations.
• In this case the median is the mean of these two middle
observations, when all observations have been arranged in the
order of their magnitude.
33
Cont.
34
School of Public Health
Cont.
35
School of Public Health
Cont.
36
School of Public Health
 The median is a better description (than the mean) of the
majority when the distribution is skewed
• Example:- Data: 14, 89, 93, 95, 96
– Skewness is reflected in the outlying low value of 14
– The sample mean is 77.4
– The median is 93
Cont.
37
School of Public Health
b) Grouped data
• In calculating the median from grouped data, we
assume that the values within a class-interval are
evenly distributed through the interval.
• The first step is to locate the class interval in which
the median is located, using the following procedure.
• Find n/2 and see a class interval with a minimum
cumulative frequency which contains n/2.
• Then, use the following formal.
38
~
x = L
n
2
F
f
W
m
c
m












where,
Lm = lower true class boundary of the interval containing the median
Fc = cumulative frequency of the interval just above the median class
interval
fm = frequency of the interval containing the median
W= class interval width
n = total number of observations
Cont.
39
Example. Compute the median age of 169 subjects from
the grouped data.
n/2 = 169/2 = 84.5
Class interval Mid-point (mi) Frequency (fi) Cum. freq
10-19
20-29
30-39
40-49
50-59
60-69
14.5
24.5
34.5
44.5
54.5
64.5
4
66
47
36
12
4
4
70
117
153
165
169
Total 169
40
• n/2 = 84.5 = in the 3rd class interval
• Lower limit = 29.5, Upper limit = 39.5
• Frequency of the class = 47
• (n/2 – fc) = 84.5-70 = 14.5
• Median = 29.5 + (14.5/47)10 = 32.58 ≈ 33
Cont.
41
School of Public Health
Properties of the median
• There is only one median for a given set of data (uniqueness)
• The median is easy to calculate
• Median is a positional average and hence it is insensitive to very
large or very small values (not affected by extreme values)
• Median can be calculated even in the case of open end intervals
• It is determined mainly by the middle points and less sensitive to
the remaining data points (weakness).
42
School of Public Health
Mode
• The mode is the most frequently occurring value
among all the observations in a set of data.
• It is not influenced by extreme values.
• It is possible to have more than one mode or no
mode.
• It is not a good summary of the majority of the
data.
43
Mode
0
2
4
6
8
10
12
14
16
18
20
Mode
T. Ancelle, D. Coulombie
Mode
44
School of Public Health
• It is a value which occurs most frequently in
a set of values.
• If all the values are different there is no
mode, on the other hand, a set of values
may have more than one mode.
a) Ungrouped data
45
School of Public Health
• Example
• Data are: 1, 2, 3, 4, 4, 4, 4, 5, 5, 6
• Mode is 4 “Unimodal”
• Example
• Data are: 1, 2, 2, 2, 3, 4, 5, 5, 5, 6, 6, 8
• There are two modes - 2 & 5
• This distribution is said to be “bi-modal”
• Example
• Data are: 2.62, 2.75, 2.76, 2.86, 3.05, 3.12
• No mode, since all the values are different
Cont.
46
School of Public Health
b) Grouped data
• To find the mode of grouped data, we
usually refer to the modal class, where the
modal class is the class interval with the
highest frequency.
• If a single value for the mode of grouped
data must be specified, it is taken as the
mid-point of the modal class interval.
47
Cont.
48
School of Public Health
Properties of mode
 It is not affected by extreme values
 It can be calculated for distributions with open
end classes
 Often its value is not unique
 The main drawback of mode is that often it
does not exist
49
School of Public Health
Cont.
Which measure of central tendency is best with a
given set of data?
• Two factors are important in making this decisions:
– The scale of measurement (type of data)
– The shape of the distribution of the
observations
50
School of Public Health
• The mean can be used for discrete and
continuous data.
• The median is appropriate for discrete and
continuous data as well, but can also be used for
ordinal data.
• The mode can be used for all types of data, but
may be especially useful for nominal and ordinal
measurements.
Cont.
51
School of Public Health
(A) Symmetric and unimodal distribution —
Mean, median, and mode should all be
approximately the same
Mean, Median & Mode
Relationship between Mean, Median and Mode
52
School of Public Health
(A) Bimodal — Mean and median should be
about the same, but may take a value that is
unlikely to occur; two modes might be best
Cont.
53
School of Public Health
(C) Skewed to the right (positively skewed) —
Mean is sensitive to extreme values, so median
might be more appropriate
Mode
Median
Mean
Cont.
54
School of Public Health
(D) Skewed to the left (negatively skewed) —
Same as (c)
Mode
Median
Mean
Cont.
55
School of Public Health
Measures of Dispersion
56
School of Public Health
Consider the following two sets of data:
A: 177 193 195 209 226 Mean = 200
B: 192 197 200 202 209 Mean = 200
Two or more sets may have the same mean and/or median but they may be
quite different.
57
School of Public Health
These two distributions have the same mean,
median, and mode
58
• MCT are not enough to give a clear
understanding about the distribution of the
data.
• We need to know something about the
variability or spread of the values — whether
they tend to be clustered close together, or
spread out over a broad range
Cont.
59
School of Public Health
• Measures that quantify the variation or dispersion of a
set of data from its central location
• Dispersion refers to the variety exhibited by the values
of the data.
• The amount may be small when the values are close
together.
• If all the values are the same, no dispersion
Measures of Dispersion
60
• Measures of dispersion include:
– Range
– Inter-quartile range
– Variance
– Standard deviation
– Coefficient of variation
– Standard error
– Others
Cont.
61
School of Public Health
Range (R)
• The difference between the largest and smallest
observations in a sample.
• Range = Maximum value – Minimum value
• Example –
– Data values: 5, 9, 12, 16, 23, 34, 37, 42
– Range = 42-5 = 37
• Data set with higher range exhibit more variability
62
School of Public Health
Properties of range
 It is the simplest crude measure and can be easily
understood
 It takes into account only two values which causes it to
be a poor measure of dispersion
 Very sensitive to extreme observations
 The larger the sample size, the larger the
range
63
School of Public Health
Interquartile range (IQR)
• Indicates the spread of the middle 50% of the
observations, and used with median
IQR = Q3 - Q1
• Example: Suppose the first and third quartile for weights
of girls 12 months of age are 8.8 Kg and 10.2 Kg,
respectively.
IQR = 10.2 Kg – 8.8 Kg
i.e., 50% of the infant girls weigh between 8.8 and 10.2 Kg.
64
• It is a simple and versatile measure
• It encloses the central 50% of the observations
• It is not based on all observations but only on two
specific values
• It is important in selecting cut-off points in the
formulation of clinical standards
• Since it excludes the lowest and highest 25% values, it
is not affected by extreme values
• Less sensitive to the size of the sample
Properties of IQR:
65
School of Public Health
• The variance is the average of the squares of the
deviations taken from the mean.
• It is squared because the sum of the deviations of the
individual observations of a sample about the sample
mean is always 0
0 = ( )
• The variance can be thought of as an average of
squared deviations
 -x
xi
Variance (2, s2)
66
• Variance is used to measure the dispersion of
values relative to the mean.
• When values are close to their mean (narrow
range) the dispersion is less than when there
is scattering over a wide range.
– Population variance = σ2
– Sample variance = S2
Cont.
67
a) Ungrouped data
 Let X1, X2, ..., XN be the measurement on N
population units, then:
mean.
population
the
is
N
X
=
where
N
)
(X
N
1
=
i
i
N
1
i
2
i
2








Cont.
68
School of Public Health
A sample variance is calculated for a sample of individual values (X1, X2,
… Xn) and uses the sample mean (e.g:- ) rather than the population
mean µ.
Cont.
𝑿
69
School of Public Health
Degrees of freedom
• In computing the variance there are (n-1) degrees of
freedom because only (n-1) of the deviations are
independent from each other
• The last one can always be calculated from the
others automatically.
• This is because the sum of the deviations from their
mean (Xi-Mean) must add to zero.
70
School of Public Health
b) Grouped data
where
mi = the mid-point of the ith class interval
fi = the frequency of the ith class interval
= the sample mean
k = the number of class intervals
S
(m x) f
f - 1
2
i
2
i
i=1
k
i
i=1
k




x
71
 The main disadvantage of variance is that its
unit is the square of the unit of the original
measurement values.
 The variance gives more weight to the extreme
values as compared to those which are near to
mean value, because the difference is squared
in variance.
• The drawbacks of variance are overcome by the
standard deviation.
Properties of Variance:
72
School of Public Health
Standard deviation (, s)
• It is the square root of the variance.
• This produces a measure having the same
scale as that of the individual values.
 
 2
and S = S2
73
• Following are the survival times of n=11
patients after heart transplant surgery.
• The survival time for the “ith” patient is
represented as Xi for i= 1, …, 11.
• Calculate the sample variance and SD?
Example:
74
School of Public Health 75
Example. Compute the variance and SD of the age of 169 subjects from the
grouped data.
Mean = 5810.5/169 = 34.48 years
S2 = 20199.22/169-1 = 120.23
SD = √S2 = √120.23 = 10.96
Class
interval (mi) (fi) (mi-Mean) (mi-Mean)2 (mi-Mean)2 fi
10-19
20-29
30-39
40-49
50-59
60-69
14.5
24.5
34.5
44.5
54.5
64.5
4
66
47
36
12
4
-19.98
-9-98
0.02
10.02
20.02
30.02
399.20
99.60
0.0004
100.40
400.80
901.20
1596.80
6573.60
0.0188
3614.40
4809.60
3604.80
Total 169 1901.20 20199.22
Example:
76
School of Public Health
Properties of SD
• The SD has the advantage of being expressed in the
same units of measurement as the mean
• SD is considered to be the best measure of dispersion
and is used widely because of the properties of the
theoretical normal curve.
• However, the drawback of SD is if the units of
measurements of variables of two data sets is not the
same.
77
School of Public Health
Coefficient of variation (CV)
• When two data sets have different units of
measurements, or their means differ
sufficiently in size, the CV should be used as a
measure of dispersion.
• It is the best measure to compare the
variability of two series of sets of
observations.
• Data with less coefficient of variation is
considered more consistent.
78
100
x
SD
CV 

• “Cholesterol is more variable than systolic blood
pressure”
SD Mean CV (%)
SBP
Cholesterol
15mm
40mg/dl
130mm
200mg/dl
11.5
20.0
•CV is the ratio of the SD to the mean multiplied by 100.
Cont.
79
School of Public Health
NOTE:
• The range often appears with the median as a
numerical summary measure
• The IQR is used with the median as well
• The SD is used with the mean
• For nominal and ordinal data, a table or graph is often
more effective than any numerical summary measure
80
School of Public Health
Probability and Probability
Distributions
81
School of Public Health
Probability
• Chance of observing a particular outcome.
• Likelihood of an event.
• Probability theory developed from the study of
games of chance like dice and cards.
• A process like flipping a coin, rolling a die or
drawing a card from a deck are probability
experiments.
82
School of Public Health
Why Probability in Statistics?
• Results are not certain
• To evaluate how accurate our results are:
– Given how our data were collected, are our
results accurate ?
– Given the level of accuracy needed, how many
observations need to be collected ?
83
School of Public Health
When can we talk about probability ?
When dealing with a process that has an uncertain
outcome
Experiment = any process with an uncertain outcome
• When an experiment is performed, one and only one
outcome is obtained
• Event = something that may happen or not when the
experiment is performed
84
School of Public Health
Two Categories of Probability
• Objective and Subjective Probabilities.
• Objective probability
1) Classical probability and
2) Relative frequency probability.
85
School of Public Health
Classical Probability
• Is based on gambling ideas
• Rolling a die -
– There are 6 possible outcomes:
– Total ways = {1, 2, 3, 4, 5, 6}.
• Each is equally likely
– P(i) = 1/6, i=1,2,...,6.
P(1) = 1/6
P(2) = 1/6
 …….
P(6) = 1/6
SUM = 1
86
• Definition: If an event can occur in N mutually exclusive
and equally likely ways, and if m of these posses a
characteristic, E, the probability of the occurrence of E =
m/N.
P(E)= the probability of E = P(E) = m/N
• If we toss a die, what is the probability of 4 coming up?
m = 1(which is 4) and N = 6
The probability of 4 coming up is 1/6.
Cont.
87
School of Public Health
Relative Frequency Probability
• The proportion of times the event A occurs — in
a large number of trials repeated under
essentially identical conditions
• Definition: If a process is repeated a large number of
times (n), and if an event with the characteristic E occurs
m times, the relative frequency of E,
Probability of E = P(E) = m/n.
88
• If you toss a coin 100 times and head comes up 40 times,
P(H) = 40/100 = 0.4.
• If we toss a coin 10,000 times and the head comes up 5562,
P(H) = 0.5562.
• Therefore, the longer the series and the longer sample size, the closer the
estimate to the true value.
Example:
Of 158 people who attended a dinner party, 99 were ill.
P (Illness) = 99/158 = 0.63 = 63%.
Cont.
89
School of Public Health
Subjective Probability
• Personalistic (represents one’s degree of belief in the
occurrence of an event).
• Personal assessment of which is more effective to
provide cure – traditional/modern
• Personal assessment of which sports team will win a
match.
• Also uses classical and relative frequency methods to
assess the likelihood of an event.
90
School of Public Health
• E.g., If someone says that he is 95% certain that
a cure for EBOLA will be discovered within 5
years, then he means that:
P(discovery of cure for EBOLA within 5 years) = 95%
= 0.95
• Although the subjective view of probability has enjoyed
increased attention over the years, it has not fully accepted by
scientists.
Cont.
91
School of Public Health
Mutually Exclusive Events
 Two events A and B are mutually exclusive if they
cannot both happen at the same time
P (A ∩ B) = 0
• Example:
– A coin toss cannot produce heads and tails
simultaneously.
– Weight of an individual can’t be classified simultaneously
as “underweight”, “normal”, “overweight”
92
School of Public Health
Independent Events
• Two events A and B are independent if the
probability of the first one happening is the
same no matter how the second one turns
out. OR. The outcome of one event has no effect on the occurrence or
non-occurrence of the other.
P(A∩B) = P(A) x P(B) (Independent events)
P(A∩B) ≠ P(A) x P(B) (Dependent events)
Example:
– The outcomes on the first and second coin tosses
are independent
93
School of Public Health
Intersection and union
• The intersection of two events A and B, A ∩ B, is the event that A and B
happen simultaneously
P ( A and B ) = P (A ∩ B )
• Let A represent the event that a randomly selected newborn is LBW, and B
the event that he or she is from a multiple birth
• The intersection of A and B is the event that the infant is both LBW and
from a multiple birth
• The union of A and B, A U B, is the event that either A happens or B
happens or they both happen simultaneously
P ( A or B ) = P ( A U B )
• In the example above, the union of A and B is the event that the newborn is
either LBW or from a multiple birth, or both
94
School of Public Health
Properties of Probability
1. The numerical value of a probability always lies
between 0 and 1, inclusive.
 A value 0 means the event can not occur
 A value 1 means the event definitely will occur
 A value of 0.5 means that the probability that the event
will occur is the same as the probability that it will not
occur.
2. The sum of the probabilities of all mutually exclusive
outcomes is equal to 1.
P(E1) + P(E2 ) + .... + P(En ) = 1.
95
3. For two mutually exclusive events A and B,
P(A or B ) = P(AUB)= P(A) + P(B).
 If not mutually exclusive:
P(A or B) = P(A) + P(B) - P(A and B)
4. The complement of an event A, denoted by Ā or Ac, is the event that
A does not occur
• Consists of all the outcomes in which event A does NOT occur
P(Ā) = P(not A) = 1 – P(A)
• Ā occurs only when A does not occur.
• These are complementary events.
Cont.
96
School of Public Health
Basic Probability Rules
1. Addition rule
 If events A and B are mutually exclusive:
P(A or B) = P(A) + P(B)
P(A and B) = 0
 More generally:
P(A or B) = P(A) + P(B) - P(A and B)
P(event A or event B occurs or they both occur)
97
School of Public Health
Example:
The probabilities below represent years of
schooling completed by mothers of newborn infants
1. What is the probability that a mother has completed < 12
years of schooling?
2. What is the probability that a mother has completed 12 or more years
of schooling?
98
1. The probability that a mother has
completed < 12 years of schooling is:
P( 8 years) = 0.056 and
P(9-11 years) = 0.159
• Since these two events are mutually exclusive,
P( 8 or 9-11) = P( 8 U 9-11)
= P( 8) + P(9-11)
= 0.056+0.159
= 0.215
Cont.
99
2. The probability that a mother has completed 12
or more years of schooling is:
P(12) = P(12 or 13-15 or 16)
= P(12 U 13-15 U 16)
= P(12)+P(13-15)+P(16)
= 0.321+0.218+0.230
= 0.769
Cont.
100
– If A and B are independent events, then
P(A ∩ B) = P(A) × P(B)
– More generally,
P(A ∩ B) = P(A) P(B|A) = P(B) P(A|B)
P(A and B) denotes the probability that A and B
both occur at the same time.
2. Multiplication rule
101
School of Public Health
Conditional Probability
• The conditional probability that event B has
occurred given that event A has already
occurred is denoted P(B|A) and is defined
provided that P(A) ≠ 0.
102
Example:
A study investigating the effect of prolonged exposure to
bright light on retina damage in premature infants.
Retinopathy
YES
Retinopathy
NO
TOTAL
Bright light
Reduced light
18
21
3
18
21
39
TOTAL 39 21 60
Cont.
103
• The probability of developing retinopathy is:
P (Retinopathy) = No. of infants with retinopathy
Total No. of infants
= (18+21)/(21+39)
= 0.65
Cont.
104
• We want to compare the probability of
retinopathy, given that the infant was exposed
to bright light, with that the infant was exposed
to reduced light.
• Exposure to bright light and exposure to
reduced light are conditioning events, events
we want to take into account when calculating
conditional probabilities.
Cont.
105
• The conditional probability of retinopathy, given
exposure to bright light, is:
• P(Retinopathy/exposure to bright light) =
No. of infants with retinopathy exposed to bright light
No. of infants exposed to bright light
= 18/21 = 0.86
Cont.
106
• P(Retinopathy/exposure to reduced light) =
# of infants with retinopathy exposed to reduced light
No. of infants exposed to reduced light
= 21/39 = 0.54
• The conditional probabilities suggest that premature infants
exposed to bright light have a higher risk of retinopathy than
premature infants exposed to reduced light.
Cont.
107
 For independent events A and B
P(A/B) = P(A).
 For non-independent events A and B
P(A and B) = P(A/B) P(B)
(General Multiplication Rule)
Cont.
108
Exercise:
Culture and Gonodectin (GD) test results for 240 Urethral
Discharge Specimens
GD Test
Result
Culture Result
yes
Gonorrhea
No
Gonorrhea
Total
Positive 175 9 184
Negative 8 48 56
Total 183 57 240
109
1. What is the probability that a man has
gonorrhea?
2. What is the probability that a man has a
positive GD test?
3. What is the probability that a man has a
positive GD test and gonorrhea?
4. What is the probability that a man has a
negative GD test and does not have gonorrhea
5. What is the probability that a man with
gonorrhea has a positive GD test?
Cont.
110
6. What is the probability that a man does not
have gonorrhea has a negative GD test?
7. What is the probability that a man does not
have gonorrhea has a positive GD test?
8. What is the probability that a man with positive
GD test has gonorrhea?
Cont.
111
School of Public Health
Probability Distributions
• It is the way data are distributed, in order to draw
conclusions about a set of data
• Random Variable = Any quantity or characteristic that is
able to assume a number of different values such that any
particular outcome is determined by chance
• The probability distribution of a random variable is a
table, graph, or mathematical formula that gives the
probabilities with which the random variable takes
different values or ranges of values.
112
School of Public Health
Discrete Probability Distributions
• For a discrete random variable, the probability
distribution specifies each of the possible outcomes
of the random variable along with the probability
that each will occur
• Examples can be:
– Frequency distribution
– Relative frequency distribution
– Cumulative frequency
113
• We represent a potential outcome of the
random variable X by x
 0 ≤ P(X = x) ≤ 1
 ∑ P(X = x) = 1
Cont.
114
School of Public Health
The following data shows the number of diagnostic
services a patient receives
115
• What is the probability that a patient receives
exactly 3 diagnostic services?
P(X=3) = 0.031
• What is the probability that a patient receives at
most one diagnostic service?
P (X≤1) = P(X = 0) + P(X = 1)
= 0.671 + 0.229
= 0.900
Cont.
116
• What is the probability that a patient receives
at least four diagnostic services?
P (X≥4) = P(X = 4) + P(X = 5)
= 0.010 + 0.006
= 0.016
Cont.
117
School of Public Health
Probability distributions can also
be displayed using a graph
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 1 2 3 4 5
No. of diagnostic services, x
Probability,
X=x
118
• Examples of discrete probability distributions
are the binomial distribution and the Poisson
distribution.
Cont.
119
School of Public Health
Binomial Distribution
• Consider dichotomous (binary) random variable
• Is based on Bernoulli trial
– When a single trial of an experiment can result in only one of
two mutually exclusive outcomes (success or failure; dead or
alive; sick or well, male or female)
120
School of Public Health
Example:
• We are interested in determining whether a newborn
infant will survive until his/her 70th birthday
• Let Y represent the survival status of the
child at age 70 years
• Y = 1 if the child survives and Y = 0 if he/she does not
• The outcomes are mutually exclusive and exhaustive
• Suppose that 72% of infants born survive to age 70 years
P(Y = 1) = p = 0.72
P(Y = 0) = 1 − p = 0.28
121
School of Public Health
Characteristics of a Binomial Distribution
• The experiment consist of n identical trials[Fixed].
• Only two possible outcomes on each trial.
• The probability of A (success), denoted by p, remains the
same from trial to trial. The probability of B (failure),
denoted by q,
q = 1- p.
• The trials are independent.
• n and  are the parameters of the binomial distribution.
• The mean is n and the variance is n(1- )
122
• If an experiment is repeated n times and the
outcome is independent from one trial to
another, the probability that outcome A occurs
exactly x times is:
• P (X=x) = , x = 0, 1, 2, ..., n.
=
Cont.
123
• n denotes the number of fixed trials
• x denotes the number of successes in
the n trials
• p denotes the probability of success
• q denotes the probability of failure (1- p)
=
• Represents the number of ways of selecting x objects out of n where the
order of selection does not matter.
• where n!=n(n-1)(n-2)…(1) , and 0!=1
Cont.
124
School of Public Health
Example:
• Suppose we know that 40% of a certain
population are cigarette smokers. If we take a
random sample of 10 people from this
population, what is the probability that we will
have exactly 4 smokers in our sample?
125
• If the probability that any individual in the
population is a smoker to be P=.40, then the
probability that x=4 smokers out of n=10 subjects
selected is:
P(X=4) =10C4(0.4)4
(1-0.4)10-4
= 10C4(0.4)4
(0.6)6
= 210(.0256)(.04666)
= 0.25
• The probability of obtaining exactly 4 smokers in the
sample is about 0.25.
Cont.
126
• We can compute the probability of observing zero
smokers out of 10 subjects selected at random, exactly
1 smoker, and so on, and display the results in a table,
as given, below.
• The third column, P(X ≤ x), gives the cumulative
probability. E.g. the probability of selecting 3 or fewer
smokers into the sample of 10 subjects is
P(X ≤ 3) =.3823, or about 38%.
Cont.
127
Cont.
128
School of Public Health
The probability in the above table can be
converted into the following graph
0
0.05
0.1
0.15
0.2
0.25
0.3
0 1 2 3 4 5 6 7 8 9 10
No. of Smokers
Probability
Cont.
129
School of Public Health
Exercise
Each child born to a particular set of parents
has a probability of 0.25 of having blood type
O. If these parents have 5 children.
What is the probability that
a. Exactly two of them have blood type O
b. At most 2 have blood type O
c. At least 4 have blood type O
d. 2 do not have blood type O.
130
School of Public Health
Solution for ‘a’
a.)
2637
.
0
)
75
.
0
(
)
25
.
0
(
2
5
=
2)
P(x 2
-
5
2








131
School of Public Health
2. The Poisson Distribution
• Is a discrete probability distribution used to
model the number of occurrences of an event
that takes place infrequently in time or space
• Applicable for counts of events over a given
interval of time, for example:
– number of patients arriving at an emergency
department in a day
– number of new cases of HIV diagnosed at a clinic in
a month
132
• In such cases, we take a sample of days and observe the
number of patients arriving at the emergency department
on each day,
• We are observing a count or number of events, rather than a
yes/no or success/ failure outcome for each subject or trial,
as in the binomial.
• Suppose events happen randomly and independently in time
at a constant rate. If events happen with rate  events per
unit time, the probability of x events happening in unit time
is:
Cont.
133
P(x) =
e
x!
x
 

• where x = 0, 1, 2, . . .∞
• x is a potential outcome of X
• The constant λ (lambda) represents the rate at
which the event occurs, or the expected number
of events per unit time
• e = 2.71828
• It depends up on just one parameter, which is
the µ number of occurrences (λ).
Cont.
134
School of Public Health
Example
• The daily number of new registrations of
cancer is 2.2 on average.
What is the probability of
a) Getting no new cases
b) Getting 1 case
c) Getting 2 cases
d) Getting 3 cases
e) Getting 4 cases
135
School of Public Health
Solutions
a)
b) P(X=1) = 0.244
c) P(X=2) = 0.268
d) P(X=3) = 0.197
e) P(X=4) = 0.108
111
.
0
!
0
)
2
.
2
(
)
0
(
2
.
2
0




e
X
P
136
0 1 2 3 4 5 6 7
0.3
0.2
0.1
0.0
Probability
Poisson distribution with mean 2.2
137
Example:
• In a given geographical area, cases of tetanus are
reported at a rate of λ = 4.5/month
• What is the probability that 0 cases of tetanus will
be reported in a given month?
138
• What is the probability that 1 case of tetanus
will be reported?
Cont.
139
School of Public Health
Continuous Probability Distributions
• A continuous random variable X can take on any value in
a specified interval or range
• The probability distribution of X is represented by a
smooth curve called a probability density function
• The area under the smooth curve is equal to 1
• The area under the curve between any two points x1 and
x2 is the probability that X takes a value between x1 and
x2
140
• The probability associated with any one particular value
is equal to 0
• Therefore, P(X=x) = 0
• Also, P(X ≥ x) = P(X > x)
• We calculate:
Pr [ a < X < b], the probability of an
interval of values of X.
Cont.
141
School of Public Health
The Normal distribution
• Frequently called the “Gaussian distribution” or
bell-shape curve.
• Variables such as blood pressure, weight,
height, serum cholesterol level, and IQ score —
are approximately normally distributed
142
A random variable is said to have a normal distribution if it
has a probability distribution that is symmetric and bell-
shaped
Cont.
143
• A random variable X is said to follow ND, if and
only if, its probability density function is:
, - < x < .
f(x) =
1
2
e
x-
2
 









1
2
Cont.
144
π (pi) = 3.14159
e = 2.71828, x = Value of X
Range of possible values of X: -∞ to +∞
µ = Expected value of X (“the long run
average”)
σ2 = Variance of X.
µ and σ are the parameters of the normal
distribution — they completely define its
shape
Cont.
145
1. The mean µ tells you about location -
– Increase µ - Location shifts right
– Decrease µ – Location shifts left
– Shape is unchanged
2. The variance σ2 tells you about narrowness or
flatness of the bell -
– Increase σ2 - Bell flattens. Extreme values are more likely
– Decrease σ2 - Bell narrows. Extreme values are less likely
– Location is unchanged
Cont.
146
147
School of Public Health
Properties of the Normal Distribution
1. It is symmetrical about its mean, .
2. The mean, the median and mode are almost equal. It is unimodal.
3. The total area under the curve about the x-axis is 1 square unit.
4. The curve never touches the x-axis.
5. As the value of  increases, the curve becomes more and more
flat and vice versa.
6. The distribution is completely determined by the parameters 
and .
148
149
• We cannot tabulate every possible
distribution
• Tabulated normal probability calculations are
available only for the ND with µ = 0 and σ2=1.
Cont.
150
School of Public Health
Standard Normal Distribution
 It is a normal distribution that has a mean equal
to 0 and a SD equal to 1, and is denoted by N(0,
1).
 The main idea is to standardize all the data that is
given by using Z-scores.
 These Z-scores can then be used to find the area
(and thus the probability) under the normal
curve.
151
School of Public Health
The standard normal distribution has
mean 0 and variance 1
• Approximately 68% of the area under the standard normal
curve lies between ±1, about 95% between ±2, and about 99%
between ±2.5
152
School of Public Health
Z - Transformation
• If a random variable X~N(,) then we can
transform it to a SND with the help of Z-
transformation
Z = x - 

• Z represents the Z-score for a given x value
153
• Consider redefining the scale to be in terms of
how many SDs away from mean for normal
distribution, μ=110 and σ=15.
Value
X = 50 65 80 95 110 125 140 155 170
Z = -4 -3 -2 -1 0 1 2 3 4
SDs from mean using
Z=
𝑥−𝜇
𝜎
=
𝑥−110
15
Cont.
154
• This process is known as standardization and
gives the position on a normal curve with μ=0
and σ=1, i.e., the SND, Z.
• A Z-score is the number of standard deviations
that a given x value is above or below the
mean.
Cont.
155
School of Public Health
Some Useful Tips
156
a) What is the probability that z < -1.96?
(1) Sketch a normal curve
(2) Draw a perpendicular line for z = -1.9
(3) Find the area in the table
(4) The answer is the area to the left of the line P(z < -
1.96) = 0.0250
157
b) What is the probability that -1.96 < z < 1.96?
The area between the values
P(-1.96 < z < 1.96) = 0.9750 - 0.0250 = 0.9500
158
c) What is the probability that z > 1.96?
• The answer is the area to the right of the line; found by
subtracting table value from 1.0000;
P(z > 1.96) =1.0000 - 0.9750 = 0.0250
159
160
School of Public Health
Exercise
1. Compute P(-1 ≤ Z ≤ 1.5)
Ans: 0.7745
2. Find the area under the SND from 0 to 1.45
Ans: 0.4265
3. Compute P(-1.66 < Z < 2.85)
Ans: 0.9493
161
School of Public Health
Applications of the Normal Distribution
Example:
• The diastolic blood pressures of males 35–44 years
of age are normally distributed with µ = 80 mm Hg
and σ2 = 144 mm Hg2
[σ = 12 mm Hg].
• Let individuals with BP above 95 mm Hg are
considered to be hypertensive
162
Cont.
163
a. What is the probability that a randomly selected
male has a BP above 95 mm Hg?
Approximately 10.6% of this population would be
classified as hypertensive.
b. What is the probability that a randomly
selected male has a DBP above 110 mm Hg?
Z = 110 – 80 = 2.50
12
P (Z > 2.50) = 0.0062
• Approximately 0.6% of the population has a
DBP above 110 mm Hg
Cont.
164
c. What is the probability that a randomly selected
male has a DBP below 60 mm Hg?
Z = 60 – 80 = -1.67
12
P (Z < -1.67) = 0.0475
• Approximately 4.8% of the population has a
DBP below 60 mm Hg
Cont.
165
School of Public Health
Other Distributions
1. Student t-distribution
2. F- Distribution
3. 2 –Distribution
166
Sampling and Sampling
Distributions
167
• Researchers often use sample survey
methodology to obtain information about a
larger population by selecting and measuring a
sample from that population.
• Since population is too large, we rely on the
information collected from the sample.
• Inferences about the population are based on the
information from the sample drawn from that
population.
Cont.
168
• A sample is a collection of individuals selected
from a larger population.
• Sampling enables us to estimate the
characteristic of a population by directly
observing a portion of the population.
Cont.
169
Sample Information
Population
Cont.
170
Steps needed to select a sample and ensure that
this sample will fulfill its goals.
1. Establish the study's objectives
– The first step in planning a useful and efficient survey is
to specify the objectives with as much detail as
possible.
– Without objectives, the survey is unlikely to generate
valuable results.
– Clarifying the aims of the survey is critical to its
ultimate success.
– The initial users and uses of the data should be
identified at this stage.
171
School of Public Health
2. Define the target population
– The target population is the total population for which
the information is required.
– Specifically, the target population is defined by the
following characteristics:
• Nature of data required
• Geographic location
• Reference period
• Other characteristics, such as socio-demographic characteristics
Cont.
172
3. Decide on the data to be collected
– The data requirements of the survey must be established.
– To ensure that the requirements are operationally sound, the
necessary data terms and definitions also need to be determined.
4. Set the level of precision
– There is a level of uncertainty associated with estimates
coming from a sample.
– The sample-to-sample variation is what causes the sampling
error.
– Researchers can estimate the sampling error associated
with a particular sampling plan, and try to minimize it.
Cont.
173
5. Decide on the methods on measurement
– Choose measuring instrument and method of approach to
the population
– Data about a person’s state of health may be obtained from
statements that he/she makes or from a medical
examination
– The survey may employ a self-administered questionnaire,
an interviewing
6. Preparing Frame
– List of all members of the population
– The elements must not overlap
Cont.
174
School of Public Health
Sampling
• The process of selecting a portion of the
population to represent the entire population.
• A main concern in sampling:
– Ensure that the sample represents the population,
and
– The findings can be generalized.
175
School of Public Health
Advantages of sampling:
• Feasibility: Sampling may be the only feasible method of
collecting information.
• Reduced cost: Sampling reduces demands on resource
such as finance, personnel, and material.
• Greater accuracy: Sampling may lead to better accuracy of
collecting data
• Sampling error: Precise allowance can be made for
sampling error
• Greater speed: Data can be collected and summarized
more quickly
176
School of Public Health
Disadvantages of sampling:
• There is always a sampling error.
• Sampling may create a feeling of
discrimination within the population.
• Sampling may be inadvisable where every
unit in the population is legally
required to have a record.
177
School of Public Health
Errors in sampling
1) Sampling error: Errors introduced due to errors
in the selection of a sample.
– They cannot be avoided or totally eliminated.
2) Non-sampling error:
- Observational error
- Respondent error
- Lack of preciseness of definition
- Errors in editing and tabulation of data
178
School of Public Health
Sampling Methods
Two broad divisions:
A. Probability sampling methods
B. Non-probability sampling methods
179
School of Public Health
Probability sampling
• Involves random selection of a sample
• A sample is obtained in a way that ensures every
member of the population to have a known, non
zero probability of being included in the sample.
• The method chosen depends on a number of
factors, such as
– the available sampling frame,
– how spread out the population is,
– how costly it is to survey members of the population
180
School of Public Health
Most common probability
sampling methods
1. Simple random sampling
2. Systematic random sampling
3. Stratified random sampling
4. Cluster sampling
5. Multi-stage sampling
181
School of Public Health
1. Simple random sampling
• Involves random selection
• Each member of a population has an equal
chance of being included in the sample.
182
• To use a SRS method:
– Make a numbered list of all the units in the
population
– Each unit should be numbered from 1 to N (where
N is the size of the population)
– Select the required number.
Cont.
183
• The randomness of the sample is ensured
by:
• use of “lottery’ methods
• a table of random numbers
Cont.
184
School of Public Health
Example
• Suppose your school has 500 students and
you need to conduct a short survey on the
quality of the food served in the cafeteria.
• You decide that a sample of 10 students
should be sufficient for your purposes.
• In order to get your sample, you assign a
number from 1 to 500 to each student in
your school.
185
• To select the sample, you use a table of
randomly generated numbers.
• Pick a starting point in the table (a row and
column number) and look at the random
numbers that appear there. In this case, since
the data run into three digits, the random
numbers would need to contain three digits as
well.
Cont.
186
• Ignore all random numbers after 500 because they do
not correspond to any of the students in the school.
• Remember that the sample is without replacement,
so if a number recurs, skip over it and use the next
random number.
• The first 10 different numbers between 001 and 500
make up your sample.
Cont.
187
• SRS has certain limitations:
– Requires a sampling frame.
– Difficult if the reference population is dispersed.
– Minority subgroups of interest may not be
selected.
Cont.
188
School of Public Health
2. Systematic random sampling
• Sometimes called interval sampling,
systematic sampling means that there is a
gap, or interval, between each selected unit in
the sample
• The selection is systematic rather than
randomly
189
• Important if the reference population is
arranged in some order:
– Order of registration of patients
– Numerical number of house numbers
– Student’s registration books
• Taking individuals at fixed intervals (every kth)
based on the sampling fraction, eg. if the
sample includes 20%, then every fifth.
Cont.
190
School of Public Health
Steps in systematic random sampling
1. Number the units on your frame from 1 to N (where N is
the total population size).
2. Determine the sampling interval (K) by dividing the number
of units in the population by the desired sample size.
3. Select a number between one and K at random. This
number is called the random start and would be the first
number included in your sample.
4. Select every Kth unit after that first number
Note: Systematic sampling should not be used when a cyclic
repetition is inherent in the sampling frame.
191
School of Public Health
Example
• To select a sample of 100 from a population of 400, you
would need a sampling interval of 400 ÷ 100 = 4.
• Therefore, K = 4.
• You will need to select one unit out of every four units
to end up with a total of 100 units in your sample.
• Select a number between 1 and 4 from a table of
random numbers.
192
• If you choose 3, the third unit on your frame
would be the first unit included in your
sample;
• The sample might consist of the following
units to make up a sample of 100: 3 , 7, 11, 15,
19...395, 399 (up to N, which is 400 in this
case).
Cont.
193
• Using the above example, you can see that
with a systematic sample approach there are
only four possible samples that can be
selected, corresponding to the four possible
random starts:
A. 1, 5, 9, 13...393, 397
B. 2, 6, 10, 14...394, 398
C. 3, 7, 11, 15...395, 399
D. 4, 8, 12, 16...396, 400
Cont.
194
School of Public Health
3. Stratified random sampling
• It is done when the population is known to be have
heterogeneity with regard to some factors and those
factors are used for stratification
• Using stratified sampling, the population is divided into
homogeneous, mutually exclusive groups called strata,
and
• A population can be stratified by any variable that is
available for all units prior to sampling (e.g., age, sex,
province of residence, income, etc.).
• A separate sample is taken independently from each
stratum.
195
School of Public Health
Why do we need to create strata?
• That it can make the sampling strategy more
efficient.
• A larger sample is required to get a more accurate
estimation if a characteristic varies greatly from
one unit to the other.
• For example, if every person in a population had
the same salary, then a sample of one individual
would be enough to get a precise estimate of the
average salary.
196
• Equal allocation:
– Allocate equal sample size to each stratum
• Proportionate allocation:
, j = 1, 2, ..., k where, k is
the number of strata and
– nj is sample size of the jth stratum
– Nj is population size of the jth stratum
– n = n1 + n2 + ...+ nk is the total sample size
– N = N1 + N2 + ...+ Nk is the total population
size
n
n
N
N
j j

Cont.
197
School of Public Health
4. Cluster sampling
• Sometimes it is too expensive to spread a sample
across the population as a whole.
• Travel costs can become expensive if interviewers
have to survey people from one end of the country to
the other.
• To reduce costs, researchers may choose a cluster
sampling technique
• The clusters should be homogeneous, unlike stratified
sampling where by the strata are heterogeneous
198
School of Public Health
Steps in cluster sampling
• Cluster sampling divides the population into groups or
clusters.
• A number of clusters are selected randomly to represent the
total population, and then all units within selected clusters
are included in the sample.
• No units from non-selected clusters are included in the
sample—they are represented by those from selected
clusters.
• This differs from stratified sampling, where some units are
selected from each group.
199
School of Public Health
Example
• In a school based study, we assume students of
the same school are homogeneous.
• We can select randomly sections and include all
students of the selected sections only
200
• Sometimes a list of all units in the population is not
available, while a list of all clusters is either available or
easy to create.
• In most cases, the main drawback is a loss of efficiency
when compared with SRS.
• It is usually better to survey a large number of small
clusters instead of a small number of large clusters.
– This is because neighboring units tend to be more alike,
resulting in a sample that does not represent the whole
spectrum of opinions or situations present in the overall
population.
• Another drawback to cluster sampling is that you do not have
total control over the final sample size.
Cont.
201
School of Public Health
5. Multi-stage sampling
• Similar to the cluster sampling.
• But it involves picking a sample from within each chosen
cluster, rather than including all units in the cluster.
• This type of sampling requires at least two stages.
• In the first stage, large groups or clusters are identified and
selected.
• In the second stage, population units are picked from
within the selected clusters (using any of the possible
probability sampling methods) for a final sample.
202
• If more than two stages are used, the process of
choosing population units within clusters continues until
there is a final sample.
• Also, you do not need to have a list of all of the units in
the population. All you need is a list of clusters and list of
the units in the selected clusters.
• Admittedly, more information is needed in this type of
sample than what is required in cluster sampling.
However, multi-stage sampling still saves a great amount
of time and effort by not having to create a list of all the
units in a population.
Cont.
203
School of Public Health
B. Non-probability sampling
• The difference between probability and non-
probability sampling has to do with a basic
assumption about the nature of the population under
study.
• In probability sampling, every item has a known
chance of being selected.
• In non-probability sampling, there is an assumption
that there is an even distribution of a characteristic of
interest within the population.
204
• In non-probability sampling, since elements
are chosen arbitrarily, there is no way to
estimate the probability of any one element
being included in the sample.
• Also, no assurance is given that each item has
a chance of being included, making it
impossible either to estimate sampling
variability or to identify possible bias
Cont.
205
• Reliability cannot be measured in non-probability
sampling; the only way to address data quality is to
compare some of the survey results with available
information about the population.
• Still, there is no assurance that the estimates will meet
an acceptable level of error.
• Researchers are reluctant to use these methods
because there is no way to measure the precision of the
resulting sample.
Cont.
206
• Despite these drawbacks, non-probability
sampling methods can be useful when
descriptive comments about the sample itself
are desired.
• There are also other circumstances, such as
researches, when it is unfeasible or
impractical to conduct probability sampling.
Cont.
207
School of Public Health
The most common types of non-
probability sampling
1. Convenience or haphazard sampling
2. Volunteer sampling
3. Judgment sampling
4. Quota sampling
5. Snowball sampling technique
208
School of Public Health
1. Convenience or haphazard sampling
• Convenience sampling is sometimes referred
to as haphazard or accidental sampling.
• It is not normally representative of the target
population because sample units are only
selected if they can be accessed easily and
conveniently.
209
• The obvious advantage is that the method is
easy to use, but that advantage is greatly
offset by the presence of bias.
• Although useful applications of the technique
are limited, it can deliver accurate results
when the population is homogeneous.
Cont.
210
• For example, a scientist could use this method to
determine whether a lake is polluted or not.
• Assuming that the lake water is well-mixed, any
sample would yield similar information.
• A scientist could safely draw water anywhere on the
lake without bothering about whether or not the
sample is representative
Cont.
211
School of Public Health
2. Volunteer sampling
• As the term implies, this type of sampling occurs
when people volunteer to be involved in the
study.
• In psychological experiments or pharmaceutical
trials (drug testing), for example, it would be
difficult and unethical to enlist random
participants from the general public.
• In these instances, the sample is taken from a
group of volunteers.
212
• Sometimes, the researcher offers payment to
attract respondents.
• In exchange, the volunteers accept the
possibility of a lengthy, demanding or
sometimes unpleasant process.
Cont.
213
• Sampling voluntary participants as opposed
to the general population may introduce
strong biases.
• Often in opinion polling, only the people
who care strongly enough about the subject
tend to respond.
• The silent majority does not typically
respond, resulting in large selection bias.
Cont.
214
School of Public Health
3. Judgment sampling
• This approach is used when a sample is taken based
on certain judgments about the overall population.
• The underlying assumption is that the investigator
will select units that are characteristic of the
population.
• The critical issue here is objectivity: how much can
judgment be relied upon to arrive at a typical
sample?
215
• Judgment sampling is subject to the
researcher's biases and is perhaps even more
biased than haphazard sampling.
• Since any preconceptions the researcher may
have are reflected in the sample, large biases
can be introduced if these preconceptions are
inaccurate.
Cont.
216
• Researchers often use this method in
exploratory studies like pre-testing of
questionnaires and focus groups.
• They also prefer to use this method in
laboratory settings where the choice of
experimental subjects (i.e., animal, human)
reflects the investigator's pre-existing beliefs
about the population.
Cont.
217
• One advantage of judgment sampling is the
reduced cost and time involved in acquiring
the sample.
Cont.
218
School of Public Health
4. Quota sampling
• This is one of the most common forms of non-
probability sampling.
• Sampling is done until a specific number of
units (quotas) for various sub-populations have
been selected.
219
• Since there are no rules as to how these
quotas are to be filled, quota sampling is
really a means for satisfying sample size
objectives for certain sub-populations.
Cont.
220
• As with all other non-probability sampling
methods, in order to make inferences about
the population, it is necessary to assume
that persons selected are similar to those not
selected.
• Such strong assumptions are rarely valid.
Cont.
221
• The main argument against quota sampling is
that it does not meet the basic requirement of
randomness.
• Some units may have no chance of selection
or the chance of selection may be unknown.
• Therefore, the sample may be biased.
Cont.
222
• Quota sampling is generally less expensive than
random sampling.
• It is also easy to administer, especially considering
the tasks of listing the whole population, randomly
selecting the sample and following-up on non-
respondents can be omitted from the procedure.
Cont.
223
• Quota sampling is an effective sampling
method when information is urgently required
and can be carried out sampling frames.
• In many cases where the population has no
suitable frame, quota sampling may be the
only appropriate sampling method.
Cont.
224
School of Public Health
5. Snowball sampling
• A technique for selecting a research sample
where existing study subjects recruit future
subjects from among their acquaintances.
• Thus the sample group appears to grow like a
rolling snowball.
225
• This sampling technique is often used in hidden
populations which are difficult for researchers to
access; example populations would be drug users or
commercial sex workers.
• Because sample members are not selected from a
sampling frame, snowball samples are subject to
numerous biases. For example, people who have
many friends are more likely to be recruited into the
sample.
Cont.
226
School of Public Health
Estimation
School of Public Health
• Up until this point, we have assumed that the
values of the parameters of a probability
distribution are known.
• In the real world, the values of these population
parameters are usually not known
• Instead, we must try to say something about the
way in which a random variable is distributed
using the information contained in a sample of
observations
228
School of Public Health
• The process of drawing conclusions about an
entire population based on the data in a sample is
known as statistical inference.
• Methods of inference usually fall into one of two
broad categories:
** Estimation or Hypothesis testing **
• For now, we will focus on using the observations in
a sample to estimate a population parameter
229
School of Public Health
Estimation
• It is concerned with estimating the values of
specific population parameters based on
sample statistics.
• It is about using information in a sample to
make estimates of the characteristics
(parameters) of the source population.
230
School of Public Health
Estimation, Estimator & Estimate
♣ Estimation is the computation of a statistic from sample
data, often yielding a value that is an approximation
(guess) of its target, an unknown true population
parameter value.
♣ The statistic itself is called an estimator and can be of two
types - point estimator or interval estimator.
♣ The value or values that the estimator assumes are called
estimates.
231
School of Public Health
• Point estimation involves the calculation of a
single number to estimate the population
parameter
• Interval estimation specifies a range of
reasonable values for the parameter
Thus,
– A point estimate is of the form: [ Value ],
– Whereas, an interval estimate is of the form:
[ lower limit, upper limit ]
Point versus Interval Estimators
232
School of Public Health
1. Point Estimate
• A single numerical value used to estimate the
corresponding population parameter.
Sample Statistics are Estimators of Population Parameters
Sample mean, 𝑿
Sample variance, S2
Sample proportion,
Sample Odds Ratio, OŔ
Sample Relative Risk, RŔ
Sample correlation coefficient, r
µ
2
P or π
OR
RR
ρ
233
School of Public Health
2. Interval Estimation
• Interval estimation specifies a range of reasonable values for the
population parameter based on a point estimate.
• A confidence interval is a particular type of interval estimator and
Give a plausible range of values of the estimate likely to include
the “true” (population) value with a given confidence level.
Also give information about the precision of an estimate.
Wider CIs indicate less certainty.
CIs can also answer the question of whether or not an association
exists
234
School of Public Health
Confidence Level
• Confidence Level
– Confidence in which the interval will contain the
unknown population parameter
• P (L, U) = (1 - α)
235
School of Public Health
Estimation for Single Population
236
School of Public Health
1. CI for a Single Population Mean
(normally distributed)
A. Known variance (large sample size)
• There are 3 elements to a CI:
1. Point estimate
2. SE of the point estimate
3. Confidence coefficient
• Consider the task of computing a CI estimate of μ for a
population distribution that is normal with σ known.
• Available are data from a random sample of size = n.
237
School of Public Health
Assumptions
 Population standard deviation () is known
 Population is normally distributed
• A 100(1-)% C.I. for  is:
  is to be chosen by the researcher, most common values of  are 0.05,
0.01 and 0.1.
Cont.
238
School of Public Health
Margin of Error
(Precision of the estimate)
239
School of Public Health
Cont.
 As n increases, the CI decreases.
 As s increases, the length of CI increases.
 As the confidence level increases (α decreases),
the length of CI increases.
240
School of Public Health
Example:
1. Waiting times (in hours) at a particular hospital are
believed to be approximately normally distributed with
a variance of 2.25 hr.
a. A sample of 20 outpatients revealed a mean waiting
time of 1.52 hours. Construct the 95% CI for the
estimate of the population mean.
b. Suppose that the mean of 1.52 hours had resulted from
a sample of 32 patients. Find the 95% CI.
c. What effect does larger sample size have on the CI?
241
a.
)
17
.
2
,
87
.
0
(
65
.
52
.
1
)
33
(.
96
.
1
52
.
1
20
25
.
2
96
.
1
52
.
1






• We are 95% confident that the true mean waiting
time is between 0.87 and 2.17 hrs.
 An incorrect interpretation is that there is 95%
probability that this interval contains the true
population mean.
Solution:
242
b.
c. The larger the sample size makes the CI
narrower (more precision).
)
.05
2
,
99
(.
53
.
52
.
1
)
27
(.
96
.
1
52
.
1
32
25
.
2
96
.
1
52
.
1






Cont.
243
School of Public Health
Cont.
B. Unknown variance (small sample size, n ≤ 30)
• What if the  for the underlying population is
unknown and the sample size is small?
• As an alternative we use Student’s t
distribution.
244
School of Public Health
Cont.
245
School of Public Health
Example
• Standard error =
• t-value at 90% CL at 19 df =1.729
246
School of Public Health
Cont.
247
School of Public Health
Exercise
• Compute a 95% CI for the mean birth weight
based on n = 10, sample mean = 116.9 oz and
s =21.70.
• From the t Table, t9, 0.975 = 2.262
• Ans: (101.4, 132.4)
248
School of Public Health
2. CIs for single population proportion, p
• Is based on three elements of CI
– Point estimate
– SE of point estimate
– Confidence coefficient
249
School of Public Health
Cont.
250
School of Public Health
Example 1
• A random sample of 100 people shows that 25
are left-handed. Form a 95% CI for the true
proportion of left-handers.
251
School of Public Health
Interpretation
252
School of Public Health
Example 2
• Suppose that among 10,000 female operating-room nurses,
60 women have developed breast cancer over five years. Find
the 95% for p based on point estimate.
• Point estimate = 60/10,000 = 0.006
• The 95% CI for p is given by the interval:
• The 95% CI for p is:
253
School of Public Health
Hypothesis Testing
School of Public Health
• The purpose of Hypothesis Testing is to aid the
researcher in reaching a decision (conclusion)
concerning a population by examining a
sample from that population.
255
School of Public Health
Hypothesis
• Is a statement about one or more
• Is a claim (assumption) about a population
parameter
The purpose of Hypothesis Testing is to aid
the researcher in reaching a decision
(conclusion) concerning a population by
examining a sample from that population.
256
School of Public Health
Examples of Research Hypotheses
Population Mean
• The average length of stay of patients
admitted to the hospital is five days
• The mean birth weight of babies delivered by
mothers with low SES is lower than those from
higher SES.
• Etc
257
School of Public Health
Types of Hypothesis
1. The Null Hypothesis, H0
 Is a statement claiming that there is no difference between
the hypothesized value and the population value.
 (The effect of interest is zero = no difference)
 States the assumption (hypothesis) to be tested
 H0 is always about a population parameter, not about a
sample statistic
 Begin with the assumption that the Ho is true
– Similar to the notion of innocent until proven guilty
258
School of Public Health
2. The Alternative Hypothesis, HA
• Is a statement of what we will believe is true if our
sample data causes us to reject Ho.
• Is generally the hypothesis that is believed (or needs to
be supported) by the researcher.
• Is a statement that disagrees (opposes) with Ho
(The effect of interest is not zero)
Cont.
259
School of Public Health
Steps in Hypothesis Testing
1. Formulate the appropriate statistical hypotheses
clearly
• Specify HO and HA
H0:  = 0 H0:  ≤ 0 H0:  ≥ 0
H1:   0 H1:  > 0 H1:  < 0
two-tailed one-tailed one-tailed
2. State the assumptions necessary for computing
probabilities
• A distribution is approximately normal (Gaussian)
• Variance is known or unknown
260
School of Public Health
3. Select a sample and collect data
• Categorical, continuous
4. Decide on the appropriate test statistic for
the hypothesis. E.g., One population
OR
Cont.
261
School of Public Health
5. Specify the desired level of significance for
the statistical test (=0.05, 0.01, etc.)
6. Determine the critical value.
– A value the test statistic must attain to be
declared significant.
-1.96 1.96 1.645 -1.645
Cont.
262
School of Public Health
7. Obtain sample evidence and compute the
test statistic
8. Reach a decision and draw the conclusion
• If Ho is rejected, we conclude that HA is true (or
accepted).
• If Ho is not rejected, we conclude that Ho may
be true.
263
School of Public Health
Rules for Stating Statistical Hypotheses
1. One population
• Indication of equality (either =, ≤ or ≥) must appear
in Ho.
Ho: μ = μo, HA: μ ≠ μo
Ho: P = Po, HA: P ≠ Po
• Can we conclude that a certain population mean is
– not 50?
Ho: μ = 50 and HA: μ ≠ 50
– greater than 50?
Ho: μ ≤ 50 HA: μ > 50
264
School of Public Health
• Can we conclude that the proportion of
patients with leukemia who survive more than
six years is not 60%?
Ho: P = 0.6 HA: P ≠ 0.6
Cont.
265
School of Public Health
Statistical Decision
• Reject Ho if the value of the test statistic that
we compute from our sample is one of the
values in the rejection region
• Don’t reject Ho if the computed value of the
test statistic is one of the values in the non-
rejection region.
266
School of Public Health
Another way to state conclusion
• Reject Ho if P-value < α
• Accept Ho if P-value ≥ α
 P-value is the probability of obtaining a test
statistic as extreme as or more extreme than the
actual test statistic obtained if the Ho is true
 The larger the test statistic, the smaller is the P-
value. OR, the smaller the P-value the stronger the
evidence against the Ho.
267
School of Public Health
Types of Errors in Hypothesis Tests
• Whenever we reject or accept the Ho, we
commit errors.
• Two types of errors are committed.
– Type I Error
– Type II Error
268
School of Public Health
Type I Error
• The probability of a type I error is the
probability of rejecting the Ho when it is true
• The probability of type I error is α
• Called level of significance of the test
• Set by researcher in advance
269
School of Public Health
Type II Error
• The error committed when a false Ho is not
rejected
• The probability of Type II Error is 
• Usually unknown but larger than α
270
Action
(Conclusion)
Reality
Ho True Ho False
Do not
reject Ho
Correct action
(Prob. = 1-α)
Type II error (β)
(Prob. = β= 1-Power)
Reject Ho Type I error (α)
(Prob. = α = Sign. level)
Correct action
(Prob. = Power = 1-β)
Cont.
271
School of Public Health
Type I & II Error Relationship
272
School of Public Health
Hypothesis Testing of a Single Mean
(Normally Distributed)
273
School of Public Health
Known Variance
274
School of Public Health
Example: Two-Tailed Test
1. A simple random sample of 10 people from a certain population
has a mean age of 27. Can we conclude that the mean age of the
population is not 30? The variance is known to be 20. Let = .05.
 Answer, "Yes we can, if we can reject the Ho that it is 30."
A. Data
n = 10, sample mean = 27, 2 = 20, α = 0.05
B. Assumptions
Simple random sample
Normally distributed population
275
School of Public Health
C. Hypotheses
Ho: µ = 30
HA: µ ≠ 30
D. Test statistic
As the population variance is known, we use Z as
the test statistic.
Cont.
276
School of Public Health
E. Decision Rule
 Reject Ho if the Z value falls in the rejection region.
 Don’t reject Ho if the Z value falls in the non-rejection region.
 Because of the structure of Ho it is a two tail test. Therefore, reject Ho if
Z ≤ -1.96 or Z ≥ 1.96.
Cont.
277
School of Public Health
F. Calculation of test statistic
G. Statistical decision
We reject the Ho because Z = -2.12 is in the rejection region. The
value is significant at 5% α.
H. Conclusion
We conclude that µ is not 30. P-value = 0.0340
A Z value of -2.12 corresponds to an area of 0.0170. Since there are two
parts to the rejection region in a two tail test, the P-value is twice this which
is .0340.
278
School of Public Health
Hypothesis test using
confidence interval
• A problem like the above example can also be solved
using a confidence interval.
• A confidence interval will show that the calculated
value of Z does not fall within the boundaries of the
interval. However, it will not give a probability.
• Confidence interval
279
School of Public Health
Example: One -Tailed Test
• A simple random sample of 10 people from a certain
population has a mean age of 27. Can we conclude that the
mean age of the population is less than 30? The variance is
known to be 20. Let α = 0.05.
• Data
n = 10, sample mean = 27, 2 = 20, α = 0.05
• Hypotheses
Ho: µ ≥ 30, HA: µ < 30
280
School of Public Health
• Test statistic
• Rejection Region
• With α = 0.05 and the inequality, we have the entire rejection region at the
left. The critical value will be Z = -1.645. Reject Ho if Z < -1.645.
=
Lower tail test
281
School of Public Health
• Statistical decision
– We reject the Ho because -2.12 < -1.645.
• Conclusion
– We conclude that µ < 30.
– p = .0170 this time because it is only a one tail test and not a two tail test.
Cont.
282
School of Public Health
Unknown Variance
• In most practical applications the standard deviation of
the underlying population is not known
• In this case,  can be estimated by the sample standard
deviation s.
• If the underlying population is normally distributed,
then the test statistic is:
283
School of Public Health
Example: Two-Tailed Test
• A simple random sample of 14 people from a certain population
gives a sample mean body mass index (BMI) of 30.5 and sd of
10.64. Can we conclude that the BMI is not 35 at α 5%?
• Ho: µ = 35, HA: µ ≠35
• Test statistic
• If the assumptions are correct and Ho is true, the test statistic
follows Student's t distribution with 13 degrees of freedom.
284
School of Public Health
• Decision rule
– We have a two tailed test. With α = 0.05 it means that each tail is
0.025. The critical t values with 13 df are -2.1604 and 2.1604.
– We reject Ho if the t ≤ -2.1604 or t ≥ 2.1604.
• Do not reject Ho because -1.58 is not in the rejection region. Based
on the data of the sample, it is possible that µ = 35. P-value = 0.1375
Cont.
285
School of Public Health
Sampling from a population that is not normally
distributed
• Here, we do not know if the population displays a
normal distribution.
• However, with a large sample size, we know from the
Central Limit Theorem that the sampling distribution
of the population is distributed normally.
286
School of Public Health
• With a large sample, we can use Z as the test statistic
calculated using the sample sd.
Cont.
287
School of Public Health
Hypothesis Tests for Proportions
• Involves categorical values
• Two possible outcomes
– “Success” (possesses a certain
characteristic)
– “Failure” (does not possesses that
characteristic)
• Fraction or proportion of population in the
“success” category is denoted by p
288
School of Public Health
Proportions
289
School of Public Health
(Normal Approximation to Binomial Distribution)
Hypothesis Testing about a Single Population
Proportion
290
School of Public Health 291
School of Public Health
Example
• We are interested in the probability of developing asthma
over a given one-year period for children 0 to 4 years of age
whose mothers smoke in the home. In the general population
of 0 to 4-year-olds, the annual incidence of asthma is 1.4%. If
10 cases of asthma are observed over a single year in a
sample of 500 children whose mothers smoke, can we
conclude that this is different from the underlying probability
of p0 = 0.014? Α = 5%
H0 : p = 0.014
HA: p ≠ 0.014
292
School of Public Health
• The test statistic is given by:
Cont.
293
School of Public Health
• The critical value of Zα/2 at α=5% is ±1.96.
• Don’t reject Ho since Z (=1.14) in the non-rejection
region between ±1.96.
• P-value = 0.2548
• We do not have sufficient evidence to conclude that
the probability of developing asthma for children
whose mothers smoke in the home is different from
the probability in the general population
Cont.
294
School of Public Health
Sample size determination
295
School of Public Health
Sample Size
• Sample Size: The number of study subjects
selected to represent a given study population.
• In estimating a certain characteristic of a
population, sample size calculations are
important to ensure that estimates are obtained
with required precision or confidence
• Should be sufficient to represent the
characteristics of interest of the study population.
296
School of Public Health
Sample size determination depends on the:
– Objective of the study
– Design of the study
• Descriptive/Analytic
– Accuracy of the measurements to be made
– Degree of precision required for generalization
– Plan for statistical analysis
– Degree of confidence with which to conclude
Cont.
297
School of Public Health
• The feasible sample size is also determined by
the availability of resources:
– time
– manpower
– transport
– available facility, and
– money
Cont.
298
School of Public Health
Sample size for single sample
A. Sample size for estimating a single
population mean
B. Sample size to estimate a single population
proportion
299
School of Public Health
A. Sample size for estimating a single
population mean
• where d = Margin of error =
= Absolute precision
= Half of the width (w) of CI
300
Where d = e in some text books
School of Public Health
Examples:
1. Find the minimum sample size needed to estimate the drop
in heart rate (µ) for a new study using a higher dose of
propranolol than the standard one. We require that the
two-sided 95% CI for µ be no wider than 5 beats per minute
and the sample sd for change in heart rate equals 10 beats
per minute.
n = (1.96)2
102
/(2.5)2
= 62 patients
301
2. Suppose that for a certain group of cancer patients, we
are interested in estimating the mean age at diagnosis.
We would like a 95% CI of 5 years wide. If the population
SD is 12 years, how large should our sample be?
302
• Suppose d=1
• Then the sample size increases
Cont.
303
School of Public Health
3. A hospital director wishes to estimate the
mean weight of babies born in the hospital.
How large a sample of birth records should be
taken if she/he wants a 95% CI of 0.5 wide?
Assume that a reasonable estimate of  is 2.
Ans: 246 birth records.
Cont.
304
School of Public Health
But the population 2 is most of the
time unknown
As a result, it has to be estimated from:
• Pilot or preliminary sample:
– Select a pilot sample and estimate 2 with
the sample variance, s2
• Previous or similar studies
305
School of Public Health
B. Sample size to estimate a single
population proportion
306
1. Suppose that you are interested to know the
proportion of infants who breastfed >18 months
of age in a rural area. Suppose that in a similar
area, the proportion (p) of breastfed infants was
found to be 0.20. What sample size is required to
estimate the true proportion within ±3% points
with 95% confidence. Let p=0.20, d=0.03, α=5%
Cont.
307
School of Public Health
Sample Size: Two Samples
A. Estimation of the difference between two
population means
B. Estimation of the difference between two
population proportions
308
School of Public Health
A. Sample size for estimating a difference in two
means
309
School of Public Health
B. Sample size for estimating a difference in two
proportions
310
School of Public Health
Data Screening
311
School of Public Health
Data check entry
• One of the first steps to proper data screening is to
ensure the data is correct
– Check out each person’s entry individually
• Makes sense if small data set or proper data checking
procedure
• Can be too costly so…
– range of data should be checked
312
School of Public Health
Normality
• All of the continuous data we are covering need
to follow a normal curve
• Skewness (univariate) – this represents the
spread of the data
313
School of Public Health
Cont.
• skewness statistic is output by SPSS and SE
skewness is
3.2 violation of skewness assumption
Skewness
skewness
Skewness
skewness
S
Z
SE
Z


314
School of Public Health
Cont.
• Kurtosis (univariate) – is how peaked the data is;
Kurtosis stat output by SPSS
• Kurtosis standard error
– for most statistics the skewness assumption is more
important that the kurtosis assumption
3.2 violation of kurtosis assumption
Kurtosis
kurtosis
Kurtosis
kurtosis
S
Z
SE
Z


315
School of Public Health
Outliers
• technically it is a data point outside of you
distribution; so potentially detrimental
because may have undo effect on
distribution
316
School of Public Health
Linearity
• relationships among variables are linear in
nature; assumption in most analyses
317
School of Public Health
Homoscedasticity
• For grouped data this is the same as
homogeneity of variance
• For ungrouped data – variability for one
variables is the same at all levels of another
variable (no variance interaction)
318
School of Public Health
Multicollinearity/Singularity
• If correlations between two variables are excessive
(e.g. 0.95) then this represents multicollinearity
• If correlation is 1 then you have singularity
• Often Multicollinearity/Singularity occurs in data
because one variable is a near duplicate of another
319

More Related Content

What's hot

Epidemiolgy and biostatistics notes
Epidemiolgy and biostatistics notesEpidemiolgy and biostatistics notes
Epidemiolgy and biostatistics notesCharles Ntwale
 
Construction of life table
Construction of life tableConstruction of life table
Construction of life tablepratapmshinde
 
Correlation
CorrelationCorrelation
CorrelationTech_MX
 
Sources of population data
Sources of population dataSources of population data
Sources of population dataMdLikeGES
 
Basic concepts and principles of epidemiology
Basic concepts and  principles of epidemiologyBasic concepts and  principles of epidemiology
Basic concepts and principles of epidemiologyDr. Dharmendra Gahwai
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAnand Thokal
 
1. Introduction to biostatistics
1. Introduction to biostatistics1. Introduction to biostatistics
1. Introduction to biostatisticsRazif Shahril
 
Interpretation of Data.pptx
Interpretation of Data.pptxInterpretation of Data.pptx
Interpretation of Data.pptxTahoor Qadeer
 
Measures of central tendency .pdf
Measures of central tendency .pdfMeasures of central tendency .pdf
Measures of central tendency .pdfshivank49
 
Data presentation
Data presentationData presentation
Data presentationWeam Banjar
 
Introduction to statistics in health care
Introduction to statistics in health care Introduction to statistics in health care
Introduction to statistics in health care Dhasarathi Kumar
 
Simple understanding of biostatistics
Simple understanding of biostatisticsSimple understanding of biostatistics
Simple understanding of biostatisticsHamdi Alhakimi
 
6 determinants of health
6 determinants of health6 determinants of health
6 determinants of healthAnup Kharde
 
Statistics for Data Analytics
Statistics for Data AnalyticsStatistics for Data Analytics
Statistics for Data AnalyticsSSaudia
 

What's hot (20)

Biostatistics khushbu
Biostatistics khushbuBiostatistics khushbu
Biostatistics khushbu
 
Epidemiolgy and biostatistics notes
Epidemiolgy and biostatistics notesEpidemiolgy and biostatistics notes
Epidemiolgy and biostatistics notes
 
Construction of life table
Construction of life tableConstruction of life table
Construction of life table
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Correlation
CorrelationCorrelation
Correlation
 
Sources of population data
Sources of population dataSources of population data
Sources of population data
 
Basic concepts and principles of epidemiology
Basic concepts and  principles of epidemiologyBasic concepts and  principles of epidemiology
Basic concepts and principles of epidemiology
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
1. Introduction to biostatistics
1. Introduction to biostatistics1. Introduction to biostatistics
1. Introduction to biostatistics
 
Interpretation of Data.pptx
Interpretation of Data.pptxInterpretation of Data.pptx
Interpretation of Data.pptx
 
Measures of central tendency .pdf
Measures of central tendency .pdfMeasures of central tendency .pdf
Measures of central tendency .pdf
 
Bio stat
Bio statBio stat
Bio stat
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Community diagnosis
Community diagnosisCommunity diagnosis
Community diagnosis
 
Data presentation
Data presentationData presentation
Data presentation
 
Introduction to statistics in health care
Introduction to statistics in health care Introduction to statistics in health care
Introduction to statistics in health care
 
Simple understanding of biostatistics
Simple understanding of biostatisticsSimple understanding of biostatistics
Simple understanding of biostatistics
 
6 determinants of health
6 determinants of health6 determinants of health
6 determinants of health
 
Statistics for Data Analytics
Statistics for Data AnalyticsStatistics for Data Analytics
Statistics for Data Analytics
 
Measures of fertility
Measures of fertilityMeasures of fertility
Measures of fertility
 

Similar to Biostatistics.school of public healthppt

biostatfinal_(2).ppt
biostatfinal_(2).pptbiostatfinal_(2).ppt
biostatfinal_(2).pptBiswa48
 
Biostatistics introduction.pptx
Biostatistics introduction.pptxBiostatistics introduction.pptx
Biostatistics introduction.pptxMohammedAbdela7
 
Biostatistics.pptxhgjfhgfthfujkolikhgjhcghd
Biostatistics.pptxhgjfhgfthfujkolikhgjhcghdBiostatistics.pptxhgjfhgfthfujkolikhgjhcghd
Biostatistics.pptxhgjfhgfthfujkolikhgjhcghdmadanshresthanepal
 
1. Introdution to Biostatistics.ppt
1. Introdution to Biostatistics.ppt1. Introdution to Biostatistics.ppt
1. Introdution to Biostatistics.pptFatima117039
 
Advanced Biostatistics presentation pptx
Advanced Biostatistics presentation  pptxAdvanced Biostatistics presentation  pptx
Advanced Biostatistics presentation pptxAbebe334138
 
Biostatisitics.pptx
Biostatisitics.pptxBiostatisitics.pptx
Biostatisitics.pptxFatima117039
 
Biostatistics
BiostatisticsBiostatistics
BiostatisticsPRIYAG63
 
Study Session 10.pptx
Study Session 10.pptxStudy Session 10.pptx
Study Session 10.pptxalex836417
 
1 Introduction to Biostatistics.pdf
1 Introduction to Biostatistics.pdf1 Introduction to Biostatistics.pdf
1 Introduction to Biostatistics.pdfbayisahrsa
 
statistics introduction.ppt
statistics introduction.pptstatistics introduction.ppt
statistics introduction.pptCHANDAN PADHAN
 
Biostatistics-C-2-Item-2pdf.pdf
Biostatistics-C-2-Item-2pdf.pdfBiostatistics-C-2-Item-2pdf.pdf
Biostatistics-C-2-Item-2pdf.pdfArifHasan773786
 
Introduction to research in Epidemiological research
Introduction to research in Epidemiological researchIntroduction to research in Epidemiological research
Introduction to research in Epidemiological research4negero
 
1 Introduction to Biostatistics last.pptx
1 Introduction to Biostatistics last.pptx1 Introduction to Biostatistics last.pptx
1 Introduction to Biostatistics last.pptxdebabatolosa
 
48  january 2  vol 27 no 18  2013  © NURSING STANDARD RC.docx
48  january 2  vol 27 no 18  2013  © NURSING STANDARD  RC.docx48  january 2  vol 27 no 18  2013  © NURSING STANDARD  RC.docx
48  january 2  vol 27 no 18  2013  © NURSING STANDARD RC.docxblondellchancy
 
1 Introduction to Biostatistics.pptx
1 Introduction to Biostatistics.pptx1 Introduction to Biostatistics.pptx
1 Introduction to Biostatistics.pptxAyeleBizuneh1
 
Introduction to nursing Statistics.pptx
Introduction to nursing Statistics.pptxIntroduction to nursing Statistics.pptx
Introduction to nursing Statistics.pptxMelba Shaya Sweety
 

Similar to Biostatistics.school of public healthppt (20)

introduction.pptx
introduction.pptxintroduction.pptx
introduction.pptx
 
biostatfinal_(2).ppt
biostatfinal_(2).pptbiostatfinal_(2).ppt
biostatfinal_(2).ppt
 
Biostatistics introduction.pptx
Biostatistics introduction.pptxBiostatistics introduction.pptx
Biostatistics introduction.pptx
 
Biostatistics.pptxhgjfhgfthfujkolikhgjhcghd
Biostatistics.pptxhgjfhgfthfujkolikhgjhcghdBiostatistics.pptxhgjfhgfthfujkolikhgjhcghd
Biostatistics.pptxhgjfhgfthfujkolikhgjhcghd
 
1. Introdution to Biostatistics.ppt
1. Introdution to Biostatistics.ppt1. Introdution to Biostatistics.ppt
1. Introdution to Biostatistics.ppt
 
Advanced Biostatistics presentation pptx
Advanced Biostatistics presentation  pptxAdvanced Biostatistics presentation  pptx
Advanced Biostatistics presentation pptx
 
Biostatisitics.pptx
Biostatisitics.pptxBiostatisitics.pptx
Biostatisitics.pptx
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
Study Session 10.pptx
Study Session 10.pptxStudy Session 10.pptx
Study Session 10.pptx
 
chapter 1.pptx
chapter 1.pptxchapter 1.pptx
chapter 1.pptx
 
1 Introduction to Biostatistics.pdf
1 Introduction to Biostatistics.pdf1 Introduction to Biostatistics.pdf
1 Introduction to Biostatistics.pdf
 
Biostatistics lec 1
Biostatistics lec 1Biostatistics lec 1
Biostatistics lec 1
 
statistics introduction.ppt
statistics introduction.pptstatistics introduction.ppt
statistics introduction.ppt
 
Biostatistics-C-2-Item-2pdf.pdf
Biostatistics-C-2-Item-2pdf.pdfBiostatistics-C-2-Item-2pdf.pdf
Biostatistics-C-2-Item-2pdf.pdf
 
Introduction to research in Epidemiological research
Introduction to research in Epidemiological researchIntroduction to research in Epidemiological research
Introduction to research in Epidemiological research
 
1 Introduction to Biostatistics last.pptx
1 Introduction to Biostatistics last.pptx1 Introduction to Biostatistics last.pptx
1 Introduction to Biostatistics last.pptx
 
48  january 2  vol 27 no 18  2013  © NURSING STANDARD RC.docx
48  january 2  vol 27 no 18  2013  © NURSING STANDARD  RC.docx48  january 2  vol 27 no 18  2013  © NURSING STANDARD  RC.docx
48  january 2  vol 27 no 18  2013  © NURSING STANDARD RC.docx
 
1 Introduction to Biostatistics.pptx
1 Introduction to Biostatistics.pptx1 Introduction to Biostatistics.pptx
1 Introduction to Biostatistics.pptx
 
Introduction to nursing Statistics.pptx
Introduction to nursing Statistics.pptxIntroduction to nursing Statistics.pptx
Introduction to nursing Statistics.pptx
 
Basic stat
Basic statBasic stat
Basic stat
 

More from temesgengirma0906

Anatomy for bsc-1forbsc nurse referrence
Anatomy for bsc-1forbsc nurse referrenceAnatomy for bsc-1forbsc nurse referrence
Anatomy for bsc-1forbsc nurse referrencetemesgengirma0906
 
Examination and diagnosis of the psychiatric patients.pptx
Examination and diagnosis of the psychiatric patients.pptxExamination and diagnosis of the psychiatric patients.pptx
Examination and diagnosis of the psychiatric patients.pptxtemesgengirma0906
 
2.Surveillance ethiopia universityaa.ppt
2.Surveillance ethiopia universityaa.ppt2.Surveillance ethiopia universityaa.ppt
2.Surveillance ethiopia universityaa.ppttemesgengirma0906
 
normal pregnancy for bsc nurse060948.ppt
normal pregnancy for bsc nurse060948.pptnormal pregnancy for bsc nurse060948.ppt
normal pregnancy for bsc nurse060948.ppttemesgengirma0906
 
04 Intermediary Metabolismpresntat-1.ppt
04 Intermediary Metabolismpresntat-1.ppt04 Intermediary Metabolismpresntat-1.ppt
04 Intermediary Metabolismpresntat-1.ppttemesgengirma0906
 
manegeral function and HRM in ethiopia University.ppt
manegeral function and HRM in ethiopia University.pptmanegeral function and HRM in ethiopia University.ppt
manegeral function and HRM in ethiopia University.ppttemesgengirma0906
 

More from temesgengirma0906 (6)

Anatomy for bsc-1forbsc nurse referrence
Anatomy for bsc-1forbsc nurse referrenceAnatomy for bsc-1forbsc nurse referrence
Anatomy for bsc-1forbsc nurse referrence
 
Examination and diagnosis of the psychiatric patients.pptx
Examination and diagnosis of the psychiatric patients.pptxExamination and diagnosis of the psychiatric patients.pptx
Examination and diagnosis of the psychiatric patients.pptx
 
2.Surveillance ethiopia universityaa.ppt
2.Surveillance ethiopia universityaa.ppt2.Surveillance ethiopia universityaa.ppt
2.Surveillance ethiopia universityaa.ppt
 
normal pregnancy for bsc nurse060948.ppt
normal pregnancy for bsc nurse060948.pptnormal pregnancy for bsc nurse060948.ppt
normal pregnancy for bsc nurse060948.ppt
 
04 Intermediary Metabolismpresntat-1.ppt
04 Intermediary Metabolismpresntat-1.ppt04 Intermediary Metabolismpresntat-1.ppt
04 Intermediary Metabolismpresntat-1.ppt
 
manegeral function and HRM in ethiopia University.ppt
manegeral function and HRM in ethiopia University.pptmanegeral function and HRM in ethiopia University.ppt
manegeral function and HRM in ethiopia University.ppt
 

Recently uploaded

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 

Recently uploaded (20)

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 

Biostatistics.school of public healthppt

  • 1. School of Public Health Name: Huruy Assefa E-mail: Huruyame@yahoo.com Mob.: 0914-728565 Chapter-1 Introduction to Biostatistics
  • 2. School of Public Health Introduction • What is statistics? • Statistics: A field of study concerned with: – collection, organization, analysis, summarization and interpretation of numerical data, and – the drawing of inferences about a body of data when only a small part of the data is observed. • Statistics helps us use numbers to communicate ideas • Statisticians try to interpret and communicate the results to others. 2
  • 3. School of Public Health  Biostatistics: The application of statistical methods to the fields of biological and medical sciences.  Concerned with interpretation of biological data & the communication of information derived from these data  Has central role in medical investigations Cont. 3
  • 4. School of Public Health Uses of biostatistics • Provide methods of organizing information • Assessment of health status • Health program evaluation • Resource allocation • Magnitude of association – Strong vs weak association between exposure and outcome 4
  • 5. School of Public Health Cont. • Assessing risk factors – Cause & effect relationship • Evaluation of a new vaccine or drug – What can be concluded if the proportion of people free from the disease is greater among the vaccinated than the unvaccinated? – How effective is the vaccine (drug)? – Is the effect due to chance or some bias? • Drawing of inferences – Information from sample to population 5
  • 6. What does biostatistics cover? Research Planning Design Execution (Data collection) Data Processing Data Analysis Presentation Interpretation Publication The best way to learn about biostatistics is to follow the flow of a research from inception to the final publication 6
  • 7. School of Public Health variable:  It is a characteristic that takes on different values in different persons, places, or things. For example: - heart rate, - the heights of adult males, - the weights of preschool children, - the ages of patients seen in a dental clinic. 7
  • 8. Types of variables Quantitative Qualitative Quantitative Variables It can be measured in the usual sense. For example:  the heights of adult males,  the weights of preschool children,  the ages of patients seen in a  dental clinic. Qualitative Variables Many characteristics are not capable of being measured. Some of them can be ordered or ranked. For example:  classification of people into socio- economic groups,  social classes based on income, education, etc. 8
  • 9. School of Public Health Interval Types of variables & scale of measurement Quantitative variables (Numerical) Qualitative variables (Categorical) Ratio Nominal Ordinal 9
  • 10. School of Public Health Types of Statistics 1. Descriptive statistics: • Ways of organizing and summarizing data • Helps to identify the general features and trends in a set of data and extracting useful information • Also very important in conveying the final results of a study • Example: tables, graphs, numerical summary measures 10
  • 11. School of Public Health Cont. 2. Inferential statistics: • Methods used for drawing conclusions about a population based on the information obtained from a sample of observations drawn from that population • Example: Principles of probability, estimation, confidence interval, comparison of two or more means or proportions, hypothesis testing, etc. 11
  • 12. School of Public Health Data • Data are numbers which can be obtained by measurement or counting • The raw material for statistics • Can be obtained from: – Routinely kept records, literature – Surveys – Counting – Experiments – Reports – Observation – Etc 12
  • 13. School of Public Health Types of Data 1. Primary data: collected from the items or individual respondents directly by the researcher for the purpose of a study. 2. Secondary data: which had been collected by certain people or organization, & statistically treated and the information contained in it is used for other purpose by other people 13
  • 14. School of Public Health Population and Sample • Population: – Refers to any collection of objects • Target population: – A collection of items that have something in common for which we wish to draw conclusions at a particular time. • E.g., All hospitals in Ethiopia – The whole group of interest 14
  • 15. School of Public Health Cont. Study (Sampled) Population: • The subset of the target population that has at least some chance of being sampled • The specific population group from which samples are drawn and data are collected 15
  • 16. School of Public Health Cont. Sample:  A subset of a study population, about which information is actually obtained.  The individuals who are actually measured and comprise the actual data. 16
  • 17. School of Public Health Population Sample Information • Role of statistics in using information from a sample to make inferences about the population Cont. 17
  • 18. Sample Study Population Target Population E.g.: In a study of the prevalence of HIV among adolescents in Ethiopia, a random sample of adolescents in Ayder of Mekelle were included. Target Population: All adolescents in Ethiopia Study population: All adolescents in Mekelle Sample: Adolescents in Ayder sub-city who were included in the study Cont. 18
  • 19. School of Public Health Generalizability • Is a two-stage procedure: • We need to be able to generalize from: – the sample to the study population, & – then from the study population to the target population • If the sample is not representative of the population, the conclusions are restricted to the sample & don’t have general applicability 19
  • 20. School of Public Health Parameter and Statistic • Parameter: A descriptive measure computed from the data of a population. – E.g., the mean (µ) age of the target population • Statistic: A descriptive measure computed from the data of a sample. – E.g., sample mean age ( ) 20
  • 21. School of Public Health Descriptive Statistics: Summarizing data 21
  • 22. School of Public Health Measures of Central Tendency (MCT) • On the scale of values of a variable there is a certain stage at which the largest number of items tend to cluster. • Since this stage is usually in the centre of distribution, the tendency of the statistical data to get concentrated at a certain value is called “central tendency” • The various methods of determining the point about which the observations tend to concentrate are called Measures of Central Tendency. 22
  • 23. • The objective of calculating MCT is to determine a single figure/value which may be used to represent the whole data set. • In that sense it is an even more compact description of the statistical data than the frequency distribution. • Since a MCT represents the entire data, it facilitates comparison within one group or between groups of data. Cont. 23
  • 24. School of Public Health Characteristics of a good MCT MCT is good or satisfactory if it possesses the following characteristics. 1. It should be based on all the observations 2. It should not be affected by the extreme values 3. It should be as close to the maximum number of values as possible 4. It should have a definite value 5. It should not be subjected to complicated and tedious calculations (easy) 24
  • 25. • The most common measures of central tendency include: – Arithmetic Mean – Median – Mode – Others Cont. 25
  • 26. 1. Arithmetic Mean A. Ungrouped Data • The arithmetic mean is the "average" of the data set and by far the most widely used measure of central location • Is the sum of all the observations divided by the total number of observations. 26
  • 28. School of Public Health The heart rates for n=10 patients were as follows (beats per minute): 167, 120, 150, 125, 150, 140, 40, 136, 120, 150 What is the arithmetic mean for the heart rate of these patients? Cont. 28
  • 29. b) Grouped data In calculating the mean from grouped data, we assume that all values falling into a particular class interval are located at the mid-point of the interval. It is calculated as follow: x = m f f i i i=1 k i i=1 k   where, k = the number of class intervals mi = the mid-point of the ith class interval fi = the frequency of the ith class interval Cont. 29
  • 30. Example. Compute the mean age of 169 subjects from the grouped data. Mean = 5810.5/169 = 34.48 years Class interval Mid-point (mi) Frequency (fi) mifi 10-19 20-29 30-39 40-49 50-59 60-69 14.5 24.5 34.5 44.5 54.5 64.5 4 66 47 36 12 4 58.0 1617.0 1621.5 1602.0 654.0 258.0 Total __ 169 5810.5 Cont. 30
  • 31. School of Public Health When the data are skewed, the mean is “dragged” in the direction of the skewness • It is possible in extreme cases for all but one of the sample points to be on one side of the arithmetic mean & in this case, the mean is a poor measure of central location or does not reflect the center of the sample. Cont. 31
  • 32. School of Public Health Properties of the Arithmetic Mean • For a given set of data there is one and only one arithmetic mean (uniqueness) • Easy to calculate and understand (simple) • Influenced by each and every value in a data set • Greatly affected by the extreme values • In case of grouped data if any class interval is open, arithmetic mean can not be calculated 32
  • 33. School of Public Health Median a) Ungrouped data • The median is the value which divides the data set into two equal parts. • If the number of values is odd, the median will be the middle value when all values are arranged in order of magnitude. • When the number of observations is even, there is no single middle value but two middle observations. • In this case the median is the mean of these two middle observations, when all observations have been arranged in the order of their magnitude. 33
  • 35. School of Public Health Cont. 35
  • 36. School of Public Health Cont. 36
  • 37. School of Public Health  The median is a better description (than the mean) of the majority when the distribution is skewed • Example:- Data: 14, 89, 93, 95, 96 – Skewness is reflected in the outlying low value of 14 – The sample mean is 77.4 – The median is 93 Cont. 37
  • 38. School of Public Health b) Grouped data • In calculating the median from grouped data, we assume that the values within a class-interval are evenly distributed through the interval. • The first step is to locate the class interval in which the median is located, using the following procedure. • Find n/2 and see a class interval with a minimum cumulative frequency which contains n/2. • Then, use the following formal. 38
  • 39. ~ x = L n 2 F f W m c m             where, Lm = lower true class boundary of the interval containing the median Fc = cumulative frequency of the interval just above the median class interval fm = frequency of the interval containing the median W= class interval width n = total number of observations Cont. 39
  • 40. Example. Compute the median age of 169 subjects from the grouped data. n/2 = 169/2 = 84.5 Class interval Mid-point (mi) Frequency (fi) Cum. freq 10-19 20-29 30-39 40-49 50-59 60-69 14.5 24.5 34.5 44.5 54.5 64.5 4 66 47 36 12 4 4 70 117 153 165 169 Total 169 40
  • 41. • n/2 = 84.5 = in the 3rd class interval • Lower limit = 29.5, Upper limit = 39.5 • Frequency of the class = 47 • (n/2 – fc) = 84.5-70 = 14.5 • Median = 29.5 + (14.5/47)10 = 32.58 ≈ 33 Cont. 41
  • 42. School of Public Health Properties of the median • There is only one median for a given set of data (uniqueness) • The median is easy to calculate • Median is a positional average and hence it is insensitive to very large or very small values (not affected by extreme values) • Median can be calculated even in the case of open end intervals • It is determined mainly by the middle points and less sensitive to the remaining data points (weakness). 42
  • 43. School of Public Health Mode • The mode is the most frequently occurring value among all the observations in a set of data. • It is not influenced by extreme values. • It is possible to have more than one mode or no mode. • It is not a good summary of the majority of the data. 43
  • 45. School of Public Health • It is a value which occurs most frequently in a set of values. • If all the values are different there is no mode, on the other hand, a set of values may have more than one mode. a) Ungrouped data 45
  • 46. School of Public Health • Example • Data are: 1, 2, 3, 4, 4, 4, 4, 5, 5, 6 • Mode is 4 “Unimodal” • Example • Data are: 1, 2, 2, 2, 3, 4, 5, 5, 5, 6, 6, 8 • There are two modes - 2 & 5 • This distribution is said to be “bi-modal” • Example • Data are: 2.62, 2.75, 2.76, 2.86, 3.05, 3.12 • No mode, since all the values are different Cont. 46
  • 47. School of Public Health b) Grouped data • To find the mode of grouped data, we usually refer to the modal class, where the modal class is the class interval with the highest frequency. • If a single value for the mode of grouped data must be specified, it is taken as the mid-point of the modal class interval. 47
  • 49. School of Public Health Properties of mode  It is not affected by extreme values  It can be calculated for distributions with open end classes  Often its value is not unique  The main drawback of mode is that often it does not exist 49
  • 50. School of Public Health Cont. Which measure of central tendency is best with a given set of data? • Two factors are important in making this decisions: – The scale of measurement (type of data) – The shape of the distribution of the observations 50
  • 51. School of Public Health • The mean can be used for discrete and continuous data. • The median is appropriate for discrete and continuous data as well, but can also be used for ordinal data. • The mode can be used for all types of data, but may be especially useful for nominal and ordinal measurements. Cont. 51
  • 52. School of Public Health (A) Symmetric and unimodal distribution — Mean, median, and mode should all be approximately the same Mean, Median & Mode Relationship between Mean, Median and Mode 52
  • 53. School of Public Health (A) Bimodal — Mean and median should be about the same, but may take a value that is unlikely to occur; two modes might be best Cont. 53
  • 54. School of Public Health (C) Skewed to the right (positively skewed) — Mean is sensitive to extreme values, so median might be more appropriate Mode Median Mean Cont. 54
  • 55. School of Public Health (D) Skewed to the left (negatively skewed) — Same as (c) Mode Median Mean Cont. 55
  • 56. School of Public Health Measures of Dispersion 56
  • 57. School of Public Health Consider the following two sets of data: A: 177 193 195 209 226 Mean = 200 B: 192 197 200 202 209 Mean = 200 Two or more sets may have the same mean and/or median but they may be quite different. 57
  • 58. School of Public Health These two distributions have the same mean, median, and mode 58
  • 59. • MCT are not enough to give a clear understanding about the distribution of the data. • We need to know something about the variability or spread of the values — whether they tend to be clustered close together, or spread out over a broad range Cont. 59
  • 60. School of Public Health • Measures that quantify the variation or dispersion of a set of data from its central location • Dispersion refers to the variety exhibited by the values of the data. • The amount may be small when the values are close together. • If all the values are the same, no dispersion Measures of Dispersion 60
  • 61. • Measures of dispersion include: – Range – Inter-quartile range – Variance – Standard deviation – Coefficient of variation – Standard error – Others Cont. 61
  • 62. School of Public Health Range (R) • The difference between the largest and smallest observations in a sample. • Range = Maximum value – Minimum value • Example – – Data values: 5, 9, 12, 16, 23, 34, 37, 42 – Range = 42-5 = 37 • Data set with higher range exhibit more variability 62
  • 63. School of Public Health Properties of range  It is the simplest crude measure and can be easily understood  It takes into account only two values which causes it to be a poor measure of dispersion  Very sensitive to extreme observations  The larger the sample size, the larger the range 63
  • 64. School of Public Health Interquartile range (IQR) • Indicates the spread of the middle 50% of the observations, and used with median IQR = Q3 - Q1 • Example: Suppose the first and third quartile for weights of girls 12 months of age are 8.8 Kg and 10.2 Kg, respectively. IQR = 10.2 Kg – 8.8 Kg i.e., 50% of the infant girls weigh between 8.8 and 10.2 Kg. 64
  • 65. • It is a simple and versatile measure • It encloses the central 50% of the observations • It is not based on all observations but only on two specific values • It is important in selecting cut-off points in the formulation of clinical standards • Since it excludes the lowest and highest 25% values, it is not affected by extreme values • Less sensitive to the size of the sample Properties of IQR: 65
  • 66. School of Public Health • The variance is the average of the squares of the deviations taken from the mean. • It is squared because the sum of the deviations of the individual observations of a sample about the sample mean is always 0 0 = ( ) • The variance can be thought of as an average of squared deviations  -x xi Variance (2, s2) 66
  • 67. • Variance is used to measure the dispersion of values relative to the mean. • When values are close to their mean (narrow range) the dispersion is less than when there is scattering over a wide range. – Population variance = σ2 – Sample variance = S2 Cont. 67
  • 68. a) Ungrouped data  Let X1, X2, ..., XN be the measurement on N population units, then: mean. population the is N X = where N ) (X N 1 = i i N 1 i 2 i 2         Cont. 68
  • 69. School of Public Health A sample variance is calculated for a sample of individual values (X1, X2, … Xn) and uses the sample mean (e.g:- ) rather than the population mean µ. Cont. 𝑿 69
  • 70. School of Public Health Degrees of freedom • In computing the variance there are (n-1) degrees of freedom because only (n-1) of the deviations are independent from each other • The last one can always be calculated from the others automatically. • This is because the sum of the deviations from their mean (Xi-Mean) must add to zero. 70
  • 71. School of Public Health b) Grouped data where mi = the mid-point of the ith class interval fi = the frequency of the ith class interval = the sample mean k = the number of class intervals S (m x) f f - 1 2 i 2 i i=1 k i i=1 k     x 71
  • 72.  The main disadvantage of variance is that its unit is the square of the unit of the original measurement values.  The variance gives more weight to the extreme values as compared to those which are near to mean value, because the difference is squared in variance. • The drawbacks of variance are overcome by the standard deviation. Properties of Variance: 72
  • 73. School of Public Health Standard deviation (, s) • It is the square root of the variance. • This produces a measure having the same scale as that of the individual values.    2 and S = S2 73
  • 74. • Following are the survival times of n=11 patients after heart transplant surgery. • The survival time for the “ith” patient is represented as Xi for i= 1, …, 11. • Calculate the sample variance and SD? Example: 74
  • 75. School of Public Health 75
  • 76. Example. Compute the variance and SD of the age of 169 subjects from the grouped data. Mean = 5810.5/169 = 34.48 years S2 = 20199.22/169-1 = 120.23 SD = √S2 = √120.23 = 10.96 Class interval (mi) (fi) (mi-Mean) (mi-Mean)2 (mi-Mean)2 fi 10-19 20-29 30-39 40-49 50-59 60-69 14.5 24.5 34.5 44.5 54.5 64.5 4 66 47 36 12 4 -19.98 -9-98 0.02 10.02 20.02 30.02 399.20 99.60 0.0004 100.40 400.80 901.20 1596.80 6573.60 0.0188 3614.40 4809.60 3604.80 Total 169 1901.20 20199.22 Example: 76
  • 77. School of Public Health Properties of SD • The SD has the advantage of being expressed in the same units of measurement as the mean • SD is considered to be the best measure of dispersion and is used widely because of the properties of the theoretical normal curve. • However, the drawback of SD is if the units of measurements of variables of two data sets is not the same. 77
  • 78. School of Public Health Coefficient of variation (CV) • When two data sets have different units of measurements, or their means differ sufficiently in size, the CV should be used as a measure of dispersion. • It is the best measure to compare the variability of two series of sets of observations. • Data with less coefficient of variation is considered more consistent. 78
  • 79. 100 x SD CV   • “Cholesterol is more variable than systolic blood pressure” SD Mean CV (%) SBP Cholesterol 15mm 40mg/dl 130mm 200mg/dl 11.5 20.0 •CV is the ratio of the SD to the mean multiplied by 100. Cont. 79
  • 80. School of Public Health NOTE: • The range often appears with the median as a numerical summary measure • The IQR is used with the median as well • The SD is used with the mean • For nominal and ordinal data, a table or graph is often more effective than any numerical summary measure 80
  • 81. School of Public Health Probability and Probability Distributions 81
  • 82. School of Public Health Probability • Chance of observing a particular outcome. • Likelihood of an event. • Probability theory developed from the study of games of chance like dice and cards. • A process like flipping a coin, rolling a die or drawing a card from a deck are probability experiments. 82
  • 83. School of Public Health Why Probability in Statistics? • Results are not certain • To evaluate how accurate our results are: – Given how our data were collected, are our results accurate ? – Given the level of accuracy needed, how many observations need to be collected ? 83
  • 84. School of Public Health When can we talk about probability ? When dealing with a process that has an uncertain outcome Experiment = any process with an uncertain outcome • When an experiment is performed, one and only one outcome is obtained • Event = something that may happen or not when the experiment is performed 84
  • 85. School of Public Health Two Categories of Probability • Objective and Subjective Probabilities. • Objective probability 1) Classical probability and 2) Relative frequency probability. 85
  • 86. School of Public Health Classical Probability • Is based on gambling ideas • Rolling a die - – There are 6 possible outcomes: – Total ways = {1, 2, 3, 4, 5, 6}. • Each is equally likely – P(i) = 1/6, i=1,2,...,6. P(1) = 1/6 P(2) = 1/6  ……. P(6) = 1/6 SUM = 1 86
  • 87. • Definition: If an event can occur in N mutually exclusive and equally likely ways, and if m of these posses a characteristic, E, the probability of the occurrence of E = m/N. P(E)= the probability of E = P(E) = m/N • If we toss a die, what is the probability of 4 coming up? m = 1(which is 4) and N = 6 The probability of 4 coming up is 1/6. Cont. 87
  • 88. School of Public Health Relative Frequency Probability • The proportion of times the event A occurs — in a large number of trials repeated under essentially identical conditions • Definition: If a process is repeated a large number of times (n), and if an event with the characteristic E occurs m times, the relative frequency of E, Probability of E = P(E) = m/n. 88
  • 89. • If you toss a coin 100 times and head comes up 40 times, P(H) = 40/100 = 0.4. • If we toss a coin 10,000 times and the head comes up 5562, P(H) = 0.5562. • Therefore, the longer the series and the longer sample size, the closer the estimate to the true value. Example: Of 158 people who attended a dinner party, 99 were ill. P (Illness) = 99/158 = 0.63 = 63%. Cont. 89
  • 90. School of Public Health Subjective Probability • Personalistic (represents one’s degree of belief in the occurrence of an event). • Personal assessment of which is more effective to provide cure – traditional/modern • Personal assessment of which sports team will win a match. • Also uses classical and relative frequency methods to assess the likelihood of an event. 90
  • 91. School of Public Health • E.g., If someone says that he is 95% certain that a cure for EBOLA will be discovered within 5 years, then he means that: P(discovery of cure for EBOLA within 5 years) = 95% = 0.95 • Although the subjective view of probability has enjoyed increased attention over the years, it has not fully accepted by scientists. Cont. 91
  • 92. School of Public Health Mutually Exclusive Events  Two events A and B are mutually exclusive if they cannot both happen at the same time P (A ∩ B) = 0 • Example: – A coin toss cannot produce heads and tails simultaneously. – Weight of an individual can’t be classified simultaneously as “underweight”, “normal”, “overweight” 92
  • 93. School of Public Health Independent Events • Two events A and B are independent if the probability of the first one happening is the same no matter how the second one turns out. OR. The outcome of one event has no effect on the occurrence or non-occurrence of the other. P(A∩B) = P(A) x P(B) (Independent events) P(A∩B) ≠ P(A) x P(B) (Dependent events) Example: – The outcomes on the first and second coin tosses are independent 93
  • 94. School of Public Health Intersection and union • The intersection of two events A and B, A ∩ B, is the event that A and B happen simultaneously P ( A and B ) = P (A ∩ B ) • Let A represent the event that a randomly selected newborn is LBW, and B the event that he or she is from a multiple birth • The intersection of A and B is the event that the infant is both LBW and from a multiple birth • The union of A and B, A U B, is the event that either A happens or B happens or they both happen simultaneously P ( A or B ) = P ( A U B ) • In the example above, the union of A and B is the event that the newborn is either LBW or from a multiple birth, or both 94
  • 95. School of Public Health Properties of Probability 1. The numerical value of a probability always lies between 0 and 1, inclusive.  A value 0 means the event can not occur  A value 1 means the event definitely will occur  A value of 0.5 means that the probability that the event will occur is the same as the probability that it will not occur. 2. The sum of the probabilities of all mutually exclusive outcomes is equal to 1. P(E1) + P(E2 ) + .... + P(En ) = 1. 95
  • 96. 3. For two mutually exclusive events A and B, P(A or B ) = P(AUB)= P(A) + P(B).  If not mutually exclusive: P(A or B) = P(A) + P(B) - P(A and B) 4. The complement of an event A, denoted by Ā or Ac, is the event that A does not occur • Consists of all the outcomes in which event A does NOT occur P(Ā) = P(not A) = 1 – P(A) • Ā occurs only when A does not occur. • These are complementary events. Cont. 96
  • 97. School of Public Health Basic Probability Rules 1. Addition rule  If events A and B are mutually exclusive: P(A or B) = P(A) + P(B) P(A and B) = 0  More generally: P(A or B) = P(A) + P(B) - P(A and B) P(event A or event B occurs or they both occur) 97
  • 98. School of Public Health Example: The probabilities below represent years of schooling completed by mothers of newborn infants 1. What is the probability that a mother has completed < 12 years of schooling? 2. What is the probability that a mother has completed 12 or more years of schooling? 98
  • 99. 1. The probability that a mother has completed < 12 years of schooling is: P( 8 years) = 0.056 and P(9-11 years) = 0.159 • Since these two events are mutually exclusive, P( 8 or 9-11) = P( 8 U 9-11) = P( 8) + P(9-11) = 0.056+0.159 = 0.215 Cont. 99
  • 100. 2. The probability that a mother has completed 12 or more years of schooling is: P(12) = P(12 or 13-15 or 16) = P(12 U 13-15 U 16) = P(12)+P(13-15)+P(16) = 0.321+0.218+0.230 = 0.769 Cont. 100
  • 101. – If A and B are independent events, then P(A ∩ B) = P(A) × P(B) – More generally, P(A ∩ B) = P(A) P(B|A) = P(B) P(A|B) P(A and B) denotes the probability that A and B both occur at the same time. 2. Multiplication rule 101
  • 102. School of Public Health Conditional Probability • The conditional probability that event B has occurred given that event A has already occurred is denoted P(B|A) and is defined provided that P(A) ≠ 0. 102
  • 103. Example: A study investigating the effect of prolonged exposure to bright light on retina damage in premature infants. Retinopathy YES Retinopathy NO TOTAL Bright light Reduced light 18 21 3 18 21 39 TOTAL 39 21 60 Cont. 103
  • 104. • The probability of developing retinopathy is: P (Retinopathy) = No. of infants with retinopathy Total No. of infants = (18+21)/(21+39) = 0.65 Cont. 104
  • 105. • We want to compare the probability of retinopathy, given that the infant was exposed to bright light, with that the infant was exposed to reduced light. • Exposure to bright light and exposure to reduced light are conditioning events, events we want to take into account when calculating conditional probabilities. Cont. 105
  • 106. • The conditional probability of retinopathy, given exposure to bright light, is: • P(Retinopathy/exposure to bright light) = No. of infants with retinopathy exposed to bright light No. of infants exposed to bright light = 18/21 = 0.86 Cont. 106
  • 107. • P(Retinopathy/exposure to reduced light) = # of infants with retinopathy exposed to reduced light No. of infants exposed to reduced light = 21/39 = 0.54 • The conditional probabilities suggest that premature infants exposed to bright light have a higher risk of retinopathy than premature infants exposed to reduced light. Cont. 107
  • 108.  For independent events A and B P(A/B) = P(A).  For non-independent events A and B P(A and B) = P(A/B) P(B) (General Multiplication Rule) Cont. 108
  • 109. Exercise: Culture and Gonodectin (GD) test results for 240 Urethral Discharge Specimens GD Test Result Culture Result yes Gonorrhea No Gonorrhea Total Positive 175 9 184 Negative 8 48 56 Total 183 57 240 109
  • 110. 1. What is the probability that a man has gonorrhea? 2. What is the probability that a man has a positive GD test? 3. What is the probability that a man has a positive GD test and gonorrhea? 4. What is the probability that a man has a negative GD test and does not have gonorrhea 5. What is the probability that a man with gonorrhea has a positive GD test? Cont. 110
  • 111. 6. What is the probability that a man does not have gonorrhea has a negative GD test? 7. What is the probability that a man does not have gonorrhea has a positive GD test? 8. What is the probability that a man with positive GD test has gonorrhea? Cont. 111
  • 112. School of Public Health Probability Distributions • It is the way data are distributed, in order to draw conclusions about a set of data • Random Variable = Any quantity or characteristic that is able to assume a number of different values such that any particular outcome is determined by chance • The probability distribution of a random variable is a table, graph, or mathematical formula that gives the probabilities with which the random variable takes different values or ranges of values. 112
  • 113. School of Public Health Discrete Probability Distributions • For a discrete random variable, the probability distribution specifies each of the possible outcomes of the random variable along with the probability that each will occur • Examples can be: – Frequency distribution – Relative frequency distribution – Cumulative frequency 113
  • 114. • We represent a potential outcome of the random variable X by x  0 ≤ P(X = x) ≤ 1  ∑ P(X = x) = 1 Cont. 114
  • 115. School of Public Health The following data shows the number of diagnostic services a patient receives 115
  • 116. • What is the probability that a patient receives exactly 3 diagnostic services? P(X=3) = 0.031 • What is the probability that a patient receives at most one diagnostic service? P (X≤1) = P(X = 0) + P(X = 1) = 0.671 + 0.229 = 0.900 Cont. 116
  • 117. • What is the probability that a patient receives at least four diagnostic services? P (X≥4) = P(X = 4) + P(X = 5) = 0.010 + 0.006 = 0.016 Cont. 117
  • 118. School of Public Health Probability distributions can also be displayed using a graph 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 1 2 3 4 5 No. of diagnostic services, x Probability, X=x 118
  • 119. • Examples of discrete probability distributions are the binomial distribution and the Poisson distribution. Cont. 119
  • 120. School of Public Health Binomial Distribution • Consider dichotomous (binary) random variable • Is based on Bernoulli trial – When a single trial of an experiment can result in only one of two mutually exclusive outcomes (success or failure; dead or alive; sick or well, male or female) 120
  • 121. School of Public Health Example: • We are interested in determining whether a newborn infant will survive until his/her 70th birthday • Let Y represent the survival status of the child at age 70 years • Y = 1 if the child survives and Y = 0 if he/she does not • The outcomes are mutually exclusive and exhaustive • Suppose that 72% of infants born survive to age 70 years P(Y = 1) = p = 0.72 P(Y = 0) = 1 − p = 0.28 121
  • 122. School of Public Health Characteristics of a Binomial Distribution • The experiment consist of n identical trials[Fixed]. • Only two possible outcomes on each trial. • The probability of A (success), denoted by p, remains the same from trial to trial. The probability of B (failure), denoted by q, q = 1- p. • The trials are independent. • n and  are the parameters of the binomial distribution. • The mean is n and the variance is n(1- ) 122
  • 123. • If an experiment is repeated n times and the outcome is independent from one trial to another, the probability that outcome A occurs exactly x times is: • P (X=x) = , x = 0, 1, 2, ..., n. = Cont. 123
  • 124. • n denotes the number of fixed trials • x denotes the number of successes in the n trials • p denotes the probability of success • q denotes the probability of failure (1- p) = • Represents the number of ways of selecting x objects out of n where the order of selection does not matter. • where n!=n(n-1)(n-2)…(1) , and 0!=1 Cont. 124
  • 125. School of Public Health Example: • Suppose we know that 40% of a certain population are cigarette smokers. If we take a random sample of 10 people from this population, what is the probability that we will have exactly 4 smokers in our sample? 125
  • 126. • If the probability that any individual in the population is a smoker to be P=.40, then the probability that x=4 smokers out of n=10 subjects selected is: P(X=4) =10C4(0.4)4 (1-0.4)10-4 = 10C4(0.4)4 (0.6)6 = 210(.0256)(.04666) = 0.25 • The probability of obtaining exactly 4 smokers in the sample is about 0.25. Cont. 126
  • 127. • We can compute the probability of observing zero smokers out of 10 subjects selected at random, exactly 1 smoker, and so on, and display the results in a table, as given, below. • The third column, P(X ≤ x), gives the cumulative probability. E.g. the probability of selecting 3 or fewer smokers into the sample of 10 subjects is P(X ≤ 3) =.3823, or about 38%. Cont. 127
  • 129. School of Public Health The probability in the above table can be converted into the following graph 0 0.05 0.1 0.15 0.2 0.25 0.3 0 1 2 3 4 5 6 7 8 9 10 No. of Smokers Probability Cont. 129
  • 130. School of Public Health Exercise Each child born to a particular set of parents has a probability of 0.25 of having blood type O. If these parents have 5 children. What is the probability that a. Exactly two of them have blood type O b. At most 2 have blood type O c. At least 4 have blood type O d. 2 do not have blood type O. 130
  • 131. School of Public Health Solution for ‘a’ a.) 2637 . 0 ) 75 . 0 ( ) 25 . 0 ( 2 5 = 2) P(x 2 - 5 2         131
  • 132. School of Public Health 2. The Poisson Distribution • Is a discrete probability distribution used to model the number of occurrences of an event that takes place infrequently in time or space • Applicable for counts of events over a given interval of time, for example: – number of patients arriving at an emergency department in a day – number of new cases of HIV diagnosed at a clinic in a month 132
  • 133. • In such cases, we take a sample of days and observe the number of patients arriving at the emergency department on each day, • We are observing a count or number of events, rather than a yes/no or success/ failure outcome for each subject or trial, as in the binomial. • Suppose events happen randomly and independently in time at a constant rate. If events happen with rate  events per unit time, the probability of x events happening in unit time is: Cont. 133 P(x) = e x! x   
  • 134. • where x = 0, 1, 2, . . .∞ • x is a potential outcome of X • The constant λ (lambda) represents the rate at which the event occurs, or the expected number of events per unit time • e = 2.71828 • It depends up on just one parameter, which is the µ number of occurrences (λ). Cont. 134
  • 135. School of Public Health Example • The daily number of new registrations of cancer is 2.2 on average. What is the probability of a) Getting no new cases b) Getting 1 case c) Getting 2 cases d) Getting 3 cases e) Getting 4 cases 135
  • 136. School of Public Health Solutions a) b) P(X=1) = 0.244 c) P(X=2) = 0.268 d) P(X=3) = 0.197 e) P(X=4) = 0.108 111 . 0 ! 0 ) 2 . 2 ( ) 0 ( 2 . 2 0     e X P 136
  • 137. 0 1 2 3 4 5 6 7 0.3 0.2 0.1 0.0 Probability Poisson distribution with mean 2.2 137
  • 138. Example: • In a given geographical area, cases of tetanus are reported at a rate of λ = 4.5/month • What is the probability that 0 cases of tetanus will be reported in a given month? 138
  • 139. • What is the probability that 1 case of tetanus will be reported? Cont. 139
  • 140. School of Public Health Continuous Probability Distributions • A continuous random variable X can take on any value in a specified interval or range • The probability distribution of X is represented by a smooth curve called a probability density function • The area under the smooth curve is equal to 1 • The area under the curve between any two points x1 and x2 is the probability that X takes a value between x1 and x2 140
  • 141. • The probability associated with any one particular value is equal to 0 • Therefore, P(X=x) = 0 • Also, P(X ≥ x) = P(X > x) • We calculate: Pr [ a < X < b], the probability of an interval of values of X. Cont. 141
  • 142. School of Public Health The Normal distribution • Frequently called the “Gaussian distribution” or bell-shape curve. • Variables such as blood pressure, weight, height, serum cholesterol level, and IQ score — are approximately normally distributed 142
  • 143. A random variable is said to have a normal distribution if it has a probability distribution that is symmetric and bell- shaped Cont. 143
  • 144. • A random variable X is said to follow ND, if and only if, its probability density function is: , - < x < . f(x) = 1 2 e x- 2            1 2 Cont. 144
  • 145. π (pi) = 3.14159 e = 2.71828, x = Value of X Range of possible values of X: -∞ to +∞ µ = Expected value of X (“the long run average”) σ2 = Variance of X. µ and σ are the parameters of the normal distribution — they completely define its shape Cont. 145
  • 146. 1. The mean µ tells you about location - – Increase µ - Location shifts right – Decrease µ – Location shifts left – Shape is unchanged 2. The variance σ2 tells you about narrowness or flatness of the bell - – Increase σ2 - Bell flattens. Extreme values are more likely – Decrease σ2 - Bell narrows. Extreme values are less likely – Location is unchanged Cont. 146
  • 147. 147
  • 148. School of Public Health Properties of the Normal Distribution 1. It is symmetrical about its mean, . 2. The mean, the median and mode are almost equal. It is unimodal. 3. The total area under the curve about the x-axis is 1 square unit. 4. The curve never touches the x-axis. 5. As the value of  increases, the curve becomes more and more flat and vice versa. 6. The distribution is completely determined by the parameters  and . 148
  • 149. 149
  • 150. • We cannot tabulate every possible distribution • Tabulated normal probability calculations are available only for the ND with µ = 0 and σ2=1. Cont. 150
  • 151. School of Public Health Standard Normal Distribution  It is a normal distribution that has a mean equal to 0 and a SD equal to 1, and is denoted by N(0, 1).  The main idea is to standardize all the data that is given by using Z-scores.  These Z-scores can then be used to find the area (and thus the probability) under the normal curve. 151
  • 152. School of Public Health The standard normal distribution has mean 0 and variance 1 • Approximately 68% of the area under the standard normal curve lies between ±1, about 95% between ±2, and about 99% between ±2.5 152
  • 153. School of Public Health Z - Transformation • If a random variable X~N(,) then we can transform it to a SND with the help of Z- transformation Z = x -   • Z represents the Z-score for a given x value 153
  • 154. • Consider redefining the scale to be in terms of how many SDs away from mean for normal distribution, μ=110 and σ=15. Value X = 50 65 80 95 110 125 140 155 170 Z = -4 -3 -2 -1 0 1 2 3 4 SDs from mean using Z= 𝑥−𝜇 𝜎 = 𝑥−110 15 Cont. 154
  • 155. • This process is known as standardization and gives the position on a normal curve with μ=0 and σ=1, i.e., the SND, Z. • A Z-score is the number of standard deviations that a given x value is above or below the mean. Cont. 155
  • 156. School of Public Health Some Useful Tips 156
  • 157. a) What is the probability that z < -1.96? (1) Sketch a normal curve (2) Draw a perpendicular line for z = -1.9 (3) Find the area in the table (4) The answer is the area to the left of the line P(z < - 1.96) = 0.0250 157
  • 158. b) What is the probability that -1.96 < z < 1.96? The area between the values P(-1.96 < z < 1.96) = 0.9750 - 0.0250 = 0.9500 158
  • 159. c) What is the probability that z > 1.96? • The answer is the area to the right of the line; found by subtracting table value from 1.0000; P(z > 1.96) =1.0000 - 0.9750 = 0.0250 159
  • 160. 160
  • 161. School of Public Health Exercise 1. Compute P(-1 ≤ Z ≤ 1.5) Ans: 0.7745 2. Find the area under the SND from 0 to 1.45 Ans: 0.4265 3. Compute P(-1.66 < Z < 2.85) Ans: 0.9493 161
  • 162. School of Public Health Applications of the Normal Distribution Example: • The diastolic blood pressures of males 35–44 years of age are normally distributed with µ = 80 mm Hg and σ2 = 144 mm Hg2 [σ = 12 mm Hg]. • Let individuals with BP above 95 mm Hg are considered to be hypertensive 162
  • 163. Cont. 163 a. What is the probability that a randomly selected male has a BP above 95 mm Hg? Approximately 10.6% of this population would be classified as hypertensive.
  • 164. b. What is the probability that a randomly selected male has a DBP above 110 mm Hg? Z = 110 – 80 = 2.50 12 P (Z > 2.50) = 0.0062 • Approximately 0.6% of the population has a DBP above 110 mm Hg Cont. 164
  • 165. c. What is the probability that a randomly selected male has a DBP below 60 mm Hg? Z = 60 – 80 = -1.67 12 P (Z < -1.67) = 0.0475 • Approximately 4.8% of the population has a DBP below 60 mm Hg Cont. 165
  • 166. School of Public Health Other Distributions 1. Student t-distribution 2. F- Distribution 3. 2 –Distribution 166
  • 168. • Researchers often use sample survey methodology to obtain information about a larger population by selecting and measuring a sample from that population. • Since population is too large, we rely on the information collected from the sample. • Inferences about the population are based on the information from the sample drawn from that population. Cont. 168
  • 169. • A sample is a collection of individuals selected from a larger population. • Sampling enables us to estimate the characteristic of a population by directly observing a portion of the population. Cont. 169
  • 171. Steps needed to select a sample and ensure that this sample will fulfill its goals. 1. Establish the study's objectives – The first step in planning a useful and efficient survey is to specify the objectives with as much detail as possible. – Without objectives, the survey is unlikely to generate valuable results. – Clarifying the aims of the survey is critical to its ultimate success. – The initial users and uses of the data should be identified at this stage. 171
  • 172. School of Public Health 2. Define the target population – The target population is the total population for which the information is required. – Specifically, the target population is defined by the following characteristics: • Nature of data required • Geographic location • Reference period • Other characteristics, such as socio-demographic characteristics Cont. 172
  • 173. 3. Decide on the data to be collected – The data requirements of the survey must be established. – To ensure that the requirements are operationally sound, the necessary data terms and definitions also need to be determined. 4. Set the level of precision – There is a level of uncertainty associated with estimates coming from a sample. – The sample-to-sample variation is what causes the sampling error. – Researchers can estimate the sampling error associated with a particular sampling plan, and try to minimize it. Cont. 173
  • 174. 5. Decide on the methods on measurement – Choose measuring instrument and method of approach to the population – Data about a person’s state of health may be obtained from statements that he/she makes or from a medical examination – The survey may employ a self-administered questionnaire, an interviewing 6. Preparing Frame – List of all members of the population – The elements must not overlap Cont. 174
  • 175. School of Public Health Sampling • The process of selecting a portion of the population to represent the entire population. • A main concern in sampling: – Ensure that the sample represents the population, and – The findings can be generalized. 175
  • 176. School of Public Health Advantages of sampling: • Feasibility: Sampling may be the only feasible method of collecting information. • Reduced cost: Sampling reduces demands on resource such as finance, personnel, and material. • Greater accuracy: Sampling may lead to better accuracy of collecting data • Sampling error: Precise allowance can be made for sampling error • Greater speed: Data can be collected and summarized more quickly 176
  • 177. School of Public Health Disadvantages of sampling: • There is always a sampling error. • Sampling may create a feeling of discrimination within the population. • Sampling may be inadvisable where every unit in the population is legally required to have a record. 177
  • 178. School of Public Health Errors in sampling 1) Sampling error: Errors introduced due to errors in the selection of a sample. – They cannot be avoided or totally eliminated. 2) Non-sampling error: - Observational error - Respondent error - Lack of preciseness of definition - Errors in editing and tabulation of data 178
  • 179. School of Public Health Sampling Methods Two broad divisions: A. Probability sampling methods B. Non-probability sampling methods 179
  • 180. School of Public Health Probability sampling • Involves random selection of a sample • A sample is obtained in a way that ensures every member of the population to have a known, non zero probability of being included in the sample. • The method chosen depends on a number of factors, such as – the available sampling frame, – how spread out the population is, – how costly it is to survey members of the population 180
  • 181. School of Public Health Most common probability sampling methods 1. Simple random sampling 2. Systematic random sampling 3. Stratified random sampling 4. Cluster sampling 5. Multi-stage sampling 181
  • 182. School of Public Health 1. Simple random sampling • Involves random selection • Each member of a population has an equal chance of being included in the sample. 182
  • 183. • To use a SRS method: – Make a numbered list of all the units in the population – Each unit should be numbered from 1 to N (where N is the size of the population) – Select the required number. Cont. 183
  • 184. • The randomness of the sample is ensured by: • use of “lottery’ methods • a table of random numbers Cont. 184
  • 185. School of Public Health Example • Suppose your school has 500 students and you need to conduct a short survey on the quality of the food served in the cafeteria. • You decide that a sample of 10 students should be sufficient for your purposes. • In order to get your sample, you assign a number from 1 to 500 to each student in your school. 185
  • 186. • To select the sample, you use a table of randomly generated numbers. • Pick a starting point in the table (a row and column number) and look at the random numbers that appear there. In this case, since the data run into three digits, the random numbers would need to contain three digits as well. Cont. 186
  • 187. • Ignore all random numbers after 500 because they do not correspond to any of the students in the school. • Remember that the sample is without replacement, so if a number recurs, skip over it and use the next random number. • The first 10 different numbers between 001 and 500 make up your sample. Cont. 187
  • 188. • SRS has certain limitations: – Requires a sampling frame. – Difficult if the reference population is dispersed. – Minority subgroups of interest may not be selected. Cont. 188
  • 189. School of Public Health 2. Systematic random sampling • Sometimes called interval sampling, systematic sampling means that there is a gap, or interval, between each selected unit in the sample • The selection is systematic rather than randomly 189
  • 190. • Important if the reference population is arranged in some order: – Order of registration of patients – Numerical number of house numbers – Student’s registration books • Taking individuals at fixed intervals (every kth) based on the sampling fraction, eg. if the sample includes 20%, then every fifth. Cont. 190
  • 191. School of Public Health Steps in systematic random sampling 1. Number the units on your frame from 1 to N (where N is the total population size). 2. Determine the sampling interval (K) by dividing the number of units in the population by the desired sample size. 3. Select a number between one and K at random. This number is called the random start and would be the first number included in your sample. 4. Select every Kth unit after that first number Note: Systematic sampling should not be used when a cyclic repetition is inherent in the sampling frame. 191
  • 192. School of Public Health Example • To select a sample of 100 from a population of 400, you would need a sampling interval of 400 ÷ 100 = 4. • Therefore, K = 4. • You will need to select one unit out of every four units to end up with a total of 100 units in your sample. • Select a number between 1 and 4 from a table of random numbers. 192
  • 193. • If you choose 3, the third unit on your frame would be the first unit included in your sample; • The sample might consist of the following units to make up a sample of 100: 3 , 7, 11, 15, 19...395, 399 (up to N, which is 400 in this case). Cont. 193
  • 194. • Using the above example, you can see that with a systematic sample approach there are only four possible samples that can be selected, corresponding to the four possible random starts: A. 1, 5, 9, 13...393, 397 B. 2, 6, 10, 14...394, 398 C. 3, 7, 11, 15...395, 399 D. 4, 8, 12, 16...396, 400 Cont. 194
  • 195. School of Public Health 3. Stratified random sampling • It is done when the population is known to be have heterogeneity with regard to some factors and those factors are used for stratification • Using stratified sampling, the population is divided into homogeneous, mutually exclusive groups called strata, and • A population can be stratified by any variable that is available for all units prior to sampling (e.g., age, sex, province of residence, income, etc.). • A separate sample is taken independently from each stratum. 195
  • 196. School of Public Health Why do we need to create strata? • That it can make the sampling strategy more efficient. • A larger sample is required to get a more accurate estimation if a characteristic varies greatly from one unit to the other. • For example, if every person in a population had the same salary, then a sample of one individual would be enough to get a precise estimate of the average salary. 196
  • 197. • Equal allocation: – Allocate equal sample size to each stratum • Proportionate allocation: , j = 1, 2, ..., k where, k is the number of strata and – nj is sample size of the jth stratum – Nj is population size of the jth stratum – n = n1 + n2 + ...+ nk is the total sample size – N = N1 + N2 + ...+ Nk is the total population size n n N N j j  Cont. 197
  • 198. School of Public Health 4. Cluster sampling • Sometimes it is too expensive to spread a sample across the population as a whole. • Travel costs can become expensive if interviewers have to survey people from one end of the country to the other. • To reduce costs, researchers may choose a cluster sampling technique • The clusters should be homogeneous, unlike stratified sampling where by the strata are heterogeneous 198
  • 199. School of Public Health Steps in cluster sampling • Cluster sampling divides the population into groups or clusters. • A number of clusters are selected randomly to represent the total population, and then all units within selected clusters are included in the sample. • No units from non-selected clusters are included in the sample—they are represented by those from selected clusters. • This differs from stratified sampling, where some units are selected from each group. 199
  • 200. School of Public Health Example • In a school based study, we assume students of the same school are homogeneous. • We can select randomly sections and include all students of the selected sections only 200
  • 201. • Sometimes a list of all units in the population is not available, while a list of all clusters is either available or easy to create. • In most cases, the main drawback is a loss of efficiency when compared with SRS. • It is usually better to survey a large number of small clusters instead of a small number of large clusters. – This is because neighboring units tend to be more alike, resulting in a sample that does not represent the whole spectrum of opinions or situations present in the overall population. • Another drawback to cluster sampling is that you do not have total control over the final sample size. Cont. 201
  • 202. School of Public Health 5. Multi-stage sampling • Similar to the cluster sampling. • But it involves picking a sample from within each chosen cluster, rather than including all units in the cluster. • This type of sampling requires at least two stages. • In the first stage, large groups or clusters are identified and selected. • In the second stage, population units are picked from within the selected clusters (using any of the possible probability sampling methods) for a final sample. 202
  • 203. • If more than two stages are used, the process of choosing population units within clusters continues until there is a final sample. • Also, you do not need to have a list of all of the units in the population. All you need is a list of clusters and list of the units in the selected clusters. • Admittedly, more information is needed in this type of sample than what is required in cluster sampling. However, multi-stage sampling still saves a great amount of time and effort by not having to create a list of all the units in a population. Cont. 203
  • 204. School of Public Health B. Non-probability sampling • The difference between probability and non- probability sampling has to do with a basic assumption about the nature of the population under study. • In probability sampling, every item has a known chance of being selected. • In non-probability sampling, there is an assumption that there is an even distribution of a characteristic of interest within the population. 204
  • 205. • In non-probability sampling, since elements are chosen arbitrarily, there is no way to estimate the probability of any one element being included in the sample. • Also, no assurance is given that each item has a chance of being included, making it impossible either to estimate sampling variability or to identify possible bias Cont. 205
  • 206. • Reliability cannot be measured in non-probability sampling; the only way to address data quality is to compare some of the survey results with available information about the population. • Still, there is no assurance that the estimates will meet an acceptable level of error. • Researchers are reluctant to use these methods because there is no way to measure the precision of the resulting sample. Cont. 206
  • 207. • Despite these drawbacks, non-probability sampling methods can be useful when descriptive comments about the sample itself are desired. • There are also other circumstances, such as researches, when it is unfeasible or impractical to conduct probability sampling. Cont. 207
  • 208. School of Public Health The most common types of non- probability sampling 1. Convenience or haphazard sampling 2. Volunteer sampling 3. Judgment sampling 4. Quota sampling 5. Snowball sampling technique 208
  • 209. School of Public Health 1. Convenience or haphazard sampling • Convenience sampling is sometimes referred to as haphazard or accidental sampling. • It is not normally representative of the target population because sample units are only selected if they can be accessed easily and conveniently. 209
  • 210. • The obvious advantage is that the method is easy to use, but that advantage is greatly offset by the presence of bias. • Although useful applications of the technique are limited, it can deliver accurate results when the population is homogeneous. Cont. 210
  • 211. • For example, a scientist could use this method to determine whether a lake is polluted or not. • Assuming that the lake water is well-mixed, any sample would yield similar information. • A scientist could safely draw water anywhere on the lake without bothering about whether or not the sample is representative Cont. 211
  • 212. School of Public Health 2. Volunteer sampling • As the term implies, this type of sampling occurs when people volunteer to be involved in the study. • In psychological experiments or pharmaceutical trials (drug testing), for example, it would be difficult and unethical to enlist random participants from the general public. • In these instances, the sample is taken from a group of volunteers. 212
  • 213. • Sometimes, the researcher offers payment to attract respondents. • In exchange, the volunteers accept the possibility of a lengthy, demanding or sometimes unpleasant process. Cont. 213
  • 214. • Sampling voluntary participants as opposed to the general population may introduce strong biases. • Often in opinion polling, only the people who care strongly enough about the subject tend to respond. • The silent majority does not typically respond, resulting in large selection bias. Cont. 214
  • 215. School of Public Health 3. Judgment sampling • This approach is used when a sample is taken based on certain judgments about the overall population. • The underlying assumption is that the investigator will select units that are characteristic of the population. • The critical issue here is objectivity: how much can judgment be relied upon to arrive at a typical sample? 215
  • 216. • Judgment sampling is subject to the researcher's biases and is perhaps even more biased than haphazard sampling. • Since any preconceptions the researcher may have are reflected in the sample, large biases can be introduced if these preconceptions are inaccurate. Cont. 216
  • 217. • Researchers often use this method in exploratory studies like pre-testing of questionnaires and focus groups. • They also prefer to use this method in laboratory settings where the choice of experimental subjects (i.e., animal, human) reflects the investigator's pre-existing beliefs about the population. Cont. 217
  • 218. • One advantage of judgment sampling is the reduced cost and time involved in acquiring the sample. Cont. 218
  • 219. School of Public Health 4. Quota sampling • This is one of the most common forms of non- probability sampling. • Sampling is done until a specific number of units (quotas) for various sub-populations have been selected. 219
  • 220. • Since there are no rules as to how these quotas are to be filled, quota sampling is really a means for satisfying sample size objectives for certain sub-populations. Cont. 220
  • 221. • As with all other non-probability sampling methods, in order to make inferences about the population, it is necessary to assume that persons selected are similar to those not selected. • Such strong assumptions are rarely valid. Cont. 221
  • 222. • The main argument against quota sampling is that it does not meet the basic requirement of randomness. • Some units may have no chance of selection or the chance of selection may be unknown. • Therefore, the sample may be biased. Cont. 222
  • 223. • Quota sampling is generally less expensive than random sampling. • It is also easy to administer, especially considering the tasks of listing the whole population, randomly selecting the sample and following-up on non- respondents can be omitted from the procedure. Cont. 223
  • 224. • Quota sampling is an effective sampling method when information is urgently required and can be carried out sampling frames. • In many cases where the population has no suitable frame, quota sampling may be the only appropriate sampling method. Cont. 224
  • 225. School of Public Health 5. Snowball sampling • A technique for selecting a research sample where existing study subjects recruit future subjects from among their acquaintances. • Thus the sample group appears to grow like a rolling snowball. 225
  • 226. • This sampling technique is often used in hidden populations which are difficult for researchers to access; example populations would be drug users or commercial sex workers. • Because sample members are not selected from a sampling frame, snowball samples are subject to numerous biases. For example, people who have many friends are more likely to be recruited into the sample. Cont. 226
  • 227. School of Public Health Estimation
  • 228. School of Public Health • Up until this point, we have assumed that the values of the parameters of a probability distribution are known. • In the real world, the values of these population parameters are usually not known • Instead, we must try to say something about the way in which a random variable is distributed using the information contained in a sample of observations 228
  • 229. School of Public Health • The process of drawing conclusions about an entire population based on the data in a sample is known as statistical inference. • Methods of inference usually fall into one of two broad categories: ** Estimation or Hypothesis testing ** • For now, we will focus on using the observations in a sample to estimate a population parameter 229
  • 230. School of Public Health Estimation • It is concerned with estimating the values of specific population parameters based on sample statistics. • It is about using information in a sample to make estimates of the characteristics (parameters) of the source population. 230
  • 231. School of Public Health Estimation, Estimator & Estimate ♣ Estimation is the computation of a statistic from sample data, often yielding a value that is an approximation (guess) of its target, an unknown true population parameter value. ♣ The statistic itself is called an estimator and can be of two types - point estimator or interval estimator. ♣ The value or values that the estimator assumes are called estimates. 231
  • 232. School of Public Health • Point estimation involves the calculation of a single number to estimate the population parameter • Interval estimation specifies a range of reasonable values for the parameter Thus, – A point estimate is of the form: [ Value ], – Whereas, an interval estimate is of the form: [ lower limit, upper limit ] Point versus Interval Estimators 232
  • 233. School of Public Health 1. Point Estimate • A single numerical value used to estimate the corresponding population parameter. Sample Statistics are Estimators of Population Parameters Sample mean, 𝑿 Sample variance, S2 Sample proportion, Sample Odds Ratio, OŔ Sample Relative Risk, RŔ Sample correlation coefficient, r µ 2 P or π OR RR ρ 233
  • 234. School of Public Health 2. Interval Estimation • Interval estimation specifies a range of reasonable values for the population parameter based on a point estimate. • A confidence interval is a particular type of interval estimator and Give a plausible range of values of the estimate likely to include the “true” (population) value with a given confidence level. Also give information about the precision of an estimate. Wider CIs indicate less certainty. CIs can also answer the question of whether or not an association exists 234
  • 235. School of Public Health Confidence Level • Confidence Level – Confidence in which the interval will contain the unknown population parameter • P (L, U) = (1 - α) 235
  • 236. School of Public Health Estimation for Single Population 236
  • 237. School of Public Health 1. CI for a Single Population Mean (normally distributed) A. Known variance (large sample size) • There are 3 elements to a CI: 1. Point estimate 2. SE of the point estimate 3. Confidence coefficient • Consider the task of computing a CI estimate of μ for a population distribution that is normal with σ known. • Available are data from a random sample of size = n. 237
  • 238. School of Public Health Assumptions  Population standard deviation () is known  Population is normally distributed • A 100(1-)% C.I. for  is:   is to be chosen by the researcher, most common values of  are 0.05, 0.01 and 0.1. Cont. 238
  • 239. School of Public Health Margin of Error (Precision of the estimate) 239
  • 240. School of Public Health Cont.  As n increases, the CI decreases.  As s increases, the length of CI increases.  As the confidence level increases (α decreases), the length of CI increases. 240
  • 241. School of Public Health Example: 1. Waiting times (in hours) at a particular hospital are believed to be approximately normally distributed with a variance of 2.25 hr. a. A sample of 20 outpatients revealed a mean waiting time of 1.52 hours. Construct the 95% CI for the estimate of the population mean. b. Suppose that the mean of 1.52 hours had resulted from a sample of 32 patients. Find the 95% CI. c. What effect does larger sample size have on the CI? 241
  • 242. a. ) 17 . 2 , 87 . 0 ( 65 . 52 . 1 ) 33 (. 96 . 1 52 . 1 20 25 . 2 96 . 1 52 . 1       • We are 95% confident that the true mean waiting time is between 0.87 and 2.17 hrs.  An incorrect interpretation is that there is 95% probability that this interval contains the true population mean. Solution: 242
  • 243. b. c. The larger the sample size makes the CI narrower (more precision). ) .05 2 , 99 (. 53 . 52 . 1 ) 27 (. 96 . 1 52 . 1 32 25 . 2 96 . 1 52 . 1       Cont. 243
  • 244. School of Public Health Cont. B. Unknown variance (small sample size, n ≤ 30) • What if the  for the underlying population is unknown and the sample size is small? • As an alternative we use Student’s t distribution. 244
  • 245. School of Public Health Cont. 245
  • 246. School of Public Health Example • Standard error = • t-value at 90% CL at 19 df =1.729 246
  • 247. School of Public Health Cont. 247
  • 248. School of Public Health Exercise • Compute a 95% CI for the mean birth weight based on n = 10, sample mean = 116.9 oz and s =21.70. • From the t Table, t9, 0.975 = 2.262 • Ans: (101.4, 132.4) 248
  • 249. School of Public Health 2. CIs for single population proportion, p • Is based on three elements of CI – Point estimate – SE of point estimate – Confidence coefficient 249
  • 250. School of Public Health Cont. 250
  • 251. School of Public Health Example 1 • A random sample of 100 people shows that 25 are left-handed. Form a 95% CI for the true proportion of left-handers. 251
  • 252. School of Public Health Interpretation 252
  • 253. School of Public Health Example 2 • Suppose that among 10,000 female operating-room nurses, 60 women have developed breast cancer over five years. Find the 95% for p based on point estimate. • Point estimate = 60/10,000 = 0.006 • The 95% CI for p is given by the interval: • The 95% CI for p is: 253
  • 254. School of Public Health Hypothesis Testing
  • 255. School of Public Health • The purpose of Hypothesis Testing is to aid the researcher in reaching a decision (conclusion) concerning a population by examining a sample from that population. 255
  • 256. School of Public Health Hypothesis • Is a statement about one or more • Is a claim (assumption) about a population parameter The purpose of Hypothesis Testing is to aid the researcher in reaching a decision (conclusion) concerning a population by examining a sample from that population. 256
  • 257. School of Public Health Examples of Research Hypotheses Population Mean • The average length of stay of patients admitted to the hospital is five days • The mean birth weight of babies delivered by mothers with low SES is lower than those from higher SES. • Etc 257
  • 258. School of Public Health Types of Hypothesis 1. The Null Hypothesis, H0  Is a statement claiming that there is no difference between the hypothesized value and the population value.  (The effect of interest is zero = no difference)  States the assumption (hypothesis) to be tested  H0 is always about a population parameter, not about a sample statistic  Begin with the assumption that the Ho is true – Similar to the notion of innocent until proven guilty 258
  • 259. School of Public Health 2. The Alternative Hypothesis, HA • Is a statement of what we will believe is true if our sample data causes us to reject Ho. • Is generally the hypothesis that is believed (or needs to be supported) by the researcher. • Is a statement that disagrees (opposes) with Ho (The effect of interest is not zero) Cont. 259
  • 260. School of Public Health Steps in Hypothesis Testing 1. Formulate the appropriate statistical hypotheses clearly • Specify HO and HA H0:  = 0 H0:  ≤ 0 H0:  ≥ 0 H1:   0 H1:  > 0 H1:  < 0 two-tailed one-tailed one-tailed 2. State the assumptions necessary for computing probabilities • A distribution is approximately normal (Gaussian) • Variance is known or unknown 260
  • 261. School of Public Health 3. Select a sample and collect data • Categorical, continuous 4. Decide on the appropriate test statistic for the hypothesis. E.g., One population OR Cont. 261
  • 262. School of Public Health 5. Specify the desired level of significance for the statistical test (=0.05, 0.01, etc.) 6. Determine the critical value. – A value the test statistic must attain to be declared significant. -1.96 1.96 1.645 -1.645 Cont. 262
  • 263. School of Public Health 7. Obtain sample evidence and compute the test statistic 8. Reach a decision and draw the conclusion • If Ho is rejected, we conclude that HA is true (or accepted). • If Ho is not rejected, we conclude that Ho may be true. 263
  • 264. School of Public Health Rules for Stating Statistical Hypotheses 1. One population • Indication of equality (either =, ≤ or ≥) must appear in Ho. Ho: μ = μo, HA: μ ≠ μo Ho: P = Po, HA: P ≠ Po • Can we conclude that a certain population mean is – not 50? Ho: μ = 50 and HA: μ ≠ 50 – greater than 50? Ho: μ ≤ 50 HA: μ > 50 264
  • 265. School of Public Health • Can we conclude that the proportion of patients with leukemia who survive more than six years is not 60%? Ho: P = 0.6 HA: P ≠ 0.6 Cont. 265
  • 266. School of Public Health Statistical Decision • Reject Ho if the value of the test statistic that we compute from our sample is one of the values in the rejection region • Don’t reject Ho if the computed value of the test statistic is one of the values in the non- rejection region. 266
  • 267. School of Public Health Another way to state conclusion • Reject Ho if P-value < α • Accept Ho if P-value ≥ α  P-value is the probability of obtaining a test statistic as extreme as or more extreme than the actual test statistic obtained if the Ho is true  The larger the test statistic, the smaller is the P- value. OR, the smaller the P-value the stronger the evidence against the Ho. 267
  • 268. School of Public Health Types of Errors in Hypothesis Tests • Whenever we reject or accept the Ho, we commit errors. • Two types of errors are committed. – Type I Error – Type II Error 268
  • 269. School of Public Health Type I Error • The probability of a type I error is the probability of rejecting the Ho when it is true • The probability of type I error is α • Called level of significance of the test • Set by researcher in advance 269
  • 270. School of Public Health Type II Error • The error committed when a false Ho is not rejected • The probability of Type II Error is  • Usually unknown but larger than α 270
  • 271. Action (Conclusion) Reality Ho True Ho False Do not reject Ho Correct action (Prob. = 1-α) Type II error (β) (Prob. = β= 1-Power) Reject Ho Type I error (α) (Prob. = α = Sign. level) Correct action (Prob. = Power = 1-β) Cont. 271
  • 272. School of Public Health Type I & II Error Relationship 272
  • 273. School of Public Health Hypothesis Testing of a Single Mean (Normally Distributed) 273
  • 274. School of Public Health Known Variance 274
  • 275. School of Public Health Example: Two-Tailed Test 1. A simple random sample of 10 people from a certain population has a mean age of 27. Can we conclude that the mean age of the population is not 30? The variance is known to be 20. Let = .05.  Answer, "Yes we can, if we can reject the Ho that it is 30." A. Data n = 10, sample mean = 27, 2 = 20, α = 0.05 B. Assumptions Simple random sample Normally distributed population 275
  • 276. School of Public Health C. Hypotheses Ho: µ = 30 HA: µ ≠ 30 D. Test statistic As the population variance is known, we use Z as the test statistic. Cont. 276
  • 277. School of Public Health E. Decision Rule  Reject Ho if the Z value falls in the rejection region.  Don’t reject Ho if the Z value falls in the non-rejection region.  Because of the structure of Ho it is a two tail test. Therefore, reject Ho if Z ≤ -1.96 or Z ≥ 1.96. Cont. 277
  • 278. School of Public Health F. Calculation of test statistic G. Statistical decision We reject the Ho because Z = -2.12 is in the rejection region. The value is significant at 5% α. H. Conclusion We conclude that µ is not 30. P-value = 0.0340 A Z value of -2.12 corresponds to an area of 0.0170. Since there are two parts to the rejection region in a two tail test, the P-value is twice this which is .0340. 278
  • 279. School of Public Health Hypothesis test using confidence interval • A problem like the above example can also be solved using a confidence interval. • A confidence interval will show that the calculated value of Z does not fall within the boundaries of the interval. However, it will not give a probability. • Confidence interval 279
  • 280. School of Public Health Example: One -Tailed Test • A simple random sample of 10 people from a certain population has a mean age of 27. Can we conclude that the mean age of the population is less than 30? The variance is known to be 20. Let α = 0.05. • Data n = 10, sample mean = 27, 2 = 20, α = 0.05 • Hypotheses Ho: µ ≥ 30, HA: µ < 30 280
  • 281. School of Public Health • Test statistic • Rejection Region • With α = 0.05 and the inequality, we have the entire rejection region at the left. The critical value will be Z = -1.645. Reject Ho if Z < -1.645. = Lower tail test 281
  • 282. School of Public Health • Statistical decision – We reject the Ho because -2.12 < -1.645. • Conclusion – We conclude that µ < 30. – p = .0170 this time because it is only a one tail test and not a two tail test. Cont. 282
  • 283. School of Public Health Unknown Variance • In most practical applications the standard deviation of the underlying population is not known • In this case,  can be estimated by the sample standard deviation s. • If the underlying population is normally distributed, then the test statistic is: 283
  • 284. School of Public Health Example: Two-Tailed Test • A simple random sample of 14 people from a certain population gives a sample mean body mass index (BMI) of 30.5 and sd of 10.64. Can we conclude that the BMI is not 35 at α 5%? • Ho: µ = 35, HA: µ ≠35 • Test statistic • If the assumptions are correct and Ho is true, the test statistic follows Student's t distribution with 13 degrees of freedom. 284
  • 285. School of Public Health • Decision rule – We have a two tailed test. With α = 0.05 it means that each tail is 0.025. The critical t values with 13 df are -2.1604 and 2.1604. – We reject Ho if the t ≤ -2.1604 or t ≥ 2.1604. • Do not reject Ho because -1.58 is not in the rejection region. Based on the data of the sample, it is possible that µ = 35. P-value = 0.1375 Cont. 285
  • 286. School of Public Health Sampling from a population that is not normally distributed • Here, we do not know if the population displays a normal distribution. • However, with a large sample size, we know from the Central Limit Theorem that the sampling distribution of the population is distributed normally. 286
  • 287. School of Public Health • With a large sample, we can use Z as the test statistic calculated using the sample sd. Cont. 287
  • 288. School of Public Health Hypothesis Tests for Proportions • Involves categorical values • Two possible outcomes – “Success” (possesses a certain characteristic) – “Failure” (does not possesses that characteristic) • Fraction or proportion of population in the “success” category is denoted by p 288
  • 289. School of Public Health Proportions 289
  • 290. School of Public Health (Normal Approximation to Binomial Distribution) Hypothesis Testing about a Single Population Proportion 290
  • 291. School of Public Health 291
  • 292. School of Public Health Example • We are interested in the probability of developing asthma over a given one-year period for children 0 to 4 years of age whose mothers smoke in the home. In the general population of 0 to 4-year-olds, the annual incidence of asthma is 1.4%. If 10 cases of asthma are observed over a single year in a sample of 500 children whose mothers smoke, can we conclude that this is different from the underlying probability of p0 = 0.014? Α = 5% H0 : p = 0.014 HA: p ≠ 0.014 292
  • 293. School of Public Health • The test statistic is given by: Cont. 293
  • 294. School of Public Health • The critical value of Zα/2 at α=5% is ±1.96. • Don’t reject Ho since Z (=1.14) in the non-rejection region between ±1.96. • P-value = 0.2548 • We do not have sufficient evidence to conclude that the probability of developing asthma for children whose mothers smoke in the home is different from the probability in the general population Cont. 294
  • 295. School of Public Health Sample size determination 295
  • 296. School of Public Health Sample Size • Sample Size: The number of study subjects selected to represent a given study population. • In estimating a certain characteristic of a population, sample size calculations are important to ensure that estimates are obtained with required precision or confidence • Should be sufficient to represent the characteristics of interest of the study population. 296
  • 297. School of Public Health Sample size determination depends on the: – Objective of the study – Design of the study • Descriptive/Analytic – Accuracy of the measurements to be made – Degree of precision required for generalization – Plan for statistical analysis – Degree of confidence with which to conclude Cont. 297
  • 298. School of Public Health • The feasible sample size is also determined by the availability of resources: – time – manpower – transport – available facility, and – money Cont. 298
  • 299. School of Public Health Sample size for single sample A. Sample size for estimating a single population mean B. Sample size to estimate a single population proportion 299
  • 300. School of Public Health A. Sample size for estimating a single population mean • where d = Margin of error = = Absolute precision = Half of the width (w) of CI 300 Where d = e in some text books
  • 301. School of Public Health Examples: 1. Find the minimum sample size needed to estimate the drop in heart rate (µ) for a new study using a higher dose of propranolol than the standard one. We require that the two-sided 95% CI for µ be no wider than 5 beats per minute and the sample sd for change in heart rate equals 10 beats per minute. n = (1.96)2 102 /(2.5)2 = 62 patients 301
  • 302. 2. Suppose that for a certain group of cancer patients, we are interested in estimating the mean age at diagnosis. We would like a 95% CI of 5 years wide. If the population SD is 12 years, how large should our sample be? 302
  • 303. • Suppose d=1 • Then the sample size increases Cont. 303
  • 304. School of Public Health 3. A hospital director wishes to estimate the mean weight of babies born in the hospital. How large a sample of birth records should be taken if she/he wants a 95% CI of 0.5 wide? Assume that a reasonable estimate of  is 2. Ans: 246 birth records. Cont. 304
  • 305. School of Public Health But the population 2 is most of the time unknown As a result, it has to be estimated from: • Pilot or preliminary sample: – Select a pilot sample and estimate 2 with the sample variance, s2 • Previous or similar studies 305
  • 306. School of Public Health B. Sample size to estimate a single population proportion 306
  • 307. 1. Suppose that you are interested to know the proportion of infants who breastfed >18 months of age in a rural area. Suppose that in a similar area, the proportion (p) of breastfed infants was found to be 0.20. What sample size is required to estimate the true proportion within ±3% points with 95% confidence. Let p=0.20, d=0.03, α=5% Cont. 307
  • 308. School of Public Health Sample Size: Two Samples A. Estimation of the difference between two population means B. Estimation of the difference between two population proportions 308
  • 309. School of Public Health A. Sample size for estimating a difference in two means 309
  • 310. School of Public Health B. Sample size for estimating a difference in two proportions 310
  • 311. School of Public Health Data Screening 311
  • 312. School of Public Health Data check entry • One of the first steps to proper data screening is to ensure the data is correct – Check out each person’s entry individually • Makes sense if small data set or proper data checking procedure • Can be too costly so… – range of data should be checked 312
  • 313. School of Public Health Normality • All of the continuous data we are covering need to follow a normal curve • Skewness (univariate) – this represents the spread of the data 313
  • 314. School of Public Health Cont. • skewness statistic is output by SPSS and SE skewness is 3.2 violation of skewness assumption Skewness skewness Skewness skewness S Z SE Z   314
  • 315. School of Public Health Cont. • Kurtosis (univariate) – is how peaked the data is; Kurtosis stat output by SPSS • Kurtosis standard error – for most statistics the skewness assumption is more important that the kurtosis assumption 3.2 violation of kurtosis assumption Kurtosis kurtosis Kurtosis kurtosis S Z SE Z   315
  • 316. School of Public Health Outliers • technically it is a data point outside of you distribution; so potentially detrimental because may have undo effect on distribution 316
  • 317. School of Public Health Linearity • relationships among variables are linear in nature; assumption in most analyses 317
  • 318. School of Public Health Homoscedasticity • For grouped data this is the same as homogeneity of variance • For ungrouped data – variability for one variables is the same at all levels of another variable (no variance interaction) 318
  • 319. School of Public Health Multicollinearity/Singularity • If correlations between two variables are excessive (e.g. 0.95) then this represents multicollinearity • If correlation is 1 then you have singularity • Often Multicollinearity/Singularity occurs in data because one variable is a near duplicate of another 319