SlideShare a Scribd company logo
1 of 206
Download to read offline
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Basic Biostatistics for MPH students
11/17/2018 1
Arsi University, College of Health Science,
Department of Public Health
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Course content
Topics Facilitator
1. Introduction
2. Methods of data collection and presentation
3. Summery measures
Mr. Teresa Kisi (MPH in
Epidemiology and
Biostatistics, Assist.
Prof.)
Email:
terek7@gmail.com
4. Probability and probability distributions
5. Sampling methods and sample size
determination
5. Statistical inference
2
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Course description
This course covers both descriptive and some
intermediate inferential level statistics for public
health. The descriptive statistics deals with frequency
distribution, measures of central tendency and
variability; probability and probability distributions;
sampling and sample size determination; statistical
estimation and sampling distributions and hypothesis
testing.
11/17/2018 3
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Learning Objectives:
 At the end of the course we will be able to:
– Discuss the role of statistics in health science and explain
the main uses of statistical methods in the broader field of
health care;
– Describe methods of collection, recording, and present
data in the form of tables, graphs etc;
– Calculate measures of central tendency and dispersion
– Apply different sample size determination and sampling
techniques
– Explain the context and meaning of statistical estimation
and hypothesis testing.
11/17/2018 4
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Evaluation
Evaluation criteria Percent
Assignments 40%
Final exam 60%
11/17/2018 5
NB: Grading will be as per the grading scale of the university registrar
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Chapter one:
Introduction to Biostatistics
Objectives of the chapter
 After completing this chapter, we will be able to:
– Define Statistics and Biostatistics
– Enumerate the importance and limitations of
statistics
– Define and Identify the different types of variable
and list why we need to classify variables
6
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Objectives cont’d…
– Identify the different methods of medical and
biological data organization and presentation
– Identify the criterion for the selection of a method
to organize and present data
– Discuss data summarization methods
7
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Statistics?
8
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Statistics
 The science of assembling and interpreting numerical
data (Bland, 2000)
 The discipline concerned with the treatment of
numerical data derived from groups of individuals
(Armitage et al., 2001).
 Generally the term statistics is used to mean either
statistical data or statistical methods.
9
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Statistics cont’d…
Statistical data: refers to numerical
descriptions of things. These descriptions may
take the form of counts or measurements.
E.g. statistics of malaria cases include fever
cases, number of positives obtained, sex and
age distribution of positive cases, etc.
10
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Statistics cont’d…
 NB: Even though statistical data always denote
figures (numerical descriptions), it must be
remembered that all 'numerical descriptions' are
not statistical data.
Why?
11
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Statistics cont’d…
 Statistical methods: refers methods that are used
for collecting, organising, analyzing and
interpreting numerical data for understanding a
phenomenon or making wise decisions. In this sense
it is a branch of scientific method and helps us to
know in a better way the objective under study.
12
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Biostatistics?
13
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
 Biostatistics: The tools of statistics are employed in
many fields - business, education, psychology,
agriculture, and economics, to mention only few.
 When the data being analyzed are derived from the
public health data, biological sciences and medicine,
we use the term biostatistics to distinguish this
particular application of statistical tools and
concepts.
14
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
–Types of biostatistics?
15
11/17/2018
Types of biostatistics
collection
organizing
summarizing
presenting of data
Descriptive Statistics
making inferences
hypothesis testing
determining relationship
making the prediction
Inferential Statistics
Biostatistics
16
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Types of Biostatistics
1. Descriptive (exploratory) statistics: is the aspect of
collecting, organization, presentation and
summarization of data.
These include techniques for tabular and graphical
presentation of data as well as the methods used to
summarize a body of data with one or two
meaningful figures
E.g. At our health centre, 50 patients were diagnosed
with angina last year.
17
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Descriptive statistics cont’d …
 Some statistical summaries which are especially
common in descriptive analyses are:
Measures of central tendency
Measures of dispersion
Cross-tabulation /contingency table
Histogram
Quantile, Q-Q plot
Scatter plot
Box plot
18
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
2. Inferential Statistics:
 Consists of generalizing from samples to population,
performing hypothesis testing, determining relation
among variables, and making prediction.
 This branch of statistics deals with techniques of making
conclusions about population.
 The inferences are drawn from particular properties of
sample to particular properties of population.
Inferential statistics builds upon descriptive statistics.
19
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Inferential Statistics cont’d...
NB: They encompasses a variety of procedures to
ensure that the inferences are sound and rational,
even though they may not always be correct.
20
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Statistical inference cont’d…
 In short, inferential statistics enables us to make
confident decisions in the face of uncertainty.
E.g. Antibiotics reduce the duration of viral throat
infections by 1-2 days.
Five percent of women aged 30-49 consult their GP
each year with heavy menstrual bleeding.
21
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Summery
Descriptive statistical methods
– Provide summary indices for a given data, e.g.
arithmetic mean, median, standard deviation,
coefficient of variation, etc.
Inductive (inferential) statistical methods
– Produce statistical inferences about a population
based on information from a sample derived from
the population, need to take variation into account
– Estimating population values from sample values 22
sample Population
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Summery cont’d …
• E.g.
At our health centre, 50 patients were diagnosed
with angina last year. (descriptive )
Antibiotics reduce the duration of viral throat
infections by 1-2 days. (inferential)
Five percent of women aged 30-49 consult their GP
each year with heavy menstrual bleeding.
(inferential)
23
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
• Why we need biostatistics?
24
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Why we need biostatistics?
 Main reason: handling variations:
o Biological variation
–Among individuals as well as within same
individual over time
»Example: height, weight, blood pressure,
eye color ...
o Sample variation:
Biomedical research projects are usually carried
out on small numbers of study subjects
25
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Why need to learn biostatistics? Cont’d....
 Essential for scientific method of investigation
– Formulate hypothesis
– Design study to objectively test hypothesis
– Collect reliable and unbiased data
– Process and evaluate data rigorously
– Interpret and draw appropriate conclusions
26
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Why need to learn biostatistics? Cont’d....
 Essential for understanding, appraisal and critique of
scientific literature
 Public health and medicine are becoming
increasingly quantitative.
27
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
limitations of statistics:
 It deals with only those subjects of inquiry that are
capable of being quantitatively measured and
numerically expressed.
 It deals on aggregates of facts and no importance is
attached to individual items – suited only if the group
characteristics are desired to be studied.
 Statistical data are only approximation and not
mathematically correct.
28
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
variables
 Variable: A variable is a characteristic under study
that assumes different values for different elements.
or it is a characteristic or attribute that can assume
different value.
Some examples of variables include:
 Diastolic blood pressure,
 heart rate, height,
 The weight and
 Stage of bladder cancer to list some
29
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
variables cont’d…
 Random variable: are varibles whose value are
determined by chance.
 Data: the measurements or observatuions (values)
for a variable
 Data set: it is a collection of observation on a
variable.
30
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
variables cont’d…
31
variables Data Data set
Values Many
Mrs. brown Mr. Patel Mr. Amanda
Age 32 24 20
Sex Female Male Male
Blood type O O A
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Types of variables
 Depending on the characteristic of the measurement,
variable can be:
Qualitative(Categorical) variable
A variable or characteristic which cannot be
measured in quantitative form. But, can only be
identified by name or categories, or variable that
can be placed into distinct categories, according to
some characteristic or attribute.
 For instance place of birth, ethnic group, type of
drug, stages of breast cancer (I, II, III, or IV),
degree of pain (low, moderate, sever or
unbearable). 32
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Types of variables cont’d…
• The categories should be clear cut (not overlapping)
and cover all the possibilities. For example, sex
(male or female), disease stage (depends on disease),
ever smoked (yes or no).
33
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Types of variables cont’d…
Quantitative (Numerical) variable:
 Is one that can be measured and expressed numerically.
 They can be of two types
Discrete Data
The values of a discrete variable are usually whole
numbers, such as the number of episodes of
diarrhoea in the first five years of life.
Observations can only take certain numerical values
Numerical discrete data occur when the observations
are integers that correspond with a count of some
sort. 34
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Discrete Data cont’d…
 Some common examples are:
 The number of bacteria colonies on a plate,
 The number of cells within a prescribed area
upon microscopic examination,
 The number of heart beats within a specified
time interval,
 A mother’s history of numbers of births ( parity)
and pregnancies (gravidity),
 The number of episodes of illness a patient
experiences during some time period, etc.
35
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Continuous Data
A continuous variable is a measurement on a
continuous scale
Each observation theoretically falls somewhere
along a continuum.
One is not restricted, in principle, to particular
values such as the integers of the discrete scale.
most clinical measurements, such as:
 Blood pressure,
 Serum cholesterol level,
 Height, weight, age etc. are on a numerical
continuous scale. 36
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Continuous Data cont’d…
Continuous data are used to report a measurement
of the individual that can take on any value within
an acceptable range.
37
Data
Qualitative
Quantitative
Discrete Continuous
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Scales of measurement
Data comes in various sizes and shapes and it is
important to know about these so that the proper
analysis can be used on the data.
There are four at which we measure:
Nominal scales of measurement
It may be thought of as "naming" level. This level of
measurement do not put subjects in any particular
order. There is no logical basis for saying one
category is higher or less than the other category. In
research activities a YES/NO scale is nominal.
38
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Nominal scales of cont’d…
The simplest data consist of unordered,
dichotomous, or "either ------- or" types of
observations, i.e., either the patient lives or the
patient dies, either he has some particular
attribute or he does not.
 Examples are: Blood group, Gender, religious
affiliation
39
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Nominal scales cont’d…
 The nominal level of measurement classifies data
into mutually exclusive (non over lapping),
exhaustive categories in which no order or ranking
can be imposed on the data
40
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Ordinal Scales of Measurement
An ordinal scale is next up the list in terms of power of
measurement. The simplest ordinal scale is a ranking.
At this level we put subjects in order from lowest to
height.
It is important to know that ranks do not tell us by
how much subjects differ.
There is no objective distance between any two points
on your subjective scale.
Hence, an ordinal scale only lets you interpret gross
order and not the relative positional distances.
41
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Ordinal Scales cont’d…
 E.g. If we told that third year students have better
knowledge than first year student, then we do not
know by how much they are better.
To measure the amount of the difference between
subjects we need the next level of measurement.
42
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Ordinal Scales cont’d…
Some of the examples under this scales of
measurement includes:
• Academic status, job satisfaction index,
employment status, response to treatment
(none, slow, moderate, fast)
• like art scale:
1. strongly agree
2. agree
3. no opinion
4. disagree
5. strongly disagree
43
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Ordinal Scales cont’d…
 The ordinal level of measurement classifies data
into categories that can be ranked; however, precise
differences between the ranks do not exist.
44
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Interval Scales of Measurement
 It is more powerful than nominal and ordinal as it not
only orders or categories but also shows exact
distances in between.
 On interval measurement scales, one unit on the scale
represents the same magnitude on the trait or
characteristic being measured across the whole range
of the scale.
 They do not have a "true" zero point, however, and
therefore it is not possible to make statements about
how many times higher one score is than another.
45
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Interval Scales cont’d …
 A good example of an interval scale is the Fahrenheit
scale for temperature.
 Equal differences on this scale represent equal
differences in temperature, but the scale is not a RATIO
Scale. Thus, a temperature of 30 degrees is not twice
as warm as that of 15 degrees.
46
The interval level of measurement ranks data,
and precise differences between units of measure
do exist; however, there is no meaningful zero
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Ratio Scales of Measurement
 The highest level of measurement
 This has the properties of an interval scale together
with a fixed origin or zero point.
 Examples of variables which are ratio scaled include
weights, lengths and times.
47
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Ratio Scales cont’d…
 Ratio scales permit the researcher to compare both
differences in scores and the relative magnitude of
scores.
– For instance the difference between 5 and 10
minutes is the same as that between 10 and 15
minutes, and 10 minutes is twice as long as 5
minutes.
48
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Ratio Scales cont’d…
 The ratio level of measurement possesses all the
characteristics of interval measurement, and there
exists a true zero. In addition, true ratio exist
between different units of measure.
49
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Summary
50
Variables
Qualitative/Categorical
Quantitative
Discrete Continuous
11/17/2018
Depending on the characteristic of the measurement, variable can
be:
Which cannot be
measured in
quantitative form.
That can be measured
and expressed
numerically.
Which takes whole/
integer numbers.
A measurement on a
continuous scale
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
11/17/2018 51
Based on the scales of measurement
Variables
Nominal Ordinal Interval Ratio
Only category and
no ranking
Category + ranking, (no
clear distance)
Ranking +clear distance between
category, but, no true Zero
If true zero exists
Summary cont’d…
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Summary table for the four scales of measurement
52
Power Scale characterstics
Highest Ratio Equal interval with absolute zero
Interval Equal interval without absolute zero
Ordinal Ordering
Lowest scale Nominal Naming
Power
increase
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Categorize the following variables into nominal,
ordinal, interval or ratio
 Gender
 Grade(A, B, C, D and F )
 Rating scale(poor, good, excelent)
 Eye colour
 Political affilation
 Religious affilation
 Ranking of tennis players
 Majour field
 Nationality
53
Height
Weight
Time
Age
IQ
Temprature
Salary
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
ASSIGNMENT 1
Exercise 1: Table 1.6 contains the characteristics of cases and controls
from a case-control study into stressful life events and breast cancer
in women (Protheroeet al.1999). Categorize the variables in the
table into nominal, ordinal, Interval or ration.
Exercise 2: Table 1.7 is from a cross-section study to determine the
incidence of pregnancy-related venous thromboembolic events and
their relationship to selected risk factors, such as maternal age,
parity, smoking, and so on (Lindqvistet al.1999). Categorize the
variables in the table into nominal, ordinal, Interval or ration.
Exercise 3: Table 1.8 is from a study to compare two lotions, Malathion
and d-phenothrin, in the treatment of head lice (Chosidowet
al.1994). In 193 schoolchildren, 95 children were given Malathion
and 98 d-phenothrin. Categorize the variables in the table into
nominal, ordinal, Interval or ration.
54
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 55
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 56
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 57
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
ASSIGNMENT 2
 Four migraine patients are asked to assess the severity
of their migraine pain one hour after the first
symptoms of an attack, by marking a point on a
horizontal line, 100 mm long. The line is marked ‘No
pain’, at the left-hand end, and ‘Worst possible pain’ at
the right-hand end. The distance of each patient’s mark
from the left-hand end is subsequently measured with
a mm ruler, and their scores are 25 mm, 44 mm, 68
mm and 85 mm. What sort of data is this? Can you
calculate the average pain of these four patients? Note
that this form of measurement (using a line and getting
subjects to mark it) is known as a visual analogue scale
(VAS).
58
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
 Response and Explanatory variables
 A variable can be also either response (dependant,
outcome) variables or explanatory (independent,
predictor) variables.
 Response (dependent, outcome) variables: are
variables which can be affected by explanatory
variable and it is the outcome of a study.
A variable you would be interested in predicting or
forecasting.
 Explanatory variables are any variables that explain
the response variable.
59
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
exercise 1:
In a study to determine whether surgery or
chemotherapy results in higher survival rates for a
certain type of cancer,
Which variable is the explanatory variable and which
one is the response variable?
60
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
• What is the importance of
variable classification?
61
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
• Source of Data?
62
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Primary source of data
It needs the involvement of the researcher
himself. Census and sample survey are sources of
primary types of data
Experiments is also another means of getting the
data needed to answer a question
63
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Source of Data…
secondary data.
The data needed to answer a question may already
exit in the form of published reports, commercially
available data banks, or the research literature.
In this case data were obtained from already collected
sources like newspaper, magazines, DHS, hospital
records and existing data like;
Mortality reports
Morbidity reports
Epidemic reports
Reports of laboratory utilization (including
laboratory test results)
64
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Data collection methods?
65
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Data collection methods
 Before any statistical work can be done data must be
collected.
 Data collection is a crucial stage in the planning and
implementation of a study.
 Data collection techniques allow us to systematically
collect data about our objectives of study and about
the setting in which they occur.
66
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Data collection methods…
 The methods of collecting data may be broadly
classified as:
Self-administered questionnaires
The use of documentary sources,
Observation
Interviews
Tape recording
Filming
Photography
Focus group discussion
67
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
The choice of methods of data collection is
based on:
♣ Types information to be collected from the
source.
♣ The accuracy of information they will yield
♣ Practical considerations, such as, the need
for personnel, time, equipment and other
facilities, in relation to what is available.
68
Data collection methods…
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Data collection methods…
 Method providing more satisfactory information will
often be a more expensive or inconvenient one.
♣ Therefore, accuracy must be balanced against
practical considerations (resources and other
practical limitations)
69
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
1) Observation
 Observation is a technique that involves
systematically selecting, watching and recording
behaviors of people or other phenomena and
aspects of the setting in which they occur, for the
purpose of getting (gaining) specific information.
 It includes all methods from simple visual
observations to the use of high level machines and
measurements, sophisticated equipment or facilities,
such as radiographic, biochemical, X-ray machines,
microscope, clinical examinations, and
microbiological examinations.
70
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Observation…
 Advantages: Gives relatively more accurate data on
behavior and activities
 Disadvantages: Investigators or observer’s own
biases, desires, and etc. and needs more resources
and skilled human power during the use of high level
machines.
71
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
2) The use of documentary sources
Clinical records and other personal records, death
certificates, published mortality statistics, census
publications, etc.
Advantages
 Documents can provide ready made information
relatively easily
 The best means of studying past events.
Disadvantages
 Problems of reliability and validity (because the
information is collected by a number of different
persons who may have used different definitions or
methods of obtaining data).
72
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
3. Interviewing
It involves oral questioning of respondents, either
individually or as a group
Answers can be recorded by writing them down or
by tape-recording the responses, or by a
combination of them.
Interviews can be conducted with varying degree of
flexibility (high degree of flexibility Vs low degree of
flexibility)
11/17/2018 73
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Interviewing cont’d…
A) High degree of flexibility /unstructured:
Usually used when the researcher has little
understanding of the problem
Is frequently applied in exploratory studies
11/17/2018 74
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Interviewing cont’d…
B) Low degree of flexibility / highly structured
interview.
Useful when the researcher is relatively
knowledgeable about expected answers or when
the number of respondents being interviewed is
relatively large
Questionnaires may be used with a fixed list of
questions in a standard sequence, which have mainly
fixed or pre-categorized answers
11/17/2018 75
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
 Ways of interviewing participants:
Face to face
Telephone
76
Interviewing cont’d…
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Interviews cont’d…
Face to face interviews:
A good interviewer can stimulate and maintain the
respondents interest of the frank answering of
questions.
If anxiety is aroused (e.g., why am I being asked these
questions?), the interviewer can allay it.
An interviewer can repeat questions which are not
understood, and give standardized explanations
where necessary.
An interviewer can make observations during the
interview; i.e., note is taken not only of what the
subject says but also how he says it.
77
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Interviews cont’d…
Telephone interviews
Telephone interviews can be a very effective and
economical way of collecting data for quantitative
research
May be useful when the respondents to be
interviewed are on wide geographical distribution
78
NB: The questionnaire should be fairly short and a prior
appointment may enhance the response rate and length of
interview
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
While interviewing, a precaution should be taken
not to influence the responses; the interviewer
should ask his questions in a neutral manner. He
should not show agreement, disagreement, or
surprise, and should record the respondent’s precise
answers without shifting or interpreting them.
79
Interviews cont’d…
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
4. Self-administered questionnaires
 Written questions are presented that are to be
answered by the respondents in written form.
 The respondent reads the questions and fills in the
answers by him/ herself (sometimes in the presence of
an interviewer who “stands by” to give assistance if
necessary.
 The use of self-administered questionnaires is simpler
and cheaper. It can be administered to many persons
simultaneously.
80
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Self-administered questionnaires cont’d ….
A written questionnaire can be administered in
different ways, such as by:
– Sending questionnaires by mail
– Gathering all or part of the respondents in one
place at one time, giving oral or written
instructions, and letting them fill out the
questionnaires
81
The main problems with postal questionnaire are
that response rates tend to be relatively low, and
that there may be under representation of less
literate subjects.
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Self -administered questionnaires cont’d…
Advantages
Is less expensive; permits anonymity & may result in
more honest responses; does not require research
assistants; eliminates bias due to phrasing questions
differently with different respondents
Disadvantages
Cannot be used with illiterates; there is often a low
rate of response; questions may be misunderstood
82
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Problems in gathering data?
83
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Problems in gathering data
Common problems might include:
 Language barriers
 Lack of adequate time
 Expense
 Inadequately trained and experienced staff
 Invasion of privacy
 Suspicion (mistrust)
 Bias (any systematic error)
 Cultural norms (e.g. which may preclude (prevent)
men interviewing women)
84
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Types of Questions
 Depending on how questions are asked and recorded we
can distinguish two major possibilities - Open –ended
questions, and closed questions.
Open-ended questions
Open-ended questions permit free responses that
should be recorded in the respondent’s own words. The
respondent is not given any possible answers to choose
from.
Such questions are useful to obtain information on:
 Facts with which the researcher is not very familiar,
 Opinions, attitudes, and suggestions of informants
85
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Open-ended questions…
For example
Can you describe exactly what the traditional birth
attendant did when your labor started?
What do you think the reasons for a high drop-out
rate of village health committee members?
What would you do if you noticed that your daughter
(school girl) had a relationship with a teacher?
86
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
 Closed Questions
Closed questions offer a list of possible options or
answers from which the respondents must choose.
When designing closed questions one should try to:
 Offer a list of options that are exhaustive and
mutually exclusive
 Closed questions are useful if the range of possible
responses are known.
87
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Closed Questions…
For example
What is your marital status?
1. Single
2. Married/living together
3. Separated
4. divorced
5. widowed
Have you ever gone to the local village health worker
for treatment?
1. Yes
2. No
88
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Requirements of questions
 Must have validity – that is the question that we design
should be one that give an obviously valid and relevant
measurement for the variable.
 Must be clear and unambiguous – the way in which
questions are worded can ‘make or break’ a questionnaire.
They must be phrased in language that it is believed the
respondent will understand, and that all respondents will
understand in the same way.
To ensure clarity, each question should contain only one
idea; ‘double-barrelled’ questions like:
‘Do you take your child to a doctor when he has a cold or
has diarrhea?’ are difficult to answer, and the answers are
difficult to interpret. 89
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Requirements of questions …
 Must not be offensive – whenever possible it is wise
to avoid questions that may offend the respondent,
for example, those which may seem to expose the
respondent’s ignorance, and those requiring him to
give a socially unacceptable answer.
 The questions should be fair - They should not be
loaded.
Short questions are generally regarded as preferable
to long ones.
90
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Requirements of questions …
 Sensitive questions - It may not be possible to avoid
asking ‘sensitive’ questions that may offend
respondents, In such situations the interviewer
(questioner) should do it very carefully and wisely
91
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Methods of data organization and presentation
 The data collected in a survey is called raw data. In most
cases, useful information is not immediately evident
from the mass of unsorted data.
 Collected data need to be organized in such a way as to
condense the information they contain in a way that will
show patterns of variation clearly.
92
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
1. Frequency Distributions
 Quite often, the presentation of data in a meaningful
way is done by preparing a frequency distribution. If
this is not done the raw data will not present any
meaning and any pattern in them, may not be
detected.
 Given a set of scores, constructing a frequency
distribution includes proportion(P)/ percentages.
93
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Frequency Distributions cont’d …
 Frequency distribution determines the number of
units (e.g., people) which fall into a series of specified
categories.
 The Frequency is the count of the number of times
that a particular combination occurred in a data set.
 The relative frequency is the frequency of the
event/value/category divided by the total number of
data points.
Frequency distribution can be grouped or ungrouped
94
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Ungrouped Frequency Distribution
 It uses to present categorical variable in simplified
and easily understandable way
 This frequency table can be constructed by listing all
possible categories of the variable and then counting
the number laying on each category of the variable
as a frequency.
95
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Example
The following data is about current age of women
and it was collected from 240 women ( data 1).
96
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Example: Consider the data collected on age at first
marriage of 240 women (data 1). One of the variable in
this dataset is religion followed by the women. Hence,
for such types of variable, we can use ungrouped
frequency distribution to summarize the data as follows:
97
religion frequency Relative frequency(%)
Orthodox 103 42.9
Muslim 33 13.8
Protestant 97 40.4
Others* 7 2.9
Total 240 100
*catholic, none religious
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Grouped Frequency Distribution
In order to present data using grouped frequency
distribution, it is not as simple as that of ungrouped. In
this case we need to compute some values. These
values are given below:
Number of class(K): The number of categories
the table will have
Number of class can be computed/ estimated using
Sturge’s rule as:
K = 1+3.322log(n)
Where:
K= number of class
n=sample size.
98
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Grouped Frequency cont’d…
• Then the width of each class, W, can be computed
as:
99
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Grouped Frequency cont’d…
Class limit: The range for each class/ The smallest
and largest values that can go into any class; they
can be either lower or upper class limits.
Lower class limit: Smallest observation of the
category
Upper class limit: Smallest observation plus
width of the class minus one.
100
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
 When forming classes, always make sure that each item
(measurement or observation) goes into one and only
one class, i.e. classes should be mutually exclusive
(namely, that successive classes have no values in
common).
 To this end we must make sure that the smallest and
largest values fall within the classification, that none of
the values can fall into possible gaps between successive
classes.
101
Grouped Frequency cont’d…
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Grouped Frequency cont’d…
 Note that: the Sturges rule should not be regarded
as final, but should:
Be considered as a guide only. The number of
classes specified by the rule should be increased
or decreased for convenient or clear presentation.
102
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
 Class Boundaries/True Limits: are those limits, which
are determined mathematically to make an interval of a
continuous variable continuous in both directions, and
no gap exists between classes. It is obtained by
subtracting and adding 0.5 from lower and upper class
limit respectively
 Lower class boundary
Upper class boundary
103
Grouped Frequency cont’d…
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
 Class mark/ Mid-point (Xc) of an interval: is the value
of the interval which lies mid-way between the lower
true limit (LTL) and the upper true limit (UTL) of a
class.
It is calculated as: The average of lower and upper
class limit.
104
Grouped Frequency cont’d…
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
NB: The constructed grouped frequency distribution
expected to be:
– Class intervals should be continuous (for
continuous data), non overlapping(mutually
exclusive) and exhaustive.
– Class intervals should generally be of the same
width
– Open indeed class intervals should be avoided.
These are classes like less then 10, greater than
65, and so on.
105
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Example for data 1
 The number of classes(k) can be computed using
Sturg's rule as:
 Therefore, the width W of each class can be
computed as:
 Thus the width of each class can be 4 and the lower
class limit for the first class will be the minimum
observation from the dataset. 106
Grouped Frequency cont’d…
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Example for data 1
 Thus, the grouped frequency distribution of current age of women can be
constructed as:
107
Class
limit
Class boundary Class mark Frequency RF(%) CF
15-18 14.5-18.5 16.5 15 6.25 15
19-22 18.5-22.5 20.5 49 20.41 64
23-26 22.5-26.5 24.5 51 21.25 115
27-30 26.5-30.5 28.5 40 16.67 155
31-34 30.5-34.5 32.5 21 8.75 176
35-38 34.5-38.5 36.5 22 9.17 198
39-42 38.5-42.5 40.5 18 7.50 216
43-46 42.5-46.5 44.5 15 6.25 231
47-50 46.5-50.5 48.5 9 3.75 240
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Example for data 1 cont’d …
Where RF and CF are relative frequency and cumulative
frequency respectively.
 Note that: the value to be added or subtracted on the
class limits to get class boundaries depends on the
decimal number of the dataset that we want to
summarize.
The width of a class is found from the true class limit by
subtracting the true lower limit from the upper true limit
of any particular class.
For example, the width of the above distribution is (let's
take the fourth class) ( w = 30.5 - 26.5 = 4).
108
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Statistical Tables
A statistical table is an orderly and systematic
presentation of data in rows and columns.
Rows : are horizontal arrangements.
Columns: are vertical arrangements.
109
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
 Based on the purpose for which the table is designed
and the complexity of the relationship, a table could
be either of simple frequency table or cross
tabulation.
Simple frequency table is used when the
individual observations involve only to a single
variable.
Cross tabulation is used to obtain the frequency
distribution of one variable by the subset of
another variables.
110
Statistical Tables cont’d…
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Statistical tables cont’d…
Construction of tables
There are no hard and fast rules to follow, the
following general principles should be addressed in
constructing tables.
Tables should be as simple as possible.
Tables should be self-explanatory:
 Title should be clear and to the point (a good
title answers: what? when? where? how
classified ?) and it should be placed above the
table.
 Each row and column should be labeled.
111
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Statistical tables cont’d …
 Numerical entities of zero should be explicitly
written rather than indicated by a dash. Dashed
are reserved for missing or unobserved data.
 If data are not original, their source should be
given in a footnote.
112
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Tables cont’d…
One-variable/ Simple frequency table
– Most basic table is a simple frequency distribution with one
variable
Example,
Fig 3. Blood group of voluntary blood donors examined in Red Cross Blood bank,
within a day, May 2006 (n=548)
Rows
Column
Title
11/17/2018 113
Table 1:
Eample 2: simple table cont’d...
Table 5. Clinical symptoms among 54 patients with S
Typhimurium-infection, Oslo, Norway, May 1998
Symptoms
n %
Diarrhoea 54 100
Fever 35 65
Headache 12 22
Joint pain 4 7
Muscle pain 4 7
Cases
11/17/2018 114
Table 2:
If two variables are cross tabulated, it is a two
variable table
If the tabulation is among three variables, it is
three variable table
In cross tabulated frequency distributions where
there are row and column totals, the decision for
the denominator is based on the variable of interest
to be compared over the subset of the other
variable.
Two and three variable table
11/17/2018 115
Table 1. Distribution of variable 1 by variable 2,
population X (n=58), place Y, period Z
Variable 2
Variable 1 Value 1 Value 2 Value 3 Total
Value 1 2 4 7 13
Value 2 3 5 3 11
Value 3 4 5 4 13
Value 4 5 6 2 13
Unkown 3 2 3 8
Total 17 22 19 58
Explanation of acronyms, units used, …
Two and three variable table cont’d…
11/17/2018 116
Table 3:
Two and three variable table cont’d...
Table 1. Cases of Salmonella
Typhimurium-infection by age-group and sex,
Herøy, Norw
ay, 1999
Age group Total
(years) Male Female
0 - 9 7 5 12
10 - 19 5 5 10
20 - 29 5 5 10
30 - 39 1 4 5
40 - 49 2 3 5
50 - 59 0 3 3
60 - 69 2 1 3
70 - 2 4 6
Total 24 30 54
Sex
11/17/2018 117
Table 1:
Two and three variable table cont’d...
Residence Age Male Female Total
Urban 15-24
25-34
35-44
34
48
65
76
56
54
110
104
119
Rural 15-24
25-34
35-44
56
78
46
58
53
47
114
131
93
Total 369 395 764
Distribution participants by age, sex and residency
11/17/2018 118
Common form of a two by two variable
It is a special form of table favorite among
epidemiologist
It is used to compare whether there is relationship
between the two variables
Exposure
Number of Total
Cases Controls
Exposed 23 23 46
Non exposed 4 139 143
Total 27 162 189
11/17/2018 119
Composite/ Higher Order Table
It is a large table combining several separate
variable/tables
Age, sex and other demographic variables may be
combined to form a single table
11/17/2018 120
– Example of composite table
Characteristics Number Percent
Marital status
Single
Married
Divorced/ widowed
50
20
4
67.6
27.0
5.4
Current Residence (n=73)
Within the PA
Within the PA (H. Post)
Within the nearest town
40
25
8
54.8
34.2
11.0
Residence of origin
Within the PA
Outside the PA
Outside the Woreda
4
24
46
5.4
32.4
62.2
Training TVETI
Axum
Makele
19
55
25.7
74.3
Totals 74 100
11/17/2018 121
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Graphical Presentation
Graphs are often easier to interpret than tables,
perhaps at the expense of detail.
A variety of graphs are used depending on the type
of data.
If we want to present categorical/qualitative or
quantitative discrete data/variable using graph, then
pie chart and bar chart are the appropriate ones,
however if the variable is numerical/quantitative
continuous data in nature, then we can use histogram,
frequency polygon, cumulative frequency curve, box
plot…
122
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Graphical Presentation cont’d…
There are, however, general rules that are commonly
accepted about construction of graphs.
Every graph should be self-explanatory and as
simple as possible.
Titles are usually placed below the graph and it
should answer again question like: what ? Where?
When? How classified?
Legends or keys should be used to differentiate
variables if more than one is shown.
123
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
The units in to which the scale is divided should
be clearly indicated.
The numerical scale representing frequency must
start at zero or a break in the line should be
shown.
124
Graphical Presentation cont’d…
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Examples of graphs:
Bar Chart
Bar diagrams are used to represent and compare the
frequency distribution of discrete variables and
attributes or categorical series. When we represent
data using bar diagram, all the bars must have equal
width and the distance between bars must be equal.
Each category of variable is represented by a bar
Variables are categorical, or treated as qualitative
It can be displayed as horizontal or vertical
125
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Types of bar charts
There are different types of bar diagrams:
A. Simple bar chart: It is a one-dimensional
diagram in which the bar represents the
whole of the magnitude.
The height or length of each bar indicates
the size (frequency) of the figure
represented.
– one variable
– It can be displayed as horizontal or vertical
11/17/2018 126
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Figure 1: immunization status of children in adami Tulu Wereda,
1995
Types of bar charts cont’d…
11/17/2018 127
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Type of bar chart cont’d ...
B. Grouped bar chart
– Data from 2-variable or more variable tables
– Distinct colours or shading is used to
differentiate
– Legend is necessary
11/17/2018 128
The meaning of
each bar is shown
in a legend
One cell
Cell separated
By a space
E.g. Grouped/ joined bar chart
Figure 2: TT immunization status by marital status of women 15-49 years,
Asendabo town, 1996.
11/17/2018 129
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
C. Stacked bar chart
– It is used to show the same data as a grouped bar
chart using a single bar
– Different groups are differentiated by different
segments within a single bar
– You are able to see the overall change easier, but
changes between groups may be difficult than
grouped bars
Type of bar chart cont’d ...
11/17/2018 130
Figure 1. Cases of S Typhimurium-infection
by age-group and sex, Herøy, Norway, 1999
0 2 4 6 8 10 12 14
0 - 9
10 - 19
20 - 29
30 - 39
40 - 49
50 - 59
60 - 69
70 -
Age-group
Number of cases
Male
Female
Eg Stacked bar chart(absolute value)
Figure 3: cases of S. Typhimurium-infection by age group and
sex, Heroy, Norway, 1999
11/17/2018 131
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
D. 100% component bar chart
– It is a variant of stacked bar chart , where bars are
pulled to 100% rather than their real values;
– It is helpful for comparing the contribution of
different subgroups within the categories of the
main variable
Type of bar chart cont’d ...
11/17/2018 132
Eg 100% Component bar chart
Figure 4. Cases of S Typhimurium-infection by age-group and sex, Herøy, Norway,
1999
0 %
20 %
40 %
60 %
80 %
100 %
0 - 9 10 - 19 20 - 29 30 - 39 40 - 49 50 - 59 60 - 69 70 -
Age-group
Male Female
Proportional distribution by sex
11/17/2018 133
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Pie -Charts;
 It is a circle divided into sectors so that the areas of the
sectors are proportional to the frequencies.
 It is split into segments to show percentages or the
relative contributions of categories of data.
 It is a good method of representation if you wish to
compare a part of group with the whole group.
 The number of categories should not be too much.
134
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
e.g. Pie chart
Fig.5. Distribution of religion of participants from Kunama ethnic group
among Eritrean Refugees in Shimelba Camp, July 2006
11/17/2018 135
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Quantitative continuous data
Histograms: is the graph of the frequency distribution
of continuous measurement variables.
It is constructed on the basis of the following
principles:
The horizontal axis is a continuous scale running
from one extreme end of the distribution to the
other. It should be labeled with the name of the
variable and the units of measurement.
136
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Histograms cont’d …
For each class in the distribution a vertical rectangle
is drawn with
Its base on the horizontal axis extending from
one class boundary of the class to the other class
boundary, there will never be any gap between
the histogram rectangles.
The bases of all rectangles will be determined by
the width of the class intervals.
137
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Histograms cont’d …
Area of each column is proportional to the number of
observations in that interval
In constructing
– Use equal class intervals
– Do not use scale breaks
It could show second variable by shading
11/17/2018 138
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Figure 6: Age distribution of women in a reproductive age group
included in a study of violence against women in Butajira, 1984.
11/17/2018 139
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Frequency polygon
 If we join the midpoints of the tops of the adjacent
rectangles of the histogram with line segments a
frequency polygon is obtained.
Note: it is not essential to draw histogram in order
to obtain frequency polygon. It can be drawn with
out erecting rectangles of histogram as follows:
140
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Frequency polygon cont’d…
 The scale should be marked in the numerical values of
the midpoints of intervals
 Erect ordinates on the midpoints of the interval - the
length or altitude of an ordinate representing the
frequency of the class on whose mid-point it is erected.
 Join the tops of the ordinates and extend the connecting
lines to the scale of sizes.
141
11/17/2018
Construction of a frequency polygon from a histogram
15 cases
14
13 1 case patient
12 1 case staff member
11
10
9
8
7
6
5
4
3
2
1
0
00- 06- 12- 18- 00- 06- 12- 18- 00- 06- 12- 18- 00- 06- 12- 18- 00-
27 August 28 August 29 August 30 August
Date and time of onset
11/17/2018 142
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 143
Mid point/ class mark
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Ogive or cumulative frequency curve:
 To construct an Ogive curve:
Compute the cumulative frequency of the
distribution.
 Prepare a graph with the cumulative frequency on
the vertical axis and the true upper class limits (class
boundaries) of the interval scaled along the X-axis
(horizontal axis).
The true lower limit of the lowest class interval with
lowest scores is included in the X-axis scale; this is
also the true upper limit of the next lower interval
having a cumulative frequency of 0.
144
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 145
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Summarizing Data
 The first step in looking at data is to describe the
data at hand in some concise way.
 One type of measure useful for summarizing data
defines the center, or middle, of the sample.
146
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Measures of Central Tendency/ Measures of Location
 Measures of central Tendency: the various methods of
determining the actual value at which the data tend to
concentrate. Hence, measures of central Tendency is a
value which tends to sum up or describe the mass of the
data.
 These central tendency includes:
Mean ,
Median and
Mode .
147
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Arithmetic Mean/simple Mean ( )
Definition: the arithmetic mean is the sum of all
observations divided by the number of observations. it
is usually denoted by
 Let us consider X1, X2, ..., XN are the list of N
measurements obtained from N subjects. Then the
mean for ungrouped number of measurements for N
subjects is defined as:
148
X
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
The mean for Grouped data can be computed as
follows:
 Where: k=the number of classes
Xci=class mark for the ith class and
fi=frequency of the ith class
149
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
properties of Mean
Individual extreme values (also known as 'outliers')
can distort its ability to represent the typical value of
a variable (which is The main weakness of the
mean.)
It is unique for the given set of data
The value of the arithmetic mean is determined by
every item in the series.
The sum of the deviations about it is zero.
150
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Example 1
 Consider the data on birth weight of 10 new born
children in kilo gram at university of Gondar hospital:
2.51, 3.01, 3.25, 2.02,1.98, 2.33, 2.33, 2.98, 2.88,
2.43.
Then the average birth weight can be computed as:
151
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
 Compute mean for the grouped frequency
distribution given bellow:
The grouped frequency distribution for current
age of women
152
Example 2
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 153
Class
limit
Class boundary Class mark Frequency RF(%) CF
15-18 14.5-18.5 16.5 15 6.25 15
19-22 18.5-22.5 20.5 49 20.41 64
23-26 22.5-26.5 24.5 51 21.25 115
27-30 26.5-30.5 28.5 40 16.67 155
31-34 30.5-34.5 32.5 21 8.75 176
35-38 34.5-38.5 36.5 22 9.17 198
39-42 38.5-42.5 40.5 18 7.50 216
43-46 42.5-46.5 44.5 15 6.25 231
47-50 46.5-50.5 48.5 9 3.75 240
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Example 2 cont’d…
 Where as: fi = frequency distribution of ith class
Xc = is the mid-point
n = total sample size
154
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Median
 An alternative measure of central location, perhaps
second in popularity to the arithmetic mean.
 Suppose there are n observations in a sample. If these
observations are ordered from smallest to largest,
then the median is defined as follows:
The median, is a value such that at least half of the
observations are less than or equal to median and
at least half of the observations are greater than or
equal to median .
The median is the midpoint of the data array.
155
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Median cont’d …
 To find the median of a data set:
Arrange the data in ascending order.
Find the middle observation of this ordered
data.
156
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Median cont’d…
 If the number of data is ODD, then the median is the
middle data point:
Median =
 If the number of data is EVEN, then the median is the
average of the two values around the middle.
Median =
157
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
• Extreme values do NOT affect the median, making
the median a good alternative to the mean to
measure central tendencies when such values occur.
158
Median cont’d…
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Example:
 Consider the data on the weight of 10 new born
children at university of Gondar hospital within a
month:
2.51, 3.01, 3.25, 2.02,1.98, 2.33, 2.33, 2.98, 2.88, 2.43.
– Find median for the data.
159
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
 First arrange the data in to ascending order as:
1.98, 2.02, 2.33, 2.33, 2.43,2.51, 2.88, 2.98, 3.01,
3.25.
 As 10 is even we need to take the middle two
observations and the median will be the average of
this two middle observations.
160
Example cont’d…
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Median cont’d…
Median for grouped data:
 The median for grouped data is defined by:
Where as:
LCB= lower class boundary of the median class
Fc= cumulative frequency just before the median
class
fc=frequency of the median class
W =class width and n=number of observations. 161
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Example median for grouped data 1
 Consider the example on age of women we presented
using frequency distribution bellow. Compute median
for grouped data?
 To compute median for grouped data, we need first
find the median class. In this example half of the
observation is 120.
 Let us see the distribution with the cumulative
frequency:
162
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 163
Example median for grouped data 1 cont’d…
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
 As we can see from the distribution, the class which
contains 120 observation for the first time is the class
with cumulative frequency 155 as 120 is under 155. So,
the median class is the 4th class
164
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Mode
 Mode is the value appearing most frequently
 It can be obtained by counting the number of appearance for
each observation from the list.
 Important for summarising nominal/categorical types of data
 disadvantage,
 In small number of observations, there may be no mode.
 In addition, sometimes, there may be more than one mode
such as when dealing with a bimodal (two-peaks)
distribution.
 Example
a. 22, 66, 69, 70, 73. (no modal value)
b. 1.8, 3.0, 3.3, 2.8, 2.9, 3.6, 3.0, 1.9, 3.2, 3.5 (modal
value = 3.0 kg) 165
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 166
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
NB: The mode for grouped data is modal class. Modal
class is the class with the largest frequency.
167
Mode cont’d…
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Skewness:
 If extremely low or extremely high observations are present
in a distribution, then the mean tends to shift towards those
scores.
 Based on the type of skewness, distributions can be:
 Symmetrical distribution: when data values are
evenly distributed on both sides of the three
measures of central tendency (Mean, Median and
Mode).
 It is neither positively nor negatively skewed. A curve
is symmetrical if one half of the curve is the mirror
image of the other half.
 If the distribution is symmetric and has only one
mode, all three measures are the same, an example
being the normal distribution. 168
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 169
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Positively skewed distribution: Occurs when the
majority of scores are at the left end of the curve
and a few extreme large scores are scattered at
the right end.
 For positively skewed distributions (where the
upper, or left tail of the distribution is longer
(“fatter”) than the lower, or right tail) the
measures are ordered as follows:
Mode < median < mean.
170
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 171
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Negatively skewed distribution: occurs when
majority of scores are at the right end of the curve
and a few small scores are scattered at the left
end.
For negatively skewed distributions (where the
right tail of the distribution is longer than the
left tail), the reverse ordering occurs:
Mean < median < mode.
172
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 173
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Measures of Dispersion/ Variation
 Measures of dispersion or variability will give us
information about the spread of the scores in our
distribution.
 Without knowing something about how the data is
dispersed, measures of central tendency may be
misleading.
 Most common measures of dispersion includes
Range,
Inter-quartile range,
Variance,
Standard deviation and
Coefficient of variation. 174
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
 Consider the following three datasets
Dataset 1:7, 7, 7, 7, 7, 7 Mean=7, s.d=0
Dataset 2: 6, 7, 7, 7, 7, 8, mean=7, s.d=0.63
Dataset 3: 3, 2, 7, 8, 9, 13, mean=7, s.d=4.04
175
Measures of Dispersion/ Variation
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Measures of Dispersion cont’d…
 RANGE: It is the difference between the largest and
smallest observation from the data
EXAMPLE: Consider the data on the weight (in Kg) of
10 new born children at university of Gondar hospital
within a month:
2.51, 3.01, 3.25, 2.02,1.98, 2.33, 2.33, 2.98, 2.88,
2.43.
176
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Then the range for the dataset can be computed by
first arranging all observation in to ascending order
as:
1.98, 2.02, 2.33, 2.33, 2.43, 2.51, 2.88, 2.98, 3.01, 3.25.
Range = Maximum-Minimum=3.25-1.98=1.27
 It is based upon two extreme cases in the entire
distribution, the range may be considerably changed if
either of the extreme cases happens to drop out, while
the removal of any other case would not affect it at all.
 It wastes information , it takes no account of the entire
data. 177
Measures of Dispersion cont’d…
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
 The extremes values may be unreliable; that is, they
are the most likely to be faulty
 Not suitable with regard to the mathematical
treatment required in driving the techniques of
statistical inference.
178
Measures of Dispersion cont’d…
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Quantiles
The Pth percentile is the value Vp such that P percent
of the sample points are less than or equal to Vp.
The median, being the 50th percentile, is a special case
of a quantile.
As was the case for the median, a different definition is
needed for the pth percentile, depending on whether
np/100 is an integer or not.
179
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
The pth percentile is defined by:
1. (k+1)th largest sample point if np/100 is not an
integer (where k is the largest integer less than
np/100)
2. The average of the (np/100)th and (np/100 + 1)th
larges observation if np/100 is an integer.
180
Quantiles cont’d …
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Quintiles cont’d …
Example 1: Compute the 10th and 90th percentile for
the birth weight data below.
Suppose the sample consists of birth weights (in
grams) of all live born infants born at a private
hospital in a city, during a 1-week period.
3265, 3323, 2581, 2759, 3260, 3649,2841
3248, 3245, 3200, 3609, 3314, 3484, 3031
2838, 3101, 4146, 2069, 3541, 2834
181
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Quintiles cont’d …
By sorting the data from the smallest to highest
2069 2581 2759 2834 2838 2841 3031 3101 3200
3245 3248 3260 3265 3314 3323 3484 3541 3609
3649 4146
Solution: Since 20×0.1=2 and 20×0.9=18 are integers,
the 10th and 90th percentiles are defined by:
182
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
10th percentile = the average of the 2nd and 3rd
values = (2581+2759)/2 = 2670 g
90th percentile=the average of the 18th and 19th
values = (3609+3649)/2 = 3629 grams.
183
Quintiles cont’d …
We would estimate that 80 percent of birth weights
would fall between 2670 g and 3629 g, which gives us
an overall feel for the spread of the distribution.
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Quintiles cont’d …
 Quartiles: are other quantiles which divide the
distribution into four equal parts. The second
quartile is the median.
 The interquartile range (IQR): is the difference
between the first and the third quartiles.
 To compute it, we first sort the data, in ascending
order, then find the data values corresponding to the
first quarter of the numbers (first quartile), and then
the third quartile.
184
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Quintiles cont’d …
example 2:
Given the following data set (age of patients) find the
interquartile range!
18,59,24,42,21,23,24,32
1. sort the data from lowest to highest
18 21 23 24 24 32 42 59
2. Find the bottom and the top quarters of the data
3. Find the difference (interquartile range) between
the two quartiles.
185
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Quintiles cont’d …
 1st quartile = The {(n+1)/4}th observation = (2.25) th
observation = 21 + (23-21)x 0.25 = 21.5
 3rd quartile = {3/4 (n+1)}th observation = (6.75)th
observation = 32 + (42-32)x 0.75 = 39.5
Hence, IQR = 39.5 - 21.5 = 18
 The interquartile range is a preferable measure to the
range. Because it is less prone to distortion by a single
large or small value. That is, outliers in the data do not
affect the inerquartile range. Also, it can be computed
when the distribution has open-end classes.
186
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Box and Whisker plot
 Box plots summarize data using a five-number :
 The 25th (first quartile), the median(second quartile), and
75th percentiles(third quartile), and the minimum and
maximum observed values that are not statistically
outlying.
 The heavy black line inside each box marks the 50th
percentile, or median, of the group distribution.
 The lower and upper hinges, or box boundaries, mark
the 25th (Q1) and 75th (Q2) percentiles respectively.
 Whiskers appear above and below the hinges. Whiskers
are vertical lines ending in horizontal lines at the largest
and smallest observed values that are not statistical
outliers.
187
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Box and Whisker plot cont’d…
 Outliers are identified with an O. Yield one (1) is,
labeled 1O and, Yield 2 labeled as 17 O and Yield 3,
labeled as 58O.
 The label 1,3,17 and 58 refers to the row number in
the Data Editor where that observation is found.
 Extreme values are marked with an asterisk (*). In
this case the extreme labeled as *3 in the first yield
indicated.
188
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 189
grooup three
group two
group one
differen groups having defferent training
30,00
25,00
20,00
15,00
10,00
s
c
o
r
e
t
o
s
u
r
v
e
y
q
u
e
s
t
i
o
n
58
17
1
3
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Information obtained from a box and whisker
plot
– If the median is near the center of the box, the
distribution is approximately symmetric,
– If the median falls to the bottom of the center of
the box, the distribution is positively skewed.
– If the median falls to the top of the center, the
distribution is negatively skewed.
190
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
– If the whiskers are about the same length, the
distribution is approximately symmetric,
– If the top whisker is longer than the bottom
whisker, the distribution is positively skewed.
– If the bottom whisker is longer than the top
whisker, the distribution is negatively skewed.
191
Information obtained from a box and whisker
plot cont’d …
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
outlier
An outlier is an observation that lies an abnormal
distance from other values in a random sample from a
population.
Before abnormal observations can be singled out, it is
necessary to characterize normal observations.
Two activities are essential for characterizing a set of
data:
Examination of the overall shape of the graphed data
for important features, including symmetry and
departures from assumptions.
Examination of the data for unusual observations that
are far from the mass of data. These points are often
referred to as outliers
192
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
outlier cont’d…
The following quantities (called fences) are needed for
identifying outliers extreme values in the tails of the
distribution:
lower inner fence: Q1 - 1.5*IQ
upper inner fence: Q3 + 1.5*IQ
lower outer fence: Q1 - 3*IQ
upper outer fence: Q3 + 3*IQ
Where as: Q1 = 1st quartile
Q3 = 3rd quartile
IQ = interquartile range
A point beyond an inner fence on either side is considered
a outlier. A point beyond an outer fence is considered an
extreme outlier.
193
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Standard Deviation and Variance
 Variance:
While the inter-quartile range eliminates the
problem of outliers it creates another problem in
that you are eliminating half of your data.
The solution to both problems is to measure
variability from the center of the distribution.
Variance measure how far on average scores
deviate or differ from the mean.
194
Variance is the average of the square of the
distance each value is from the mean
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 195
2
2 1
( )
N
i
i
x
N
µ
σ −
−
=
∑
Mathematically the formula for population
variance is defined as:
2
2 1
( )
1
n
i
i
x x
s
n
−
−
=
−
∑
The mathemetical formula for sample variance is
defined as:
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Short cut formula for the sample variance
196
2
2
2
( )
=
1
x
x
n
s
n
Σ
Σ −
−
Variance cont’d…
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
 The sample and population standard deviations are
denoted by S and σ (by convention) respectively.
 The standard deviation, S.D., is just the positive square
root of the variance.
 It expresses exactly the same information as the variance,
but re-scaled to be in the same units as the mean.
 Mathematically: Population standard deviation
197
2
1
( )
N
i
i
x
N
µ
σ −
−
=
∑
Standard Deviation
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Standard Deviation cont’d…
 Sample standard deviation can be defined as:
 Example1 The Areas of sprayable surfaces with DDT
from a sample of 15 houses are measured as follows (in
m2) :
101, 105, 110, 114, 115, 124, 125, 125, 130, 133, 135,
136, 137, 140, 145
198
2
1
( )
1
n
i
i
x
s
n
x
−
=
−
=
−
∑
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Example 1 cont’d …
 Find the variance and standard deviation of the
above distribution.
 Solutions
The mean of the sample is 125 m2.
Variance (sample) = s2 = Σ(xi –x)2/n-1 = {(101-125)2
+(105-125) 2 + ….(145-125) 2 } / (15-1)
= 2502/14
= 178.71 m4
Hence, the standard deviation
=
= 13.37 m2
199
178.71
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Variance for grouped frequency distribution
 In a grouped frequency distribution, the variance is
computed as:
 Where as
fi =frequency of ith class
Xci =class mark of ith class
n = total number of the sample
200
2 2
2 1 1
( ) ( )
( 1)
i i
k k
i c i c
i i
n f x f x
s
n n
= =
−
=
−
∑ ∑
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Example of Variance for grouped frequency
distribution
 Consider the following data of time spend by college
students for leisure activities. Compute standard
deviation.
201
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 202
2 2
1 1
( ) ( )
=
( 1)
i i
k k
i c i c
i i
n f x f x
s
n n
= =
−
−
∑ ∑
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Coefficient of variance
 The standard deviation is an absolute measure of deviation of
observations around their mean and is expressed with the same
unit of the data.
 Due to this nature of the standard deviation it is not directly
used for comparison purposes with respect to variability.
 Coefficient of variation, is often used for this purpose
 The coefficient of variation (CV) is defined by:
CV =
 The coefficient of variation is most useful in comparing the
variability of several different samples, each with different
means. 203
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
Coefficient of variance cont’d…
 CV is a relative measure free from unit of measurement.
example
204
Weights of newborn
elephants (kg)
929 853
878 939
895 972
937 841
801 826
Weights of newborn
mice (kg)
0.72 0.42
0.63 0.31
0.59 0.38
0.79 0.96
1.06 0.89
n=10, = 887.1
s = 56.50
CV = 0.0637
X
n=10, = 0.68
s = 0.255
CV = 0.375
X
Mice show
greater birth-
weight variation
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
When to use coefficient of variance
 When comparison groups have very different means
(CV is suitable as it expresses the standard deviation
relative to its corresponding mean)
 When different units of measurements are involved,
e.g. group 1 unit is mm, and group 2 unit is gm (CV is
suitable for comparison as it is unit-free)
 In such cases, standard deviation should not be used
for comparison
205
11/17/2018
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 206
11/17/2018

More Related Content

Similar to MPH Biostatistics Course on Data Presentation

Factors That Impacted Effective Diabetes Management Within...
Factors That Impacted Effective Diabetes Management Within...Factors That Impacted Effective Diabetes Management Within...
Factors That Impacted Effective Diabetes Management Within...Susan Tullis
 
Role of Biostatistician and Biostatistical Programming in Epidemiological Stu...
Role of Biostatistician and Biostatistical Programming in Epidemiological Stu...Role of Biostatistician and Biostatistical Programming in Epidemiological Stu...
Role of Biostatistician and Biostatistical Programming in Epidemiological Stu...PEPGRA Healthcare
 
Bio-Statistics newer advances, Scope & Challenges in Bio-Medical Research
Bio-Statistics newer advances, Scope & Challenges in Bio-Medical ResearchBio-Statistics newer advances, Scope & Challenges in Bio-Medical Research
Bio-Statistics newer advances, Scope & Challenges in Bio-Medical Researchkomalicarol
 
Introduction to biostatistics
Introduction to biostatisticsIntroduction to biostatistics
Introduction to biostatisticsshivamdixit57
 
16The Roles of Diversity in Research and Practices.docx
16The Roles of Diversity in Research and Practices.docx16The Roles of Diversity in Research and Practices.docx
16The Roles of Diversity in Research and Practices.docxdurantheseldine
 
introductoin to Biostatistics ( 1st and 2nd lec ).ppt
introductoin to Biostatistics ( 1st and 2nd lec ).pptintroductoin to Biostatistics ( 1st and 2nd lec ).ppt
introductoin to Biostatistics ( 1st and 2nd lec ).pptKvkExambranch
 
introductoin to Biostatistics ( 1st and 2nd lec ).ppt
introductoin to Biostatistics ( 1st and 2nd lec ).pptintroductoin to Biostatistics ( 1st and 2nd lec ).ppt
introductoin to Biostatistics ( 1st and 2nd lec ).pptPriyankaSharma89719
 
Effective strategies to monitor clinical risks using biostatistics - Pubrica.pdf
Effective strategies to monitor clinical risks using biostatistics - Pubrica.pdfEffective strategies to monitor clinical risks using biostatistics - Pubrica.pdf
Effective strategies to monitor clinical risks using biostatistics - Pubrica.pdfPubrica
 
Biostatistics is a critical subject in current health data research – pubrica
Biostatistics is a critical subject in current health data research – pubricaBiostatistics is a critical subject in current health data research – pubrica
Biostatistics is a critical subject in current health data research – pubricaPubrica
 
Biostatistics is a critical subject in current health data research – pubrica
Biostatistics is a critical subject in current health data research – pubricaBiostatistics is a critical subject in current health data research – pubrica
Biostatistics is a critical subject in current health data research – pubricaPubrica
 
An Assignment On Advanced Biostatistics
An Assignment On Advanced BiostatisticsAn Assignment On Advanced Biostatistics
An Assignment On Advanced BiostatisticsAmy Roman
 
Discussions on the growth and future of biostatistics
Discussions on the growth and future of biostatisticsDiscussions on the growth and future of biostatistics
Discussions on the growth and future of biostatisticsharamaya university
 
Application of Biostatistics
Application of BiostatisticsApplication of Biostatistics
Application of BiostatisticsJippy Jack
 
1 The Outcomes of Neural Stem Cell Transplantation and .docx
1  The Outcomes of Neural Stem Cell Transplantation and .docx1  The Outcomes of Neural Stem Cell Transplantation and .docx
1 The Outcomes of Neural Stem Cell Transplantation and .docxaulasnilda
 
1 The Outcomes of Neural Stem Cell Transplantation and .docx
1  The Outcomes of Neural Stem Cell Transplantation and .docx1  The Outcomes of Neural Stem Cell Transplantation and .docx
1 The Outcomes of Neural Stem Cell Transplantation and .docxjeremylockett77
 
To prepare for this Assignment· Review the article, Sources of
To prepare for this Assignment· Review the article, Sources ofTo prepare for this Assignment· Review the article, Sources of
To prepare for this Assignment· Review the article, Sources ofTakishaPeck109
 
Qualitative Research-Grounded Theory
Qualitative Research-Grounded TheoryQualitative Research-Grounded Theory
Qualitative Research-Grounded TheoryTina Jordan
 

Similar to MPH Biostatistics Course on Data Presentation (20)

Factors That Impacted Effective Diabetes Management Within...
Factors That Impacted Effective Diabetes Management Within...Factors That Impacted Effective Diabetes Management Within...
Factors That Impacted Effective Diabetes Management Within...
 
Role of Biostatistician and Biostatistical Programming in Epidemiological Stu...
Role of Biostatistician and Biostatistical Programming in Epidemiological Stu...Role of Biostatistician and Biostatistical Programming in Epidemiological Stu...
Role of Biostatistician and Biostatistical Programming in Epidemiological Stu...
 
Statistic note
Statistic noteStatistic note
Statistic note
 
Bio-Statistics newer advances, Scope & Challenges in Bio-Medical Research
Bio-Statistics newer advances, Scope & Challenges in Bio-Medical ResearchBio-Statistics newer advances, Scope & Challenges in Bio-Medical Research
Bio-Statistics newer advances, Scope & Challenges in Bio-Medical Research
 
Introduction to biostatistics
Introduction to biostatisticsIntroduction to biostatistics
Introduction to biostatistics
 
16The Roles of Diversity in Research and Practices.docx
16The Roles of Diversity in Research and Practices.docx16The Roles of Diversity in Research and Practices.docx
16The Roles of Diversity in Research and Practices.docx
 
introductoin to Biostatistics ( 1st and 2nd lec ).ppt
introductoin to Biostatistics ( 1st and 2nd lec ).pptintroductoin to Biostatistics ( 1st and 2nd lec ).ppt
introductoin to Biostatistics ( 1st and 2nd lec ).ppt
 
introductoin to Biostatistics ( 1st and 2nd lec ).ppt
introductoin to Biostatistics ( 1st and 2nd lec ).pptintroductoin to Biostatistics ( 1st and 2nd lec ).ppt
introductoin to Biostatistics ( 1st and 2nd lec ).ppt
 
Effective strategies to monitor clinical risks using biostatistics - Pubrica.pdf
Effective strategies to monitor clinical risks using biostatistics - Pubrica.pdfEffective strategies to monitor clinical risks using biostatistics - Pubrica.pdf
Effective strategies to monitor clinical risks using biostatistics - Pubrica.pdf
 
Naf ppt
Naf pptNaf ppt
Naf ppt
 
Biostatistics is a critical subject in current health data research – pubrica
Biostatistics is a critical subject in current health data research – pubricaBiostatistics is a critical subject in current health data research – pubrica
Biostatistics is a critical subject in current health data research – pubrica
 
Biostatistics is a critical subject in current health data research – pubrica
Biostatistics is a critical subject in current health data research – pubricaBiostatistics is a critical subject in current health data research – pubrica
Biostatistics is a critical subject in current health data research – pubrica
 
An Assignment On Advanced Biostatistics
An Assignment On Advanced BiostatisticsAn Assignment On Advanced Biostatistics
An Assignment On Advanced Biostatistics
 
Discussions on the growth and future of biostatistics
Discussions on the growth and future of biostatisticsDiscussions on the growth and future of biostatistics
Discussions on the growth and future of biostatistics
 
Application of Biostatistics
Application of BiostatisticsApplication of Biostatistics
Application of Biostatistics
 
1 The Outcomes of Neural Stem Cell Transplantation and .docx
1  The Outcomes of Neural Stem Cell Transplantation and .docx1  The Outcomes of Neural Stem Cell Transplantation and .docx
1 The Outcomes of Neural Stem Cell Transplantation and .docx
 
1 The Outcomes of Neural Stem Cell Transplantation and .docx
1  The Outcomes of Neural Stem Cell Transplantation and .docx1  The Outcomes of Neural Stem Cell Transplantation and .docx
1 The Outcomes of Neural Stem Cell Transplantation and .docx
 
Status of Statistics
Status of StatisticsStatus of Statistics
Status of Statistics
 
To prepare for this Assignment· Review the article, Sources of
To prepare for this Assignment· Review the article, Sources ofTo prepare for this Assignment· Review the article, Sources of
To prepare for this Assignment· Review the article, Sources of
 
Qualitative Research-Grounded Theory
Qualitative Research-Grounded TheoryQualitative Research-Grounded Theory
Qualitative Research-Grounded Theory
 

More from MohammedKasim29

More from MohammedKasim29 (6)

CVSD -Nurse.ppt
CVSD -Nurse.pptCVSD -Nurse.ppt
CVSD -Nurse.ppt
 
pallative care .pptx
pallative care .pptxpallative care .pptx
pallative care .pptx
 
Chapter 2.pptx
Chapter 2.pptxChapter 2.pptx
Chapter 2.pptx
 
PPt of NNM.pptx
PPt of NNM.pptxPPt of NNM.pptx
PPt of NNM.pptx
 
1introductionofmsnconceptofhealth-181004172918 (1).pdf
1introductionofmsnconceptofhealth-181004172918 (1).pdf1introductionofmsnconceptofhealth-181004172918 (1).pdf
1introductionofmsnconceptofhealth-181004172918 (1).pdf
 
Final defense.pptx
Final defense.pptxFinal defense.pptx
Final defense.pptx
 

Recently uploaded

Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls JaipurCall Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipurparulsinha
 
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiRussian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiAlinaDevecerski
 
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.MiadAlsulami
 
Call Girls Yelahanka Bangalore 📲 9907093804 💞 Full Night Enjoy
Call Girls Yelahanka Bangalore 📲 9907093804 💞 Full Night EnjoyCall Girls Yelahanka Bangalore 📲 9907093804 💞 Full Night Enjoy
Call Girls Yelahanka Bangalore 📲 9907093804 💞 Full Night Enjoynarwatsonia7
 
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...Taniya Sharma
 
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...narwatsonia7
 
Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...
Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...
Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...Miss joya
 
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...astropune
 
Call Girl Number in Panvel Mumbai📲 9833363713 💞 Full Night Enjoy
Call Girl Number in Panvel Mumbai📲 9833363713 💞 Full Night EnjoyCall Girl Number in Panvel Mumbai📲 9833363713 💞 Full Night Enjoy
Call Girl Number in Panvel Mumbai📲 9833363713 💞 Full Night Enjoybabeytanya
 
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...CALL GIRLS
 
VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...
VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...
VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...Miss joya
 
Aspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliAspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliRewAs ALI
 
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort ServicePremium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Servicevidya singh
 
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...narwatsonia7
 
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escortsvidya singh
 
CALL ON ➥9907093804 🔝 Call Girls Baramati ( Pune) Girls Service
CALL ON ➥9907093804 🔝 Call Girls Baramati ( Pune)  Girls ServiceCALL ON ➥9907093804 🔝 Call Girls Baramati ( Pune)  Girls Service
CALL ON ➥9907093804 🔝 Call Girls Baramati ( Pune) Girls ServiceMiss joya
 
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...Call girls in Ahmedabad High profile
 
Call Girl Number in Vashi Mumbai📲 9833363713 💞 Full Night Enjoy
Call Girl Number in Vashi Mumbai📲 9833363713 💞 Full Night EnjoyCall Girl Number in Vashi Mumbai📲 9833363713 💞 Full Night Enjoy
Call Girl Number in Vashi Mumbai📲 9833363713 💞 Full Night Enjoybabeytanya
 
Call Girl Coimbatore Prisha☎️ 8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prisha☎️  8250192130 Independent Escort Service CoimbatoreCall Girl Coimbatore Prisha☎️  8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prisha☎️ 8250192130 Independent Escort Service Coimbatorenarwatsonia7
 
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...Garima Khatri
 

Recently uploaded (20)

Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls JaipurCall Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
 
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiRussian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
 
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
 
Call Girls Yelahanka Bangalore 📲 9907093804 💞 Full Night Enjoy
Call Girls Yelahanka Bangalore 📲 9907093804 💞 Full Night EnjoyCall Girls Yelahanka Bangalore 📲 9907093804 💞 Full Night Enjoy
Call Girls Yelahanka Bangalore 📲 9907093804 💞 Full Night Enjoy
 
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
 
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...Bangalore Call Girls Hebbal Kempapura Number 7001035870  Meetin With Bangalor...
Bangalore Call Girls Hebbal Kempapura Number 7001035870 Meetin With Bangalor...
 
Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...
Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...
Call Girls Service Pune Vaishnavi 9907093804 Short 1500 Night 6000 Best call ...
 
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
 
Call Girl Number in Panvel Mumbai📲 9833363713 💞 Full Night Enjoy
Call Girl Number in Panvel Mumbai📲 9833363713 💞 Full Night EnjoyCall Girl Number in Panvel Mumbai📲 9833363713 💞 Full Night Enjoy
Call Girl Number in Panvel Mumbai📲 9833363713 💞 Full Night Enjoy
 
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
 
VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...
VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...
VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...
 
Aspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliAspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas Ali
 
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort ServicePremium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
 
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
 
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
 
CALL ON ➥9907093804 🔝 Call Girls Baramati ( Pune) Girls Service
CALL ON ➥9907093804 🔝 Call Girls Baramati ( Pune)  Girls ServiceCALL ON ➥9907093804 🔝 Call Girls Baramati ( Pune)  Girls Service
CALL ON ➥9907093804 🔝 Call Girls Baramati ( Pune) Girls Service
 
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
 
Call Girl Number in Vashi Mumbai📲 9833363713 💞 Full Night Enjoy
Call Girl Number in Vashi Mumbai📲 9833363713 💞 Full Night EnjoyCall Girl Number in Vashi Mumbai📲 9833363713 💞 Full Night Enjoy
Call Girl Number in Vashi Mumbai📲 9833363713 💞 Full Night Enjoy
 
Call Girl Coimbatore Prisha☎️ 8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prisha☎️  8250192130 Independent Escort Service CoimbatoreCall Girl Coimbatore Prisha☎️  8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prisha☎️ 8250192130 Independent Escort Service Coimbatore
 
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
 

MPH Biostatistics Course on Data Presentation

  • 1. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Basic Biostatistics for MPH students 11/17/2018 1 Arsi University, College of Health Science, Department of Public Health
  • 2. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Course content Topics Facilitator 1. Introduction 2. Methods of data collection and presentation 3. Summery measures Mr. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Email: terek7@gmail.com 4. Probability and probability distributions 5. Sampling methods and sample size determination 5. Statistical inference 2 11/17/2018
  • 3. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Course description This course covers both descriptive and some intermediate inferential level statistics for public health. The descriptive statistics deals with frequency distribution, measures of central tendency and variability; probability and probability distributions; sampling and sample size determination; statistical estimation and sampling distributions and hypothesis testing. 11/17/2018 3
  • 4. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Learning Objectives:  At the end of the course we will be able to: – Discuss the role of statistics in health science and explain the main uses of statistical methods in the broader field of health care; – Describe methods of collection, recording, and present data in the form of tables, graphs etc; – Calculate measures of central tendency and dispersion – Apply different sample size determination and sampling techniques – Explain the context and meaning of statistical estimation and hypothesis testing. 11/17/2018 4
  • 5. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Evaluation Evaluation criteria Percent Assignments 40% Final exam 60% 11/17/2018 5 NB: Grading will be as per the grading scale of the university registrar
  • 6. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Chapter one: Introduction to Biostatistics Objectives of the chapter  After completing this chapter, we will be able to: – Define Statistics and Biostatistics – Enumerate the importance and limitations of statistics – Define and Identify the different types of variable and list why we need to classify variables 6 11/17/2018
  • 7. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Objectives cont’d… – Identify the different methods of medical and biological data organization and presentation – Identify the criterion for the selection of a method to organize and present data – Discuss data summarization methods 7 11/17/2018
  • 8. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Statistics? 8 11/17/2018
  • 9. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Statistics  The science of assembling and interpreting numerical data (Bland, 2000)  The discipline concerned with the treatment of numerical data derived from groups of individuals (Armitage et al., 2001).  Generally the term statistics is used to mean either statistical data or statistical methods. 9 11/17/2018
  • 10. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Statistics cont’d… Statistical data: refers to numerical descriptions of things. These descriptions may take the form of counts or measurements. E.g. statistics of malaria cases include fever cases, number of positives obtained, sex and age distribution of positive cases, etc. 10 11/17/2018
  • 11. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Statistics cont’d…  NB: Even though statistical data always denote figures (numerical descriptions), it must be remembered that all 'numerical descriptions' are not statistical data. Why? 11 11/17/2018
  • 12. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Statistics cont’d…  Statistical methods: refers methods that are used for collecting, organising, analyzing and interpreting numerical data for understanding a phenomenon or making wise decisions. In this sense it is a branch of scientific method and helps us to know in a better way the objective under study. 12 11/17/2018
  • 13. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Biostatistics? 13 11/17/2018
  • 14. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)  Biostatistics: The tools of statistics are employed in many fields - business, education, psychology, agriculture, and economics, to mention only few.  When the data being analyzed are derived from the public health data, biological sciences and medicine, we use the term biostatistics to distinguish this particular application of statistical tools and concepts. 14 11/17/2018
  • 15. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) –Types of biostatistics? 15 11/17/2018
  • 16. Types of biostatistics collection organizing summarizing presenting of data Descriptive Statistics making inferences hypothesis testing determining relationship making the prediction Inferential Statistics Biostatistics 16 11/17/2018
  • 17. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Types of Biostatistics 1. Descriptive (exploratory) statistics: is the aspect of collecting, organization, presentation and summarization of data. These include techniques for tabular and graphical presentation of data as well as the methods used to summarize a body of data with one or two meaningful figures E.g. At our health centre, 50 patients were diagnosed with angina last year. 17 11/17/2018
  • 18. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Descriptive statistics cont’d …  Some statistical summaries which are especially common in descriptive analyses are: Measures of central tendency Measures of dispersion Cross-tabulation /contingency table Histogram Quantile, Q-Q plot Scatter plot Box plot 18 11/17/2018
  • 19. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 2. Inferential Statistics:  Consists of generalizing from samples to population, performing hypothesis testing, determining relation among variables, and making prediction.  This branch of statistics deals with techniques of making conclusions about population.  The inferences are drawn from particular properties of sample to particular properties of population. Inferential statistics builds upon descriptive statistics. 19 11/17/2018
  • 20. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Inferential Statistics cont’d... NB: They encompasses a variety of procedures to ensure that the inferences are sound and rational, even though they may not always be correct. 20 11/17/2018
  • 21. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Statistical inference cont’d…  In short, inferential statistics enables us to make confident decisions in the face of uncertainty. E.g. Antibiotics reduce the duration of viral throat infections by 1-2 days. Five percent of women aged 30-49 consult their GP each year with heavy menstrual bleeding. 21 11/17/2018
  • 22. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Summery Descriptive statistical methods – Provide summary indices for a given data, e.g. arithmetic mean, median, standard deviation, coefficient of variation, etc. Inductive (inferential) statistical methods – Produce statistical inferences about a population based on information from a sample derived from the population, need to take variation into account – Estimating population values from sample values 22 sample Population 11/17/2018
  • 23. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Summery cont’d … • E.g. At our health centre, 50 patients were diagnosed with angina last year. (descriptive ) Antibiotics reduce the duration of viral throat infections by 1-2 days. (inferential) Five percent of women aged 30-49 consult their GP each year with heavy menstrual bleeding. (inferential) 23 11/17/2018
  • 24. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) • Why we need biostatistics? 24 11/17/2018
  • 25. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Why we need biostatistics?  Main reason: handling variations: o Biological variation –Among individuals as well as within same individual over time »Example: height, weight, blood pressure, eye color ... o Sample variation: Biomedical research projects are usually carried out on small numbers of study subjects 25 11/17/2018
  • 26. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Why need to learn biostatistics? Cont’d....  Essential for scientific method of investigation – Formulate hypothesis – Design study to objectively test hypothesis – Collect reliable and unbiased data – Process and evaluate data rigorously – Interpret and draw appropriate conclusions 26 11/17/2018
  • 27. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Why need to learn biostatistics? Cont’d....  Essential for understanding, appraisal and critique of scientific literature  Public health and medicine are becoming increasingly quantitative. 27 11/17/2018
  • 28. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) limitations of statistics:  It deals with only those subjects of inquiry that are capable of being quantitatively measured and numerically expressed.  It deals on aggregates of facts and no importance is attached to individual items – suited only if the group characteristics are desired to be studied.  Statistical data are only approximation and not mathematically correct. 28 11/17/2018
  • 29. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) variables  Variable: A variable is a characteristic under study that assumes different values for different elements. or it is a characteristic or attribute that can assume different value. Some examples of variables include:  Diastolic blood pressure,  heart rate, height,  The weight and  Stage of bladder cancer to list some 29 11/17/2018
  • 30. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) variables cont’d…  Random variable: are varibles whose value are determined by chance.  Data: the measurements or observatuions (values) for a variable  Data set: it is a collection of observation on a variable. 30 11/17/2018
  • 31. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) variables cont’d… 31 variables Data Data set Values Many Mrs. brown Mr. Patel Mr. Amanda Age 32 24 20 Sex Female Male Male Blood type O O A 11/17/2018
  • 32. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Types of variables  Depending on the characteristic of the measurement, variable can be: Qualitative(Categorical) variable A variable or characteristic which cannot be measured in quantitative form. But, can only be identified by name or categories, or variable that can be placed into distinct categories, according to some characteristic or attribute.  For instance place of birth, ethnic group, type of drug, stages of breast cancer (I, II, III, or IV), degree of pain (low, moderate, sever or unbearable). 32 11/17/2018
  • 33. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Types of variables cont’d… • The categories should be clear cut (not overlapping) and cover all the possibilities. For example, sex (male or female), disease stage (depends on disease), ever smoked (yes or no). 33 11/17/2018
  • 34. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Types of variables cont’d… Quantitative (Numerical) variable:  Is one that can be measured and expressed numerically.  They can be of two types Discrete Data The values of a discrete variable are usually whole numbers, such as the number of episodes of diarrhoea in the first five years of life. Observations can only take certain numerical values Numerical discrete data occur when the observations are integers that correspond with a count of some sort. 34 11/17/2018
  • 35. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Discrete Data cont’d…  Some common examples are:  The number of bacteria colonies on a plate,  The number of cells within a prescribed area upon microscopic examination,  The number of heart beats within a specified time interval,  A mother’s history of numbers of births ( parity) and pregnancies (gravidity),  The number of episodes of illness a patient experiences during some time period, etc. 35 11/17/2018
  • 36. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Continuous Data A continuous variable is a measurement on a continuous scale Each observation theoretically falls somewhere along a continuum. One is not restricted, in principle, to particular values such as the integers of the discrete scale. most clinical measurements, such as:  Blood pressure,  Serum cholesterol level,  Height, weight, age etc. are on a numerical continuous scale. 36 11/17/2018
  • 37. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Continuous Data cont’d… Continuous data are used to report a measurement of the individual that can take on any value within an acceptable range. 37 Data Qualitative Quantitative Discrete Continuous 11/17/2018
  • 38. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Scales of measurement Data comes in various sizes and shapes and it is important to know about these so that the proper analysis can be used on the data. There are four at which we measure: Nominal scales of measurement It may be thought of as "naming" level. This level of measurement do not put subjects in any particular order. There is no logical basis for saying one category is higher or less than the other category. In research activities a YES/NO scale is nominal. 38 11/17/2018
  • 39. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Nominal scales of cont’d… The simplest data consist of unordered, dichotomous, or "either ------- or" types of observations, i.e., either the patient lives or the patient dies, either he has some particular attribute or he does not.  Examples are: Blood group, Gender, religious affiliation 39 11/17/2018
  • 40. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Nominal scales cont’d…  The nominal level of measurement classifies data into mutually exclusive (non over lapping), exhaustive categories in which no order or ranking can be imposed on the data 40 11/17/2018
  • 41. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Ordinal Scales of Measurement An ordinal scale is next up the list in terms of power of measurement. The simplest ordinal scale is a ranking. At this level we put subjects in order from lowest to height. It is important to know that ranks do not tell us by how much subjects differ. There is no objective distance between any two points on your subjective scale. Hence, an ordinal scale only lets you interpret gross order and not the relative positional distances. 41 11/17/2018
  • 42. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Ordinal Scales cont’d…  E.g. If we told that third year students have better knowledge than first year student, then we do not know by how much they are better. To measure the amount of the difference between subjects we need the next level of measurement. 42 11/17/2018
  • 43. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Ordinal Scales cont’d… Some of the examples under this scales of measurement includes: • Academic status, job satisfaction index, employment status, response to treatment (none, slow, moderate, fast) • like art scale: 1. strongly agree 2. agree 3. no opinion 4. disagree 5. strongly disagree 43 11/17/2018
  • 44. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Ordinal Scales cont’d…  The ordinal level of measurement classifies data into categories that can be ranked; however, precise differences between the ranks do not exist. 44 11/17/2018
  • 45. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Interval Scales of Measurement  It is more powerful than nominal and ordinal as it not only orders or categories but also shows exact distances in between.  On interval measurement scales, one unit on the scale represents the same magnitude on the trait or characteristic being measured across the whole range of the scale.  They do not have a "true" zero point, however, and therefore it is not possible to make statements about how many times higher one score is than another. 45 11/17/2018
  • 46. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Interval Scales cont’d …  A good example of an interval scale is the Fahrenheit scale for temperature.  Equal differences on this scale represent equal differences in temperature, but the scale is not a RATIO Scale. Thus, a temperature of 30 degrees is not twice as warm as that of 15 degrees. 46 The interval level of measurement ranks data, and precise differences between units of measure do exist; however, there is no meaningful zero 11/17/2018
  • 47. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Ratio Scales of Measurement  The highest level of measurement  This has the properties of an interval scale together with a fixed origin or zero point.  Examples of variables which are ratio scaled include weights, lengths and times. 47 11/17/2018
  • 48. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Ratio Scales cont’d…  Ratio scales permit the researcher to compare both differences in scores and the relative magnitude of scores. – For instance the difference between 5 and 10 minutes is the same as that between 10 and 15 minutes, and 10 minutes is twice as long as 5 minutes. 48 11/17/2018
  • 49. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Ratio Scales cont’d…  The ratio level of measurement possesses all the characteristics of interval measurement, and there exists a true zero. In addition, true ratio exist between different units of measure. 49 11/17/2018
  • 50. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Summary 50 Variables Qualitative/Categorical Quantitative Discrete Continuous 11/17/2018 Depending on the characteristic of the measurement, variable can be: Which cannot be measured in quantitative form. That can be measured and expressed numerically. Which takes whole/ integer numbers. A measurement on a continuous scale
  • 51. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 11/17/2018 51 Based on the scales of measurement Variables Nominal Ordinal Interval Ratio Only category and no ranking Category + ranking, (no clear distance) Ranking +clear distance between category, but, no true Zero If true zero exists Summary cont’d…
  • 52. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Summary table for the four scales of measurement 52 Power Scale characterstics Highest Ratio Equal interval with absolute zero Interval Equal interval without absolute zero Ordinal Ordering Lowest scale Nominal Naming Power increase 11/17/2018
  • 53. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Categorize the following variables into nominal, ordinal, interval or ratio  Gender  Grade(A, B, C, D and F )  Rating scale(poor, good, excelent)  Eye colour  Political affilation  Religious affilation  Ranking of tennis players  Majour field  Nationality 53 Height Weight Time Age IQ Temprature Salary 11/17/2018
  • 54. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) ASSIGNMENT 1 Exercise 1: Table 1.6 contains the characteristics of cases and controls from a case-control study into stressful life events and breast cancer in women (Protheroeet al.1999). Categorize the variables in the table into nominal, ordinal, Interval or ration. Exercise 2: Table 1.7 is from a cross-section study to determine the incidence of pregnancy-related venous thromboembolic events and their relationship to selected risk factors, such as maternal age, parity, smoking, and so on (Lindqvistet al.1999). Categorize the variables in the table into nominal, ordinal, Interval or ration. Exercise 3: Table 1.8 is from a study to compare two lotions, Malathion and d-phenothrin, in the treatment of head lice (Chosidowet al.1994). In 193 schoolchildren, 95 children were given Malathion and 98 d-phenothrin. Categorize the variables in the table into nominal, ordinal, Interval or ration. 54 11/17/2018
  • 55. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 55 11/17/2018
  • 56. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 56 11/17/2018
  • 57. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 57 11/17/2018
  • 58. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) ASSIGNMENT 2  Four migraine patients are asked to assess the severity of their migraine pain one hour after the first symptoms of an attack, by marking a point on a horizontal line, 100 mm long. The line is marked ‘No pain’, at the left-hand end, and ‘Worst possible pain’ at the right-hand end. The distance of each patient’s mark from the left-hand end is subsequently measured with a mm ruler, and their scores are 25 mm, 44 mm, 68 mm and 85 mm. What sort of data is this? Can you calculate the average pain of these four patients? Note that this form of measurement (using a line and getting subjects to mark it) is known as a visual analogue scale (VAS). 58 11/17/2018
  • 59. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)  Response and Explanatory variables  A variable can be also either response (dependant, outcome) variables or explanatory (independent, predictor) variables.  Response (dependent, outcome) variables: are variables which can be affected by explanatory variable and it is the outcome of a study. A variable you would be interested in predicting or forecasting.  Explanatory variables are any variables that explain the response variable. 59 11/17/2018
  • 60. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) exercise 1: In a study to determine whether surgery or chemotherapy results in higher survival rates for a certain type of cancer, Which variable is the explanatory variable and which one is the response variable? 60 11/17/2018
  • 61. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) • What is the importance of variable classification? 61 11/17/2018
  • 62. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) • Source of Data? 62 11/17/2018
  • 63. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Primary source of data It needs the involvement of the researcher himself. Census and sample survey are sources of primary types of data Experiments is also another means of getting the data needed to answer a question 63 11/17/2018
  • 64. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Source of Data… secondary data. The data needed to answer a question may already exit in the form of published reports, commercially available data banks, or the research literature. In this case data were obtained from already collected sources like newspaper, magazines, DHS, hospital records and existing data like; Mortality reports Morbidity reports Epidemic reports Reports of laboratory utilization (including laboratory test results) 64 11/17/2018
  • 65. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Data collection methods? 65 11/17/2018
  • 66. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Data collection methods  Before any statistical work can be done data must be collected.  Data collection is a crucial stage in the planning and implementation of a study.  Data collection techniques allow us to systematically collect data about our objectives of study and about the setting in which they occur. 66 11/17/2018
  • 67. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Data collection methods…  The methods of collecting data may be broadly classified as: Self-administered questionnaires The use of documentary sources, Observation Interviews Tape recording Filming Photography Focus group discussion 67 11/17/2018
  • 68. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) The choice of methods of data collection is based on: ♣ Types information to be collected from the source. ♣ The accuracy of information they will yield ♣ Practical considerations, such as, the need for personnel, time, equipment and other facilities, in relation to what is available. 68 Data collection methods… 11/17/2018
  • 69. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Data collection methods…  Method providing more satisfactory information will often be a more expensive or inconvenient one. ♣ Therefore, accuracy must be balanced against practical considerations (resources and other practical limitations) 69 11/17/2018
  • 70. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 1) Observation  Observation is a technique that involves systematically selecting, watching and recording behaviors of people or other phenomena and aspects of the setting in which they occur, for the purpose of getting (gaining) specific information.  It includes all methods from simple visual observations to the use of high level machines and measurements, sophisticated equipment or facilities, such as radiographic, biochemical, X-ray machines, microscope, clinical examinations, and microbiological examinations. 70 11/17/2018
  • 71. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Observation…  Advantages: Gives relatively more accurate data on behavior and activities  Disadvantages: Investigators or observer’s own biases, desires, and etc. and needs more resources and skilled human power during the use of high level machines. 71 11/17/2018
  • 72. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 2) The use of documentary sources Clinical records and other personal records, death certificates, published mortality statistics, census publications, etc. Advantages  Documents can provide ready made information relatively easily  The best means of studying past events. Disadvantages  Problems of reliability and validity (because the information is collected by a number of different persons who may have used different definitions or methods of obtaining data). 72 11/17/2018
  • 73. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 3. Interviewing It involves oral questioning of respondents, either individually or as a group Answers can be recorded by writing them down or by tape-recording the responses, or by a combination of them. Interviews can be conducted with varying degree of flexibility (high degree of flexibility Vs low degree of flexibility) 11/17/2018 73
  • 74. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Interviewing cont’d… A) High degree of flexibility /unstructured: Usually used when the researcher has little understanding of the problem Is frequently applied in exploratory studies 11/17/2018 74
  • 75. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Interviewing cont’d… B) Low degree of flexibility / highly structured interview. Useful when the researcher is relatively knowledgeable about expected answers or when the number of respondents being interviewed is relatively large Questionnaires may be used with a fixed list of questions in a standard sequence, which have mainly fixed or pre-categorized answers 11/17/2018 75
  • 76. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)  Ways of interviewing participants: Face to face Telephone 76 Interviewing cont’d… 11/17/2018
  • 77. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Interviews cont’d… Face to face interviews: A good interviewer can stimulate and maintain the respondents interest of the frank answering of questions. If anxiety is aroused (e.g., why am I being asked these questions?), the interviewer can allay it. An interviewer can repeat questions which are not understood, and give standardized explanations where necessary. An interviewer can make observations during the interview; i.e., note is taken not only of what the subject says but also how he says it. 77 11/17/2018
  • 78. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Interviews cont’d… Telephone interviews Telephone interviews can be a very effective and economical way of collecting data for quantitative research May be useful when the respondents to be interviewed are on wide geographical distribution 78 NB: The questionnaire should be fairly short and a prior appointment may enhance the response rate and length of interview 11/17/2018
  • 79. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) While interviewing, a precaution should be taken not to influence the responses; the interviewer should ask his questions in a neutral manner. He should not show agreement, disagreement, or surprise, and should record the respondent’s precise answers without shifting or interpreting them. 79 Interviews cont’d… 11/17/2018
  • 80. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 4. Self-administered questionnaires  Written questions are presented that are to be answered by the respondents in written form.  The respondent reads the questions and fills in the answers by him/ herself (sometimes in the presence of an interviewer who “stands by” to give assistance if necessary.  The use of self-administered questionnaires is simpler and cheaper. It can be administered to many persons simultaneously. 80 11/17/2018
  • 81. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Self-administered questionnaires cont’d …. A written questionnaire can be administered in different ways, such as by: – Sending questionnaires by mail – Gathering all or part of the respondents in one place at one time, giving oral or written instructions, and letting them fill out the questionnaires 81 The main problems with postal questionnaire are that response rates tend to be relatively low, and that there may be under representation of less literate subjects. 11/17/2018
  • 82. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Self -administered questionnaires cont’d… Advantages Is less expensive; permits anonymity & may result in more honest responses; does not require research assistants; eliminates bias due to phrasing questions differently with different respondents Disadvantages Cannot be used with illiterates; there is often a low rate of response; questions may be misunderstood 82 11/17/2018
  • 83. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Problems in gathering data? 83 11/17/2018
  • 84. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Problems in gathering data Common problems might include:  Language barriers  Lack of adequate time  Expense  Inadequately trained and experienced staff  Invasion of privacy  Suspicion (mistrust)  Bias (any systematic error)  Cultural norms (e.g. which may preclude (prevent) men interviewing women) 84 11/17/2018
  • 85. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Types of Questions  Depending on how questions are asked and recorded we can distinguish two major possibilities - Open –ended questions, and closed questions. Open-ended questions Open-ended questions permit free responses that should be recorded in the respondent’s own words. The respondent is not given any possible answers to choose from. Such questions are useful to obtain information on:  Facts with which the researcher is not very familiar,  Opinions, attitudes, and suggestions of informants 85 11/17/2018
  • 86. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Open-ended questions… For example Can you describe exactly what the traditional birth attendant did when your labor started? What do you think the reasons for a high drop-out rate of village health committee members? What would you do if you noticed that your daughter (school girl) had a relationship with a teacher? 86 11/17/2018
  • 87. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)  Closed Questions Closed questions offer a list of possible options or answers from which the respondents must choose. When designing closed questions one should try to:  Offer a list of options that are exhaustive and mutually exclusive  Closed questions are useful if the range of possible responses are known. 87 11/17/2018
  • 88. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Closed Questions… For example What is your marital status? 1. Single 2. Married/living together 3. Separated 4. divorced 5. widowed Have you ever gone to the local village health worker for treatment? 1. Yes 2. No 88 11/17/2018
  • 89. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Requirements of questions  Must have validity – that is the question that we design should be one that give an obviously valid and relevant measurement for the variable.  Must be clear and unambiguous – the way in which questions are worded can ‘make or break’ a questionnaire. They must be phrased in language that it is believed the respondent will understand, and that all respondents will understand in the same way. To ensure clarity, each question should contain only one idea; ‘double-barrelled’ questions like: ‘Do you take your child to a doctor when he has a cold or has diarrhea?’ are difficult to answer, and the answers are difficult to interpret. 89 11/17/2018
  • 90. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Requirements of questions …  Must not be offensive – whenever possible it is wise to avoid questions that may offend the respondent, for example, those which may seem to expose the respondent’s ignorance, and those requiring him to give a socially unacceptable answer.  The questions should be fair - They should not be loaded. Short questions are generally regarded as preferable to long ones. 90 11/17/2018
  • 91. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Requirements of questions …  Sensitive questions - It may not be possible to avoid asking ‘sensitive’ questions that may offend respondents, In such situations the interviewer (questioner) should do it very carefully and wisely 91 11/17/2018
  • 92. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Methods of data organization and presentation  The data collected in a survey is called raw data. In most cases, useful information is not immediately evident from the mass of unsorted data.  Collected data need to be organized in such a way as to condense the information they contain in a way that will show patterns of variation clearly. 92 11/17/2018
  • 93. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 1. Frequency Distributions  Quite often, the presentation of data in a meaningful way is done by preparing a frequency distribution. If this is not done the raw data will not present any meaning and any pattern in them, may not be detected.  Given a set of scores, constructing a frequency distribution includes proportion(P)/ percentages. 93 11/17/2018
  • 94. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Frequency Distributions cont’d …  Frequency distribution determines the number of units (e.g., people) which fall into a series of specified categories.  The Frequency is the count of the number of times that a particular combination occurred in a data set.  The relative frequency is the frequency of the event/value/category divided by the total number of data points. Frequency distribution can be grouped or ungrouped 94 11/17/2018
  • 95. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Ungrouped Frequency Distribution  It uses to present categorical variable in simplified and easily understandable way  This frequency table can be constructed by listing all possible categories of the variable and then counting the number laying on each category of the variable as a frequency. 95 11/17/2018
  • 96. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Example The following data is about current age of women and it was collected from 240 women ( data 1). 96 11/17/2018
  • 97. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Example: Consider the data collected on age at first marriage of 240 women (data 1). One of the variable in this dataset is religion followed by the women. Hence, for such types of variable, we can use ungrouped frequency distribution to summarize the data as follows: 97 religion frequency Relative frequency(%) Orthodox 103 42.9 Muslim 33 13.8 Protestant 97 40.4 Others* 7 2.9 Total 240 100 *catholic, none religious 11/17/2018
  • 98. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Grouped Frequency Distribution In order to present data using grouped frequency distribution, it is not as simple as that of ungrouped. In this case we need to compute some values. These values are given below: Number of class(K): The number of categories the table will have Number of class can be computed/ estimated using Sturge’s rule as: K = 1+3.322log(n) Where: K= number of class n=sample size. 98 11/17/2018
  • 99. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Grouped Frequency cont’d… • Then the width of each class, W, can be computed as: 99 11/17/2018
  • 100. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Grouped Frequency cont’d… Class limit: The range for each class/ The smallest and largest values that can go into any class; they can be either lower or upper class limits. Lower class limit: Smallest observation of the category Upper class limit: Smallest observation plus width of the class minus one. 100 11/17/2018
  • 101. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)  When forming classes, always make sure that each item (measurement or observation) goes into one and only one class, i.e. classes should be mutually exclusive (namely, that successive classes have no values in common).  To this end we must make sure that the smallest and largest values fall within the classification, that none of the values can fall into possible gaps between successive classes. 101 Grouped Frequency cont’d… 11/17/2018
  • 102. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Grouped Frequency cont’d…  Note that: the Sturges rule should not be regarded as final, but should: Be considered as a guide only. The number of classes specified by the rule should be increased or decreased for convenient or clear presentation. 102 11/17/2018
  • 103. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)  Class Boundaries/True Limits: are those limits, which are determined mathematically to make an interval of a continuous variable continuous in both directions, and no gap exists between classes. It is obtained by subtracting and adding 0.5 from lower and upper class limit respectively  Lower class boundary Upper class boundary 103 Grouped Frequency cont’d… 11/17/2018
  • 104. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)  Class mark/ Mid-point (Xc) of an interval: is the value of the interval which lies mid-way between the lower true limit (LTL) and the upper true limit (UTL) of a class. It is calculated as: The average of lower and upper class limit. 104 Grouped Frequency cont’d… 11/17/2018
  • 105. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) NB: The constructed grouped frequency distribution expected to be: – Class intervals should be continuous (for continuous data), non overlapping(mutually exclusive) and exhaustive. – Class intervals should generally be of the same width – Open indeed class intervals should be avoided. These are classes like less then 10, greater than 65, and so on. 105 11/17/2018
  • 106. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Example for data 1  The number of classes(k) can be computed using Sturg's rule as:  Therefore, the width W of each class can be computed as:  Thus the width of each class can be 4 and the lower class limit for the first class will be the minimum observation from the dataset. 106 Grouped Frequency cont’d… 11/17/2018
  • 107. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Example for data 1  Thus, the grouped frequency distribution of current age of women can be constructed as: 107 Class limit Class boundary Class mark Frequency RF(%) CF 15-18 14.5-18.5 16.5 15 6.25 15 19-22 18.5-22.5 20.5 49 20.41 64 23-26 22.5-26.5 24.5 51 21.25 115 27-30 26.5-30.5 28.5 40 16.67 155 31-34 30.5-34.5 32.5 21 8.75 176 35-38 34.5-38.5 36.5 22 9.17 198 39-42 38.5-42.5 40.5 18 7.50 216 43-46 42.5-46.5 44.5 15 6.25 231 47-50 46.5-50.5 48.5 9 3.75 240 11/17/2018
  • 108. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Example for data 1 cont’d … Where RF and CF are relative frequency and cumulative frequency respectively.  Note that: the value to be added or subtracted on the class limits to get class boundaries depends on the decimal number of the dataset that we want to summarize. The width of a class is found from the true class limit by subtracting the true lower limit from the upper true limit of any particular class. For example, the width of the above distribution is (let's take the fourth class) ( w = 30.5 - 26.5 = 4). 108 11/17/2018
  • 109. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Statistical Tables A statistical table is an orderly and systematic presentation of data in rows and columns. Rows : are horizontal arrangements. Columns: are vertical arrangements. 109 11/17/2018
  • 110. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)  Based on the purpose for which the table is designed and the complexity of the relationship, a table could be either of simple frequency table or cross tabulation. Simple frequency table is used when the individual observations involve only to a single variable. Cross tabulation is used to obtain the frequency distribution of one variable by the subset of another variables. 110 Statistical Tables cont’d… 11/17/2018
  • 111. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Statistical tables cont’d… Construction of tables There are no hard and fast rules to follow, the following general principles should be addressed in constructing tables. Tables should be as simple as possible. Tables should be self-explanatory:  Title should be clear and to the point (a good title answers: what? when? where? how classified ?) and it should be placed above the table.  Each row and column should be labeled. 111 11/17/2018
  • 112. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Statistical tables cont’d …  Numerical entities of zero should be explicitly written rather than indicated by a dash. Dashed are reserved for missing or unobserved data.  If data are not original, their source should be given in a footnote. 112 11/17/2018
  • 113. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Tables cont’d… One-variable/ Simple frequency table – Most basic table is a simple frequency distribution with one variable Example, Fig 3. Blood group of voluntary blood donors examined in Red Cross Blood bank, within a day, May 2006 (n=548) Rows Column Title 11/17/2018 113 Table 1:
  • 114. Eample 2: simple table cont’d... Table 5. Clinical symptoms among 54 patients with S Typhimurium-infection, Oslo, Norway, May 1998 Symptoms n % Diarrhoea 54 100 Fever 35 65 Headache 12 22 Joint pain 4 7 Muscle pain 4 7 Cases 11/17/2018 114 Table 2:
  • 115. If two variables are cross tabulated, it is a two variable table If the tabulation is among three variables, it is three variable table In cross tabulated frequency distributions where there are row and column totals, the decision for the denominator is based on the variable of interest to be compared over the subset of the other variable. Two and three variable table 11/17/2018 115
  • 116. Table 1. Distribution of variable 1 by variable 2, population X (n=58), place Y, period Z Variable 2 Variable 1 Value 1 Value 2 Value 3 Total Value 1 2 4 7 13 Value 2 3 5 3 11 Value 3 4 5 4 13 Value 4 5 6 2 13 Unkown 3 2 3 8 Total 17 22 19 58 Explanation of acronyms, units used, … Two and three variable table cont’d… 11/17/2018 116 Table 3:
  • 117. Two and three variable table cont’d... Table 1. Cases of Salmonella Typhimurium-infection by age-group and sex, Herøy, Norw ay, 1999 Age group Total (years) Male Female 0 - 9 7 5 12 10 - 19 5 5 10 20 - 29 5 5 10 30 - 39 1 4 5 40 - 49 2 3 5 50 - 59 0 3 3 60 - 69 2 1 3 70 - 2 4 6 Total 24 30 54 Sex 11/17/2018 117 Table 1:
  • 118. Two and three variable table cont’d... Residence Age Male Female Total Urban 15-24 25-34 35-44 34 48 65 76 56 54 110 104 119 Rural 15-24 25-34 35-44 56 78 46 58 53 47 114 131 93 Total 369 395 764 Distribution participants by age, sex and residency 11/17/2018 118
  • 119. Common form of a two by two variable It is a special form of table favorite among epidemiologist It is used to compare whether there is relationship between the two variables Exposure Number of Total Cases Controls Exposed 23 23 46 Non exposed 4 139 143 Total 27 162 189 11/17/2018 119
  • 120. Composite/ Higher Order Table It is a large table combining several separate variable/tables Age, sex and other demographic variables may be combined to form a single table 11/17/2018 120
  • 121. – Example of composite table Characteristics Number Percent Marital status Single Married Divorced/ widowed 50 20 4 67.6 27.0 5.4 Current Residence (n=73) Within the PA Within the PA (H. Post) Within the nearest town 40 25 8 54.8 34.2 11.0 Residence of origin Within the PA Outside the PA Outside the Woreda 4 24 46 5.4 32.4 62.2 Training TVETI Axum Makele 19 55 25.7 74.3 Totals 74 100 11/17/2018 121
  • 122. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Graphical Presentation Graphs are often easier to interpret than tables, perhaps at the expense of detail. A variety of graphs are used depending on the type of data. If we want to present categorical/qualitative or quantitative discrete data/variable using graph, then pie chart and bar chart are the appropriate ones, however if the variable is numerical/quantitative continuous data in nature, then we can use histogram, frequency polygon, cumulative frequency curve, box plot… 122 11/17/2018
  • 123. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Graphical Presentation cont’d… There are, however, general rules that are commonly accepted about construction of graphs. Every graph should be self-explanatory and as simple as possible. Titles are usually placed below the graph and it should answer again question like: what ? Where? When? How classified? Legends or keys should be used to differentiate variables if more than one is shown. 123 11/17/2018
  • 124. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) The units in to which the scale is divided should be clearly indicated. The numerical scale representing frequency must start at zero or a break in the line should be shown. 124 Graphical Presentation cont’d… 11/17/2018
  • 125. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Examples of graphs: Bar Chart Bar diagrams are used to represent and compare the frequency distribution of discrete variables and attributes or categorical series. When we represent data using bar diagram, all the bars must have equal width and the distance between bars must be equal. Each category of variable is represented by a bar Variables are categorical, or treated as qualitative It can be displayed as horizontal or vertical 125 11/17/2018
  • 126. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Types of bar charts There are different types of bar diagrams: A. Simple bar chart: It is a one-dimensional diagram in which the bar represents the whole of the magnitude. The height or length of each bar indicates the size (frequency) of the figure represented. – one variable – It can be displayed as horizontal or vertical 11/17/2018 126
  • 127. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Figure 1: immunization status of children in adami Tulu Wereda, 1995 Types of bar charts cont’d… 11/17/2018 127
  • 128. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Type of bar chart cont’d ... B. Grouped bar chart – Data from 2-variable or more variable tables – Distinct colours or shading is used to differentiate – Legend is necessary 11/17/2018 128
  • 129. The meaning of each bar is shown in a legend One cell Cell separated By a space E.g. Grouped/ joined bar chart Figure 2: TT immunization status by marital status of women 15-49 years, Asendabo town, 1996. 11/17/2018 129
  • 130. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) C. Stacked bar chart – It is used to show the same data as a grouped bar chart using a single bar – Different groups are differentiated by different segments within a single bar – You are able to see the overall change easier, but changes between groups may be difficult than grouped bars Type of bar chart cont’d ... 11/17/2018 130
  • 131. Figure 1. Cases of S Typhimurium-infection by age-group and sex, Herøy, Norway, 1999 0 2 4 6 8 10 12 14 0 - 9 10 - 19 20 - 29 30 - 39 40 - 49 50 - 59 60 - 69 70 - Age-group Number of cases Male Female Eg Stacked bar chart(absolute value) Figure 3: cases of S. Typhimurium-infection by age group and sex, Heroy, Norway, 1999 11/17/2018 131
  • 132. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) D. 100% component bar chart – It is a variant of stacked bar chart , where bars are pulled to 100% rather than their real values; – It is helpful for comparing the contribution of different subgroups within the categories of the main variable Type of bar chart cont’d ... 11/17/2018 132
  • 133. Eg 100% Component bar chart Figure 4. Cases of S Typhimurium-infection by age-group and sex, Herøy, Norway, 1999 0 % 20 % 40 % 60 % 80 % 100 % 0 - 9 10 - 19 20 - 29 30 - 39 40 - 49 50 - 59 60 - 69 70 - Age-group Male Female Proportional distribution by sex 11/17/2018 133
  • 134. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Pie -Charts;  It is a circle divided into sectors so that the areas of the sectors are proportional to the frequencies.  It is split into segments to show percentages or the relative contributions of categories of data.  It is a good method of representation if you wish to compare a part of group with the whole group.  The number of categories should not be too much. 134 11/17/2018
  • 135. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) e.g. Pie chart Fig.5. Distribution of religion of participants from Kunama ethnic group among Eritrean Refugees in Shimelba Camp, July 2006 11/17/2018 135
  • 136. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Quantitative continuous data Histograms: is the graph of the frequency distribution of continuous measurement variables. It is constructed on the basis of the following principles: The horizontal axis is a continuous scale running from one extreme end of the distribution to the other. It should be labeled with the name of the variable and the units of measurement. 136 11/17/2018
  • 137. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Histograms cont’d … For each class in the distribution a vertical rectangle is drawn with Its base on the horizontal axis extending from one class boundary of the class to the other class boundary, there will never be any gap between the histogram rectangles. The bases of all rectangles will be determined by the width of the class intervals. 137 11/17/2018
  • 138. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Histograms cont’d … Area of each column is proportional to the number of observations in that interval In constructing – Use equal class intervals – Do not use scale breaks It could show second variable by shading 11/17/2018 138
  • 139. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Figure 6: Age distribution of women in a reproductive age group included in a study of violence against women in Butajira, 1984. 11/17/2018 139
  • 140. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Frequency polygon  If we join the midpoints of the tops of the adjacent rectangles of the histogram with line segments a frequency polygon is obtained. Note: it is not essential to draw histogram in order to obtain frequency polygon. It can be drawn with out erecting rectangles of histogram as follows: 140 11/17/2018
  • 141. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Frequency polygon cont’d…  The scale should be marked in the numerical values of the midpoints of intervals  Erect ordinates on the midpoints of the interval - the length or altitude of an ordinate representing the frequency of the class on whose mid-point it is erected.  Join the tops of the ordinates and extend the connecting lines to the scale of sizes. 141 11/17/2018
  • 142. Construction of a frequency polygon from a histogram 15 cases 14 13 1 case patient 12 1 case staff member 11 10 9 8 7 6 5 4 3 2 1 0 00- 06- 12- 18- 00- 06- 12- 18- 00- 06- 12- 18- 00- 06- 12- 18- 00- 27 August 28 August 29 August 30 August Date and time of onset 11/17/2018 142
  • 143. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 143 Mid point/ class mark 11/17/2018
  • 144. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Ogive or cumulative frequency curve:  To construct an Ogive curve: Compute the cumulative frequency of the distribution.  Prepare a graph with the cumulative frequency on the vertical axis and the true upper class limits (class boundaries) of the interval scaled along the X-axis (horizontal axis). The true lower limit of the lowest class interval with lowest scores is included in the X-axis scale; this is also the true upper limit of the next lower interval having a cumulative frequency of 0. 144 11/17/2018
  • 145. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 145 11/17/2018
  • 146. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Summarizing Data  The first step in looking at data is to describe the data at hand in some concise way.  One type of measure useful for summarizing data defines the center, or middle, of the sample. 146 11/17/2018
  • 147. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Measures of Central Tendency/ Measures of Location  Measures of central Tendency: the various methods of determining the actual value at which the data tend to concentrate. Hence, measures of central Tendency is a value which tends to sum up or describe the mass of the data.  These central tendency includes: Mean , Median and Mode . 147 11/17/2018
  • 148. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Arithmetic Mean/simple Mean ( ) Definition: the arithmetic mean is the sum of all observations divided by the number of observations. it is usually denoted by  Let us consider X1, X2, ..., XN are the list of N measurements obtained from N subjects. Then the mean for ungrouped number of measurements for N subjects is defined as: 148 X 11/17/2018
  • 149. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) The mean for Grouped data can be computed as follows:  Where: k=the number of classes Xci=class mark for the ith class and fi=frequency of the ith class 149 11/17/2018
  • 150. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) properties of Mean Individual extreme values (also known as 'outliers') can distort its ability to represent the typical value of a variable (which is The main weakness of the mean.) It is unique for the given set of data The value of the arithmetic mean is determined by every item in the series. The sum of the deviations about it is zero. 150 11/17/2018
  • 151. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Example 1  Consider the data on birth weight of 10 new born children in kilo gram at university of Gondar hospital: 2.51, 3.01, 3.25, 2.02,1.98, 2.33, 2.33, 2.98, 2.88, 2.43. Then the average birth weight can be computed as: 151 11/17/2018
  • 152. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)  Compute mean for the grouped frequency distribution given bellow: The grouped frequency distribution for current age of women 152 Example 2 11/17/2018
  • 153. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 153 Class limit Class boundary Class mark Frequency RF(%) CF 15-18 14.5-18.5 16.5 15 6.25 15 19-22 18.5-22.5 20.5 49 20.41 64 23-26 22.5-26.5 24.5 51 21.25 115 27-30 26.5-30.5 28.5 40 16.67 155 31-34 30.5-34.5 32.5 21 8.75 176 35-38 34.5-38.5 36.5 22 9.17 198 39-42 38.5-42.5 40.5 18 7.50 216 43-46 42.5-46.5 44.5 15 6.25 231 47-50 46.5-50.5 48.5 9 3.75 240 11/17/2018
  • 154. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Example 2 cont’d…  Where as: fi = frequency distribution of ith class Xc = is the mid-point n = total sample size 154 11/17/2018
  • 155. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Median  An alternative measure of central location, perhaps second in popularity to the arithmetic mean.  Suppose there are n observations in a sample. If these observations are ordered from smallest to largest, then the median is defined as follows: The median, is a value such that at least half of the observations are less than or equal to median and at least half of the observations are greater than or equal to median . The median is the midpoint of the data array. 155 11/17/2018
  • 156. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Median cont’d …  To find the median of a data set: Arrange the data in ascending order. Find the middle observation of this ordered data. 156 11/17/2018
  • 157. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Median cont’d…  If the number of data is ODD, then the median is the middle data point: Median =  If the number of data is EVEN, then the median is the average of the two values around the middle. Median = 157 11/17/2018
  • 158. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) • Extreme values do NOT affect the median, making the median a good alternative to the mean to measure central tendencies when such values occur. 158 Median cont’d… 11/17/2018
  • 159. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Example:  Consider the data on the weight of 10 new born children at university of Gondar hospital within a month: 2.51, 3.01, 3.25, 2.02,1.98, 2.33, 2.33, 2.98, 2.88, 2.43. – Find median for the data. 159 11/17/2018
  • 160. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)  First arrange the data in to ascending order as: 1.98, 2.02, 2.33, 2.33, 2.43,2.51, 2.88, 2.98, 3.01, 3.25.  As 10 is even we need to take the middle two observations and the median will be the average of this two middle observations. 160 Example cont’d… 11/17/2018
  • 161. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Median cont’d… Median for grouped data:  The median for grouped data is defined by: Where as: LCB= lower class boundary of the median class Fc= cumulative frequency just before the median class fc=frequency of the median class W =class width and n=number of observations. 161 11/17/2018
  • 162. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Example median for grouped data 1  Consider the example on age of women we presented using frequency distribution bellow. Compute median for grouped data?  To compute median for grouped data, we need first find the median class. In this example half of the observation is 120.  Let us see the distribution with the cumulative frequency: 162 11/17/2018
  • 163. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 163 Example median for grouped data 1 cont’d… 11/17/2018
  • 164. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)  As we can see from the distribution, the class which contains 120 observation for the first time is the class with cumulative frequency 155 as 120 is under 155. So, the median class is the 4th class 164 11/17/2018
  • 165. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Mode  Mode is the value appearing most frequently  It can be obtained by counting the number of appearance for each observation from the list.  Important for summarising nominal/categorical types of data  disadvantage,  In small number of observations, there may be no mode.  In addition, sometimes, there may be more than one mode such as when dealing with a bimodal (two-peaks) distribution.  Example a. 22, 66, 69, 70, 73. (no modal value) b. 1.8, 3.0, 3.3, 2.8, 2.9, 3.6, 3.0, 1.9, 3.2, 3.5 (modal value = 3.0 kg) 165 11/17/2018
  • 166. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 166 11/17/2018
  • 167. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) NB: The mode for grouped data is modal class. Modal class is the class with the largest frequency. 167 Mode cont’d… 11/17/2018
  • 168. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Skewness:  If extremely low or extremely high observations are present in a distribution, then the mean tends to shift towards those scores.  Based on the type of skewness, distributions can be:  Symmetrical distribution: when data values are evenly distributed on both sides of the three measures of central tendency (Mean, Median and Mode).  It is neither positively nor negatively skewed. A curve is symmetrical if one half of the curve is the mirror image of the other half.  If the distribution is symmetric and has only one mode, all three measures are the same, an example being the normal distribution. 168 11/17/2018
  • 169. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 169 11/17/2018
  • 170. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Positively skewed distribution: Occurs when the majority of scores are at the left end of the curve and a few extreme large scores are scattered at the right end.  For positively skewed distributions (where the upper, or left tail of the distribution is longer (“fatter”) than the lower, or right tail) the measures are ordered as follows: Mode < median < mean. 170 11/17/2018
  • 171. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 171 11/17/2018
  • 172. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Negatively skewed distribution: occurs when majority of scores are at the right end of the curve and a few small scores are scattered at the left end. For negatively skewed distributions (where the right tail of the distribution is longer than the left tail), the reverse ordering occurs: Mean < median < mode. 172 11/17/2018
  • 173. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 173 11/17/2018
  • 174. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Measures of Dispersion/ Variation  Measures of dispersion or variability will give us information about the spread of the scores in our distribution.  Without knowing something about how the data is dispersed, measures of central tendency may be misleading.  Most common measures of dispersion includes Range, Inter-quartile range, Variance, Standard deviation and Coefficient of variation. 174 11/17/2018
  • 175. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)  Consider the following three datasets Dataset 1:7, 7, 7, 7, 7, 7 Mean=7, s.d=0 Dataset 2: 6, 7, 7, 7, 7, 8, mean=7, s.d=0.63 Dataset 3: 3, 2, 7, 8, 9, 13, mean=7, s.d=4.04 175 Measures of Dispersion/ Variation 11/17/2018
  • 176. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Measures of Dispersion cont’d…  RANGE: It is the difference between the largest and smallest observation from the data EXAMPLE: Consider the data on the weight (in Kg) of 10 new born children at university of Gondar hospital within a month: 2.51, 3.01, 3.25, 2.02,1.98, 2.33, 2.33, 2.98, 2.88, 2.43. 176 11/17/2018
  • 177. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Then the range for the dataset can be computed by first arranging all observation in to ascending order as: 1.98, 2.02, 2.33, 2.33, 2.43, 2.51, 2.88, 2.98, 3.01, 3.25. Range = Maximum-Minimum=3.25-1.98=1.27  It is based upon two extreme cases in the entire distribution, the range may be considerably changed if either of the extreme cases happens to drop out, while the removal of any other case would not affect it at all.  It wastes information , it takes no account of the entire data. 177 Measures of Dispersion cont’d… 11/17/2018
  • 178. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)  The extremes values may be unreliable; that is, they are the most likely to be faulty  Not suitable with regard to the mathematical treatment required in driving the techniques of statistical inference. 178 Measures of Dispersion cont’d… 11/17/2018
  • 179. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Quantiles The Pth percentile is the value Vp such that P percent of the sample points are less than or equal to Vp. The median, being the 50th percentile, is a special case of a quantile. As was the case for the median, a different definition is needed for the pth percentile, depending on whether np/100 is an integer or not. 179 11/17/2018
  • 180. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) The pth percentile is defined by: 1. (k+1)th largest sample point if np/100 is not an integer (where k is the largest integer less than np/100) 2. The average of the (np/100)th and (np/100 + 1)th larges observation if np/100 is an integer. 180 Quantiles cont’d … 11/17/2018
  • 181. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Quintiles cont’d … Example 1: Compute the 10th and 90th percentile for the birth weight data below. Suppose the sample consists of birth weights (in grams) of all live born infants born at a private hospital in a city, during a 1-week period. 3265, 3323, 2581, 2759, 3260, 3649,2841 3248, 3245, 3200, 3609, 3314, 3484, 3031 2838, 3101, 4146, 2069, 3541, 2834 181 11/17/2018
  • 182. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Quintiles cont’d … By sorting the data from the smallest to highest 2069 2581 2759 2834 2838 2841 3031 3101 3200 3245 3248 3260 3265 3314 3323 3484 3541 3609 3649 4146 Solution: Since 20×0.1=2 and 20×0.9=18 are integers, the 10th and 90th percentiles are defined by: 182 11/17/2018
  • 183. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 10th percentile = the average of the 2nd and 3rd values = (2581+2759)/2 = 2670 g 90th percentile=the average of the 18th and 19th values = (3609+3649)/2 = 3629 grams. 183 Quintiles cont’d … We would estimate that 80 percent of birth weights would fall between 2670 g and 3629 g, which gives us an overall feel for the spread of the distribution. 11/17/2018
  • 184. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Quintiles cont’d …  Quartiles: are other quantiles which divide the distribution into four equal parts. The second quartile is the median.  The interquartile range (IQR): is the difference between the first and the third quartiles.  To compute it, we first sort the data, in ascending order, then find the data values corresponding to the first quarter of the numbers (first quartile), and then the third quartile. 184 11/17/2018
  • 185. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Quintiles cont’d … example 2: Given the following data set (age of patients) find the interquartile range! 18,59,24,42,21,23,24,32 1. sort the data from lowest to highest 18 21 23 24 24 32 42 59 2. Find the bottom and the top quarters of the data 3. Find the difference (interquartile range) between the two quartiles. 185 11/17/2018
  • 186. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Quintiles cont’d …  1st quartile = The {(n+1)/4}th observation = (2.25) th observation = 21 + (23-21)x 0.25 = 21.5  3rd quartile = {3/4 (n+1)}th observation = (6.75)th observation = 32 + (42-32)x 0.75 = 39.5 Hence, IQR = 39.5 - 21.5 = 18  The interquartile range is a preferable measure to the range. Because it is less prone to distortion by a single large or small value. That is, outliers in the data do not affect the inerquartile range. Also, it can be computed when the distribution has open-end classes. 186 11/17/2018
  • 187. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Box and Whisker plot  Box plots summarize data using a five-number :  The 25th (first quartile), the median(second quartile), and 75th percentiles(third quartile), and the minimum and maximum observed values that are not statistically outlying.  The heavy black line inside each box marks the 50th percentile, or median, of the group distribution.  The lower and upper hinges, or box boundaries, mark the 25th (Q1) and 75th (Q2) percentiles respectively.  Whiskers appear above and below the hinges. Whiskers are vertical lines ending in horizontal lines at the largest and smallest observed values that are not statistical outliers. 187 11/17/2018
  • 188. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Box and Whisker plot cont’d…  Outliers are identified with an O. Yield one (1) is, labeled 1O and, Yield 2 labeled as 17 O and Yield 3, labeled as 58O.  The label 1,3,17 and 58 refers to the row number in the Data Editor where that observation is found.  Extreme values are marked with an asterisk (*). In this case the extreme labeled as *3 in the first yield indicated. 188 11/17/2018
  • 189. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 189 grooup three group two group one differen groups having defferent training 30,00 25,00 20,00 15,00 10,00 s c o r e t o s u r v e y q u e s t i o n 58 17 1 3 11/17/2018
  • 190. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Information obtained from a box and whisker plot – If the median is near the center of the box, the distribution is approximately symmetric, – If the median falls to the bottom of the center of the box, the distribution is positively skewed. – If the median falls to the top of the center, the distribution is negatively skewed. 190 11/17/2018
  • 191. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) – If the whiskers are about the same length, the distribution is approximately symmetric, – If the top whisker is longer than the bottom whisker, the distribution is positively skewed. – If the bottom whisker is longer than the top whisker, the distribution is negatively skewed. 191 Information obtained from a box and whisker plot cont’d … 11/17/2018
  • 192. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) outlier An outlier is an observation that lies an abnormal distance from other values in a random sample from a population. Before abnormal observations can be singled out, it is necessary to characterize normal observations. Two activities are essential for characterizing a set of data: Examination of the overall shape of the graphed data for important features, including symmetry and departures from assumptions. Examination of the data for unusual observations that are far from the mass of data. These points are often referred to as outliers 192 11/17/2018
  • 193. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) outlier cont’d… The following quantities (called fences) are needed for identifying outliers extreme values in the tails of the distribution: lower inner fence: Q1 - 1.5*IQ upper inner fence: Q3 + 1.5*IQ lower outer fence: Q1 - 3*IQ upper outer fence: Q3 + 3*IQ Where as: Q1 = 1st quartile Q3 = 3rd quartile IQ = interquartile range A point beyond an inner fence on either side is considered a outlier. A point beyond an outer fence is considered an extreme outlier. 193 11/17/2018
  • 194. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Standard Deviation and Variance  Variance: While the inter-quartile range eliminates the problem of outliers it creates another problem in that you are eliminating half of your data. The solution to both problems is to measure variability from the center of the distribution. Variance measure how far on average scores deviate or differ from the mean. 194 Variance is the average of the square of the distance each value is from the mean 11/17/2018
  • 195. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 195 2 2 1 ( ) N i i x N µ σ − − = ∑ Mathematically the formula for population variance is defined as: 2 2 1 ( ) 1 n i i x x s n − − = − ∑ The mathemetical formula for sample variance is defined as: 11/17/2018
  • 196. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Short cut formula for the sample variance 196 2 2 2 ( ) = 1 x x n s n Σ Σ − − Variance cont’d… 11/17/2018
  • 197. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)  The sample and population standard deviations are denoted by S and σ (by convention) respectively.  The standard deviation, S.D., is just the positive square root of the variance.  It expresses exactly the same information as the variance, but re-scaled to be in the same units as the mean.  Mathematically: Population standard deviation 197 2 1 ( ) N i i x N µ σ − − = ∑ Standard Deviation 11/17/2018
  • 198. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Standard Deviation cont’d…  Sample standard deviation can be defined as:  Example1 The Areas of sprayable surfaces with DDT from a sample of 15 houses are measured as follows (in m2) : 101, 105, 110, 114, 115, 124, 125, 125, 130, 133, 135, 136, 137, 140, 145 198 2 1 ( ) 1 n i i x s n x − = − = − ∑ 11/17/2018
  • 199. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Example 1 cont’d …  Find the variance and standard deviation of the above distribution.  Solutions The mean of the sample is 125 m2. Variance (sample) = s2 = Σ(xi –x)2/n-1 = {(101-125)2 +(105-125) 2 + ….(145-125) 2 } / (15-1) = 2502/14 = 178.71 m4 Hence, the standard deviation = = 13.37 m2 199 178.71 11/17/2018
  • 200. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Variance for grouped frequency distribution  In a grouped frequency distribution, the variance is computed as:  Where as fi =frequency of ith class Xci =class mark of ith class n = total number of the sample 200 2 2 2 1 1 ( ) ( ) ( 1) i i k k i c i c i i n f x f x s n n = = − = − ∑ ∑ 11/17/2018
  • 201. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Example of Variance for grouped frequency distribution  Consider the following data of time spend by college students for leisure activities. Compute standard deviation. 201 11/17/2018
  • 202. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 202 2 2 1 1 ( ) ( ) = ( 1) i i k k i c i c i i n f x f x s n n = = − − ∑ ∑ 11/17/2018
  • 203. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Coefficient of variance  The standard deviation is an absolute measure of deviation of observations around their mean and is expressed with the same unit of the data.  Due to this nature of the standard deviation it is not directly used for comparison purposes with respect to variability.  Coefficient of variation, is often used for this purpose  The coefficient of variation (CV) is defined by: CV =  The coefficient of variation is most useful in comparing the variability of several different samples, each with different means. 203 11/17/2018
  • 204. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) Coefficient of variance cont’d…  CV is a relative measure free from unit of measurement. example 204 Weights of newborn elephants (kg) 929 853 878 939 895 972 937 841 801 826 Weights of newborn mice (kg) 0.72 0.42 0.63 0.31 0.59 0.38 0.79 0.96 1.06 0.89 n=10, = 887.1 s = 56.50 CV = 0.0637 X n=10, = 0.68 s = 0.255 CV = 0.375 X Mice show greater birth- weight variation 11/17/2018
  • 205. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) When to use coefficient of variance  When comparison groups have very different means (CV is suitable as it expresses the standard deviation relative to its corresponding mean)  When different units of measurements are involved, e.g. group 1 unit is mm, and group 2 unit is gm (CV is suitable for comparison as it is unit-free)  In such cases, standard deviation should not be used for comparison 205 11/17/2018
  • 206. Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.) 206 11/17/2018