Intro biostat1&2

By
Dr Babatunde, OA
MBBS, PgCertDPMIS, MPH, FWACP
Department of Community Medicine,
FMC, Ido-Ekiti





Definition (C-O-S-A-I-P)
Collection
Organization
Summarizing
Analyzing
Interpreting
Presenting

Applications of biostatistics
Dr Babatunde OA MBBS,
PGCertDPMIS, MPH, FWACP

2/10/2014

2



A variable is any parameter that can be
observed or measured



Information collected on a variable is usually
unrefined and it is called data



The collection, analysis, interpretation and
use of data is called statistics



The application of statistics to health-related
fields is known as Biostatistics1

2/10/2014

3







Biostatistics = Medical statistics
Medical statistics is the scientific method of
collecting, organizing, summarizing,
analyzing, interpreting, and presenting
medical data1
Biostatistics is statistics applied to the
biological sciences and to Medicine2


2/10/2014

4










Biostatistics is all about „curiosity‟3
Biostatistics is about asking medically
relevant questions and getting answers using
statistical methods
Which age group dies most? Mortality rate
What proportion of University students use
condoms during sexual intercourse?
Assignment 1: Each student should ask a
medically related question of personal
interest and submit it in the format below

2/10/2014

5







Name:
Matriculation Number:
Medical question of personal interest
Submit it at the end of the lecture
Also document in your notebook because we
will always make reference to this question
throughout this class


2/10/2014

6







Research is the scientific investigation of
facts and relationships to establish
dependable solutions to problems through
systematic collection, analysis, and
interpretation of data

Research is described as systematic in that it
involves an organized, formally structured
methodology to obtain new knowledge
Biostatistics is the basis for research

4

2/10/2014

7








It is a general phenomenon that many
students do not have interest in statistics
Many see it as too abstract to conceptualize
However, it is the simplest form of all
sciences being practiced by both literates and
illiterates
Grandmother statistics: A big stroke by a
grandmother represents a birth while a small
stroke represents a death (origin of tally
sheet in immunization)

2/10/2014

8









Biostatistics center around data
Hence what is data?
Data is information collected of an individual
or group of individuals
When entered into a computer, it is called
dataset
Assignment 2: List 5 examples of data you
can collect to answer your question in
assignment 1


2/10/2014

9





Example: How many students in this class use
condom during sexual intercourse:
5 data set:
1. Ever had sex
2. Age at 1st sexual intercourse
3. Number of sexual intercourse in last 3
months
4. Number of times used condom
5. Number of sexual partners since sexual
initiation

2/10/2014

10










Questionnaires
Observations (checklist)
Focus Group Discussion
Proforma
Records
Census
List other ways you can collect data


2/10/2014

11



4 Levels of measurement are involved in data
collection (N-O-I-R)
◦
◦
◦
◦

1.
2.
3.
4.

Nominal
Ordinal
Interval
Ratio


2/10/2014

12










Lowest level
Mutually unordered category
No notion of numerical magnitude
Any number assigned has no numerical value
other than to distinguish one category from
another.
Examples: Gender, Blood Group, Marital
status
Assignment 3: List 5 more examples of
Nominal scale

2/10/2014

13








Ability to rank or order phenomenon
In addition to nominal propert
It is defined by related category
Examples: Patients pain coditions desribed as
Mild, Moderate, Severe
Assignment 4: List 5 more examples of
Ordinal scale of measurement


2/10/2014

14








Measurements are expressed in numbers
The starting point is arbitrary depending
largely on the units of measurement
It is possible to attach physical meanings to
differences of 2 measurements (intervals) but
not to their ratios
Examples: Temperature-Centigrade or
Fahrenheit


2/10/2014

15







Measurement on this scale has 3 previously
mentioned properties but in addition has a
true zero point
The ratio of any 2 measurements on the scale
is physically meaningful
Examples: Height in cm, Weight in Kg, Age in
years.


2/10/2014

16

Level

Summary

Example

Nominal

Categories only. Data cannot be
arranged in an ordering scheme

Student’s car:
1 Ford, 2 Toyota, 3 BMW

Ordinal

Categories are ordered, but
differences cannot be
determined or they are
meaningless

Student’s car:
1 Compact,
2 Mid-size,
3 Full size

Interval

Differences between values can
be found, but there may be no
inherent starting point. Ratios
are not meaningful

Temperature:
45 ,
80 ,
90

Ratio

Like interval scale, but with an
inherent starting point. Ratios
are meaningful

Weights of football players:
200 lbs, 300 lbs, 400 lbs


2/10/2014

17

Theoretical interest is not the primary reason why
researchers and statisticians consider the level of
measurement of a variable.
Level of measurement is important because the kinds
of statistical procedures that can be appropriately
used depend on the level of measurement of the
variable studied.
Calculating mean telephone number of a group of
people’s telephone number would be possible but
ridiculous, since telephone number is a nominal scale
level variable.

2/10/2014

18






Raw data is usually not too useful
It has to be organized to make sense out of it
This brings us to types of statistics:
◦ Descriptive: Frequency tables, Diagrams
◦ Inferential: Use of statistical tests


2/10/2014

19





Primary data
Data that is obtained directly from an
individual e.g. 2006 Census
Secondary data
Data that is obtained from outside source e.g.
studying of hospital records 5


2/10/2014

20



A Special type of Discrete Variable is the
Binary Variable which takes on exactly 2
possible values
◦ Gender (M/F)
◦ Pregnant? (Y/N)
◦ Hypertensive? (Y/N)


2/10/2014

21



Sometimes, discrete variables have a “natural
ordering” to them
◦ For example, names of consecutive days in a week
(M, Tu, Wed, Thurs, Fri, Sat, Sun)



Other types of discrete variables do not have
a natural order and are called Nominal

Variables

◦ Race (African American, Caucasian, Asian, Hispanic
etc.)


2/10/2014

22







If in an experiment you measure a single
variable, it is called a Univariate experiment
If you measure 2 variables, it is called a
Bivariate experiment
And if you measure multiple variables, it is
called a Multivariate experiment


2/10/2014

23








Concerned with summarizing series of
measurements or observations
A] Measures of Central tendency
B] Measures of Variability/Dispersion
C] Measures of Relative standing


2/10/2014

24



Now that we have displayed our data, we want to
be able to characterize it quantitatively
◦ Measures of Central Tendency
 Mean, Median, Mode

◦ Measures of Variability
 Range, Variance, Standard Deviation

◦ Measures of Relative Standing
 Z-Scores, Percentiles, Quartiles


2/10/2014

25



Mean
◦ Arithmetic Average of a sample of data



Median
◦ If you order the data from smallest to highest,
the median is the middle value, assuming an odd
number of data elements
◦ If you have an even number of elements, it is the
average of the 2 middle numbers.



Mode

◦ The most common value in a set of values


2/10/2014

26







i. Arithmetic Mean: This is different from
other types of mean like geometric mean
and harmonic mean.
The arithmetic mean is simply the average,
denoted by the symbols shown: [μ,-x, ie
miu or x-bar].
These symbols are used to represent
arithmetic mean of population [N] and
sample [n] respectively.


2/10/2014

27







Median: Here the distribution is arrayed or
arranged in a particular pattern.
Then look at the value which cuts this distribution
into two equal parts.
That value in array which divides it into two equal
parts is called the median.


2/10/2014

28







Mode: This is the most frequently occurring
value in a distribution.
Some distributions are described as amodal
because they have no mode.
A distribution with one mode is uni-modal
and that with two modes is called bimodal
distribution.


2/10/2014

29

 If

you stop learning you are
old, whether you are 20 or 80
years

Thank

you


2/10/2014

30

 This is one of the simplest measures
of variability.
 This is simply the difference between
the highest and the lowest values;
R=XH-XL.
 The range has a problem of looking at
two extremes alone and ignores other
values.


2/10/2014

32

 In the following distribution; 9, 4, 2, 5, 10
[which has a mean of 6], the total deviation
from the mean or the average is always
zero.
 Since the total or average mean deviation is
useless, something is done to get around
the problem.
 Thus we square the deviations and sum
them up and we get 46.
 Now the average of the squared deviations
is got by dividing by number of
observations.
 This is called variance [S2, σ2], sample and
population variance respectively.

2/10/2014

33

tables
 charts
 diagrams
 graphs
 pictures
 special curves



2/10/2014

34

Numbering eg table 1, table 2, etc
 Title which must be brief and self explanatory
 Headings of columns and rows should be clear
and concise
 Data must be presented according to size or
importance, chronologically, alphabetically or
geographically
 If percentages or averages are to be compared,
they must be placed as close as possible
 No table may be too large
 Footnotes may be given where necessary



2/10/2014

35



Charts and diagrams;
These methods of presentation have powerful
impact on the imagination of people. So they are a
popular media of exposing statistical data

a. Bar charts; these are a way of presenting a set of
numbers by the length of a bar- length of bar
being proportional to the magnitude to be
represented


2/10/2014

36



simple bar chart; bars may be vertical or horizontal are
usually separated by appropriate spaces with an eye on
neatness and clear presentation





Multiple bar charts; Here two or more bars are grouped
together.
Component bar chart; Here the bar may be divided into two
or more parts. Each part represents a certain item and
proportional to the magnitude of that particular item.


2/10/2014

37



b. Histogram; this is a pictorial diagram of
frequency distribution



It consists of a series of block



The class intervals are given along the horizontal
axis and frequency on the vertical axis



The area of each block or rectangle is proportional
to the frequency



The histogram is apt for representing continuous
variables.


2/10/2014

38







i. it is like the simple bar chart except that
the bars of histogram touch each other
ii. The height of each box is equal to the
frequency {ie for equal intervals} of class it
represents
iii. The interval with the highest box is
called the modal interval ie interval that
contains the mode.


2/10/2014

39





c.
Frequency
polygon;
a
frequency
distribution may also be represented
diagrammatically by the frequency polygon
It‟s obtained by joining the midpoints of the
histogram blocks.


2/10/2014

40



d. Pie charts; Instead of comparing the length
of a bar
the areas of segments of a circle are
compared.
The Area of each segment depends upon
the angle. A
circle of any considerable large size is
divided into the
number of components that make up the
total such
that the area of each sector is proportional
to the
component it represents.

2/10/2014

41



e. Graphs / scatter diagrams; this comes in when
there
are two different factors involved eg age
/height. If
after plotting the points, and they are such that
the
points cannot be joined by any line, then
graphs will
not apply and so we have scatter diagram.


2/10/2014

42

47
46.5
46
45.5
45
44.5
44
43.5
43
42.5
42

East
West
North

1st Qtr

2/10/2014

2nd Qtr

3rd Qtr

4th Qtr

Dr Babatunde OA MBBS, PGCertDPMIS, MPH,
FWACP

43

90
80
70
60
50
40
30

East
West
North

20
10
0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

2/10/2014

Dr Babatunde OA MBBS, PGCertDPMIS, MPH,
FWACP

44

100%
90%
80%
70%
60%
Series2

50%

Series1

40%
30%
20%
10%
0%
1

2

3

4

5


2/10/2014

45

1
2
3
4


2/10/2014

46

60
50
40
30

Series1

20
10
0
0

5

10

15


2/10/2014

47

50
45
40
35
30

Series1

25

Series2

20
15
10
5
0
1

2

3

4

5


2/10/2014

48





This refers to the applications of statistical
tests to study results with a view to ascertain
presence of statistical significance
Suppose we find in a study on level of
physical activity, 40% of men included in the
sample are physically active whereas only 30%
of women qualified as active. How should one
interpret this result?


2/10/2014

49

• 1. The observed difference of 10% might be a TRUE
DIFFERENCE, which also exist in the total pop from
which the sample was drawn




2. This difference might also be DUE to CHANCE; ie
in reality there is no difference b/w men and
women but that the sample of men just happened
to differ from the sample of women –probably due
to sample variation
3. The observed difference of 10% is due to defect
in the study design (bias)-ie with an appropriate
study design no such difference would have
occurred

2/10/2014

50

• Statistical tests estimate the likelihood that such a
result occur by chance

• If the likelihood or probability is less than 5% it
implies that a true difference exist and the notion of
chance occurrence is rejected
• This level of 5% is known as the alpha level while the
actual likelihood or probability calculated is know as
the P-value
• In statistical terms the assumption that in the total
population no real difference exists between the
groups is called the NULL HYPOTHESIS

2/10/2014

51







Once the alpha level has been set and the
statistical test applied to results the P-value
is obtained
If the P-value is lower than the alpha value it
implies that a true difference exists and the
Null Hypothesis is rejected while the result is
said to be statistically significant
If the P-value is higher than the alpha value
the Null hypothesis is accepted and the result
is taken as having occurred by chance and
considered not significant

2/10/2014

52





If the Null hypothesis is rejected when it is
true ie no true difference exist ( P value >
than alpha value) then a type I error is
committed
If the Null hypothesis is accepted when a true
difference exist (P-value < than alpha value)
then a type II error is committed


2/10/2014

53

•

•

Clinicians often have to evaluate and use new
information through out their practice lives.
The most important reasons for learning
biostatistics include the following:

1. Assessing medical literature-evidence based
information is often made available in journals and
clinicians must understanding biostatistics to be able
to make sense of such information
2. Patient care- results of research work are often meant
for patient care and clinicians want to know best
diagnostic procedure, optimal care and how treatment
regimens should be designed and implemented


2/10/2014

54

3. Use of vital statistics-effective diagnosis and
treatment of patients requires an understanding of
how to make sense out of vital statistics which
often results from the recording of vital events such
as births and deaths

4. Deploying diagnostic procedures-knowing the
appropriate diagnostic procedure to use in a given
patient is essential for effective care. Clinicians
should be conversant with the sensitivity,
specificity, positive and negative predictive values
of a procedure

2/10/2014

55

5. Assessing information on drugs and equipmentcompanies present information on their products in
charts, graph and clinical studies and clinicians
need to good knowledge of biostatistics to make
sense out of such presentation and information

6. Understanding epidemiologic problems-disease
prevalence, variation by seasons and by location,
and relationship to risk factors constitute
epidemiological parameters of utmost importance
to the clinician in practice.


2/10/2014

56









Public health (Epidemiology, Nutrition etc)
Clinical trials
Population genetics
Genomics analysis
Ecology/Ecological forecasting
Biological Sequence Analysis
Systems biology for gene network inference


2/10/2014

57

1.

2.

3.
4.

5.

Bamgboye EA. A companion of Medical statistics.
Ibipress & Publishing Company, Ibada Nigeria 1st
Edition 2006: 1-16.
Dunn OJ. Basic statistics: A primer for the
Biomedical Sciences. Johm Wiley and Sons
Publishers 2nd Edition: 1-11.
Kolawole EB. Statistical methods. Bolabay
Publications Lagos, Nigeria 1st Edition 2006: 1-12.
Taofeek I. Research methodology and dissertation
writing for allied professionals. Cress Global Link
Limited, Abuja 1st Edition 2006: 1-24
Park K. Park‟s textbook of Preventive Medicine and
Social Medicine. M/s Banarsidas Bhanot Publishers
2004 18th Edition: 608-615


2/10/2014

58

6. Dawnson B, Trapp R. Introduction to Medical
Research in Basic and Clinical Biostatistics. Fourth
Edition. McGraw-Hill Companies Inc: USA,
2004;p1-6
7. Prabhakara GN. Basics of Statistics in
Biostatistics. JAYPEE:New Delhi; 2006; p11-16.
8. Dawnson B, Trapp R. Summarising Data and
Presenting data in Tables and Graphs in Basic
and Clinical Biostatistics. Fourth Edition.
McGraw-Hill Companies Inc:USA, 2004;p23-60


2/10/2014

59

 What

doesn‟t kill us makes us
stronger
 So
see
challenges
as
opportunities for
personal
growth

Thank

you


2/10/2014

60

Intro biostat1&2

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Intro biostat1&2

Similar to Intro biostat1&2 (20)

More from Lucidante1

More from Lucidante1 (20)

Recently uploaded

Recently uploaded (20)

Intro biostat1&2