2. This session covers:
Background and need to know
Biostatistics
Origin and development of Biostatistics
Definition of Statistics and Biostatistics
Types of data
Graphical representation of a data
Frequency distribution of a data
3. “Statistics is the science which deals
with collection, classification and
tabulation of numerical facts as the
basis for explanation, description
and comparison of phenomenon”.
------ Lovitt
4. “BIOSTATISICS”
(1) Statistics arising out of biological
sciences, particularly from the fields of
Medicine and public health.
(2) The methods used in dealing with
statistics in the fields of medicine, biology
and public health for planning,
conducting and analyzing data which
arise in investigations of these branches.
5. Origin and development of
statistics in Medical Research
In 1929 a huge paper on application of
statistics was published in Physiology
Journal by Dunn.
In 1937, 15 articles on statistical methods
by Austin Bradford Hill, were published in
book form.
In 1948, a RCT of Streptomycin for
pulmonary tb., was published in which
Bradford Hill has a key influence.
Then the growth of Statistics in Medicine
from 1952 was a 8-fold increase by 1982.
8. Sources of Medical
Uncertainties
1. Intrinsic due to biological,
environmental and sampling factors
2. Natural variation among methods,
observers, instruments etc.
3. Errors in measurement or assessment
or errors in knowledge
4. Incomplete knowledge
9. Intrinsic variation as a
source of medical
uncertainties
Biological due to age, gender, heredity, parity, height,
weight, etc. Also due to variation in anatomical,
physiological and biochemical parameters
Environmental due to nutrition, smoking, pollution,
facilities of water and sanitation, road traffic, legislation,
stress and strains etc.,
Sampling fluctuations because the entire world cannot
be studied and at least future cases can never be
included
Chance variation due to unknown or complex to
comprehend factors
10. Natural variation despite
best care as a source of
uncertainties
In assessment of any medical parameter
Due to partial compliance by the patients
Due to incomplete information in
conditions such as the patient in coma
11. Medical Errors that cause
Uncertainties
Carelessness of the providers such as physicians,
surgeons, nursing staff, radiographers and
pharmacists.
Errors in methods such as in using incorrect quantity or
quality of chemicals and reagents, misinterpretation of
ECG, using inappropriate diagnostic tools,
misrecording of information etc.
Instrument error due to use of non-standardized or
faulty instrument and improper use of a right
instrument.
Not collecting full information
Inconsistent response by the patients or other subjects
under evaluation
12. Incomplete knowledge as a
source of Uncertainties
Diagnostic, therapeutic and prognostic
uncertainties due to lack of knowledge
Predictive uncertainties such as in
survival duration of a patient of cancer
Other uncertainties such as how to
measure positive health
14. Reasons to know about
biostatistics:
Medicine is becoming increasingly
quantitative.
The planning, conduct and interpretation
of much of medical research are
becoming increasingly reliant on the
statistical methodology.
Statistics pervades the medical literature.
15. CLINICAL MEDICINE
Documentation of medical history of
diseases.
Planning and conduct of clinical studies.
Evaluating the merits of different
procedures.
In providing methods for definition of
“normal” and “abnormal”.
16. Role of Biostatistics in
patient care
In increasing awareness regarding diagnostic,
therapeutic and prognostic uncertainties and
providing rules of probability to delineate those
uncertainties
In providing methods to integrate chances with value
judgments that could be most beneficial to patient
In providing methods such as sensitivity-specificity
and predictivities that help choose valid tests for
patient assessment
In providing tools such as scoring system and expert
system that can help reduce epistemic uncertainties
17. PREVENTIVE MEDICINE
To provide the magnitude of any health
problem in the community.
To find out the basic factors underlying
the ill-health.
To evaluate the health programs which
was introduced in the community
(success/failure).
To introduce and promote health
legislation.
18. Role of Biostatics in Health
Planning and Evaluation
In carrying out a valid and reliable health
situation analysis, including in proper
summarization and interpretation of data.
In proper evaluation of the achievements
and failures of a health programme
19. Role of Biostatistics in
Medical Research
In developing a research design that can
minimize the impact of uncertainties
In assessing reliability and validity of
tools and instruments to collect the
infromation
In proper analysis of data
20. Example: Evaluation of Penicillin (treatment
A) vs Penicillin & Chloramphenicol
(treatment B) for treating bacterial
pneumonia in children< 2 yrs.
What is the sample size needed to demonstrate the significance
of one group against other ?
Is treatment A is better than treatment B or vice versa ?
If so, how much better ?
What is the normal variation in clinical measurement ? (mild,
moderate & severe) ?
How reliable and valid is the measurement ? (clinical &
radiological) ?
What is the magnitude and effect of laboratory and technical
error ?
How does one interpret abnormal values ?
21. WHAT DOES STAISTICS
COVER ?
Planning
Design
Execution (Data collection)
Data Processing
Data analysis
Presentation
Interpretation
Publication
22. BASIC CONCEPTS
Data : Set of values of one or more variables recorded
on one or more observational units
Categories of data
1. Primary data: observation, questionnaire, record form,
interviews, survey,
2. Secondary data: census, medical record,registry
Sources of data 1. Routinely kept records
2. Surveys (census)
3. Experiments
4. External source
23. TYPES OF DATA
QUALITATIVE DATA
DISCRETE QUANTITATIVE
CONTINOUS QUANTITATIVE
26. QUANTITATIVE (DISCRETE)
Example: The no. of family members
The no. of heart beats
The no. of admissions in a day
QUANTITATIVE (CONTINOUS)
Example: Height, Weight, Age, BP, Serum
Cholesterol and BMI
27. Discrete data -- Gaps between possible values
Continuous data -- Theoretically,
no gaps between possible values
Number of Children
Hb
29. hospital length of stay Number Percent
1 – 3 days 5891 43.3
4 – 7 days 3489 25.6
2 weeks 2449 18.0
3 weeks 813 6.0
1 month 417 3.1
More than 1 month 545 4.0
Total 14604 100.0
Mean = 7.85 SE = 0.10
Table 1 Distribution of blunt injured patients
according to hospital length of stay
31. Scale of measurement
Quantitative variable:
A numerical variable: discrete; continuous
Interval scale :
Data is placed in meaningful intervals and order. The unit of
measurement are arbitrary.
- Temperature (37º C -- 36º C; 38º C-- 37º C are equal) and
No implication of ratio (30º C is not twice as hot as 15º C)
32. Ratio scale:
Data is presented in frequency distribution in
logical order. A meaningful ratio exists.
- Age, weight, height, pulse rate
- pulse rate of 120 is twice as fast as 60
- person with weight of 80kg is twice as heavy
as the one with weight of 40 kg.
33. Scales of Measure
Nominal – qualitative classification of
equal value: gender, race, color, city
Ordinal - qualitative classification
which can be rank ordered:
socioeconomic status of families
Interval - Numerical or quantitative
data: can be rank ordered and sizes
compared : temperature
Ratio - Quantitative interval data along
with ratio: time, age.
34. CLINIMETRICS
A science called clinimetrics in which
qualities are converted to meaningful
quantities by using the scoring system.
Examples: (1) Apgar score based on
appearance, pulse, grimace, activity and
respiration is used for neonatal prognosis.
(2) Smoking Index: no. of cigarettes, duration,
filter or not, whether pipe, cigar etc.,
(3) APACHE( Acute Physiology and Chronic
Health Evaluation) score: to quantify the
severity of condition of a patient
39. Frequency Distributions
data distribution – pattern of
variability.
the center of a distribution
the ranges
the shapes
simple frequency distributions
grouped frequency distributions
midpoint
40. Patien
t No
Hb
(g/dl)
Patien
t No
Hb
(g/dl)
Patien
t No
Hb
(g/dl)
1 12.0 11 11.2 21 14.9
2 11.9 12 13.6 22 12.2
3 11.5 13 10.8 23 12.2
4 14.2 14 12.3 24 11.4
5 12.3 15 12.3 25 10.7
6 13.0 16 15.7 26 12.5
7 10.5 17 12.6 27 11.8
8 12.8 18 9.1 28 15.1
9 13.2 19 12.9 29 13.4
10 11.2 20 14.6 30 13.1
Tabulate the hemoglobin values of 30 adult
male patients listed below
41. Steps for making a
table
Step1 Find Minimum (9.1) & Maximum (15.7)
Step2 Calculate difference 15.7 – 9.1 = 6.6
Step3 Decide the number and width of
the classes (7 c.l) 9.0 -9.9, 10.0-10.9,----
Step4 Prepare dummy table –
Hb (g/dl), Tally mark, No. patients
43. Hb (g/dl) No. of
patients
9.0 – 9.9
10.0 – 10.9
11.0 – 11.9
12.0 – 12.9
13.0 – 13.9
14.0 – 14.9
15.0 – 15.9
1
3
6
10
5
3
2
Total 30
Table Frequency distribution of 30 adult male
patients by Hb
44. Table Frequency distribution of adult patients by
Hb and gender:
Hb
(g/dl)
Gender Total
Male Female
<9.0
9.0 – 9.9
10.0 – 10.9
11.0 – 11.9
12.0 – 12.9
13.0 – 13.9
14.0 – 14.9
15.0 – 15.9
0
1
3
6
10
5
3
2
2
3
5
8
6
4
2
0
2
4
8
14
16
9
5
2
Total 30 30 60
45. Elements of a Table
Ideal table should have Number
Title
Column headings
Foot-notes
Number – Table number for identification in a report
Title,place - Describe the body of the table, variables,
Time period (What, how classified, where and when)
Column - Variable name, No. , Percentages (%), etc.,
Heading
Foot-note(s) - to describe some column/row headings,
special cells, source, etc.,
46. Death rate (/1000 per annum)
No. of divisions
7.0-7.9 4 (3.3)
8.0 - 8.9 13 (10.8)
9.0 - 9.9 20 (16.7)
10.0 - 10.9 27 (22.5)
11.0 - 11.9 18 (15.0)
12.0 - 12.9 11 (0.2)
13.0 - 13.9 11 (9.2)
14.0 - 14.9 6 (5.0)
15.0 - 15.9 2 (1.7)
16.0 - 16.9 4 (3.3)
17.0 - 18.9 3 (2.5)
19.0 + 1 (0.8)
Total 120 (100.0)
Table II. Distribution of 120 (Madras) Corporation divisions
according to annual death rate based on registered deaths in
1975 and 1976
Figures in parentheses indicate percentages
47. DIAGRAMS/GRAPHS
Discrete data
--- Bar charts (one or two groups)
Continuous data
--- Histogram
--- Frequency polygon (curve)
--- Stem-and –leaf plot
--- Box-and-whisker plot
54. Descriptive statistics report:
Boxplot
- minimum score
- maximum score
- lower quartile
- upper quartile
- median
- mean
- the skew of the distribution:
positive skew: mean > median & high-score whisker is longer
negative skew: mean < median & low-score whisker is longer
55. 10%
20%
70%
Mild
Moderate
Severe
The prevalence of different degree of
Hypertension
in the population
Pie Chart
•Circular diagram – total -100%
•Divided into segments each
representing a category
•Decide adjacent category
•The amount for each category is
proportional to slice of the pie
56. Bar Graphs
9
12
20
16
12
8
20
0
5
10
15
20
25
Smo Alc Chol DM HTN No
Exer
F-H
Riskfactor
Number
The distribution of risk factor among cases with
Cardio vascular Diseases
Heights of the bar indicates
frequency
Frequency in the Y axis
and categories of variable
in the X axis
The bars should be of equal
width and no touching the
other bars
57. HIV cases enrolment in
USA by gender
0
2
4
6
8
10
12
1986 1987 1988 1989 1990 1991 1992
Year
Enrollment
(hundred)
Men
Women
Bar chart
58. HIV cases Enrollment
in USA by gender
0
2
4
6
8
10
12
14
16
18
1986 1987 1988 1989 1990 1991 1992
Year
Enrollment
(Thousands)
Women
Men
Stocked bar chart
59. Graphic Presentation of
Data
the histogram
(quantitative data)
the bar graph
(qualitative data)
the frequency polygon
(quantitative data)
60.
61. General rules for designing
graphs
A graph should have a self-explanatory
legend
A graph should help reader to understand
data
Axis labeled, units of measurement
indicated
Scales important. Start with zero (otherwise
// break)
Avoid graphs with three-dimensional
impression, it may be misleading (reader
visualize less easily
63. Origin and development of
statistics in Medical Research
In 1929 a huge paper on application of
statistics was published in Physiology
Journal by Dunn.
In 1937, 15 articles on statistical methods
by Austin Bradford Hill, were published in
book form.
In 1948, a RCT of Streptomycin for
pulmonary tb., was published in which
Bradford Hill has a key influence.
Then the growth of Statistics in Medicine
from 1952 was a 8-fold increase by 1982.