3. Analysis and interpretation of
data??
Data Analytics: DOE, AQbD
Biostatics: Statistical
Bioinformatics: Computational tools
Schrödinger, AutoDock, MOE
4. Biostatistician
Role: Applying statistical methods and
mathematical principles to analyze and
evaluate research data in health sciences, then
providing explanations and recommendations
as a result of research.
They present their findings to policymakers,
health officials, and other government officials or
CEOs to assist them in making public health
decisions.
Ex: Analyze the effectiveness of new drugs and
forecast health outcomes (Stable, Effective,
ADME)
Using quantitative skills and statistical tools, they evaluate the data, analyze
trends, and predict outcomes.
Positions: Universities, hospitals, government organizations, research
facilities, instrumentation firms and pharmaceutical companies.
Salaries: 30,000K to 1lac,
5. How to become ?????
bachelor and Master's degree
SAS programming language
Skills????
Analytical, scientific, Statistical, mathematical, computer,
management, communication
Software
SAS
Minitab Statistical Software
OriginPro
Prism
Python
R
Winonlin
6. “Statistics is the science which
deals with collection, classification
and tabulation of numerical facts as
the basis for explanation,
description and comparison of
phenomenon”.
7. “BIOSTATISICS”
The methods used in dealing with
statistics in the fields of medicine,
biology and public health for planning,
conducting and analyzing data which
arise in investigations of these branches.
10. How to use bio-statics
Interest on research
Generate hypothesis
Design protocol
Collection data/Litt survey
Analyze the data
Descriptive statics
Statistical inference
11. Descriptive statistics helps to describe and organize known data using charts, bar
graphs/patterns of group
Inferential statistics aims at making inferences and generalizations about the
population data.
Central tendency: central position of the data
Dispersion: spread out the data
15. It is like a scientific guess to learn new things.
A scientific hypothesis that has been verified
through scientific experiment and research
It’s an idea or prediction that scientists predict
before they do experiments.
They use it to guess what might happen and
then test it to see if they are right or wrong.
Hypothesis
16. Hypothesis: Prediction
Hypothesis testing, sometimes called significance testing, is
an act in statistics whereby an analyst tests an assumption
regarding a population parameter. The methodology
employed by the analyst depends on the nature of the data
used and the reason for the analysis.
Null hypothesis-H0 : no significant or statistical relationship
Alternative hypothesis-H1 : significant or statistical relationship
significant or statistical relationships
Claim/Assumption on population or sample
17. Example: Paracetamol
Sample/Population: Tablet, Syrup
Measurement variables: Assay, Stability
1 Sample Vs 1 Variable (Tablet Vs Assay)
1 Sample Vs 2 Variables (Tablet Vs Assay, Stability)
2 samples Vs 1 Variable (Tab, Syrup Vs Assay)
2 samples Vs 2 Variables (Tab, Syrup Vs Assay, stability)
21. Sources of Medical
Uncertainties
1. Intrinsic due to biological,
environmental and sampling factors
2. Natural variation among methods,
observers, instruments etc.
3. Errors in measurement or assessment
or errors in knowledge
4. Incomplete knowledge
22. Intrinsic variation as a
source of medical
uncertainties
Biological due to age, gender, heredity, parity, height,
weight, etc. Also due to variation in anatomical,
physiological and biochemical parameters
Environmental due to nutrition, smoking, pollution,
facilities of water and sanitation, road traffic, legislation,
stress and strains etc.,
Sampling fluctuations because the entire world cannot
be studied and at least future cases can never be
included
Chance variation due to unknown or complex to
comprehend factors
23. Natural variation despite
best care as a source of
uncertainties
In assessment of any medical parameter
Due to partial compliance by the patients
Due to incomplete information in
conditions such as the patient in coma
24. Medical Errors that cause
Uncertainties
Carelessness of the providers such as physicians,
surgeons, nursing staff, radiographers and
pharmacists.
Errors in methods such as in using incorrect quantity or
quality of chemicals and reagents, misinterpretation of
ECG, using inappropriate diagnostic tools,
misrecording of information etc.
Instrument error due to use of non-standardized or
faulty instrument and improper use of a right
instrument.
Not collecting full information
Inconsistent response by the patients or other subjects
under evaluation
25. Incomplete knowledge as a
source of Uncertainties
Diagnostic, therapeutic and prognostic
uncertainties due to lack of knowledge
Predictive uncertainties such as in
survival duration of a patient of cancer
Other uncertainties such as how to
measure positive health
27. Reasons to know about
biostatistics:
Medicine is becoming increasingly
quantitative.
The planning, conduct and interpretation
of much of medical research are
becoming increasingly reliant on the
statistical methodology.
Statistics pervades the medical literature.
28. CLINICAL MEDICINE
Documentation of medical history of
diseases.
Planning and conduct of clinical studies.
Evaluating the merits of different
procedures.
In providing methods for definition of
“normal” and “abnormal”.
29. Role of Biostatistics in
patient care
In increasing awareness regarding diagnostic,
therapeutic and prognostic uncertainties and
providing rules of probability to delineate those
uncertainties
In providing methods to integrate chances with value
judgments that could be most beneficial to patient
In providing methods such as sensitivity-specificity
and predictivities that help choose valid tests for
patient assessment
In providing tools such as scoring system and expert
system that can help reduce epistemic uncertainties
30. PREVENTIVE MEDICINE
To provide the magnitude of any health
problem in the community.
To find out the basic factors underlying
the ill-health.
To evaluate the health programs which
was introduced in the community
(success/failure).
To introduce and promote health
legislation.
31. Role of Biostatics in Health
Planning and Evaluation
In carrying out a valid and reliable health
situation analysis, including in proper
summarization and interpretation of data.
In proper evaluation of the achievements
and failures of a health programme
32. Role of Biostatistics in
Medical Research
In developing a research design that can
minimize the impact of uncertainties
In assessing reliability and validity of
tools and instruments to collect the
infromation
In proper analysis of data
33. Example: Evaluation of Penicillin (treatment
A) vs Penicillin & Chloramphenicol
(treatment B) for treating bacterial
pneumonia in children< 2 yrs.
What is the sample size needed to demonstrate the significance
of one group against other ?
Is treatment A is better than treatment B or vice versa ?
If so, how much better ?
What is the normal variation in clinical measurement ? (mild,
moderate & severe) ?
How reliable and valid is the measurement ? (clinical &
radiological) ?
What is the magnitude and effect of laboratory and technical
error ?
How does one interpret abnormal values ?
34. WHAT DOES STAISTICS
COVER ?
Planning
Design
Execution (Data collection)
Data Processing
Data analysis
Presentation
Interpretation
Publication
35. BASIC CONCEPTS
Data : Set of values of one or more variables recorded
on one or more observational units
Categories of data
1. Primary data: observation, questionnaire, record form,
interviews, survey,
2. Secondary data: census, medical record,registry
Sources of data 1. Routinely kept records
2. Surveys (census)
3. Experiments
4. External source
36. TYPES OF DATA
QUALITATIVE DATA
DISCRETE QUANTITATIVE
CONTINOUS QUANTITATIVE
39. QUANTITATIVE (DISCRETE)
Example: The no. of family members
The no. of heart beats
The no. of admissions in a day
QUANTITATIVE (CONTINOUS)
Example: Height, Weight, Age, BP, Serum
Cholesterol and BMI
40. Discrete data -- Gaps between possible values
Continuous data -- Theoretically,
no gaps between possible values
Number of Children
Hb
42. hospital length of stay Number Percent
1 – 3 days 5891 43.3
4 – 7 days 3489 25.6
2 weeks 2449 18.0
3 weeks 813 6.0
1 month 417 3.1
More than 1 month 545 4.0
Total 14604 100.0
Mean = 7.85 SE = 0.10
Table 1 Distribution of blunt injured patients
according to hospital length of stay
44. Scale of measurement
Quantitative variable:
A numerical variable: discrete; continuous
Interval scale :
Data is placed in meaningful intervals and order. The unit of
measurement are arbitrary.
- Temperature (37º C -- 36º C; 38º C-- 37º C are equal) and
No implication of ratio (30º C is not twice as hot as 15º C)
45. Ratio scale:
Data is presented in frequency distribution in
logical order. A meaningful ratio exists.
- Age, weight, height, pulse rate
- pulse rate of 120 is twice as fast as 60
- person with weight of 80kg is twice as heavy
as the one with weight of 40 kg.
46. Scales of Measure
Nominal – qualitative classification of
equal value: gender, race, color, city
Ordinal - qualitative classification
which can be rank ordered:
socioeconomic status of families
Interval - Numerical or quantitative
data: can be rank ordered and sizes
compared : temperature
Ratio - Quantitative interval data along
with ratio: time, age.
47. CLINIMETRICS
A science called clinimetrics in which
qualities are converted to meaningful
quantities by using the scoring system.
Examples: (1) Apgar score based on
appearance, pulse, grimace, activity and
respiration is used for neonatal prognosis.
(2) Smoking Index: no. of cigarettes, duration,
filter or not, whether pipe, cigar etc.,
(3) APACHE( Acute Physiology and Chronic
Health Evaluation) score: to quantify the
severity of condition of a patient
52. Frequency Distributions
data distribution – pattern of
variability.
the center of a distribution
the ranges
the shapes
simple frequency distributions
grouped frequency distributions
midpoint
53. Patien
t No
Hb
(g/dl)
Patien
t No
Hb
(g/dl)
Patien
t No
Hb
(g/dl)
1 12.0 11 11.2 21 14.9
2 11.9 12 13.6 22 12.2
3 11.5 13 10.8 23 12.2
4 14.2 14 12.3 24 11.4
5 12.3 15 12.3 25 10.7
6 13.0 16 15.7 26 12.5
7 10.5 17 12.6 27 11.8
8 12.8 18 9.1 28 15.1
9 13.2 19 12.9 29 13.4
10 11.2 20 14.6 30 13.1
Tabulate the hemoglobin values of 30 adult
male patients listed below
54. Steps for making a
table
Step1 Find Minimum (9.1) & Maximum (15.7)
Step2 Calculate difference 15.7 – 9.1 = 6.6
Step3 Decide the number and width of
the classes (7 c.l) 9.0 -9.9, 10.0-10.9,----
Step4 Prepare dummy table –
Hb (g/dl), Tally mark, No. patients
56. Hb (g/dl) No. of
patients
9.0 – 9.9
10.0 – 10.9
11.0 – 11.9
12.0 – 12.9
13.0 – 13.9
14.0 – 14.9
15.0 – 15.9
1
3
6
10
5
3
2
Total 30
Table Frequency distribution of 30 adult male
patients by Hb
57. Table Frequency distribution of adult patients by
Hb and gender:
Hb
(g/dl)
Gender Total
Male Female
<9.0
9.0 – 9.9
10.0 – 10.9
11.0 – 11.9
12.0 – 12.9
13.0 – 13.9
14.0 – 14.9
15.0 – 15.9
0
1
3
6
10
5
3
2
2
3
5
8
6
4
2
0
2
4
8
14
16
9
5
2
Total 30 30 60
58. Elements of a Table
Ideal table should have Number
Title
Column headings
Foot-notes
Number – Table number for identification in a report
Title,place - Describe the body of the table, variables,
Time period (What, how classified, where and when)
Column - Variable name, No. , Percentages (%), etc.,
Heading
Foot-note(s) - to describe some column/row headings,
special cells, source, etc.,
59. Death rate (/1000 per annum)
No. of divisions
7.0-7.9 4 (3.3)
8.0 - 8.9 13 (10.8)
9.0 - 9.9 20 (16.7)
10.0 - 10.9 27 (22.5)
11.0 - 11.9 18 (15.0)
12.0 - 12.9 11 (0.2)
13.0 - 13.9 11 (9.2)
14.0 - 14.9 6 (5.0)
15.0 - 15.9 2 (1.7)
16.0 - 16.9 4 (3.3)
17.0 - 18.9 3 (2.5)
19.0 + 1 (0.8)
Total 120 (100.0)
Table II. Distribution of 120 (Madras) Corporation divisions
according to annual death rate based on registered deaths in
1975 and 1976
Figures in parentheses indicate percentages
60. DIAGRAMS/GRAPHS
Discrete data
--- Bar charts (one or two groups)
Continuous data
--- Histogram
--- Frequency polygon (curve)
--- Stem-and –leaf plot
--- Box-and-whisker plot
67. Descriptive statistics report:
Boxplot
- minimum score
- maximum score
- lower quartile
- upper quartile
- median
- mean
- the skew of the distribution:
positive skew: mean > median & high-score whisker is longer
negative skew: mean < median & low-score whisker is longer
68. 10%
20%
70%
Mild
Moderate
Severe
The prevalence of different degree of
Hypertension
in the population
Pie Chart
•Circular diagram – total -100%
•Divided into segments each
representing a category
•Decide adjacent category
•The amount for each category is
proportional to slice of the pie
69. Bar Graphs
9
12
20
16
12
8
20
0
5
10
15
20
25
Smo Alc Chol DM HTN No
Exer
F-H
Riskfactor
Number
The distribution of risk factor among cases with
Cardio vascular Diseases
Heights of the bar indicates
frequency
Frequency in the Y axis
and categories of variable
in the X axis
The bars should be of equal
width and no touching the
other bars
70. HIV cases enrolment in
USA by gender
0
2
4
6
8
10
12
1986 1987 1988 1989 1990 1991 1992
Year
Enrollment
(hundred)
Men
Women
Bar chart
71. HIV cases Enrollment
in USA by gender
0
2
4
6
8
10
12
14
16
18
1986 1987 1988 1989 1990 1991 1992
Year
Enrollment
(Thousands)
Women
Men
Stocked bar chart
72. Graphic Presentation of
Data
the histogram
(quantitative data)
the bar graph
(qualitative data)
the frequency polygon
(quantitative data)
73.
74. General rules for designing
graphs
A graph should have a self-explanatory
legend
A graph should help reader to understand
data
Axis labeled, units of measurement
indicated
Scales important. Start with zero (otherwise
// break)
Avoid graphs with three-dimensional
impression, it may be misleading (reader
visualize less easily
76. Origin and development of
statistics in Medical Research
In 1929 a huge paper on application of
statistics was published in Physiology
Journal by Dunn.
In 1937, 15 articles on statistical methods
by Austin Bradford Hill, were published in
book form.
In 1948, a RCT of Streptomycin for
pulmonary tb., was published in which
Bradford Hill has a key influence.
Then the growth of Statistics in Medicine
from 1952 was a 8-fold increase by 1982.