Bio-stats in Biomedical Research

Bio-statistics in Bio-medical
Research
Dr. Shinjan Patra
D.M. ( Endocrinology) Resident
All India Institute of Medical Sciences
Jodhpur

Introduction
• Collecting, analyzing, and interpreting data are
essential components of biomedical research
and require biostatistics.
• Doing various statistical tests has been made
easy by sophisticated computer software.

First learning with Bio-statistics

Evidence Based!!
Clinical
Expertise
Research
Evidence
Patient
Preferences
EBM
Are our patients
always able to judge
what is best for
them?
Our clinical experience
alone may not improve
our patient’s lives
A lot of research is
aimed at improving
the lives of patients!
4

Basic need for Bio-statistics
• To choose the right statistical test for the
computer to perform based on the nature of
data derived from one’s own research
• To understand if an analysis was performed
appropriately during review and interpretation
of others’ research

Drawing
Conclusions
Designing &
Implementing
TRUTH IN THE
UNIVERSE
Research
Question
TRUTH IN THE
STUDY
Study
Plan
FINDINGS IN
THE STUDY
Actual
Study
Infer Infer
Design Implement
Process of Research Project
6

Methods of Analysis
I. Descriptive Methods :
Tables
Diagrams
Charts
II. Inferential Methods :
 Estimation
 Point Estimation
Mean / proportion
etc.
 Interval Estimation
i.e. Confidence interval of
point estimate
 Hypotheses Testing
 Comparison between the
treatments
 Association
7

Overview
• Variables
• Distribution of Data
• Statistical Tests
• Hypothesis testing and Error
• Confounding/Bias
• Measures of Association
• Diagnostic Tests
• Regression analysis

Variables
• Any characteristic that can be observed,
measured, or categorized
• Divided into Two-
Categorical
Continuous

Categorical
• Not suitable for quantification; classified into
categories
Nominal- Named categories, with no implied
value- like- blood groups. Existential, no
inherent order or superiority. Nominal data
with only two groups are referred to as
dichotomous or binary (male or female)
Ordinal: Named with an order/ superiority like
grades of GO

Continuous
• can have an infinite number of possible values
Interval: Equal interval between values but no
meaningful zero point ( body temp)
Ratio: Equal intervals with a meaningful zero
point and all mathematical operations are
functional ( amount of secretions from a
diabetic wound)

13
Intervening Variable
• An intervening variable is on the causal pathway to the
outcome

Choice of Statistical test
• Parametric tests can be used with interval and
ratio data but not with nominal or ordinal
data
• Nonparametric tests can be used with any
type of variable, including nominal or ordinal
data

Central Tendency
• Estimates the “center” of the Distribution
 Mean- representative of all data points and is
the most efficient estimator of the middle of a
normal (Gaussian) distribution;
 However, inappropriate as a measure of
central tendency if data are skewed.
 Influenced by outlying values
 Commonly used for interval and ratio data.

Median- not influenced by outlying values
 more appropriate for skewed data
 commonly used for ordinal data.
Mode- particularly useful while describing
data distributed in a bimodal pattern when
mean and median are not appropriate.
commonly used with nominal data.

Normal Distribution
• Gaussian. Symmetric bell-shaped frequency
distribution, in which mean, median, and
mode all have the same value
• Appropriate statistical test would be a
parametric test such as a t-test or an analysis
of variance (ANOVA).
• Data are usually represented as mean (SD).

Skewed Distribution
• Positive/Right and Negative/Left
• Appropriate statistical test would be a
nonparametric test, such as Wilcoxon test or
Mann-Whitney test.
• Data are usually represented as median,
interquartile range (IQR)

Measures of Dispersion
• Range- difference between the highest and
the lowest values. Range can change
drastically when the study is repeated
• INTERQUARTILE RANGE (IQR) is the range
between the 25th and 75th percentiles or the
difference between the medians of the lower
half and upper half of the data and comprises
the middle 50% of the data

• VARIANCE is a measure of dispersion or average
deviation from the mean
• STANDARD DEVIATION (SD) is the square root of
variance and is the most common measure of
dispersion used for normally distributed data
• STANDARD ERROR OF MEAN (SEM) is calculated
by dividing the SD by the square root of n.

Statistical tests
• Parametric- assume the underlying population
to be normally distributed and are based on
means and SDs
• Non-parametric- No assumption of population
distribution

t Tests
• Student’s t test is a simple, commonly used
parametric test to compare two groups of
continuous variables
• Paired : Each patient/subject serves as his/her
own control before and after an intervention
• Unpaired: Two groups of patients/subjects are
compared with each other

Analysis of Variance ( ANOVA)
• One-way ANOVA is an extension of the two-
sample t test to three or more samples and
deals with statistical test on more than two
groups
• Other methods, such as planned or post hoc
comparisons, are conducted to examine
specific comparisons among individual means

Wilcoxon Rank-Sum test
• Mostly for Skewed distribution and Ordinal
data.
• Others- x2 test (Chi-squared test)- common
test used to compare categorical data. Data
first entered into a 2*2 contingency table
• If the numbers are small (expected value is
<5), an alternative test called Fisher’s exact
test.

Simplified form
• Nominal variable- Chi Square / Fischers
• Ordinal Variable- Wilcoxon/ Mann-Whitney U
• Interval/Ratio- Normal distribution- t test/ANOVA
• Interval/Ratio- Skewed distribution- Wilcoxon/
Mann-Whitney U

Hypothesis testing
• Null hypothesis refers to restating the
research hypothesis to one that proposes no
difference between groups being compared
• An alternative hypothesis proposes an
association- One-sided/two-sided

Errors
• TYPE I ERROR (false-positive, also known as a
rejection error) is rejection of a null
hypothesis that is actually true in the
population
• TYPE II ERROR (false-negative, also known as
an acceptance error) is failure to reject a null
hypothesis that is actually false

Interpretation of p Value
• p VALUE is the probability of the null
hypothesis being true by chance alone.
• It is also the probability of committing a type I
error.
• p value of 0.05 or less is commonly used to
denote significance
• This value informs the investigator that there
is at least a 95% chance that the two samples
represent different populations

• lower p value (<0.01) indicates a lower likelihood
(1%) that the null hypothesis may be true due to
chance alone.
• lower p value does not infer a higher strength of
association or clinical importance of an
association
• Factors that tend to decrease p value and
increase significance are increased sample size,
increased difference in control and experimental
means and less variance

Confidence interval
• Range of values that you expect to include the
actual mean of the true population
• Typically 95% confidence intervals are used in
research
• Values at either extreme of this range are
called confidence limits

Sample Size and C.I
• The larger the sample size, the narrower the
C.I .
• p value and C.I together provide the best
information about the role of chance
• Sample size is an important determinant of
the power of the study to detect significant
differences

Confounding
• Distortion in a measure of effect that may arise
because we fail to control for other variables that
are previously known risk factors for the health
outcome being studied
• Can lead to the observation of apparent
differences between the study groups when they
do not truly exist, or conversely, the observation
of no difference when they do exist.
38

Confounding variable
• Independent risk factor (cause) of outcome
• Unevenly distributed among exposed and
unexposed
• Not on the causal pathway between exposure
and outcome
39

THE DIFFERENCE BETWEEN BIAS
AND CONFOUNDING
 Bias creates an association that is not true,
 Confounding describes an association that is true,
but potentially misleading.
40

Example
• 2 groups of bottle-fed babies and breast-milk
fed babies compared for Gastro-enteritis ( GE)
• Bias- Finding less GE in bottle-fed babies due
to less follow-up to the doctors
• Confounding- Better-hygiene and less
crowding can prevent GE in any of the groups;
but that doesn’t correspond to true protective
effects of breast-milk

Least square mean changes
• When there is study imbalance such that
some blocks of experimental units are under-
represented relative to other block
• Least square means (estimated population
marginal means) provide an opportunity to
obtain an unbiased estimate of averages in
the face of this kind of study imbalance

• Absolute risk = a/(a + b) or c/ ( c+ d)
• Relative risk/ Risk Ratio = (a/[a + b]) / (c/[c +
d])
• Hazard ratio is a measure of relative risk over
time in circumstances in which we are
interested not only in the total number of
events, but in their timing as well.

Key points
• Sensitivity and specificity are prevalence
independent.
• Increased prevalence of disease will increase PPV
• Reduced disease prevalence increase NPV
• Increasing the threshold for a positive test
reduces false-positives and increases specificity
• Reducing the threshold or cutoff value for a
positive test reduces false-negatives and
increases sensitivity

Gold standard
• Unambiguous method of determining
whether or not a patient has a particular
disease or outcome

Regression Analysis
• Simple Regression- Many relationships
between variables can be fit to a straight line.
Like duration of treatment
• Logistics Regression- In situations in which the
response of interest is dichotomous (binary)
rather than continuous. Like outcome of death
or alive

Summary
• Bio-statistics is integral part of understanding
any bio-medical research and Evidence-based
medicines.
• The concept of various variable and
understanding the various statistical tests are
equally important
• Sample size and the confidence limit concepts
hold the key

CAS-3 & CAS-4
• CAS-3- PPV- 61%
• NPV- 60%
• Sensitivity- 64%
• Specificity- 57%
• CAS-4- PPV- 80%
• NPV- 64%
• Sensitivity- 55%
• Specificity- 86%
• M. Ph. Mourits et al.

Bio-stats in Biomedical Research

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Bio-stats in Biomedical Research

Similar to Bio-stats in Biomedical Research (20)

More from Shinjan Patra

More from Shinjan Patra (20)

Recently uploaded

Recently uploaded (20)

Bio-stats in Biomedical Research