This document discusses key concepts in biostatistics used in biomedical research. It covers topics like types of variables, measures of central tendency and dispersion, distributions of data, statistical tests for different situations, hypotheses testing and errors, measures of association, diagnostic tests, and regression analysis. Understanding biostatistics is important for evidence-based medicine and improving patient lives through rigorous research. Sample size, confidence intervals, and avoiding bias and confounding are important considerations in study design and interpretation.
2. Introduction
⢠Collecting, analyzing, and interpreting data are
essential components of biomedical research
and require biostatistics.
⢠Doing various statistical tests has been made
easy by sophisticated computer software.
5. Basic need for Bio-statistics
⢠To choose the right statistical test for the
computer to perform based on the nature of
data derived from oneâs own research
⢠To understand if an analysis was performed
appropriately during review and interpretation
of othersâ research
7. Methods of Analysis
I. Descriptive Methods :
Tables
Diagrams
Charts
II. Inferential Methods :
ďŽ Estimation
ďŽ Point Estimation
Mean / proportion
etc.
ďŽ Interval Estimation
i.e. Confidence interval of
point estimate
ďŽ Hypotheses Testing
ďŽ Comparison between the
treatments
ďŽ Association
7
8. Overview
⢠Variables
⢠Distribution of Data
⢠Statistical Tests
⢠Hypothesis testing and Error
⢠Confounding/Bias
⢠Measures of Association
⢠Diagnostic Tests
⢠Regression analysis
9. Variables
⢠Any characteristic that can be observed,
measured, or categorized
⢠Divided into Two-
ďCategorical
ďContinuous
10. Categorical
⢠Not suitable for quantification; classified into
categories
ďNominal- Named categories, with no implied
value- like- blood groups. Existential, no
inherent order or superiority. Nominal data
with only two groups are referred to as
dichotomous or binary (male or female)
ďOrdinal: Named with an order/ superiority like
grades of GO
11. Continuous
⢠can have an infinite number of possible values
ďInterval: Equal interval between values but no
meaningful zero point ( body temp)
ďRatio: Equal intervals with a meaningful zero
point and all mathematical operations are
functional ( amount of secretions from a
diabetic wound)
14. Choice of Statistical test
⢠Parametric tests can be used with interval and
ratio data but not with nominal or ordinal
data
⢠Nonparametric tests can be used with any
type of variable, including nominal or ordinal
data
16. Central Tendency
⢠Estimates the âcenterâ of the Distribution
ď Mean- representative of all data points and is
the most efficient estimator of the middle of a
normal (Gaussian) distribution;
ď§ However, inappropriate as a measure of
central tendency if data are skewed.
ď§ Influenced by outlying values
ď§ Commonly used for interval and ratio data.
17. ďMedian- not influenced by outlying values
ď§ more appropriate for skewed data
ď§ commonly used for ordinal data.
ďMode- particularly useful while describing
data distributed in a bimodal pattern when
mean and median are not appropriate.
commonly used with nominal data.
18. Normal Distribution
⢠Gaussian. Symmetric bell-shaped frequency
distribution, in which mean, median, and
mode all have the same value
⢠Appropriate statistical test would be a
parametric test such as a t-test or an analysis
of variance (ANOVA).
⢠Data are usually represented as mean (SD).
19.
20. Skewed Distribution
⢠Positive/Right and Negative/Left
⢠Appropriate statistical test would be a
nonparametric test, such as Wilcoxon test or
Mann-Whitney test.
⢠Data are usually represented as median,
interquartile range (IQR)
21.
22. Measures of Dispersion
⢠Range- difference between the highest and
the lowest values. Range can change
drastically when the study is repeated
⢠INTERQUARTILE RANGE (IQR) is the range
between the 25th and 75th percentiles or the
difference between the medians of the lower
half and upper half of the data and comprises
the middle 50% of the data
23. ⢠VARIANCE is a measure of dispersion or average
deviation from the mean
⢠STANDARD DEVIATION (SD) is the square root of
variance and is the most common measure of
dispersion used for normally distributed data
⢠STANDARD ERROR OF MEAN (SEM) is calculated
by dividing the SD by the square root of n.
24. Statistical tests
⢠Parametric- assume the underlying population
to be normally distributed and are based on
means and SDs
⢠Non-parametric- No assumption of population
distribution
25. t Tests
⢠Studentâs t test is a simple, commonly used
parametric test to compare two groups of
continuous variables
⢠Paired : Each patient/subject serves as his/her
own control before and after an intervention
⢠Unpaired: Two groups of patients/subjects are
compared with each other
26. Analysis of Variance ( ANOVA)
⢠One-way ANOVA is an extension of the two-
sample t test to three or more samples and
deals with statistical test on more than two
groups
⢠Other methods, such as planned or post hoc
comparisons, are conducted to examine
specific comparisons among individual means
27. Wilcoxon Rank-Sum test
⢠Mostly for Skewed distribution and Ordinal
data.
⢠Others- x2 test (Chi-squared test)- common
test used to compare categorical data. Data
first entered into a 2*2 contingency table
⢠If the numbers are small (expected value is
<5), an alternative test called Fisherâs exact
test.
28.
29. Simplified form
⢠Nominal variable- Chi Square / Fischers
⢠Ordinal Variable- Wilcoxon/ Mann-Whitney U
⢠Interval/Ratio- Normal distribution- t test/ANOVA
⢠Interval/Ratio- Skewed distribution- Wilcoxon/
Mann-Whitney U
30. Hypothesis testing
⢠Null hypothesis refers to restating the
research hypothesis to one that proposes no
difference between groups being compared
⢠An alternative hypothesis proposes an
association- One-sided/two-sided
31. Errors
⢠TYPE I ERROR (false-positive, also known as a
rejection error) is rejection of a null
hypothesis that is actually true in the
population
⢠TYPE II ERROR (false-negative, also known as
an acceptance error) is failure to reject a null
hypothesis that is actually false
32.
33. Interpretation of p Value
⢠p VALUE is the probability of the null
hypothesis being true by chance alone.
⢠It is also the probability of committing a type I
error.
⢠p value of 0.05 or less is commonly used to
denote significance
⢠This value informs the investigator that there
is at least a 95% chance that the two samples
represent different populations
34. ⢠lower p value (<0.01) indicates a lower likelihood
(1%) that the null hypothesis may be true due to
chance alone.
⢠lower p value does not infer a higher strength of
association or clinical importance of an
association
⢠Factors that tend to decrease p value and
increase significance are increased sample size,
increased difference in control and experimental
means and less variance
35.
36. Confidence interval
⢠Range of values that you expect to include the
actual mean of the true population
⢠Typically 95% confidence intervals are used in
research
⢠Values at either extreme of this range are
called confidence limits
37. Sample Size and C.I
⢠The larger the sample size, the narrower the
C.I .
⢠p value and C.I together provide the best
information about the role of chance
⢠Sample size is an important determinant of
the power of the study to detect significant
differences
38. Confounding
⢠Distortion in a measure of effect that may arise
because we fail to control for other variables that
are previously known risk factors for the health
outcome being studied
⢠Can lead to the observation of apparent
differences between the study groups when they
do not truly exist, or conversely, the observation
of no difference when they do exist.
38
39. Confounding variable
⢠Independent risk factor (cause) of outcome
⢠Unevenly distributed among exposed and
unexposed
⢠Not on the causal pathway between exposure
and outcome
39
40. THE DIFFERENCE BETWEEN BIAS
AND CONFOUNDING
ď§ Bias creates an association that is not true,
ď§ Confounding describes an association that is true,
but potentially misleading.
40
41. Example
⢠2 groups of bottle-fed babies and breast-milk
fed babies compared for Gastro-enteritis ( GE)
⢠Bias- Finding less GE in bottle-fed babies due
to less follow-up to the doctors
⢠Confounding- Better-hygiene and less
crowding can prevent GE in any of the groups;
but that doesnât correspond to true protective
effects of breast-milk
42. Least square mean changes
⢠When there is study imbalance such that
some blocks of experimental units are under-
represented relative to other block
⢠Least square means (estimated population
marginal means) provide an opportunity to
obtain an unbiased estimate of averages in
the face of this kind of study imbalance
45. ⢠Absolute risk = a/(a + b) or c/ ( c+ d)
⢠Relative risk/ Risk Ratio = (a/[a + b]) / (c/[c +
d])
⢠Hazard ratio is a measure of relative risk over
time in circumstances in which we are
interested not only in the total number of
events, but in their timing as well.
49. Key points
⢠Sensitivity and specificity are prevalence
independent.
⢠Increased prevalence of disease will increase PPV
⢠Reduced disease prevalence increase NPV
⢠Increasing the threshold for a positive test
reduces false-positives and increases specificity
⢠Reducing the threshold or cutoff value for a
positive test reduces false-negatives and
increases sensitivity
52. Regression Analysis
⢠Simple Regression- Many relationships
between variables can be fit to a straight line.
Like duration of treatment
⢠Logistics Regression- In situations in which the
response of interest is dichotomous (binary)
rather than continuous. Like outcome of death
or alive
53. Summary
⢠Bio-statistics is integral part of understanding
any bio-medical research and Evidence-based
medicines.
⢠The concept of various variable and
understanding the various statistical tests are
equally important
⢠Sample size and the confidence limit concepts
hold the key