Basic Statistics for application in Medical Assessment

Medical Statistics
Just The Basics

Percentages
Occurrence in a proportion of 100.
Divide the number of items by the total
number and multiply by hundred.
Usually applicable in calculating
frequency.

Mean
Arithmetic mean or average.
To assess if the data is normally distributed
or not.
Need to understand the difference between
mean and median.

Median
Mid-point to use when the data distribution is
not symmetrical.
Represent average location.

Mode
Most frequently occurring event.
The most common occurrence.
Bi-modal distribution means that two population
are merged together and the arithmetic mean
may not be the best estimate.

Standard deviation
Measure of spread of data around the
average.
(µ ± σ) contains 68.2% of the total data.
(µ ± 2σ) contains 95.4% of the total data.
(µ ± 3σ) contains 99.7% of the total data.
To check if data is normally distributed and to
find outliers.

Confidence intervals
Range in which the true value may occur.
Related to the size of sample.
99% CI means we are 99% sure that true effect
is in range x to y.
Larger studies have lower CI.
Meta-analysis is a technique for bringing results
together from number of similar studies and
estimate the overall effect.

P-values
Level of significance/ test of hypothesis.
Probability–value.
Any observed occurrence due to chance.
Lower the p-value, lower the chance and hence
higher the significance.
Null-hypothesis.

Parametric tests
T-test
Compare two sample and test the probability of samples means to be
same value.
Analysis of variance (ANOVA)
Group of techniques used to compare more than 2 samples to verify if
they come from same population.
Should only be used when data are normally-distributed.
(Kolmogorov Smirnov tests the hypothesis that the data are from
normal distribution)
Trick: Skewed-data can be normalized by using logarithm.

Non-parametric tests
Mann-Whitney U test
Wilcox-signed rank test
Kruskal Wallis test
Friedman test

Chi-squared test
Difference between actual and expected frequencies
If observed and expected frequencies are same, x2 would be zero.
The larger the difference, the higher the x2 would be, only gives the
approximated p-value.
Fisher’s exact test.
Yates’ continuity correction is done to improve the accuracy of p-values.
Mantel Haenszel test is its extension to compare several 2-way tables.

Risk ratio
Relative risk.
Risk is probability that an event will occur,
obtained by dividing number of events by
number of people at risk.
If risk>1, rate of event is increased.
If risk<1, rate of event is decreased.

Odds ratio
Ratio of times an event occurs to when the
event does not occur.
Odds ratio>1, rate of event is increased.
Odds ratio<1, rate of event is decreased.
Odds ratio given with 95% CI.
If odds ratio ≠ 1, it is statistically significant.

Risk reduction and numbers needed
to treat
Relative risk reduction (RRR) / absolute risk reduction (ARR).
Numbers needed to treat (NTT).
How often treatment works?
ARR= event rate in intervention group- event rate in control group (as %).
NNT= number of patient needed to be treated for one to benefit = 100/AR.
RRR= proportion by which intervention reduces the event
rate=RRR/control event rate.

Correlation
Linear relationship/association between 2
variables.
Positive and negative correlation.
(0-0.2) very low and meaningless, (0.2-0.4) low,
(0.4-0.6) reasonable, (0.6-0.8) high and (0.8-1.0)
very high, reconfirm.
Values as r or r2.

Regression
Fit a line through a set of points.
How one set of data relates to other.
Regression line/ coefficient. (y=ax+b)
Logistic regression: only when sample can belong to one of 2 groups.
Poisson regression: to study waiting time or time between rare events.
Correlation measures strength of association and regression quantifies the
association.
Regression only valid to predict inside the range of the data.

Survival analysis: Life tables and
Kaplan-Meier plots
Visual analysis of what has happened to a population over time.
Time until a single event occurs and when the information is available for
a limited duration (Censored-operation).
Life table: table of proportion of patients surviving overtime.
Kaplan-Meier approach recalculates survival survival when an end occur
in a dataset.
Represented as survival plot. Compares between survival between groups.

Cox Regression Model
Cox Proportional Hazard Survival Model.
Hazard Ratio (HR).
Provides the estimate of the effect that different
factors have on the time until the end event.

Sensitivity, specificity and predictive
value
Sensitivity =
%&'( )*+,-,.(
%&'( )*+,-,.(/012+( 3(41-,.(
Specificity =
%&'( 3(41-,.(
%&'( 3(41-,.(/012+( )*+,-,.(
Positive predictive value =
%&'( )*+,-,.(
%&'( )*+,-,.(/012+( )*+,-,.(
Negative predictive value =
%&'( 3(41-,.(
%&'( 3(41-,.(/012+( 3(41-,.(
Likelihood ratio =
+(3+,-,.,-5
67+(3+,-,.,-5

Level of agreement
Measurement of how well the test agree.
Agreement of 0: no significant agreement.
Agreement of 1: perfect agreement.

Incidence
Number of new conditions over given time as
percentage of population.

Prevalence
Existing number of cases in a single point in
time as a percentage of population.

Power
Probability that the study will detect a
statistically significant difference.

Bayesian statistics
Based on the information already available.
A prior distribution is setup and new dataset is
analyzed.
Uses combination of prior belief and new
dataset.

Alpha (𝛼)
Equivalent to p-value.
Can be interpreted similarly.

Analysis of covariance (ANCOVA)
Study of inclusion of continuous variables in the model.

Beta (𝛽)
Probability of accepting a hypothesis that is
actually false.
Power of any study = 1-𝛽.

Binomial distribution
Data can only be taken from 2 values.

Bonferroni
Allows for multiple comparison of the data.

Descriptive statistics
Measures that describe data in a sample.
Mean, median, standard deviation, quartiles,
histogram.

Fisher’s exact test
Accurate test of association between
categorical variables.

Kappa (𝜅)
Measure of level of agreement between 2
categorical measures.

Log rank test
Non-parametric test that is used for
comparison of survival estimates.

Nominal data
Data that can be placed into named categories
without definite order.
Example. Eye colour.

Receiver Operator Characteristics (ROC)
A cut-off value in an initial test that gives
increase in sensitivity and a decrease in
specificity.
A curve showing sensitivity and specificity of
different possible values.
Helps to choose the cut-off value.

Skewed data
Lack of symmetry in distribution of the data.

Basic Statistics for application in Medical Assessment

Recommended

Recommended

More Related Content

Similar to Basic Statistics for application in Medical Assessment

Similar to Basic Statistics for application in Medical Assessment (20)

Recently uploaded

Recently uploaded (20)

Basic Statistics for application in Medical Assessment