2. Percentages
Occurrence in a proportion of 100.
Divide the number of items by the total
number and multiply by hundred.
Usually applicable in calculating
frequency.
3. Mean
Arithmetic mean or average.
To assess if the data is normally distributed
or not.
Need to understand the difference between
mean and median.
4. Median
Mid-point to use when the data distribution is
not symmetrical.
Represent average location.
5. Mode
Most frequently occurring event.
The most common occurrence.
Bi-modal distribution means that two population
are merged together and the arithmetic mean
may not be the best estimate.
6. Standard deviation
Measure of spread of data around the
average.
(µ ± σ) contains 68.2% of the total data.
(µ ± 2σ) contains 95.4% of the total data.
(µ ± 3σ) contains 99.7% of the total data.
To check if data is normally distributed and to
find outliers.
7. Confidence intervals
Range in which the true value may occur.
Related to the size of sample.
99% CI means we are 99% sure that true effect
is in range x to y.
Larger studies have lower CI.
Meta-analysis is a technique for bringing results
together from number of similar studies and
estimate the overall effect.
8. P-values
Level of significance/ test of hypothesis.
Probability–value.
Any observed occurrence due to chance.
Lower the p-value, lower the chance and hence
higher the significance.
Null-hypothesis.
9. Parametric tests
T-test
Compare two sample and test the probability of samples means to be
same value.
Analysis of variance (ANOVA)
Group of techniques used to compare more than 2 samples to verify if
they come from same population.
Should only be used when data are normally-distributed.
(Kolmogorov Smirnov tests the hypothesis that the data are from
normal distribution)
Trick: Skewed-data can be normalized by using logarithm.
11. Chi-squared test
Difference between actual and expected frequencies
If observed and expected frequencies are same, x2 would be zero.
The larger the difference, the higher the x2 would be, only gives the
approximated p-value.
Fisher’s exact test.
Yates’ continuity correction is done to improve the accuracy of p-values.
Mantel Haenszel test is its extension to compare several 2-way tables.
12. Risk ratio
Relative risk.
Risk is probability that an event will occur,
obtained by dividing number of events by
number of people at risk.
If risk>1, rate of event is increased.
If risk<1, rate of event is decreased.
13. Odds ratio
Ratio of times an event occurs to when the
event does not occur.
Odds ratio>1, rate of event is increased.
Odds ratio<1, rate of event is decreased.
Odds ratio given with 95% CI.
If odds ratio ≠ 1, it is statistically significant.
14. Risk reduction and numbers needed
to treat
Relative risk reduction (RRR) / absolute risk reduction (ARR).
Numbers needed to treat (NTT).
How often treatment works?
ARR= event rate in intervention group- event rate in control group (as %).
NNT= number of patient needed to be treated for one to benefit = 100/AR.
RRR= proportion by which intervention reduces the event
rate=RRR/control event rate.
15. Correlation
Linear relationship/association between 2
variables.
Positive and negative correlation.
(0-0.2) very low and meaningless, (0.2-0.4) low,
(0.4-0.6) reasonable, (0.6-0.8) high and (0.8-1.0)
very high, reconfirm.
Values as r or r2.
16. Regression
Fit a line through a set of points.
How one set of data relates to other.
Regression line/ coefficient. (y=ax+b)
Logistic regression: only when sample can belong to one of 2 groups.
Poisson regression: to study waiting time or time between rare events.
Correlation measures strength of association and regression quantifies the
association.
Regression only valid to predict inside the range of the data.
17. Survival analysis: Life tables and
Kaplan-Meier plots
Visual analysis of what has happened to a population over time.
Time until a single event occurs and when the information is available for
a limited duration (Censored-operation).
Life table: table of proportion of patients surviving overtime.
Kaplan-Meier approach recalculates survival survival when an end occur
in a dataset.
Represented as survival plot. Compares between survival between groups.
18. Cox Regression Model
Cox Proportional Hazard Survival Model.
Hazard Ratio (HR).
Provides the estimate of the effect that different
factors have on the time until the end event.
19. Sensitivity, specificity and predictive
value
Sensitivity =
%&'( )*+,-,.(
%&'( )*+,-,.(/012+( 3(41-,.(
Specificity =
%&'( 3(41-,.(
%&'( 3(41-,.(/012+( )*+,-,.(
Positive predictive value =
%&'( )*+,-,.(
%&'( )*+,-,.(/012+( )*+,-,.(
Negative predictive value =
%&'( 3(41-,.(
%&'( 3(41-,.(/012+( 3(41-,.(
Likelihood ratio =
+(3+,-,.,-5
67+(3+,-,.,-5
20. Level of agreement
Measurement of how well the test agree.
Agreement of 0: no significant agreement.
Agreement of 1: perfect agreement.
24. Bayesian statistics
Based on the information already available.
A prior distribution is setup and new dataset is
analyzed.
Uses combination of prior belief and new
dataset.
34. Nominal data
Data that can be placed into named categories
without definite order.
Example. Eye colour.
35. Receiver Operator Characteristics (ROC)
A cut-off value in an initial test that gives
increase in sensitivity and a decrease in
specificity.
A curve showing sensitivity and specificity of
different possible values.
Helps to choose the cut-off value.