This document defines key concepts related to random variables, distributions, and statistical hypothesis testing. It describes random and independent variables, normal, bimodal, skewed, and other distributions, measures of central tendency and variability like mean, median, range, variance, standard deviation, and their uses. It also defines type 1 and type 2 errors, alpha and beta levels, confidence intervals, null and alternative hypotheses, and their roles in statistical hypothesis testing.
2. Random variable
A variable whose observed values may be considered as
outcomes of an experiment
Whose values cannot be anticipated with certainty before
the experiment is conducted
4. Dependent Variable
Outcome of interest within a study.
In bioavailability and bioequivalence studies, examples
include the maximum concentration of the drug in the
circulation, the time to reach that maximum level, and
the area under the curve (AUC) of drug level-versus-time
curve
5. Normal distributions
Symmetrical on both sides of the mean”
A bell-shaped curve, Gaussian curve, curve of error, or normal
probability curve
An example of normally distributed data includes drug
elimination half-lives in a specific population in a sample of men
with normal renal and hepatic function.
6. Bimodal distribution
Two peaks of cluster or areas of high frequency occur
For example, a medication that is acetylated at different
rates in humans would be a “bimodal distribution,
indicating two populations consisting of fast acetylators
and slow acetylators
7. Skewed distributions
Occur when data are not normally distributed and tail off to either
the high or the low end of measurement units
A positive skew occurs when data cluster on the low end of the x axis
For example, the x axis could be the income of patients seen in inner-
city Emergency Department (ED), cost of generic medications,
number of prescribed medications in patients younger than 30 years
of age.
8. Negative skew
A negative skew occurs when data cluster on the high end
of the x axis
For example, the x axis could be the income of patients
seen in ED of an affluent area, cost of brand name
medications, number of prescribed medications in patients
older than 60 years of age.
9. Kurtosis
Occurs when data cluster on both ends of the x axis such
that the graph tails upward (ie, clusters on both ends of
the graph).
For example, the J-curve of hypertension treatment; with
the J-curve, mortality increases if blood pressure is either
too high or too low
10. Range
The interval between lowest and highest values
Range only considers extreme values, so it is affected by
outliers
Descriptive only, not used to infer statistical significance
Interquartile range is the interval between the 25th and 75th
percentiles, so it is directly related to median, or the 50th
percentile
It is not affected by outliers and, along with the median, is
used for ordinal scale data
11. Variance
Variance is deviation from the mean, expressed as the
square of the units used
As sample size (n) increases, variance decreases
12. Standard deviation (SD)
the square root of variance
SD estimates the degree of data scatter around the sample
mean.
Sixty-eight percent of data lie within ±1 SD of the mean and
95% of data lie within ±2 SD of the mean
SD is only meaningful when data are normally or near-
normally distributed
Sigma (s) is the population SD and S is the sample SD to
parametric data
13. The coefficient of variation
the SD expressed as a percentage of the mean
useful in comparing the relative difference in variability
between two or more samples, or which group has the
largest relative variability of values from the mean
14. Standard error of the mean (SEM)
the SD divided by the square root of n
The larger n is, the smaller SEM is
quantification of the spread of the sample means for a
study that is repeated multiple times
The SEM helps to estimate how well a sample represents
the population from which it was drawn
15. Confidence interval (CI)
A method of estimating the range of values likely to
include the true value of a population parameter
In medical literature, a 95% CI is most frequently used
The 95% CI is a range of values that “if the entire
population could be studied, 95% of the time the true
population value would fall within the CI estimated from
the sample
16. Statistical hypothesis
For superiority trials, the null hypothesis (H0) is that no
difference exists between studied populations
For superiority trials, the alternative hypothesis (H1) is
that a difference does exist between studied populations
H0: There is no difference in the AUC for drug formulation
A relative to formulation B
H1 (aka Ha): There is a difference in AUC for drug
formulation A relative to formulation B
17. Type 1 error
Occurs if one rejects the H0 when, in fact, the H0 is true
For superiority trials this is when one concludes there is a
difference between treatment groups, when in fact, no
difference exists
18. Alpha (a) is defined as the probability of making a type 1
error
When a level is set a priori (or before the trial), the H0 is
rejected when p = a
By convention, an acceptable a is usually 0.05 (5%), which
means that 1 time out of 20, a type 1 error will be
committed.
Type 1 error
19. Type 2 error
A type 2 error occurs if one accepts the H0 when the H0 is
false
For superiority trials this is when one concludes there is
no difference between treatment groups, when in fact, a
difference does exist
Beta (b) is the probability of making a type 2 error
By convention, an acceptable b is 0.2 (20%) or less