SlideShare a Scribd company logo
1 of 42
A COMPACT GUIDE TO
BIOSTATISTICS
Naira R. Matevosyan, MD,MSJ,PhD
Legal Clinic: nairarenault.wix.com/panther-law
Authored Books: nairarenault1.wix.com/nairamatevosyan
IN THIS ISSUE:
Main Strains in Biostatistics: Descriptive, Inferential, Euclid (3 -7)
Data, Variables, Vectors, Valence (8 - 16)
Matching and Manipulation: Mediator and Moderator (17)
Mode Merits and Demerits (18)
Confounding by Indication: Severity, Protopathy, Selection (19-20)
Confounding by Indication and Contraindication (21)
Collider, Residual Confounding, Reverse Causality (22 -23)
Prevalence, Incidence, Duration (24)
Reduction and Stratification (25 -26)
Diagnostic Tests (S.N.N.O.U.T., S.P.P.I.N.), Predictive Value (27 -29)
Reliability, Validity, Accuracy, Precision, Recall (30 -32)
Stratum Specific Hyper-prior Distributions (33)
Propensity Score, Matching, and Causal Pretzel (34 -35)
Level of Evidence: Causal Description v. Causal Explanation (36)
Measuring Risk (37 -38)
Types of Biases (39 -40)
Review Questions and Answers (41 -42)
MAIN STRAINS IN BIOSTATISTICS
Based on the degree of abstraction
DESCRIPTIVE
Characterizes a sample or data-sets by actual measurements.
INFERENTIAL
Assumes each replication in a condition as entirely
independent – to create countless challenges. Calculates a
test-statistic value, degree of freedom, or rejection criteria –
through a particular formula (based on the study design or
specifics) to determine whether or not there are differences
between the treatment groups. Extrapolates and generalizes
the outcomes to make predictions.
EUCLID
Assumes a set of intuitively appealing axioms to assess a
sample through the two, three, four or n-dimensional
geometrical canon (detaching time from space) for causality,
casualty, and prediction. 3
SCENARIOS, EXAMPLES
●
Hypothesis: Copper (Cu), Iron (Fe), Manganese (Mn), and Zinc (Zn)
insufficiency contributes to suboptimal levels of luteinization hormone (LH) in
infertile women of reproductive age.
●
Hypothetical Sample: Plasma levels of Cu, Fe, Mn, and Zn from 210 women
(age 19-49 y) with primary infertility, 36 hours prior to the proposed ovulation
day, over the four consecutive menstrual cycles.
●
Descriptive Statistics: Measures the means, standard deviations of plasma
trace elements, covariance given by the diagonal and off-diagonal elements,
correlations between each and paired variables. A correlation coefficient (r)
larger than 0.7 indicates a strong association.
●
Inferential Statistics: Stratifies women for ethnicity (for variations in duration
of the menstrual cycles), age (19-29, 39-40), BMI, preexisting medical
conditions, pelvic inflammatory disease, tubal TB, uterine anomalies, thyroid
disorders, anemia, etc. Models the stratified random samples for probability.
Measures the posterior kernel densities of parameters. Runs predictive
inferences in the fitted model.
●
Euclid Statistics: Assesses the data through holomorphic operations, each
domain as a complex-valued function of differentiable variables which are
manifolds in a spatial unit where the tangent sits with the n-root of differential
expression. Data triangulation and inferences at the infinitesimal points help
with predictions (check scholar.google.com for Matevosyan N.R. Articles, 2011-2021).
4
SOME OF MY BOOKS ON THIS SUBJECT
●
“Advanced Research in Comorbodity”
ISBN: 9781514787410 ISBN: 9781493553013
5
QUALITATIVE DATA:
6
QUANTITATIVE DATA:
7
DATA v. VARIABLES
●
Data include two sets of values: variables (qualitative,
quantitative), and observational units from samples or populations.
●
Data and variables are not synonymous. Variables are the data
modeled (measured, manipulated, linked, controlled, correlated,
compared, indexed) into a function.
●
Independent (experimental) variable is manipulated and its effect
on the dependent (outcome) variable is measured. The role of a
variable depends on a study design: a dependent variable may
become independent, a process variable may become a predictor.
For example, by changing the temporal order between two
variables in causal inference (rates of abortion and unipolar
depression in the same community), abortion can be modeled as an
independent variable to measure the depression rate (dependent
variable) or vice versa. Put simply, depression can be viewed as the
outcome of abortion, and abortion can be viewed as the outcome of
depression.
8
TYPES OF VARIABLES INCLUDED IN THIS
PRESENTATION
Independent (experimental) variable
Dependent (outcome) variable
Categorical variable
Numerical variable
Continuous variable
Predictor
Process (mediator, intervening) variable
Moderator (affector)
Latent variable
Omitted variable
Symbolic variable
Hidden variable
Hypothetical variable. 9
VARIABLES: MODERATOR V. MEDIATOR
MEDIATOR explains the relationship between independent and
dependent variables (predictor and outcome). For example, in a
sedative pill trial on women with anxiety disorder, women's body-
mass index (BMI) is modeled as a process variable (mediator) that
shows the relation between the independent variable (drug dosage)
and the dependent variable (symptoms of anxiety). In the case of
total mediation, the relationship between predictor and criterion is
reduced to zero, after controlling the mediator - criterion relation.
MODERATOR is a third variable (in a zero-order correlation) on
which the relationship between the other two depends. While the
mediator explains the causality chain, the moderator affects the
strength and direction of that chain. Mediator intervenes and
moderator interacts. Interaction can be categorical (qualitative) or
quantitative. In the same sedative pill study, a moderator is the heavy
caffeine intake that worsens the anxiety in women and interacts with
the study results. Moderator can also explain variations between the
studies. 10
11
12
MODERATOR V. CONFOUNDER
●
Confounder distorts the association between the
predictor and the outcome.
●
Moderator differentiates the association between the
predictor and the outcome.
●
Mediator explains the association between the predictor
and the outcome.
We typically “adjust” for confounders, and “report” the
different effects seen from the effect modifiers.
LATENT VARIABLES
Latent variables are inferred via mathematical models from other
variables that are observed (actually measured). Latent variables (LV)
are used in psychology, economics, medicine, artificial intelligence,
bioinformatics, speech science, management, or social sciences.
Examples are quality of life, confidence, morale, happiness, or liberty -
concepts that cannot be measured directly.
Sometimes, LV may correspond to aspects of physical reality and
therefore be measured as “hidden variables”, and not for practical
reasons. LV may also correspond to the abstract concepts (categories,
behavioral clusters) and be modeled as “hypothetical variables”.
An advantage of using LV is that it reduces data dimensionality
(valence). Presenting a "shared variance" or the degree to which
variables "move together," the LV link observable (real) data to
symbolic (modeled) data. Variables that have no correlation cannot
result in a latent construct based on the common factor model.3
(3) Tabachnick, B.G., Fidell, L.S. (2001). Using Multivariate Analysis. Boston: Allyn
and Bacon. 14
OMITTED VARIABLES
Omitted variables are values that can be both cause and result, or
independent and dependent variables in the same model. For example,
anxiety can be both the cause or the result of unemployment;
abortion can be both the cause and the result of depression.
Omitted variable bias (OVB) occurs when a model is created by
incorrectly leaving out one or more important causal factors, or
compensating for the missing factor by underestimating one of the
other important factors.
Two conditions must hold true for OVB to exist in linear regression:
the omitted variable must be:
●
a determinant of the dependent variable (when its true, regression
coefficient is not equal to zero), and
●
correlated with one or more of the included independent variables
(the covariance of the omitted variable and the independent
variable is not equal to zero).
15
VALENCE, VECTORS
●
Valence is the dimensionality of the data which can be reduced by the
latent variables (see slide 14).
●
Vectors of values are implied in regression models toward the matrix.
An example: in the modeled formula
16
i = 1.....n; xi is a 1 × p row vector of values of p independent variables
observed at time i or for the ith study participant; β is a p × 1 column
vector of unobservable parameters to be estimated; zi is a scalar, the value
of another independent variable that is observed at time i or for the ith
study participant; δ is a scalar, an unobservable parameter (the response
coefficient of the dependent variable to zi); ui is an unobservable error
occurring at time i or for the ith
study participant; ui is an unobserved
realization of a random variable having expected value 0 (conditionally on
xi and zi); yi is the observation of dependent variable at time i or for the ith
study participant. If zi is omitted from the regression, the estimated values
of response parameters will be given by usual least squares, = (X'X)-1
β
X'Y, where the "prime" notation means the transpose of matrix and the -1
superscript is matrix inversion. Substituting for Y based on the assumed
linear model,
The OMV is non-zero if z is correlated with any variable on the matrix.
MATCHING & MANIPULATION: MEDIATION V. MODERATION
●
Matching is used to reduce bias, by evaluating the effect of treatment while
comparing the treated and non-treated units in an observational study or
quasi-experiment (without no random assignments).
●
Experiments explore the effects of things, events, or behaviors that can be
manipulated (dose of a medicine, salary, treatment modality). It is harder to
measure non-manipulable causes (raw genetic material, age, gender). Those
are assessed indirectly in non-experimental studies, using whatever means are
available or fit. Finding manipulable agents helps ameliorate the problem. For
example, phenylketonuria (PKU) treatment wasn't discovered by first trying
different diets in retarded children. Initially, non-manipulable variables were
used to find the increased levels of phenylalanine in those kids. Such findings
informed the scientific directions leading to the diet – with varying degrees of
reduction. Some were experimental, others were not.
●
Analogue experiments can be used on non-manipulable causes by
manipulating an agent that is similar to the cause of interest. We cannot change
a person's race but we can chemically alter the skin pigmentation. Further,
past events (which usually are nonmanipulable) may constitute a natural
experiment that once was even randomized. Stronger solutions to causality
can be achieved by the mediators.
●
Mediator v. Moderator: See details on slides 10-13. 17
MODE MERITS & DEMERITS
●
Mode: Among the mean or median of a series, mode is the most
frequent value. It can't be determined from a series of individual
observations unless it is converted into a discrete or continuous series.
In a discrete series, the value of variable against which the frequency
is the largest is the modal value. Mode is measured by
where i is the class interval, i1
is the lower limit of modal class, Δ1
is the
difference of frequencies between modal class and preceding class, Δ2
is
the difference of frequencies between modal class and post-modal class.
●
Mode Merits: Mode is not affected by the values of extreme items. For
the determination of mode, all values in a series are not considered.
●
Mode Demerits: Mode is incapable of further mathematical
treatment. Because mode is not based on all observations of a series, it
is not rigidly defined. Mode may be unrepresentative in some cases as
it may not have a definite value, as in a set of observations two or
three or more modal values may occur. 18
Confounding by Indication
●
As noted in my books, epidemiology is about mastering the
concept of confounding. Yet, confounding is not always the “elixir” of
causality.
●
A confounding variable (hidden or lurking variable) extraneously
correlates (directly or inversely) with both dependent and
independent variables. A perceived relationship between independent
and dependent variables that has been misestimated due to the
failure to adjust for confounders is termed a spurious relationship,
and the misestimation is known as an omitted variable bias.
●
How do we prove confounding? We compute the degree of
associations between independent and dependent variables before
and after adjusting for a possible confounder. If the difference
between the two degrees of association is >10%, a confounding is
present and the effect is modified.
●
Confounding by indication is when a variable itself is a risk factor
(in the non-exposed control group) associated with the exposure of
interest – without being an intermediate step in the causal pathway.
19
Types of Confounding by Indication
Confounding by Indication (CBI) is typical of the observational,
pharmaco-epidemiologic studies when exposure is associated with
outcome and the latter is caused by indication for what the exposure was
used, or by another factor associated with indication. Confusions about
CBI are mostly due to the three different situations:
(a) CBI as a protopathic bias
(b) CBI by severity
(c) CBI as a form of selection bias.
CBI matters when the severity (or stage) of a disease or a degree of
exposure to an agent act as independent variables at the random intercept
for each confounder. The degree of confounding depends on the
prevalence of putative confounding factors, levels of association with the
disease, and the exposure. Where the disease responsible for indication
acts as a categorical confounder irrespective of a symptom severity, CBI is
due to the protopathic or selection biases.
Solution: Including a range of different indications for the same exposures
enables the relationship between exposure and outcome triangulated to
each of the individual indication analyzed separately.
20
Confounding by Indication & Contraindication
Confounding by contraindication (CBCI) is
a rarer bias, and concerns the non-
experimental (observational) studies that
examine predictable side effects.
Hypothesis: Hypochromic anemia is a side
effect of SSRI/SNRI antidepressants. The
SSRI/SNRI intake during pregnancy
contributes to intrauterine growth
retardation (IUGR). CBI Scenario: In women
with depression and singleton pregnancies
antidepressants are modeled as independent
variables, IUGR as the outcome, anemia as a
confounder. CBCI Scenario: The SSRI/SNRI
and IUGR relationship is distorted because
the index group of SSRI/SNRI users will
exclude women with prior IUGR. Ignoring
the CBCI will result in a reference group of
SSRI/SNRI non-users having false “higher
rates” of IUGR. CSCI bias can be addressed
by the exclusion of multi-gravida women. 21
Confounding v. Colliding
22
●
Confounding is when exposure and outcome have a
shared common cause that is not controlled by design.
●
Collider bias occurs when exposure and outcome (or
factors causing these) each concurrently influence a
common third variable and that variable (or collider) is
controlled by design.
Residual Confounding, Reverse Causality
Residual confounding is the distortion that remains after controlling for
confounding in a study design or analysis. There are three reasons for
residual confounding: (1) No efforts are made to consider, collect and
adjust for additional factors; (2) There are many errors in grouping the
subjects for a confounder analysis; (3) Control of confounding is not
vigorous enough. For example, in a randomized trial on women with
osteoporosis (where age is a confounder) the sample size is too small and
the confounding variable is imprecise (while matching or stratifying age
groups, the age distinction is not scored but scaled as “younger,” “young,”
“old,” “older”) – resultant in residual confounding.
Reverse causality occurs when the probability of an outcome is causally
related to the exposure being studied. Put simply, you may think that X
causes Y, while in the reality Y cases X. For example, it is hard to prove
whether miscarriage was from depression, or depression resulted from a
miscarriage? To prevent confusion, the nine criteria must be followed: (1)
strength of the association, (2) consistency of findings, (3) specificity, (4)
temporal order, (5) exposure gradient, (6) plausible mechanisms, (7)
coherence between the observational, epidemiological and lab data, (8)
experimental evidence, (9) analogy. 23
Prevalence, Incidence, Duration
●
Prevalence: A cross-sectional measure of the total number of
people in a population (or subjects in a sample) affected by a
condition at one point in time. Cannot be used in a prediction analysis
(excluding meta-studies with logistic prediction, stochastic compartmental
modeling, or Euclid infinitesimal manifolds by Matevosyan 2013, 2015, 2021).
●
Incidence: A longitudinal measure showing the number of new
cases of a disease (or an event) in a population over a specific
period of time. Can be used in a prediction analysis (examples are the
S.I.R. model [4], or the reinfection proportion model [5]).
●
Duration: Relates incidence to prevalence. For example, upper
respiratory infections (URI) have a high incidence (seasonal) but a low
prevalence because most URI resolve fast. multiple sclerosis (MS) has a
relatively low incidence but high prevalence because it is for life.
(4) Kermack,W.O., McKendrick, A.G. (1927). A contribution to the mathematical theory of
epidemics. Proceedings of the Royal Society of London; 115(772): 700-721
(5) Wang J.Y., Lee L, N., Lai H.C., et al. (2007). Prediction of the tuberculosis reinfection
proportion from the local incidence. The Journal of Infectious Diseases; 196(2): 281–288
24
Reduction
Reduction is the transformation of numerical data (empirical, trial, lab,
digital) into a corrected and simplified form for three reasons: (1) to reduce
the number of data records by eliminating invalid or dubious data, (2) to
produce summary or aggregate data for various applications, (3) to reduce
the occurrence and effect of confounders by comparative analysis.
Depending on a study design, reduction controls confounders differently:
●
Cross-section - assigns confounders to both (clinical, control) groups equally.
●
Cohort - creates (via over-exclusion) comparable cohorts with similar
features for possible confounders (age, gender, income, menarche, BMI, etc)
●
Double-blind - conceals the experiment group membership. By preventing
the participants from knowing if they are receiving treatment or not, the
placebo effect should be the same for the control and treatment groups. By
preventing the observees from knowing of their membership, there should be
no treatment or interpreting bias by the researchers.
●
Randomized- the study sample (or population) is divided randomly in order
to mitigate the chances of self-selection (by participants) or bias (by
researchers). Prior to the trial, a random number generator is used to assign
participants to the intended groups (control, intervention, parallel).
25
Stratification
Stratification is about dividing the population into distinct groups
or subsets (strata) in each independent sample.
●
For example, protected sex may prevent prostate cancer and in
this equation, age is assumed to be a confounder. Therefore,
the sampled data are stratified by age groups to analyze the
degree of association between safe sex practices and prostate
cancer. If different age groups (strata) yield substantially
diverse risk ratios, age must be viewed as a confounding
variable.
There are statistical tools, among them Mantel–Haenszel iterates,
that control confounding effects by measuring the known
confounders and including them as covariates in multivariate
analyses. However, the multivariate analyses reveal much lesser
information about the strength of the confounding effects than do
stratification methods.
26
Diagnostic Tests
●
True positive (Tp): Disease is present and diagnostic test is positive
(a correct result).
●
True negative (Tn): Disease is absent and diagnostic test is negative
(a correct result).
●
False positive (Fp): Disease is absent and diagnostic test is positive
(an incorrect result).
●
False negative (Fn): Disease is present and diagnostic test is
negative (an incorrect result). It is also known as type-2 error.
●
PREVALENCE: The number of affected persons of the total sampe (or
population) = (Tp + Fn)/(Tp + Tn +Fp + Fn)
●
SENSITIVITY: Assuming the disease is present, the probability that the
test will be positive Tp/(Tp + Fn). Used in imaging or screening that
have few negatives. A highly sensitive test rules out the disease:
SNNOUT (sensitive, negative result rules out a disease).
●
SPECIFICITY: Assuming the disease is absent, the probability that the
test will be negative Tn/(Tn + Fp). Used in confirming clinical
diagnoses as there are few false positives. A highly specific test rules-in
the disease: SPPIN (spcific, positive result rules in a disease). 27
Sensitivity v. Specificity (continued)
There is a tradeoff between sensitivity and specificity. Changing the cutoff value for the
serum psychosine (to < 15 ng/mL) will change the test's ability to detect the affected
newborns with Krabbe disease. Likewise, if the serum copper cutoff for diagnosing
Wilson's disease were moved from 20 g/dL to 15 g/dL, the test would be
μ μ very specific
because any child with Cu level of 15 g/dL would certainly have
μ Wilson's disease (with
a very few Fp results). However, the results would be insensitive because patients with
serum Cu reading of 15 g/dL would have
μ Fn results (when the normal is > 20 g/dL) .
μ
28
Predictive Value
●
A reminder: Sensitivity = Tp/(Tp + Fn); Specificity = Tn/(Tn +Fp).
●
Positive Predictive Value (PPV): Given the test is positive, it is
the probability that a disease is present.
PPV = Tp/(Tp + Fp)
If MRI has a 95% specificity of a spinal cord tumor, based on positive
findings, the patient will trully have the tumor 95% of the time.
●
Nevative Predictive Value (NPV): Given the test is negative, it is
the probability that a disease is absent.
NPV = Tn/(Tn + Fn)
If Epstein Barr Virus (EBV) test has a 99% NPV, then given a negative
test, the patient will trully be EBV-negative 99% of the time.
●
Note: PPV and NPV vary depending on disease prevalence in a population. Yet,
sensitivity will not be affected because Tp/(total number of people with
disease) ratio will not change for a given test but the Tp/(total number of
positive tests) will vary because the area with a higher prevalence will have a
higher number of positive tests.
29
Reliability, Validity, Accuracy, Precision
●
RELIABILITY – the measure of consistency of a test; the likelihood that
upon repetition the test will deliver the same results in the same
situation.
●
VALIDITY – the ability of a test to measure what it intends to measure.
●
A test may be reliable but not valid. It may reliably measure the serum
level of selenium; yet, this doesn't inherently mean that the reliable level
of selenium is a valid predictor of Grave's disease.
●
ACCURACY - is analogous to validity, relates to constant error, and
measures a test's ability to obtain true results. In a binary case,
Ac = (Tp +Tn)/(Tp +Tn + Fp+ Fn).
●
PRECISION - is analogous to reliability, relates to variable error, and
measures a test's ability to replicate results. In a binary model,
Pr = Tp/(Tp +Fp).
●
Accuracy is the degree of closeness to the true value. Precision is the
degree to which repeated measurements under unchanged conditions
show the same results. The precision value lies between 0 and 1.
30
Precision & Recall
●
A measurement system is considered valid if it is both
accurate and precise.
●
RECALL – Measures a test's accuracy in a binary model,
i.e. out of the total positive what percentage is predicted
positively? It is the same as TPR (true positive rate):
R = Tp /(Tp + Fn)
●
F1 SCORE: The harmonic mean of precision and recall
that takes both false positives (Fp) and false negatives
31
(Fn) into account. It performs well on imbalanced datasets by
giving the same weight to recall (Rc) and precision (Pr):
F1 score = 2/(1/Pr + 1/Rc) = (2Pr x 2 Rc)/(Pr + Rc) =
Tp/ (Tp + Fp/2 + Fn/2).
●
Different problems give different weights to recall or precision. The
weighted F1 score interprets it:
Fβ
=(1+β2
) x (Pr x Rc)/([β2
x Pr] +Rc), where β represents the
number of times when a recall is more important than precision.
Precision & Recall
●
A measurement system is considered valid if it is both
accurate and precise.
●
RECALL – Measures a test's accuracy in a binary model,
i.e. out of the total positive what percentage is predicted
positively? It is the same as TPR (true positive rate):
R = Tp /(Tp + Fn)
●
F1 SCORE: The harmonic mean of precision and recall
that takes both false positives (Fp) and false negatives
32
(Fn) into account. It performs well on imbalanced datasets by
giving the same weight to recall (Rc) and precision (Pr):
F1 score = 2/(1/Pr + 1/Rc) = (2Pr x 2 Rc)/(Pr + Rc) =
Tp/ (Tp + Fp/2 + Fn/2).
●
Different problems give different weight to recall or precision. The
weighted F1 score interpretes it:
Fβ
=(1+β2
) x (Pr x Rc)/([β2
x Pr] +Rc), where β represents the
number of times when recall is more important than precision.
Stratum Specific Hyper-prior Distributions
●
Problem Definition: Bias models in biostatistics are often used for a sensitivity
analysis where bias is a function (although, occasionally it becomes part of Bayesian
analysis). Conventional analysis of observational data looks like a stratum-specific
process that only quantifies random errors, leaving the scholars to rely on informal
judgments as to the bias effects.
●
Conventional Solutions: Assessment of uncertainty is an essential part of inference
and requires a model with parameters that measure departures from the dubious
assumptions. The most notable models are the confidence profile method which
incorporates bias models into the likelihood function, or Monte Carlo sensitivity
analysis (MCSA) which samples bias parameters, then inverts the bias model to provide
a distribution of ‘bias-corrected’ estimates.
●
Hyperprior Approximation: Bayesian and MCSA outputs depend entirely on the
prior-distributions p(η) that reintroduce the problem of basic sensitivity analysis.
Given the limitless possibilities for p(η), a thorough sensitivity analysis would only
illustrate how various conclusions can be reached. A conclusion about the target would
require constraints on the p(η). These limits would constitute a subjective prior on
priors (a hyperprior); incorporating them into the analysis would produce a subjective
average of results over the hyperprior. This result would itself be subjected to concerns
about sensitivity to the hyperprior, which would continue on into an infinite regress
which is impractical. There is nothing spurious about the quantification if the hyperprior
approximates the views of the analyst, as then the output gives the analyst an idea of
what his/ her posterior bets about the value of target should be. 33
Propensity Score
Propensity score (PS) is a probability of treatment assignment
conditional on observed baseline characteristics. It allows to design
and analyze an observational (non-randomized) study so that it
mimics some of the characteristics of a randomized controlled trial. It
is a balancing score where the distribution of observed baseline
covariates is similar between treated and untreated subjects.
ei = Pr(Zi = 1|Xi)
where ei is the PS, Zi denotes the binary treatment condition (Zi=1, if
patient i is in the treatment group and Zi=0, if patient i is in the control
group), Pr - the conditional probability of treatment, Xi vector of
covariates. There are four different applications of PS:
●
matching on the propensity score
●
stratification on the propensity score
●
inverse probability of treatment weighing by using the propensity score
●
covariate adjustment using the propensity score.
(continued) 34
Matching & Causal Pretzel
There are several methods of forming matched pairs in treated and
compared subjects for the propensity score:
(1) Matching with and without replacement
(2) Greedy matching – where the first treated subject is selected randomly
(3) Caliper matching - using a proportion of standard deviations of a logit
for propensity score.
Causal Pretzel: Experiments test the influence of descriptive causes or inus
conditions. They do not completely explain a phenomenon; rather they aim to
identify whether a variable (or a set) makes a marginal difference in an
outcome - among other factors affecting that outcome. Many costly scientific
studies (including randomized trials) do not necessarily bring home results. In
part, to limit the cost of contingencies, researchers undergo extensive training
to be able to make smart inclusions and matching. Even then, substantial
judgment is still required as the exact choice may depend on the diagnosis, lab
results, insurance resources, ethics constraints, and the cost of such
arrangements still remains high. In this aspect, meta-studies are a great asset,
for measuring moderators (that once were experimental) for propensity scores,
for further testing invariance or reduction. This framework reminds a pretzel.
35
Level of Evidence:
Causal Description v. Causal Explanation
The level of evidence outlined by Sackett (2000) [5]:
1A = Systematic review of randomized controlled trials (RCT)
1B = RCT with narrow confidence interval
1C = All or none case series
2A = Systematic review of cohort studies
2B = Cohort study
2C = Outcomes research
3A = Systematic review of case-controlled studies
3B = Case-controlled study
4 = Case series, poor cohort case-controlled
5 = Expert opinion.
(5) Sackett D.L., Strauss S.E., Richardson W.S., et al (2000).
Evidence-Based Medicine: How to Practice and Teach EBM.
Philadelphia (PA): Churchill-Livingstone
The strength of an
experiment or observation is
in describing outcomes
attributable to varying
treatments (causal
description). Yet, trials or
observations do less when
clarifying causal chains or
confounding (causal
explanation). Meta-studies
pool the results of similar
studies to increase statistical
power (rejecting Fn). This
depends on how a meta-
study preserves or changes
the provided causal
description into causal
explanation.
36
Measuring Risk
EXPOSURE: From the 2 x 2 table, the probability that the event will
occur in the exposed group is given by risk in exposed = a/(a+b), and
in the unexposed (control) group it is given by risk in unexposed =
c(c+d).
RISK DIFFERENCE (RD) = risk in exposed - risk in unexposed,
or vice versa. There are several ways to express RD.
– Absolute Risk Reduction (ARR): The reduction of incidence
associated with treatment. ARR = risk in the control group –
risk in the treatment group.
– Attributable Risk (AR): Increase in disease incidence associated
with an exposure. AR = risk in exposed - risk in unexposed.
– Number Needed to Treat (NNT): Number of patients required
to receive an intervention before an adverse outcome is
prevented. NNT = 1/ARR.
– Number Needed to Harm (NNH): Used for interventions or
exposures that may be detrimental. NNH = 1/AR.
37
Measuring Risk (continued)
●
Relative Risk or Risk Ratio (RR): The ratio of incidence in two
groups. RR =risk in exposed / risk in unexposed. RR > 1 indicates
harm, RR < 1 indicates treatment, RR = 1 indicates null effect.
●
Relative Risk Reduction (RRR): The percentage of a disease
prevented by treatment. RRR = (risk in unexposed – risk in
exposed)/risk in unexposed = ARR/baseline risk.
●
Excess Relative Risk (ERR): For harmful exposure, ERR =(risk
in exposed – risk in unexposed)/risk in unexposed.
●
Odds: The ratio of a probability of an outcome to the probablity of
not having the outcome. Odds =p/(1-p).
●
Odds Ratio (OR): The odds of an event in the exposed group
divided by the odds of the event in an unexposed group. In a 2 x 2
table, OR = (a/b)/(c/d) = ad/bc. In case-control studies, OR is used
instead of RR because RR can't be calculated from a study data owing
to purposeful oversampling of cases in the study design. OR
approximates RR if the outcome is rare. 38
Types of Biases
●
Confounding: A third variable relates to both exposure and
outcome and distorts the association of interest. Solution: Matching.
●
Selection Bias: Non-randomly assigned unsimilar baseline groups.
– Sampling (Ascertainment) Bias: The sample doesn't accurately
represent the population of interest. These studies have internal
validity but lack external validity (generalization). Solution:
Random sampling.
– Susceptibility Bias: Sicker patients are selected for more
invasive treatment. Solution: Randomization.
– Attrition Bias: If loss pr follow-up is uneven between the
groups, it makes an intervention group seem more effective than
it is. Solution: Gathering as much data as possible from dropouts.
●
Measurement Bias (Hawthorne Effect): During the study,
participants change their behaviors. Solution: Placebo group.
●
Recall Bias: The memory of exposure may be affected by the
patient's knowledge of the current disorder. Solution: Prospective
study or data triangulation with confirmatory and objective sources. 39
Types of Biases (continued)
●
Lead-time Bias: Early detection of a disease may be misinterpreted as
improving survival. Solution: Adjusting survival rates according to the
severity of disease, not from the detection date.
●
Late-look Bias: Data are collected too late for useful conclusions
because subjects with terminal diseases are either dead or incapable of
timely responding. Solution: Stratify by severity.
●
Omission Bias: Removing or absence of certain variables resultant in
unfitness of the model for regression analysis. Solution: Reiterative
truncated projected least squares (BP-RTPLS).
●
Procedural Bias: Subjects are treated differently depending on the
arm of the study. Solution: Double-blind study.
●
Experimenter Expectancy Bias (Pygmalion Effect): The
researchers' ambitions influence the outcome of the study. Solution:
Double-blind study will prevent researchers and subjects from knowing
to which arm of the study the subjects are assigned.
●
Funding (Sponsorship) Bias: The tendency to skew study results to
support the sponsor's goal or mission. Solution: Independent audit.
40
Review Questions
1)A randomized control trial studied the benefits of a new Lupus
Nephritis medication. Of 80 subjects on medication, only 10 had
hematuria. Twenty-five (25) participants out of 80 in the
control group developed hematuria. Make a 2 x 2 table to
calculate the incidence in the exposed and unexposed groups as
well as ARR, NNT, RR, and RRR for medication.
2)A case-control study examines risk factors for oral cancer.
Sixteen (16) subjects with oral cancer are sampled for the
treatment group and 16 participants are selected as controls.
Ten subjects with oral cancer are heavy smokers and four
without oral cancer smoke too. Construct a 2 x 2 table to
calculate the odds ratio (OR). Given the data above, can we
compute the prevalence of oral cancer? Why should we
calculate OR and not RR?
41
Review Answers
42
1) Risk in exposed = A/(A + B) =
10/80 = 0.125 = 12.5%
Risk in unexposed = C/(C+D) = 25/80
= 0.313 = 31.3%
ARR = 31.3% - 12.5% = 18.8%
NNT = 1/AAR = 1/0.188 = 5.3
RR = Risk in exposed/Risk in unexposed = 12.5%/31.3% = 0.4 = 40%
RRR = (Risk in unexposed – Risk in exposed)/Risk in unexposed = (31.3%
-12.5%)/31.3% = 0.6 = 60%
2) OR = (A/B)(C/D) = AD/CB = 120/24 = 5
prevalence as we sampled two equal-size groups (exposed,
unexposed). In a case-control study like this, OR is measured, not RR.
The odds of having oral cancer in
smokers are 5 times those of non-
smokers. We can't calculate

More Related Content

Similar to A Compact Guide to Biostatistics

Linear regression (1). spss analiisa statistik
Linear regression (1). spss analiisa statistikLinear regression (1). spss analiisa statistik
Linear regression (1). spss analiisa statistikJuandaSatriyo1
 
Inferential AnalysisChapter 20NUR 6812Nursing Research
Inferential AnalysisChapter 20NUR 6812Nursing ResearchInferential AnalysisChapter 20NUR 6812Nursing Research
Inferential AnalysisChapter 20NUR 6812Nursing ResearchLaticiaGrissomzz
 
Inferential AnalysisChapter 20NUR 6812Nursing Research
Inferential AnalysisChapter 20NUR 6812Nursing ResearchInferential AnalysisChapter 20NUR 6812Nursing Research
Inferential AnalysisChapter 20NUR 6812Nursing ResearchLizbethQuinonez813
 
9_Different_Statistical_Techniques.pptx
9_Different_Statistical_Techniques.pptx9_Different_Statistical_Techniques.pptx
9_Different_Statistical_Techniques.pptxVangie Esquillo
 
statistics in pharmaceutical sciences
statistics in pharmaceutical sciencesstatistics in pharmaceutical sciences
statistics in pharmaceutical sciencesTechmasi
 
Overview of Multivariate Statistical Methods
Overview of Multivariate Statistical MethodsOverview of Multivariate Statistical Methods
Overview of Multivariate Statistical MethodsThomasUttaro1
 
ch 13 Correlation and regression.doc
ch 13 Correlation  and regression.docch 13 Correlation  and regression.doc
ch 13 Correlation and regression.docAbedurRahman5
 
TYPESOFDATAANALYSIS research methodology .pdf
TYPESOFDATAANALYSIS research methodology .pdfTYPESOFDATAANALYSIS research methodology .pdf
TYPESOFDATAANALYSIS research methodology .pdfMounika711622
 
JSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerJSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerDennis Sweitzer
 
ders 8 Quantile-Regression.ppt
ders 8 Quantile-Regression.pptders 8 Quantile-Regression.ppt
ders 8 Quantile-Regression.pptErgin Akalpler
 
Febr 15th Statitical Inference Ch 2.pptx
Febr 15th Statitical Inference Ch 2.pptxFebr 15th Statitical Inference Ch 2.pptx
Febr 15th Statitical Inference Ch 2.pptxJuankZBk
 
Linear regression
Linear regressionLinear regression
Linear regressionDepEd
 
Common Statistical tools and guides test
Common Statistical tools and guides testCommon Statistical tools and guides test
Common Statistical tools and guides testRONALDARTILLERO1
 
Anova in easyest way
Anova in easyest wayAnova in easyest way
Anova in easyest wayBidyut Ghosh
 
Assessing Mediation in HIV Intervention Studies
Assessing Mediation in HIV Intervention StudiesAssessing Mediation in HIV Intervention Studies
Assessing Mediation in HIV Intervention Studiesfhardnett
 
Statistics Introduction In Pharmacy
Statistics Introduction In PharmacyStatistics Introduction In Pharmacy
Statistics Introduction In PharmacyPharmacy Universe
 
Multivariate and Conditional Distribution
Multivariate and Conditional DistributionMultivariate and Conditional Distribution
Multivariate and Conditional Distributionssusered887b
 
Meta-Analysis in Ayurveda
Meta-Analysis in AyurvedaMeta-Analysis in Ayurveda
Meta-Analysis in AyurvedaAyurdata
 

Similar to A Compact Guide to Biostatistics (20)

Linear regression (1). spss analiisa statistik
Linear regression (1). spss analiisa statistikLinear regression (1). spss analiisa statistik
Linear regression (1). spss analiisa statistik
 
Inferential AnalysisChapter 20NUR 6812Nursing Research
Inferential AnalysisChapter 20NUR 6812Nursing ResearchInferential AnalysisChapter 20NUR 6812Nursing Research
Inferential AnalysisChapter 20NUR 6812Nursing Research
 
Inferential AnalysisChapter 20NUR 6812Nursing Research
Inferential AnalysisChapter 20NUR 6812Nursing ResearchInferential AnalysisChapter 20NUR 6812Nursing Research
Inferential AnalysisChapter 20NUR 6812Nursing Research
 
9_Different_Statistical_Techniques.pptx
9_Different_Statistical_Techniques.pptx9_Different_Statistical_Techniques.pptx
9_Different_Statistical_Techniques.pptx
 
statistics in pharmaceutical sciences
statistics in pharmaceutical sciencesstatistics in pharmaceutical sciences
statistics in pharmaceutical sciences
 
Overview of Multivariate Statistical Methods
Overview of Multivariate Statistical MethodsOverview of Multivariate Statistical Methods
Overview of Multivariate Statistical Methods
 
ch 13 Correlation and regression.doc
ch 13 Correlation  and regression.docch 13 Correlation  and regression.doc
ch 13 Correlation and regression.doc
 
TYPESOFDATAANALYSIS research methodology .pdf
TYPESOFDATAANALYSIS research methodology .pdfTYPESOFDATAANALYSIS research methodology .pdf
TYPESOFDATAANALYSIS research methodology .pdf
 
Final analysis & Discussion_Volen
Final analysis & Discussion_VolenFinal analysis & Discussion_Volen
Final analysis & Discussion_Volen
 
JSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzerJSM2013,Proceedings,paper307699_79238,DSweitzer
JSM2013,Proceedings,paper307699_79238,DSweitzer
 
ders 8 Quantile-Regression.ppt
ders 8 Quantile-Regression.pptders 8 Quantile-Regression.ppt
ders 8 Quantile-Regression.ppt
 
Febr 15th Statitical Inference Ch 2.pptx
Febr 15th Statitical Inference Ch 2.pptxFebr 15th Statitical Inference Ch 2.pptx
Febr 15th Statitical Inference Ch 2.pptx
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Common Statistical tools and guides test
Common Statistical tools and guides testCommon Statistical tools and guides test
Common Statistical tools and guides test
 
Anova in easyest way
Anova in easyest wayAnova in easyest way
Anova in easyest way
 
Assessing Mediation in HIV Intervention Studies
Assessing Mediation in HIV Intervention StudiesAssessing Mediation in HIV Intervention Studies
Assessing Mediation in HIV Intervention Studies
 
02_AJMS_441_22.pdf
02_AJMS_441_22.pdf02_AJMS_441_22.pdf
02_AJMS_441_22.pdf
 
Statistics Introduction In Pharmacy
Statistics Introduction In PharmacyStatistics Introduction In Pharmacy
Statistics Introduction In Pharmacy
 
Multivariate and Conditional Distribution
Multivariate and Conditional DistributionMultivariate and Conditional Distribution
Multivariate and Conditional Distribution
 
Meta-Analysis in Ayurveda
Meta-Analysis in AyurvedaMeta-Analysis in Ayurveda
Meta-Analysis in Ayurveda
 

More from Naira R. Matevosyan, MD, MSJ, PhD

Endorsement by Dorothy Strauss Hutchinson, Boston University
Endorsement by Dorothy Strauss Hutchinson, Boston UniversityEndorsement by Dorothy Strauss Hutchinson, Boston University
Endorsement by Dorothy Strauss Hutchinson, Boston UniversityNaira R. Matevosyan, MD, MSJ, PhD
 
TERATOLOGY: Abridged medical-legal survey (by Dr. Naira Matevosyan)
TERATOLOGY: Abridged medical-legal survey (by Dr. Naira Matevosyan)TERATOLOGY: Abridged medical-legal survey (by Dr. Naira Matevosyan)
TERATOLOGY: Abridged medical-legal survey (by Dr. Naira Matevosyan)Naira R. Matevosyan, MD, MSJ, PhD
 
Old Version: Trademarks and Unfair Competition (by Dr. Naira Matevosyan)
Old Version: Trademarks and Unfair Competition (by Dr. Naira Matevosyan)Old Version: Trademarks and Unfair Competition (by Dr. Naira Matevosyan)
Old Version: Trademarks and Unfair Competition (by Dr. Naira Matevosyan)Naira R. Matevosyan, MD, MSJ, PhD
 
External Ligation of Internal Iliac Artery (E.L.I.I.A) - by Naira R. Matevosyan
External Ligation of Internal Iliac Artery (E.L.I.I.A) - by Naira R. MatevosyanExternal Ligation of Internal Iliac Artery (E.L.I.I.A) - by Naira R. Matevosyan
External Ligation of Internal Iliac Artery (E.L.I.I.A) - by Naira R. MatevosyanNaira R. Matevosyan, MD, MSJ, PhD
 
Emergency Medical Treatment and Active Labor Act (EMTALA), by Naira Matevosyan
Emergency Medical Treatment and Active Labor Act (EMTALA), by Naira MatevosyanEmergency Medical Treatment and Active Labor Act (EMTALA), by Naira Matevosyan
Emergency Medical Treatment and Active Labor Act (EMTALA), by Naira MatevosyanNaira R. Matevosyan, MD, MSJ, PhD
 

More from Naira R. Matevosyan, MD, MSJ, PhD (17)

Alloimmunization of Pregnancy (by Naira Matevosyan)
Alloimmunization of Pregnancy (by Naira Matevosyan)Alloimmunization of Pregnancy (by Naira Matevosyan)
Alloimmunization of Pregnancy (by Naira Matevosyan)
 
Techno-borne organs (by Naira Matevosyan)
Techno-borne organs (by Naira Matevosyan)Techno-borne organs (by Naira Matevosyan)
Techno-borne organs (by Naira Matevosyan)
 
Naira, a Thurgood Marshall moot court judge
Naira, a Thurgood Marshall moot court judgeNaira, a Thurgood Marshall moot court judge
Naira, a Thurgood Marshall moot court judge
 
Queen v. Dudley & Stephens, 14 QBD 273 DC
Queen v. Dudley & Stephens, 14 QBD 273 DCQueen v. Dudley & Stephens, 14 QBD 273 DC
Queen v. Dudley & Stephens, 14 QBD 273 DC
 
Endorsement by Dorothy Strauss Hutchinson, Boston University
Endorsement by Dorothy Strauss Hutchinson, Boston UniversityEndorsement by Dorothy Strauss Hutchinson, Boston University
Endorsement by Dorothy Strauss Hutchinson, Boston University
 
IMMUNOLOGY (by Naira Renault)
IMMUNOLOGY (by Naira Renault)IMMUNOLOGY (by Naira Renault)
IMMUNOLOGY (by Naira Renault)
 
INFECTIOUS AGENTS (by Naira Renault)
INFECTIOUS AGENTS (by Naira Renault)INFECTIOUS AGENTS (by Naira Renault)
INFECTIOUS AGENTS (by Naira Renault)
 
TERATOLOGY: Abridged medical-legal survey (by Dr. Naira Matevosyan)
TERATOLOGY: Abridged medical-legal survey (by Dr. Naira Matevosyan)TERATOLOGY: Abridged medical-legal survey (by Dr. Naira Matevosyan)
TERATOLOGY: Abridged medical-legal survey (by Dr. Naira Matevosyan)
 
Tentative Relaxin-Thymus Pathways
Tentative Relaxin-Thymus PathwaysTentative Relaxin-Thymus Pathways
Tentative Relaxin-Thymus Pathways
 
Stark Law (by Naira Matevosyan)
Stark Law (by Naira Matevosyan)Stark Law (by Naira Matevosyan)
Stark Law (by Naira Matevosyan)
 
Patent Law (by Naira Matevosyan)
Patent Law (by Naira Matevosyan)Patent Law (by Naira Matevosyan)
Patent Law (by Naira Matevosyan)
 
Old Version: Trademarks and Unfair Competition (by Dr. Naira Matevosyan)
Old Version: Trademarks and Unfair Competition (by Dr. Naira Matevosyan)Old Version: Trademarks and Unfair Competition (by Dr. Naira Matevosyan)
Old Version: Trademarks and Unfair Competition (by Dr. Naira Matevosyan)
 
External Ligation of Internal Iliac Artery (E.L.I.I.A) - by Naira R. Matevosyan
External Ligation of Internal Iliac Artery (E.L.I.I.A) - by Naira R. MatevosyanExternal Ligation of Internal Iliac Artery (E.L.I.I.A) - by Naira R. Matevosyan
External Ligation of Internal Iliac Artery (E.L.I.I.A) - by Naira R. Matevosyan
 
SURROGACY CONTRACT (by Naira Matevosyan)
SURROGACY CONTRACT (by Naira Matevosyan)SURROGACY CONTRACT (by Naira Matevosyan)
SURROGACY CONTRACT (by Naira Matevosyan)
 
Ulysses Pact in Obstetrics (by Naira Matevosyan)
Ulysses Pact in Obstetrics  (by Naira Matevosyan)Ulysses Pact in Obstetrics  (by Naira Matevosyan)
Ulysses Pact in Obstetrics (by Naira Matevosyan)
 
Emergency Medical Treatment and Active Labor Act (EMTALA), by Naira Matevosyan
Emergency Medical Treatment and Active Labor Act (EMTALA), by Naira MatevosyanEmergency Medical Treatment and Active Labor Act (EMTALA), by Naira Matevosyan
Emergency Medical Treatment and Active Labor Act (EMTALA), by Naira Matevosyan
 
Sarbanes Oxley Act of 2002 (by Naira Matevosyan)
Sarbanes Oxley Act of 2002 (by Naira Matevosyan)Sarbanes Oxley Act of 2002 (by Naira Matevosyan)
Sarbanes Oxley Act of 2002 (by Naira Matevosyan)
 

Recently uploaded

VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...Miss joya
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalorenarwatsonia7
 
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls ServiceCall Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Servicesonalikaur4
 
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowSonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowRiya Pathan
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingNehru place Escorts
 
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.MiadAlsulami
 
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...narwatsonia7
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...narwatsonia7
 
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...narwatsonia7
 
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girlsnehamumbai
 
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service LucknowVIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknownarwatsonia7
 
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...narwatsonia7
 
Glomerular Filtration and determinants of glomerular filtration .pptx
Glomerular Filtration and  determinants of glomerular filtration .pptxGlomerular Filtration and  determinants of glomerular filtration .pptx
Glomerular Filtration and determinants of glomerular filtration .pptxDr.Nusrat Tariq
 
Aspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliAspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliRewAs ALI
 
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000aliya bhat
 
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service LucknowCall Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknownarwatsonia7
 

Recently uploaded (20)

VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
 
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Servicesauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
 
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls ServiceCall Girls Thane Just Call 9910780858 Get High Class Call Girls Service
Call Girls Thane Just Call 9910780858 Get High Class Call Girls Service
 
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowSonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
 
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hsr Layout Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment BookingCall Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
Call Girls Service Nandiambakkam | 7001305949 At Low Cost Cash Payment Booking
 
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
 
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hosur Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Kanakapura Road Just Call 7001305949 Top Class Call Girl Service A...
 
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
Russian Call Girls Chickpet - 7001305949 Booking and charges genuine rate for...
 
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
 
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
 
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service LucknowVIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
VIP Call Girls Lucknow Nandini 7001305949 Independent Escort Service Lucknow
 
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
 
Glomerular Filtration and determinants of glomerular filtration .pptx
Glomerular Filtration and  determinants of glomerular filtration .pptxGlomerular Filtration and  determinants of glomerular filtration .pptx
Glomerular Filtration and determinants of glomerular filtration .pptx
 
Aspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliAspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas Ali
 
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000Ahmedabad Call Girls CG Road 🔝9907093804  Short 1500  💋 Night 6000
Ahmedabad Call Girls CG Road 🔝9907093804 Short 1500 💋 Night 6000
 
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Hebbal Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service LucknowCall Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
Call Girl Lucknow Mallika 7001305949 Independent Escort Service Lucknow
 

A Compact Guide to Biostatistics

  • 1. A COMPACT GUIDE TO BIOSTATISTICS Naira R. Matevosyan, MD,MSJ,PhD Legal Clinic: nairarenault.wix.com/panther-law Authored Books: nairarenault1.wix.com/nairamatevosyan
  • 2. IN THIS ISSUE: Main Strains in Biostatistics: Descriptive, Inferential, Euclid (3 -7) Data, Variables, Vectors, Valence (8 - 16) Matching and Manipulation: Mediator and Moderator (17) Mode Merits and Demerits (18) Confounding by Indication: Severity, Protopathy, Selection (19-20) Confounding by Indication and Contraindication (21) Collider, Residual Confounding, Reverse Causality (22 -23) Prevalence, Incidence, Duration (24) Reduction and Stratification (25 -26) Diagnostic Tests (S.N.N.O.U.T., S.P.P.I.N.), Predictive Value (27 -29) Reliability, Validity, Accuracy, Precision, Recall (30 -32) Stratum Specific Hyper-prior Distributions (33) Propensity Score, Matching, and Causal Pretzel (34 -35) Level of Evidence: Causal Description v. Causal Explanation (36) Measuring Risk (37 -38) Types of Biases (39 -40) Review Questions and Answers (41 -42)
  • 3. MAIN STRAINS IN BIOSTATISTICS Based on the degree of abstraction DESCRIPTIVE Characterizes a sample or data-sets by actual measurements. INFERENTIAL Assumes each replication in a condition as entirely independent – to create countless challenges. Calculates a test-statistic value, degree of freedom, or rejection criteria – through a particular formula (based on the study design or specifics) to determine whether or not there are differences between the treatment groups. Extrapolates and generalizes the outcomes to make predictions. EUCLID Assumes a set of intuitively appealing axioms to assess a sample through the two, three, four or n-dimensional geometrical canon (detaching time from space) for causality, casualty, and prediction. 3
  • 4. SCENARIOS, EXAMPLES ● Hypothesis: Copper (Cu), Iron (Fe), Manganese (Mn), and Zinc (Zn) insufficiency contributes to suboptimal levels of luteinization hormone (LH) in infertile women of reproductive age. ● Hypothetical Sample: Plasma levels of Cu, Fe, Mn, and Zn from 210 women (age 19-49 y) with primary infertility, 36 hours prior to the proposed ovulation day, over the four consecutive menstrual cycles. ● Descriptive Statistics: Measures the means, standard deviations of plasma trace elements, covariance given by the diagonal and off-diagonal elements, correlations between each and paired variables. A correlation coefficient (r) larger than 0.7 indicates a strong association. ● Inferential Statistics: Stratifies women for ethnicity (for variations in duration of the menstrual cycles), age (19-29, 39-40), BMI, preexisting medical conditions, pelvic inflammatory disease, tubal TB, uterine anomalies, thyroid disorders, anemia, etc. Models the stratified random samples for probability. Measures the posterior kernel densities of parameters. Runs predictive inferences in the fitted model. ● Euclid Statistics: Assesses the data through holomorphic operations, each domain as a complex-valued function of differentiable variables which are manifolds in a spatial unit where the tangent sits with the n-root of differential expression. Data triangulation and inferences at the infinitesimal points help with predictions (check scholar.google.com for Matevosyan N.R. Articles, 2011-2021). 4
  • 5. SOME OF MY BOOKS ON THIS SUBJECT ● “Advanced Research in Comorbodity” ISBN: 9781514787410 ISBN: 9781493553013 5
  • 8. DATA v. VARIABLES ● Data include two sets of values: variables (qualitative, quantitative), and observational units from samples or populations. ● Data and variables are not synonymous. Variables are the data modeled (measured, manipulated, linked, controlled, correlated, compared, indexed) into a function. ● Independent (experimental) variable is manipulated and its effect on the dependent (outcome) variable is measured. The role of a variable depends on a study design: a dependent variable may become independent, a process variable may become a predictor. For example, by changing the temporal order between two variables in causal inference (rates of abortion and unipolar depression in the same community), abortion can be modeled as an independent variable to measure the depression rate (dependent variable) or vice versa. Put simply, depression can be viewed as the outcome of abortion, and abortion can be viewed as the outcome of depression. 8
  • 9. TYPES OF VARIABLES INCLUDED IN THIS PRESENTATION Independent (experimental) variable Dependent (outcome) variable Categorical variable Numerical variable Continuous variable Predictor Process (mediator, intervening) variable Moderator (affector) Latent variable Omitted variable Symbolic variable Hidden variable Hypothetical variable. 9
  • 10. VARIABLES: MODERATOR V. MEDIATOR MEDIATOR explains the relationship between independent and dependent variables (predictor and outcome). For example, in a sedative pill trial on women with anxiety disorder, women's body- mass index (BMI) is modeled as a process variable (mediator) that shows the relation between the independent variable (drug dosage) and the dependent variable (symptoms of anxiety). In the case of total mediation, the relationship between predictor and criterion is reduced to zero, after controlling the mediator - criterion relation. MODERATOR is a third variable (in a zero-order correlation) on which the relationship between the other two depends. While the mediator explains the causality chain, the moderator affects the strength and direction of that chain. Mediator intervenes and moderator interacts. Interaction can be categorical (qualitative) or quantitative. In the same sedative pill study, a moderator is the heavy caffeine intake that worsens the anxiety in women and interacts with the study results. Moderator can also explain variations between the studies. 10
  • 11. 11
  • 12. 12
  • 13. MODERATOR V. CONFOUNDER ● Confounder distorts the association between the predictor and the outcome. ● Moderator differentiates the association between the predictor and the outcome. ● Mediator explains the association between the predictor and the outcome. We typically “adjust” for confounders, and “report” the different effects seen from the effect modifiers.
  • 14. LATENT VARIABLES Latent variables are inferred via mathematical models from other variables that are observed (actually measured). Latent variables (LV) are used in psychology, economics, medicine, artificial intelligence, bioinformatics, speech science, management, or social sciences. Examples are quality of life, confidence, morale, happiness, or liberty - concepts that cannot be measured directly. Sometimes, LV may correspond to aspects of physical reality and therefore be measured as “hidden variables”, and not for practical reasons. LV may also correspond to the abstract concepts (categories, behavioral clusters) and be modeled as “hypothetical variables”. An advantage of using LV is that it reduces data dimensionality (valence). Presenting a "shared variance" or the degree to which variables "move together," the LV link observable (real) data to symbolic (modeled) data. Variables that have no correlation cannot result in a latent construct based on the common factor model.3 (3) Tabachnick, B.G., Fidell, L.S. (2001). Using Multivariate Analysis. Boston: Allyn and Bacon. 14
  • 15. OMITTED VARIABLES Omitted variables are values that can be both cause and result, or independent and dependent variables in the same model. For example, anxiety can be both the cause or the result of unemployment; abortion can be both the cause and the result of depression. Omitted variable bias (OVB) occurs when a model is created by incorrectly leaving out one or more important causal factors, or compensating for the missing factor by underestimating one of the other important factors. Two conditions must hold true for OVB to exist in linear regression: the omitted variable must be: ● a determinant of the dependent variable (when its true, regression coefficient is not equal to zero), and ● correlated with one or more of the included independent variables (the covariance of the omitted variable and the independent variable is not equal to zero). 15
  • 16. VALENCE, VECTORS ● Valence is the dimensionality of the data which can be reduced by the latent variables (see slide 14). ● Vectors of values are implied in regression models toward the matrix. An example: in the modeled formula 16 i = 1.....n; xi is a 1 × p row vector of values of p independent variables observed at time i or for the ith study participant; β is a p × 1 column vector of unobservable parameters to be estimated; zi is a scalar, the value of another independent variable that is observed at time i or for the ith study participant; δ is a scalar, an unobservable parameter (the response coefficient of the dependent variable to zi); ui is an unobservable error occurring at time i or for the ith study participant; ui is an unobserved realization of a random variable having expected value 0 (conditionally on xi and zi); yi is the observation of dependent variable at time i or for the ith study participant. If zi is omitted from the regression, the estimated values of response parameters will be given by usual least squares, = (X'X)-1 β X'Y, where the "prime" notation means the transpose of matrix and the -1 superscript is matrix inversion. Substituting for Y based on the assumed linear model, The OMV is non-zero if z is correlated with any variable on the matrix.
  • 17. MATCHING & MANIPULATION: MEDIATION V. MODERATION ● Matching is used to reduce bias, by evaluating the effect of treatment while comparing the treated and non-treated units in an observational study or quasi-experiment (without no random assignments). ● Experiments explore the effects of things, events, or behaviors that can be manipulated (dose of a medicine, salary, treatment modality). It is harder to measure non-manipulable causes (raw genetic material, age, gender). Those are assessed indirectly in non-experimental studies, using whatever means are available or fit. Finding manipulable agents helps ameliorate the problem. For example, phenylketonuria (PKU) treatment wasn't discovered by first trying different diets in retarded children. Initially, non-manipulable variables were used to find the increased levels of phenylalanine in those kids. Such findings informed the scientific directions leading to the diet – with varying degrees of reduction. Some were experimental, others were not. ● Analogue experiments can be used on non-manipulable causes by manipulating an agent that is similar to the cause of interest. We cannot change a person's race but we can chemically alter the skin pigmentation. Further, past events (which usually are nonmanipulable) may constitute a natural experiment that once was even randomized. Stronger solutions to causality can be achieved by the mediators. ● Mediator v. Moderator: See details on slides 10-13. 17
  • 18. MODE MERITS & DEMERITS ● Mode: Among the mean or median of a series, mode is the most frequent value. It can't be determined from a series of individual observations unless it is converted into a discrete or continuous series. In a discrete series, the value of variable against which the frequency is the largest is the modal value. Mode is measured by where i is the class interval, i1 is the lower limit of modal class, Δ1 is the difference of frequencies between modal class and preceding class, Δ2 is the difference of frequencies between modal class and post-modal class. ● Mode Merits: Mode is not affected by the values of extreme items. For the determination of mode, all values in a series are not considered. ● Mode Demerits: Mode is incapable of further mathematical treatment. Because mode is not based on all observations of a series, it is not rigidly defined. Mode may be unrepresentative in some cases as it may not have a definite value, as in a set of observations two or three or more modal values may occur. 18
  • 19. Confounding by Indication ● As noted in my books, epidemiology is about mastering the concept of confounding. Yet, confounding is not always the “elixir” of causality. ● A confounding variable (hidden or lurking variable) extraneously correlates (directly or inversely) with both dependent and independent variables. A perceived relationship between independent and dependent variables that has been misestimated due to the failure to adjust for confounders is termed a spurious relationship, and the misestimation is known as an omitted variable bias. ● How do we prove confounding? We compute the degree of associations between independent and dependent variables before and after adjusting for a possible confounder. If the difference between the two degrees of association is >10%, a confounding is present and the effect is modified. ● Confounding by indication is when a variable itself is a risk factor (in the non-exposed control group) associated with the exposure of interest – without being an intermediate step in the causal pathway. 19
  • 20. Types of Confounding by Indication Confounding by Indication (CBI) is typical of the observational, pharmaco-epidemiologic studies when exposure is associated with outcome and the latter is caused by indication for what the exposure was used, or by another factor associated with indication. Confusions about CBI are mostly due to the three different situations: (a) CBI as a protopathic bias (b) CBI by severity (c) CBI as a form of selection bias. CBI matters when the severity (or stage) of a disease or a degree of exposure to an agent act as independent variables at the random intercept for each confounder. The degree of confounding depends on the prevalence of putative confounding factors, levels of association with the disease, and the exposure. Where the disease responsible for indication acts as a categorical confounder irrespective of a symptom severity, CBI is due to the protopathic or selection biases. Solution: Including a range of different indications for the same exposures enables the relationship between exposure and outcome triangulated to each of the individual indication analyzed separately. 20
  • 21. Confounding by Indication & Contraindication Confounding by contraindication (CBCI) is a rarer bias, and concerns the non- experimental (observational) studies that examine predictable side effects. Hypothesis: Hypochromic anemia is a side effect of SSRI/SNRI antidepressants. The SSRI/SNRI intake during pregnancy contributes to intrauterine growth retardation (IUGR). CBI Scenario: In women with depression and singleton pregnancies antidepressants are modeled as independent variables, IUGR as the outcome, anemia as a confounder. CBCI Scenario: The SSRI/SNRI and IUGR relationship is distorted because the index group of SSRI/SNRI users will exclude women with prior IUGR. Ignoring the CBCI will result in a reference group of SSRI/SNRI non-users having false “higher rates” of IUGR. CSCI bias can be addressed by the exclusion of multi-gravida women. 21
  • 22. Confounding v. Colliding 22 ● Confounding is when exposure and outcome have a shared common cause that is not controlled by design. ● Collider bias occurs when exposure and outcome (or factors causing these) each concurrently influence a common third variable and that variable (or collider) is controlled by design.
  • 23. Residual Confounding, Reverse Causality Residual confounding is the distortion that remains after controlling for confounding in a study design or analysis. There are three reasons for residual confounding: (1) No efforts are made to consider, collect and adjust for additional factors; (2) There are many errors in grouping the subjects for a confounder analysis; (3) Control of confounding is not vigorous enough. For example, in a randomized trial on women with osteoporosis (where age is a confounder) the sample size is too small and the confounding variable is imprecise (while matching or stratifying age groups, the age distinction is not scored but scaled as “younger,” “young,” “old,” “older”) – resultant in residual confounding. Reverse causality occurs when the probability of an outcome is causally related to the exposure being studied. Put simply, you may think that X causes Y, while in the reality Y cases X. For example, it is hard to prove whether miscarriage was from depression, or depression resulted from a miscarriage? To prevent confusion, the nine criteria must be followed: (1) strength of the association, (2) consistency of findings, (3) specificity, (4) temporal order, (5) exposure gradient, (6) plausible mechanisms, (7) coherence between the observational, epidemiological and lab data, (8) experimental evidence, (9) analogy. 23
  • 24. Prevalence, Incidence, Duration ● Prevalence: A cross-sectional measure of the total number of people in a population (or subjects in a sample) affected by a condition at one point in time. Cannot be used in a prediction analysis (excluding meta-studies with logistic prediction, stochastic compartmental modeling, or Euclid infinitesimal manifolds by Matevosyan 2013, 2015, 2021). ● Incidence: A longitudinal measure showing the number of new cases of a disease (or an event) in a population over a specific period of time. Can be used in a prediction analysis (examples are the S.I.R. model [4], or the reinfection proportion model [5]). ● Duration: Relates incidence to prevalence. For example, upper respiratory infections (URI) have a high incidence (seasonal) but a low prevalence because most URI resolve fast. multiple sclerosis (MS) has a relatively low incidence but high prevalence because it is for life. (4) Kermack,W.O., McKendrick, A.G. (1927). A contribution to the mathematical theory of epidemics. Proceedings of the Royal Society of London; 115(772): 700-721 (5) Wang J.Y., Lee L, N., Lai H.C., et al. (2007). Prediction of the tuberculosis reinfection proportion from the local incidence. The Journal of Infectious Diseases; 196(2): 281–288 24
  • 25. Reduction Reduction is the transformation of numerical data (empirical, trial, lab, digital) into a corrected and simplified form for three reasons: (1) to reduce the number of data records by eliminating invalid or dubious data, (2) to produce summary or aggregate data for various applications, (3) to reduce the occurrence and effect of confounders by comparative analysis. Depending on a study design, reduction controls confounders differently: ● Cross-section - assigns confounders to both (clinical, control) groups equally. ● Cohort - creates (via over-exclusion) comparable cohorts with similar features for possible confounders (age, gender, income, menarche, BMI, etc) ● Double-blind - conceals the experiment group membership. By preventing the participants from knowing if they are receiving treatment or not, the placebo effect should be the same for the control and treatment groups. By preventing the observees from knowing of their membership, there should be no treatment or interpreting bias by the researchers. ● Randomized- the study sample (or population) is divided randomly in order to mitigate the chances of self-selection (by participants) or bias (by researchers). Prior to the trial, a random number generator is used to assign participants to the intended groups (control, intervention, parallel). 25
  • 26. Stratification Stratification is about dividing the population into distinct groups or subsets (strata) in each independent sample. ● For example, protected sex may prevent prostate cancer and in this equation, age is assumed to be a confounder. Therefore, the sampled data are stratified by age groups to analyze the degree of association between safe sex practices and prostate cancer. If different age groups (strata) yield substantially diverse risk ratios, age must be viewed as a confounding variable. There are statistical tools, among them Mantel–Haenszel iterates, that control confounding effects by measuring the known confounders and including them as covariates in multivariate analyses. However, the multivariate analyses reveal much lesser information about the strength of the confounding effects than do stratification methods. 26
  • 27. Diagnostic Tests ● True positive (Tp): Disease is present and diagnostic test is positive (a correct result). ● True negative (Tn): Disease is absent and diagnostic test is negative (a correct result). ● False positive (Fp): Disease is absent and diagnostic test is positive (an incorrect result). ● False negative (Fn): Disease is present and diagnostic test is negative (an incorrect result). It is also known as type-2 error. ● PREVALENCE: The number of affected persons of the total sampe (or population) = (Tp + Fn)/(Tp + Tn +Fp + Fn) ● SENSITIVITY: Assuming the disease is present, the probability that the test will be positive Tp/(Tp + Fn). Used in imaging or screening that have few negatives. A highly sensitive test rules out the disease: SNNOUT (sensitive, negative result rules out a disease). ● SPECIFICITY: Assuming the disease is absent, the probability that the test will be negative Tn/(Tn + Fp). Used in confirming clinical diagnoses as there are few false positives. A highly specific test rules-in the disease: SPPIN (spcific, positive result rules in a disease). 27
  • 28. Sensitivity v. Specificity (continued) There is a tradeoff between sensitivity and specificity. Changing the cutoff value for the serum psychosine (to < 15 ng/mL) will change the test's ability to detect the affected newborns with Krabbe disease. Likewise, if the serum copper cutoff for diagnosing Wilson's disease were moved from 20 g/dL to 15 g/dL, the test would be μ μ very specific because any child with Cu level of 15 g/dL would certainly have μ Wilson's disease (with a very few Fp results). However, the results would be insensitive because patients with serum Cu reading of 15 g/dL would have μ Fn results (when the normal is > 20 g/dL) . μ 28
  • 29. Predictive Value ● A reminder: Sensitivity = Tp/(Tp + Fn); Specificity = Tn/(Tn +Fp). ● Positive Predictive Value (PPV): Given the test is positive, it is the probability that a disease is present. PPV = Tp/(Tp + Fp) If MRI has a 95% specificity of a spinal cord tumor, based on positive findings, the patient will trully have the tumor 95% of the time. ● Nevative Predictive Value (NPV): Given the test is negative, it is the probability that a disease is absent. NPV = Tn/(Tn + Fn) If Epstein Barr Virus (EBV) test has a 99% NPV, then given a negative test, the patient will trully be EBV-negative 99% of the time. ● Note: PPV and NPV vary depending on disease prevalence in a population. Yet, sensitivity will not be affected because Tp/(total number of people with disease) ratio will not change for a given test but the Tp/(total number of positive tests) will vary because the area with a higher prevalence will have a higher number of positive tests. 29
  • 30. Reliability, Validity, Accuracy, Precision ● RELIABILITY – the measure of consistency of a test; the likelihood that upon repetition the test will deliver the same results in the same situation. ● VALIDITY – the ability of a test to measure what it intends to measure. ● A test may be reliable but not valid. It may reliably measure the serum level of selenium; yet, this doesn't inherently mean that the reliable level of selenium is a valid predictor of Grave's disease. ● ACCURACY - is analogous to validity, relates to constant error, and measures a test's ability to obtain true results. In a binary case, Ac = (Tp +Tn)/(Tp +Tn + Fp+ Fn). ● PRECISION - is analogous to reliability, relates to variable error, and measures a test's ability to replicate results. In a binary model, Pr = Tp/(Tp +Fp). ● Accuracy is the degree of closeness to the true value. Precision is the degree to which repeated measurements under unchanged conditions show the same results. The precision value lies between 0 and 1. 30
  • 31. Precision & Recall ● A measurement system is considered valid if it is both accurate and precise. ● RECALL – Measures a test's accuracy in a binary model, i.e. out of the total positive what percentage is predicted positively? It is the same as TPR (true positive rate): R = Tp /(Tp + Fn) ● F1 SCORE: The harmonic mean of precision and recall that takes both false positives (Fp) and false negatives 31 (Fn) into account. It performs well on imbalanced datasets by giving the same weight to recall (Rc) and precision (Pr): F1 score = 2/(1/Pr + 1/Rc) = (2Pr x 2 Rc)/(Pr + Rc) = Tp/ (Tp + Fp/2 + Fn/2). ● Different problems give different weights to recall or precision. The weighted F1 score interprets it: Fβ =(1+β2 ) x (Pr x Rc)/([β2 x Pr] +Rc), where β represents the number of times when a recall is more important than precision.
  • 32. Precision & Recall ● A measurement system is considered valid if it is both accurate and precise. ● RECALL – Measures a test's accuracy in a binary model, i.e. out of the total positive what percentage is predicted positively? It is the same as TPR (true positive rate): R = Tp /(Tp + Fn) ● F1 SCORE: The harmonic mean of precision and recall that takes both false positives (Fp) and false negatives 32 (Fn) into account. It performs well on imbalanced datasets by giving the same weight to recall (Rc) and precision (Pr): F1 score = 2/(1/Pr + 1/Rc) = (2Pr x 2 Rc)/(Pr + Rc) = Tp/ (Tp + Fp/2 + Fn/2). ● Different problems give different weight to recall or precision. The weighted F1 score interpretes it: Fβ =(1+β2 ) x (Pr x Rc)/([β2 x Pr] +Rc), where β represents the number of times when recall is more important than precision.
  • 33. Stratum Specific Hyper-prior Distributions ● Problem Definition: Bias models in biostatistics are often used for a sensitivity analysis where bias is a function (although, occasionally it becomes part of Bayesian analysis). Conventional analysis of observational data looks like a stratum-specific process that only quantifies random errors, leaving the scholars to rely on informal judgments as to the bias effects. ● Conventional Solutions: Assessment of uncertainty is an essential part of inference and requires a model with parameters that measure departures from the dubious assumptions. The most notable models are the confidence profile method which incorporates bias models into the likelihood function, or Monte Carlo sensitivity analysis (MCSA) which samples bias parameters, then inverts the bias model to provide a distribution of ‘bias-corrected’ estimates. ● Hyperprior Approximation: Bayesian and MCSA outputs depend entirely on the prior-distributions p(η) that reintroduce the problem of basic sensitivity analysis. Given the limitless possibilities for p(η), a thorough sensitivity analysis would only illustrate how various conclusions can be reached. A conclusion about the target would require constraints on the p(η). These limits would constitute a subjective prior on priors (a hyperprior); incorporating them into the analysis would produce a subjective average of results over the hyperprior. This result would itself be subjected to concerns about sensitivity to the hyperprior, which would continue on into an infinite regress which is impractical. There is nothing spurious about the quantification if the hyperprior approximates the views of the analyst, as then the output gives the analyst an idea of what his/ her posterior bets about the value of target should be. 33
  • 34. Propensity Score Propensity score (PS) is a probability of treatment assignment conditional on observed baseline characteristics. It allows to design and analyze an observational (non-randomized) study so that it mimics some of the characteristics of a randomized controlled trial. It is a balancing score where the distribution of observed baseline covariates is similar between treated and untreated subjects. ei = Pr(Zi = 1|Xi) where ei is the PS, Zi denotes the binary treatment condition (Zi=1, if patient i is in the treatment group and Zi=0, if patient i is in the control group), Pr - the conditional probability of treatment, Xi vector of covariates. There are four different applications of PS: ● matching on the propensity score ● stratification on the propensity score ● inverse probability of treatment weighing by using the propensity score ● covariate adjustment using the propensity score. (continued) 34
  • 35. Matching & Causal Pretzel There are several methods of forming matched pairs in treated and compared subjects for the propensity score: (1) Matching with and without replacement (2) Greedy matching – where the first treated subject is selected randomly (3) Caliper matching - using a proportion of standard deviations of a logit for propensity score. Causal Pretzel: Experiments test the influence of descriptive causes or inus conditions. They do not completely explain a phenomenon; rather they aim to identify whether a variable (or a set) makes a marginal difference in an outcome - among other factors affecting that outcome. Many costly scientific studies (including randomized trials) do not necessarily bring home results. In part, to limit the cost of contingencies, researchers undergo extensive training to be able to make smart inclusions and matching. Even then, substantial judgment is still required as the exact choice may depend on the diagnosis, lab results, insurance resources, ethics constraints, and the cost of such arrangements still remains high. In this aspect, meta-studies are a great asset, for measuring moderators (that once were experimental) for propensity scores, for further testing invariance or reduction. This framework reminds a pretzel. 35
  • 36. Level of Evidence: Causal Description v. Causal Explanation The level of evidence outlined by Sackett (2000) [5]: 1A = Systematic review of randomized controlled trials (RCT) 1B = RCT with narrow confidence interval 1C = All or none case series 2A = Systematic review of cohort studies 2B = Cohort study 2C = Outcomes research 3A = Systematic review of case-controlled studies 3B = Case-controlled study 4 = Case series, poor cohort case-controlled 5 = Expert opinion. (5) Sackett D.L., Strauss S.E., Richardson W.S., et al (2000). Evidence-Based Medicine: How to Practice and Teach EBM. Philadelphia (PA): Churchill-Livingstone The strength of an experiment or observation is in describing outcomes attributable to varying treatments (causal description). Yet, trials or observations do less when clarifying causal chains or confounding (causal explanation). Meta-studies pool the results of similar studies to increase statistical power (rejecting Fn). This depends on how a meta- study preserves or changes the provided causal description into causal explanation. 36
  • 37. Measuring Risk EXPOSURE: From the 2 x 2 table, the probability that the event will occur in the exposed group is given by risk in exposed = a/(a+b), and in the unexposed (control) group it is given by risk in unexposed = c(c+d). RISK DIFFERENCE (RD) = risk in exposed - risk in unexposed, or vice versa. There are several ways to express RD. – Absolute Risk Reduction (ARR): The reduction of incidence associated with treatment. ARR = risk in the control group – risk in the treatment group. – Attributable Risk (AR): Increase in disease incidence associated with an exposure. AR = risk in exposed - risk in unexposed. – Number Needed to Treat (NNT): Number of patients required to receive an intervention before an adverse outcome is prevented. NNT = 1/ARR. – Number Needed to Harm (NNH): Used for interventions or exposures that may be detrimental. NNH = 1/AR. 37
  • 38. Measuring Risk (continued) ● Relative Risk or Risk Ratio (RR): The ratio of incidence in two groups. RR =risk in exposed / risk in unexposed. RR > 1 indicates harm, RR < 1 indicates treatment, RR = 1 indicates null effect. ● Relative Risk Reduction (RRR): The percentage of a disease prevented by treatment. RRR = (risk in unexposed – risk in exposed)/risk in unexposed = ARR/baseline risk. ● Excess Relative Risk (ERR): For harmful exposure, ERR =(risk in exposed – risk in unexposed)/risk in unexposed. ● Odds: The ratio of a probability of an outcome to the probablity of not having the outcome. Odds =p/(1-p). ● Odds Ratio (OR): The odds of an event in the exposed group divided by the odds of the event in an unexposed group. In a 2 x 2 table, OR = (a/b)/(c/d) = ad/bc. In case-control studies, OR is used instead of RR because RR can't be calculated from a study data owing to purposeful oversampling of cases in the study design. OR approximates RR if the outcome is rare. 38
  • 39. Types of Biases ● Confounding: A third variable relates to both exposure and outcome and distorts the association of interest. Solution: Matching. ● Selection Bias: Non-randomly assigned unsimilar baseline groups. – Sampling (Ascertainment) Bias: The sample doesn't accurately represent the population of interest. These studies have internal validity but lack external validity (generalization). Solution: Random sampling. – Susceptibility Bias: Sicker patients are selected for more invasive treatment. Solution: Randomization. – Attrition Bias: If loss pr follow-up is uneven between the groups, it makes an intervention group seem more effective than it is. Solution: Gathering as much data as possible from dropouts. ● Measurement Bias (Hawthorne Effect): During the study, participants change their behaviors. Solution: Placebo group. ● Recall Bias: The memory of exposure may be affected by the patient's knowledge of the current disorder. Solution: Prospective study or data triangulation with confirmatory and objective sources. 39
  • 40. Types of Biases (continued) ● Lead-time Bias: Early detection of a disease may be misinterpreted as improving survival. Solution: Adjusting survival rates according to the severity of disease, not from the detection date. ● Late-look Bias: Data are collected too late for useful conclusions because subjects with terminal diseases are either dead or incapable of timely responding. Solution: Stratify by severity. ● Omission Bias: Removing or absence of certain variables resultant in unfitness of the model for regression analysis. Solution: Reiterative truncated projected least squares (BP-RTPLS). ● Procedural Bias: Subjects are treated differently depending on the arm of the study. Solution: Double-blind study. ● Experimenter Expectancy Bias (Pygmalion Effect): The researchers' ambitions influence the outcome of the study. Solution: Double-blind study will prevent researchers and subjects from knowing to which arm of the study the subjects are assigned. ● Funding (Sponsorship) Bias: The tendency to skew study results to support the sponsor's goal or mission. Solution: Independent audit. 40
  • 41. Review Questions 1)A randomized control trial studied the benefits of a new Lupus Nephritis medication. Of 80 subjects on medication, only 10 had hematuria. Twenty-five (25) participants out of 80 in the control group developed hematuria. Make a 2 x 2 table to calculate the incidence in the exposed and unexposed groups as well as ARR, NNT, RR, and RRR for medication. 2)A case-control study examines risk factors for oral cancer. Sixteen (16) subjects with oral cancer are sampled for the treatment group and 16 participants are selected as controls. Ten subjects with oral cancer are heavy smokers and four without oral cancer smoke too. Construct a 2 x 2 table to calculate the odds ratio (OR). Given the data above, can we compute the prevalence of oral cancer? Why should we calculate OR and not RR? 41
  • 42. Review Answers 42 1) Risk in exposed = A/(A + B) = 10/80 = 0.125 = 12.5% Risk in unexposed = C/(C+D) = 25/80 = 0.313 = 31.3% ARR = 31.3% - 12.5% = 18.8% NNT = 1/AAR = 1/0.188 = 5.3 RR = Risk in exposed/Risk in unexposed = 12.5%/31.3% = 0.4 = 40% RRR = (Risk in unexposed – Risk in exposed)/Risk in unexposed = (31.3% -12.5%)/31.3% = 0.6 = 60% 2) OR = (A/B)(C/D) = AD/CB = 120/24 = 5 prevalence as we sampled two equal-size groups (exposed, unexposed). In a case-control study like this, OR is measured, not RR. The odds of having oral cancer in smokers are 5 times those of non- smokers. We can't calculate