SlideShare a Scribd company logo
1 of 21
Download to read offline
24/09/2015
1
To explain or to predict
Kathrine Frey Frøslie, statistician, PhD
Norwegian advisory unit for women’s health, Oslo university hospital
Oslo Centre for Biostatistics and Epidemiology, University of Oslo
Overview
Epidemiology
The research process
Aim vs statistical methods
Regression analysis
To explain or to predict: Explain Mechanisms
Causality
DAGs
Exposure & outcome
Confounder, Mediator, Collider
Direct and indirect effects
To explain or to predict: Predict Diagnostic tests, Forecasting
Personalized medicine
Statistical learning, big data, black box
Prediction error: Bias vs variance
Constructing a predictor
Thom R. Prédire N’est Pas Expliquer (1991)
Shmueli G. To explain or to predict (2010)
Abdelnoor M, Sandven I: Etiologisk versus prognostisk strategi i klinisk
epidemiologisk forskning (2006)
EPIDEMIOLOGY
The study of the occurrence and distribution of health-related states or events in
specified populations, including the study of DETERMINANTS influencing such states,
and the application of this knowledge to control the health problems.
Study includes surveillance, observation, hypothesis testing, analytic research, and
experiments. Distribution refers to analysis by time, place, classes or subgroups of
persons affected in a population or in a society. Determinants are all the physical,
biological, social, cultural, economic and behavioral factors that influence health.
Health-related states and events include diseases, causes of death, behaviors,
reactions to preventive programs, and provision and use of health services. Specified
populations are those with common identifiable characteristics. Application… to
control… makes explicit the aim of epidemiology – to promote, protect, and restore
health.
The primary “knowledge object” of epidemiology as a scientific discipline are
causes of health-related events in populations.In the last 70 years, the definition
has broadened from concern with communicable disease epidemics to take in all
processes and phenomena related to health in populations. Therefore epidemiology
is much more than a branch of medicine treating epidemics.
Porta M: A Dictionary of Epidemiology, Fifth Edition, 2008
John Snow, 1854
“To summarize, it is not reasonable, in our view, to
attribute the results to any special selection of cases or to
bias in recording. In other words, it must be concluded
that there is a real association between carcinoma of the
lung and smoking.”
“...it is concluded that smoking is an important factor in
the cause of carcinoma of the lung.”
DOLL R, HILL AB. Smoking and carcinoma of the lung.
BMJ 1950;4682:739-748.
24/09/2015
2
Ill: Åshild Irgens
Marit B Veierød, 2015: Melanoma incidence on the rise again
APPLIED EPIDEMIOLOGY
The application and evaluation of epidemiological knowledge and methods (e.g. in
public health or in health care). It includes applications of etiological research,
priority setting and evaluation of health care programs, policies, technologies, and
services. It is epidemiological practice aimed at protecting and/or improving the
health of a defined population. It usually involves identifying and investigating health
problems, MONITORING changes in health status, and/or evaluating the outcomes
of interventions. It is generally conducted in a time frame determined by the need to
protect the health of an exposed population and an administrative context that
results in public health action. See also FIELD EPIDEMIOLOGY*; HOSPITAL
EPIDEMIOLOGY**.
Porta M: A Dictionary of Epidemiology, Fifth Edition, 2008
Problem of interest
Aim
Main outcomeDesign
Statistical analysis
Categorical
Lung
cancer
Continuous
Blood
glucose
Censored
Life
time
Experimental
RCT
Observational
Case-Control
Cohort
Regression analyses
REGRESSION ANALYSIS
Given data on a regressand (dependent variable) y and one or more regressors
(covariates or independent variables) x1, x2, etc., regression analysis involves finding
a mathematicalmodel (within some restricted class of models) that adequately
describes y as a function of the x’s, or that predicts y from the x’s. The most
common form of model for an unbounded continuous y is a linear model; the logistic
and proportional hazards models are the most common forms used when y is binary
or a survival time, respectively.
Porta M: A Dictionary of Epidemiology, Fifth Edition, 2008
Regressand Regressor
Response Explanatory variable
Dependent variable
y
Independent variable
x
Outcome Covariate
Output
Endogenous variable
⁞
Predictor
Determinant
Exogenous variable
⁞
Response
Continuous
Ex Birth weight
Categorical
Ex Lung cancer/no lung cancer
Ex 3 different types of lung cancer
Ex 3 different stages of lung cancer
Survival (Censored)
Common regression model
Linear
Logistic
Binary logistic
Nominal logistic
Ordinal logistic
Cox
24/09/2015
3
ε+++++= 443322110 xBxBxBxBBy
The price of a used car: y
depends on
Age, Mileage, Size, Brand, Colour, Rust,…
Linear function
ε+++++= 544
2
3322110 ln xxBxBxBxBBy
Generalised linear function
The price of a used car: y
depends on
Age, Mileage, Size, Brand, Colour, Rust,…
ε+++++= 544
2
3322110 ln)( xxBxBxBxBByg
Linear function
WHAT IS YOUR RESEARCH QUESTION? (AIM?)
Write your aim in one sentence.
WHAT IS YOUR DESIGN?
WHAT IS YOUR OUTCOME?
HOW WOULD YOU DO A REGRESSION ANAYSIS ON YOUR DATA?
To explain
Understanding
Mechanisms
Expert knowledge
What is the best estimate for the association
between the main exposure and the main
outcome?
Etiology
Causality
CAUSALITY
The relating of causes to the effects they produce. The property of being causal. The
presence of cause. Ideas about the nature of the relations of cause and effect. The
potential for changing an outcome (the effect) by changing the antecedent (the
cause).
Most of clinical, epidemiological, and public health research concerns causality. In
the health and life sciences, causality is often established by integration of biological,
clinical, epidemiological, and social evidence, as appropriate to the hypothesis at
stake.
Porta M: A Dictionary of Epidemiology, Fifth Edition, 2008
24/09/2015
4
“Necessary” and/or “sufficient” causes
Examples
Measles virus is necessary and sufficient to cause measles in an
unimmunized individual.
Mycobacterium tuberculosis is the necessary cause of tuberculosis, but often
is not a sufficient cause without poverty, poor nutrition etc
Smoking is sufficient to cause lung cancer, but not necessary, as lung cancer
can also be caused by radon gas, asbestos fibres etc.
Giving a lecture is not necessary to make me stressed, as many other things
stress me. Also, it is not sufficient, as I usually enjoy giving lectures. However,
sometimes giving a lecture causes stress.
Counterfactual theory
This is the story of what didn’t happen
Det var synd!
Hernán MA, Robins JM (2016). Causal Inference.
‘Any cause of disease will itself have causes. And each of these causes will
have causes, in a theoretically infinite causal chain. Acute myocardial
infarction is caused by (among other things) atherosclerosis of the coronary
arteries, which is caused by (among other things) high plasma cholesterol
concentrations, which is caused by (among other things) a high dietary
intake of fat, which is caused by (among other things) living in 21st century
western society, and so on. In a sense, all diseases have a potentially
infinite set of causes and so can never be completely understood.’
Coggon D, Martyn C. Time and chance: the stochastic nature of disease causation.
Lancet 2005;365:1434-1437.
The infinite causal chain
Causality
What is causality?
The Counterfactual concept
Interventions and causality
Consequences of actions
Philosophic
background
Ultimate goal:
Action!
24/09/2015
5
Implantation of
blastocysts from RAG-
deficient mice into
pseudo-pregnant mothers
results in the
development of viable
mice that lack mature B
and T cells. However, if
normal embryonic stem
(ES) cells are injected into
RAG-deficient blastocysts,
somatic chimaeras are
formed, which develop
mature B and T cells.
24/09/2015
6
Problem of interest
Aim
Main outcomeDesign
Statistical analysis
Categorical
Lung
cancer
Continuous
Blood
glucose
Censored
Life
time
Experimental
RCT
Observational
Case-Control
Cohort
Regression analyses
Problem of interest
Aim
Main outcomeDesign
Statistical analysis
Categorical
Lung
cancer
Continuous
Blood
glucose
Censored
Life
time
Experimental
RCT
Observational
Case-Control
Cohort
Regression analyses
Based on expert knowledge of the topic under investigation, we want to
estimate the association between an exposure and an outcome as
unbiasedly as possible.
Knowledge of the topic makes it plausible that the estimated association
can be interpreted as an effect, i. e. causal, i.e. as a quantification of
mechanisms.
The expert knowledge about the topic is formalised in a graph of the
variables studied, a Directed Acyclic Graph (DAG).
In a DAG, one variable is defined as the main outcome, and one variable
is defined as the main exposure.
Expert knowledge is used to define other variables as either
a confounder, a mediator, or a collider.
DIRECTED ACYCLIC GRAPH (DAG)
See CAUSAL DIAGRAM.
CAUSAL DIAGRAM
(Syn: causal graph, path diagram) A graphical display of causal relations among
variables, in which each variable is assigned a fixed location on the graph (called a
node) and in which each direct causal effect of one variable on another is
represented by an arrow with its tail at the cause and its head at the effect. Direct
noncausal associations are usually represented by lines without arrowheads.
Graphs with only directed arrows (in which all direct associations are causal) are
called directed graphs.
Graphs in which no variable can affect itself (no feedback loop) are called acyclic.
Algorithms have been developed to determine from causal diagrams which sets of
variables are sufficient to control for confounding, and for when control of variables
leads to bias.
Porta M: A Dictionary of Epidemiology, Fifth Edition, 2008
RE RE
Exposure and response
RE RE RE
Confounder, mediator, collider
24/09/2015
7
RE RE RE
Confounder, mediator, collider
Confounder ColliderMediator
Note: The presence of either of these may affect the association of interest.
Hence, including either of these in regression analyses may change effect
estimates.
Confounding
Confounding is bias of the estimated effect of an exposure on an outcome
due to the presence of a common cause of the exposure and the outcome.
Important in observational designs. May lead to underestimation,
overestimation, or even change the sign of the estimated effect.
A confounder is a variable that is associated with the outcome (either as a
cause or a proxy for a cause, but not as an effect of the disease),
associated with the exposure, and not an effect of the exposure. The
definition of confounding may also include bias due to baseline differences
in exposure groups in the risk factor for the outcome, although this may be
considered as selection bias.
Confounding can be reduced by proper adjustment. Exploring data is not
sufficient to identify whether a variable is a confounder, and such
evaluation of confounding may lead to bias. Other evidence like
pathophysiological and clinical knowledge and external data is needed.
DAGs are useful tools when considering confounding variables.
Ex: Shopping time vs estradiol level. Simulated data.
24/09/2015
8
Ex: Shopping time vs estradiol level. Simulated data.
Females
Males
Ex: Shopping time vs estradiol level. Simulated data.
Ex: Shopping time vs estradiol level. Simulated data.
Females
Males
ShoppingEstradiol
levels
Hours
Frequency
Intensity
Sex
Confounder:
Sex or gender?
Biomarker
Mediator
In contrast to the confounder, a mediator represents a step
in the causal pathway between the exposure and the
outcome.
Such a variable will also be associated with both the
exposure and the outcome.
Example: Mediators between maternal BMI and birth weight
Path diagram showing a decomposition of the hypothesized effect of early
pregnancy BMI on birth weight. The indirect pathways between BMI and
birth weight were hypothesized to be mediated by fasting glucose (nutrient
availability), the interleukins IL-1Ra or IL-6 or a combination of these.
Friis et al. The interleukins IL-6 and IL-1Ra: a mediating role in the associations
between BMI and birth weight?
24/09/2015
9
Direct effects and indirect effects
Depends on the level of explanation. When available knowledge
allows for it, a direct effect may be split into components, i.e.
indirect effects via mediators.
Example: Mediators between maternal BMI and birth weight
Path diagram showing a decomposition of the hypothesized effect of early
pregnancy BMI on birth weight. The indirect pathways between BMI and
birth weight were hypothesized to be mediated by fasting glucose (nutrient
availability), the interleukins IL-1Ra or IL-6 or a combination of these.
Friis et al. The interleukins IL-6 and IL-1Ra: a mediating role in the associations
between BMI and birth weight?
Decomposition of the total effect of maternal BMI on birth weight.
The total effect is the sum of all arrows, that is, the direct and indirect
effects.
The arrow widths represent the relative proportions of the total effect
through a specific pathway.
COLLIDER
A variable directly affected by two or more other variables in the causal diagram.
Porta M: A Dictionary of Epidemiology, Fifth Edition, 2008
Ex: Post-traumatic stress after terror attack. Simulated data.
Grouped after
length of
hospital stay:
> 2 weeks
< 2 weeks
Ex: Post-traumatic stress after terror attack. Simulated data.
24/09/2015
10
Ex: Post-traumatic stress after terror attack. Simulated data.
Grouped after
length of
hospital stay:
> 2 weeks
< 2 weeks
RE RE RE
Confounder, mediator, collider
Confounder ColliderMediator
Note: The presence of either of these may affect the association of interest.
Hence, including either of these in regression analyses may change effect
estimates.
> 2 weeks
< 2 weeks
Females
Males
Confounder.
Correct to adjust
Only expert knowledge can tell us what to do
> 2 weeks
< 2 weeks
Females
Males
Collider.
Wrong to adjust
Regression analysis recipe (kind of)
When the ultimate goal is to understand mechanisms, and to estimate
associations between an exposure and an outcome as unbiasedly as
possible:
Use expert knowledge to identify exposure & outcome, confounders,
colliders & mediators. Use DAGs to clarify and communicate.
Find the crude association between exposure and outcome
Measured variables:
Adjust for confounders
Do not adjust for colliders
Sometimes adjust for mediators
Unmeasured variables:
Sensitivity analysis
Alas, real world may not be so simple.
Alas, real world may not be so simple:
Expert knowledge may be lacking or inconclusive, regarding which
variables and arrows to include or leave out in the DAG.
Additional variables may come into consideration as confounders
e.g. for indirect effects, resulting in a very complex DAG.
It may be hard to tell (based on present knowledge) the direction
of a causal effect, e.g. whether a variable is a confounder or a
mediator, or a mediator or a collider.
Feed-back may be of concern.
Time-dependent covariates may exist.
24/09/2015
11
Topics not covered
Inverse probability weighting
Dynamic path analysis
Time-dependent confounders
Marginal structural models
Latent variables
Structural equation modelling
Background:
There is an association between giving birth to a large baby, and
subsequent maternal diabetes.
Draw a DAG that can clarify potential mechanisms behind this
association.
Discussion exercises
Background
It is controversial whether maternal hyperglycemia less severe than that in diabetes
mellitus is associated with increased risks of adverse pregnancy outcomes.
Hyperglycemia and Adverse Pregnancy Outcomes
The HAPO Study Cooperative Research Group
N Engl J Med 2008;358:1991-2002
Methods
A total of 25,505 pregnant women underwent 75-g oral glucose-tolerance testing at
24 to 32 weeks of gestation. Data remained blinded if the fasting plasma glucose
level was <5.8 mmol/l and the 2-hour plasma glucose level was <11.1 mmol/l.
Primary outcomes: Birth weight above the 90th percentile for gestational age,
primary cesarean delivery, clinically diagnosed neonatal hypoglycemia, and cord-
blood serum C-peptide level above the 90th percentile.
For continuous-variable analyses, odds ratios were calculated for a 1-SD increase
in fasting, 1-hour, and 2-hour plasma glucose levels. As prespecified, to assess
whether the log of the odds of each outcome was linearly related to glucose
level, we added squared terms for glucose level for each adverse pregnancy
outcome to assess whether there were significant quadratic associations.
For each outcome, two logistic models were fit.
Model I included adjustment for center or the variables used in estimating the
90th percentile for birth weight for gestational age (infant's sex, race or ethnic
group, center, and parity).
For associations of glycemia with primary outcomes, each glucose measurement
was considered as both a categorical and a continuous variable in multiple logistic-
regression analyses. For categorical analyses, each measure of glycemia was divided
into seven categories, such that the 1- and 2-hour plasma glucose measures
reflected data for approximately the same number of women in each category as
did the fasting plasma glucose measure.
Model II included adjustment for multiple potential prespecified
confounders, including age, body-mass index (BMI), smoking status, alcohol
use, presence or absence of a family history of diabetes, gestational age at
the oral glucose-tolerance test, sex of the infant, parity (0, 1, or ≥2, except
for primary cesarean deliveries), mean arterial pressure and presence or
absence of hospitalization before delivery (except for preeclampsia), and
presence or absence of a family history of hypertension and maternal
urinary tract infection (for analysis of preeclampsia only). Height was also
included as a potential confounder, on the basis of post hoc findings of an
association with birth weight greater than the 90th percentile, and two
prespecified confounders (maternal urinary tract infection and previous
prenatal death) were excluded from primary and secondary outcome
analyses when neither was found to be related to any primary outcome or to
affect primary outcome–glucose associations.
Only the fully adjusted model results are presented in this report.
Please discuss this approach with your nearest neighbours.
The HAPO Study Cooperative Research Group. N Engl J Med 2008;358:1991-2002.
Frequency of Primary Outcomes across the Glucose
Categories.
24/09/2015
12
Background:
Physical activity is beneficial to a variety of physiological
processes, including the metabolisation of nutritients.
Might there also be positive effects on bone metabolism,
as measured by bone mineral density?
Aim:
Estimate the association between physical activity and
bone mineral density.
Measured variables:
Physical activity
BMD
Age Draw the DAG.
Are there unmeasured variables that should be taken into
consideration? If so, include them in the DAG.
When studying the association between milk consume and
bone mineral density, should one adjust for calcium
supplements?
Draw the DAG.
To predict
Does not have to explain/understand mechanisms, as
long as it predicts well.
Diagnostic tests
Ex Diabetes diagnosis
Melanoma screening based on picture and
blood samples
Expected devlopment in disease (e.g. prognosis after
sepsis)
Identify high-risk groups/profiles, compute risk scores
Personalised medicine
Diagnostic tests Diagnostic tests
24/09/2015
13
Forecasting
[slide source: Birgitte Freiesleben deBlasio, Oslo Center for Biostatistics and Epidemiology]
Precision medicine
Aim:
Tailoring of medical treatment to the
individual characteristics, needs, and
preferences of a patient during all stages
of care, including prevention, diagnosis, treatment, and follow-up.
Provide the right patient with the right drug at the right dose at the right time.
Methods:
Diagnostic tests to determine which medical treatments will work best for each
patient.
Personalized Health Care is based on systems biology and uses
predictive tools to evaluate health risks and to design personalized health plans.
Ex Predict a person’s risk for a particular disease, based on one or several genes.
Initiate preventative treatment before the disease presents itself in their patient.
Personalized medicine
Refers to a vast set of tools for modelling and understanding data, i.e. to
extract important patterns and trends in complex data sets.
These tools can be classified as supervised or unsupervised.
Supervised statistical learning involves building a statistical model for
predicting, or estimating, an output based on one or more inputs.
Statistical learning is a recently developed area in statistics and blends
with parallel developments in computer science and, in particular, machine
learning. The field encompasses many methods such as the lasso and sparse
regression, classification and regression trees, and boosting and support
vector machines.
With the explosion of “Big Data” problems, statistical learning has become
a very hot field in many scientific areas as well as marketing, finance,
and other business disciplines.
Statistical learning
24/09/2015
14
What is inside the black box?
“BLACK BOX”
1. A method of reasoning or studying a problem in which the methods and
procedures are not described, explained, or perhaps even understood.Nothing
is stated or inferred about the method; discussion and conclusions relate solely
to the empirical relationships observed.
2. A method of formally relating an input (e.g., quantity of a drug administered,
exposure to a putative causal factor) to an output or an observed effect (e.g.,
amount of the drug eliminated, disease), without making detailed assumptions
about the MECHANISMS that have contributed to the transformation of input to
output within the organism (the “black box”).
“BLACK-BOX EPIDEMIOLOGY”
A common epidemiological approach, used both in research and in public health
practice, in which the focus is on assessing putative causes and clinical effects
(beneficial or adverse) rather than the underlying biological MECHANISMS. It is not a
formal branch or speciality of epidemiology, nor is it an epidemiological method or
philosophy. Loosely speaking, it is an opposite of MECHANISTIC EPIDEMIOLOGY.
Porta M: A Dictionary of Epidemiology, Fifth Edition, 2008
MECHANISM
In epidemiology and in other health, life and social sciences, the way in which a
particular health-related event or outcome occurs, often described in terms of the
agents and steps involved. Whereas the focus is often on biological mechanisms,
environmental, social, and cultural mechanisms are also relevant to epidemiology,
public health, medicine and related disciplines.
MECHANISTIC EPIDEMIOLOGY
Epidemiological research that focuses on MECHANISMS underlying and explaining
associations between DETERMINANTS and health-related events or states. It is not a
formal branch or or speciality of epidemiology, nor is it an epidemiological method or
philosophy. Loosely, the opposite of “BLACK-BOX EPIDEMIOLOGY”. See also APPLIED
EPIDEMIOLOGY.
Porta M: A Dictionary of Epidemiology, Fifth Edition, 2008
24/09/2015
15
The big issue in prediction models:
Prediction error
www.scientificamerican.com
Totalpopulationinbillions
[slide source: Birgitte Freiesleben deBlasio, Oslo Center for Biostatistics and Epidemiology]
http://www.minitab.com/en-us/Published-Articles/Weather-Forecasts--Just-How-Reliable-Are-They-/
The big issue in prediction models:
Prediction error
The best model = the optimal predictor is the
one which minimises the prediction error,
i.e. the model that predicts new values (or
classifies undiagnosed patients) best possible
24/09/2015
16
Optimal predictor = the best model
1) Define prediction error.
2) Consider model complexity
3) «Old», registered data (data used to develop prediction model)
vs new (yet unknown) data
Model complexity
# variables, polynomial terms,
interactions, functional form etc.
ComplexSimple
E.g. Overall mean
Predictionerror
Optimal predictor = the best model
ComplexSimple
E.g. Overall mean
New data
Predictionerror
Underfitting.
Too simple
model
Overfitting.
Too complex
model
Optimal predictor = the best model
ComplexSimple
E.g. Overall mean
New data
Predictionerror
Model complexity
# variables, polynomial terms,
interactions, functional form etc.
Optimal predictor = the best model
Prediction error is a trade-off between bias and variance.
Bias
Variance
New data
Prediction error is a trade-off between bias and variance.
Complex
Precise fit to old data, but overfits
and performs badly on new data
Simple
Potentially high prediction error in
both old and new data
Optimal predictor = the best model
Predictionerror
“Best model” dependent on how we define the error
criterion, and how we weight/penalize bias and variance.
Variable selection?
No DAG to identify the roles of the variables; exposure,
confounder, mediator etc.
Model selection rather than variable selection.
Not necessary to assume linearity by convenience for the
interpretation of effect estimates, as prediction of future
values is more important than interpretation of parameters.
24/09/2015
17
Ex2
Classification into two classes
(toxic/non-toxic) based on two
continuous predictors x1 and x2
Constructing a predictor/classifier, two simple examples
Ex 1
Prediction of a continuous outcome y
based on one continuous predictor x
y
x
Predictor x
Responsey
I.e. What response do we expect to get
for a new observation of x?
Ex:
What do we expect the
blood glucose value
to be 30 mins
after a meal?
new
observation
of x
Ex 1 Prediction of a continuous outcome y based on one continuous predictor x
Blood glucose y measured x minutes after a meal
First shot: Simple linear model, y= a + bx + e
A linear model is a simple model.
Stable
Not flexible
Here: High prediction error
Here: Bad!
Predictor x
Responsey
Ex 1 Prediction of a continuous outcome y based on one continuous predictor x
Blood glucose y measured x minutes after a meal
Slightly more complex: y=a + bt + ct2+e
Predictor x
Responsey
Ex 1 Prediction of a continuous outcome y based on one continuous predictor x
Blood glucose y measured x minutes after a meal
Increasing complexity:
y=a + bt + ct2 + dt3+e
Predictor x
Responsey
Ex 1 Prediction of a continuous outcome y based on one continuous predictor x
Blood glucose y measured x minutes after a meal
Predictor x
Responsey
Increasing complexity:
A sum of third-degree polynomials
More flexible
Ex 1 Prediction of a continuous outcome y based on one continuous predictor x
Blood glucose y measured x minutes after a meal
24/09/2015
18
The best linear combination of the basis functions is found by
minimising the least squares expression
Increasing complexity:
Choose a set of basis functions, e.g. third-
degree polynomials (= B-splines),
formulate the predictor as a linear
combination of
the basis functions:
Ex 1 Prediction of a continuous outcome y based on one continuous predictor x
Blood glucose y measured x minutes after a meal
f(x)
f(x)f(x)
(x)
Best model, according to SSE, fits data perfectly
We want to restrict the total amount of curvature, and minimize these least
squares instead:
Overfitted (too curvy) if measurement error
Ex 1 Prediction of a continuous outcome y based on one continuous predictor x
Blood glucose y measured x minutes after a meal
f(x)f(x)
Better model: Smoothed curve. Less curvy, overall better fit
How much should we smooth/penalize?
Chose the model with the λ that gives the least
cross-validation score.
Cross-validation: Leaves one out,
then predicts it based on the other observations.
Ex 1 Prediction of a continuous outcome y based on one continuous predictor x
Blood glucose y measured x minutes after a meal
Sample Observed
category
Predicted
Linear
Predicted
1-NN
1 1
2 1
3 1
4 1
5 1
6 0
7 0
8 1
9 1
10 0
11 0
12 0
13 0
14 1
15 0
16 0
[Artificial data by Manuela Zucknick, Oslo Center for Biostatistics and Epidemiology]
Ex 2 Classifier: Predict correct class of
a categorical outcome y, based on
two continuous predictors x1 and x2
Sample Observed
category
Predicted
Linear
Predicted
1-NN
1 1
2 1
3 1
4 1
5 1
6 0
7 0
8 1
9 1
10 0
11 0
12 0
13 0
14 1
15 0
16 0
[Artificial data by Manuela Zucknick/Kathrine Frey Frøslie]
CS (=1)
Not CS (=0)
BMI
18.5 25 30 35 40
Fetalsizemeasuredbyultrasound
Ex 2 Classifier: Predict correct class of
a categorical outcome y, based on
two continuous predictors x1 and x2
Sample Observed
category
Predicted
Linear
Predicted
1-NN
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 0 0
7 0 0
8 1 1
9 1 1
10 0 0
11 0 0
12 0 0
13 0 0
14 1 1
15 0 0
16 0 1
Linear classifier
CS (=1)
Not CS (=0)
BMI
18.5 25 30 35 40
Fetalsizemeasuredbyultrasound
A linear model is a simple model.
Stable
Not flexible
Here: Low prediction error.
Here: Good!
24/09/2015
19
Sample Observed
category
Predicted
Linear
Predicted
1-NN
1 1 1
2 1 1
3 1 1
4 1 1
5 1 1
6 0 0
7 0 0
8 1 1
9 1 1
10 0 0
11 0 0
12 0 0
13 0 0
14 1 1
15 0 0
16 0 1
Predict/classify an observation according
to its nearest neighbour (1-NN)
CS (=1)
Not CS (=0)
BMI
18.5 25 30 35 40
Fetalsizemeasuredbyultrasound
Divide the data set into training set (1/3) and test set (2/3)
Leave-one-out cross-validation
K-fold cross-validation
How to obtain prediction error based on new data?
A brilliant tutorial is found at
http://www.autonlab.org/tutorials/overfit10.pdf
Cross-validation for detecting and preventing overfitting
Andrew W. Moore
“Best” model?
Variable selection
(Forward/Backward/stepwise)
Akaike’s information criterion (AIC)
Bayes’ information criterion (BIC)
Focused information criterion (FIC)
Variable shrinkage
PCA
Ridge regression cf. Penalization in the curve fitting
Lasso
Functional forms of variables?
Nonlinear models?
“There are two cultures in the use of statistical modeling to reach
conclusions from data. One assumes that the data are generated by
a given stochastic data model. The other uses algorithmic models
and treats the data mechanism as unknown. The statistical
community has been committed to the almost exclusive use of data
models. This commitment has led to irrelevant theory, questionable
conclusions, and has kept statisticians from working on a large
range of interesting current problems. Algorithmic modeling, both
in theory and practice, has developed rapidly in fields outside
statistics. It can be used both on large complex data sets and as a
more accurate and informative alternative to data modeling on
smaller data sets. If our goal as a field is to use data to solve
problems, then we need to move away from exclusive dependence
on data models and adopt a more diverse set of tools.”
Breiman L. Statistical modeling: The two cultures. Statist. Sci.
2001;16:199-231.
What is inside the black box?
Regression analysis comes with a huge toolbox
and add-ons.
Different regression tools and different approaches to (regression
modelling) must be chosen for different scientific questions, even
though the data set in the study «contains several variables to be
included in the analysis».
24/09/2015
20
Topics not covered
Model selection
(Forward/Backward/stepwise)
Akaike’s information criterion (AIC)
Bayes’ information criterion (BIC)
Focused information criterion (FIC)
Variable shrinkage
PCA
Ridge regression
Lasso
Cross-validation
What are the main differences between
a regression analysis for the estimation of associations, and
a regression analysis for prediction purposes?
Discussion exercises
To explain and to predict
Public health perspective
Using prediction models to construct/improve hypotheses
and theories
“Learning causal structures”
Using expert knowledge to improve prediction by
incorporating knowledge into prediction models
Take action to improve health
OCBE partner in new SFI
The University of Oslo celebrates BIG INSIGHT, our new centre for research
based innovation, awarded by the Research Council of Norway.
Leaders of the new SFI centre Big
Insight: Arnoldo Frigessi, Lars Holden,
Kjersti Aas and André Teigland.
UiO headline, November 2014
BIG INSIGHT will produce innovative solutions
for key challenges facing the consortium of
private, public and research partners, by
developing original statistical and machine
learning methodologies. Fulfilling the promise
of the big data revolution, we shall invent deep
analytical tools to extract knowledge from
complex data and deliver BIG INSIGHT.
We shall focus on two central innovation
themes: deeply novel personalised solutions
and sharper predictions of transient behaviours.
24/09/2015
21
References 1/2
• Abdelnoor M, Sandven I: Etiologisk versus prognostisk strategi i klinisk epidemiologisk
forskning. Norsk epidemiologi, 2006;16:77-80.
• Benner A, Zucknick M, Hielscher T, Ittrich C, Mansmann U. High-Dimensional Cox
Models: The Choice of Penalty as Part of the Model Building Process. Biometrical
Journal 2010;52:50-69.
• Breiman L. Statistical modeling: The two cultures. Statist. Sci. 2001;16:199-231.
• Fattori E, Cappelletti M, Costa P, et al. Defective inflammatory response in interleukin
6-deficient mice. J Exp Med. 1994; 180:1243–1250.
• Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research.
Epidemiology 1999;10:37-48.
• Hastie, Tibshirani and Friedman. The Elements of Statistical Learning, 2nd ed. Springer,
2009
• Hernan MA, Hernandez-Diaz S, Werler MM, Mitchell AA. Causal knowledge as a
prerequisite for confounding evaluation: An application to birth defects epidemiology.
Am J Epidemiol 2002;155:176-84.
• Hernán MA, Robins JM. Causal Inference. Boca Raton: Chapman & Hall/CRC,
forthcoming 2016. http://www.hsph.harvard.edu/miguel-hernan/causal-inference-
book/
• James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning with
Applications in R. Springer, 2014
http://www-bcf.usc.edu/~gareth/ISL/index.html
References 2/2
• Janszky I, Ahlbom A, Svensson AC. The Janus face of statistical adjustment: confounder
versus collider. Eur J Epidemiol 2010,25:361-363
• Porta M. A dictionary of epidemiology. 5th ed. New York: Oxford University Press;
2008.
• Rothman KJ, Greenland S, Lash TL. Modern Epidemiology, 3rd ed. Lippincott Williams &
Wilkins; 2008.
• Shmueli G. To explain or to predict. Statist. Sci. 2010;25:289-310.
• Somm E, Cettour-Rose P, Asensio C, et al. Interleukin-1 receptor antagonist is
upregulated during diet-induced obesity and regulates insulin sensitivity in rodents.
Diabetologia. 2006;49, 387–393.
• Steyerberg EW: Clinical Prediction Models. A practical approach to development,
validation and updating. Springer, 2009
• Thom R. Prédire N’est Pas Expliquer, 1991. Translated by Roy Lisker:
http://www.fermentmagazine.org/Stories/Thom/Predire.pdf
• VanderWeele TJ. Explanation in Causal Inference: Methods for Mediation and
Interaction, Oxford University Press, 2015.
• Veierød M, Lydersen S, Laake, (eds.). Medical statistics in clinical and epidemiological
research. Gyldendal Akademisk; 2012.
• Aalen OO, Røysland K, Gran JM, Ledergerber B. Causality, mediation and time: a
dynamic viewpoint. Journal of the Royal Statistical Society: Series A, 2012;175:831-
861.
Thanks to the sweet and smart colleagues at OCBE,
who generously shared their slides and knowledge!
Thanks to Valeria Vitelli, Odd Aalen,
https://www.youtube.com/watch?v=e0vj-0imOLw

More Related Content

What's hot

Oncol Lett Vol12 No6 Pg5043
Oncol Lett Vol12 No6 Pg5043Oncol Lett Vol12 No6 Pg5043
Oncol Lett Vol12 No6 Pg5043Lenka Kellermann
 
3. descriptive study
3. descriptive study3. descriptive study
3. descriptive studyNaveen Phuyal
 
5.principles and methods of epidemiology
5.principles and methods of epidemiology5.principles and methods of epidemiology
5.principles and methods of epidemiologyRajeev Kumar
 
6422 22472-1-pb (1)
6422 22472-1-pb (1)6422 22472-1-pb (1)
6422 22472-1-pb (1)sashisilwal
 
Fundamentals and Study Design of Epidemiology
Fundamentals and Study Design of EpidemiologyFundamentals and Study Design of Epidemiology
Fundamentals and Study Design of EpidemiologyFarazaJaved
 
Measuring the occurrences of disease dhanlal
Measuring the occurrences of disease dhanlalMeasuring the occurrences of disease dhanlal
Measuring the occurrences of disease dhanlalDhan Pandey
 
descriptive epidemiology
descriptive epidemiologydescriptive epidemiology
descriptive epidemiologySagar Dalal
 
Crossectional studies 2
Crossectional studies 2Crossectional studies 2
Crossectional studies 2Bruno Mmassy
 
A comparative risk assessment of burden of disease and injury attributable to...
A comparative risk assessment of burden of disease and injury attributable to...A comparative risk assessment of burden of disease and injury attributable to...
A comparative risk assessment of burden of disease and injury attributable to...Chuco Diaz
 
Crude Rate, Age Adjusted Rate, Age Specific Rates in Cancer
Crude Rate, Age Adjusted Rate, Age Specific Rates in CancerCrude Rate, Age Adjusted Rate, Age Specific Rates in Cancer
Crude Rate, Age Adjusted Rate, Age Specific Rates in CancerRamnath Takiar
 
Research hotspot and frontier progress of cancer under the background of pre...
Research hotspot and frontier progress of cancer under the  background of pre...Research hotspot and frontier progress of cancer under the  background of pre...
Research hotspot and frontier progress of cancer under the background of pre...LucyPi1
 
Descriptive and Analytical Epidemiology
Descriptive and Analytical Epidemiology Descriptive and Analytical Epidemiology
Descriptive and Analytical Epidemiology coolboy101pk
 
Analytical study designs case control study
Analytical study designs case control studyAnalytical study designs case control study
Analytical study designs case control studyjarati
 
Analytical epidemiology (1)
Analytical epidemiology (1)Analytical epidemiology (1)
Analytical epidemiology (1)nitya Krishna
 
[READ]❤book⚡ Mausner & Bahn Epidemiology An Introductory Text
[READ]❤book⚡ Mausner & Bahn Epidemiology An Introductory Text [READ]❤book⚡ Mausner & Bahn Epidemiology An Introductory Text
[READ]❤book⚡ Mausner & Bahn Epidemiology An Introductory Text ultimasherryu
 
INTRO TO EPIDEMIOLOGICAL METHODS
INTRO TO EPIDEMIOLOGICAL METHODSINTRO TO EPIDEMIOLOGICAL METHODS
INTRO TO EPIDEMIOLOGICAL METHODSMAHESWARI JAIKUMAR
 
Descriptive epidemiology
Descriptive epidemiologyDescriptive epidemiology
Descriptive epidemiologyDalia El-Shafei
 

What's hot (20)

Oncol Lett Vol12 No6 Pg5043
Oncol Lett Vol12 No6 Pg5043Oncol Lett Vol12 No6 Pg5043
Oncol Lett Vol12 No6 Pg5043
 
3. descriptive study
3. descriptive study3. descriptive study
3. descriptive study
 
5.principles and methods of epidemiology
5.principles and methods of epidemiology5.principles and methods of epidemiology
5.principles and methods of epidemiology
 
Analytical epidemiology
Analytical epidemiologyAnalytical epidemiology
Analytical epidemiology
 
6422 22472-1-pb (1)
6422 22472-1-pb (1)6422 22472-1-pb (1)
6422 22472-1-pb (1)
 
Fundamentals and Study Design of Epidemiology
Fundamentals and Study Design of EpidemiologyFundamentals and Study Design of Epidemiology
Fundamentals and Study Design of Epidemiology
 
Measuring the occurrences of disease dhanlal
Measuring the occurrences of disease dhanlalMeasuring the occurrences of disease dhanlal
Measuring the occurrences of disease dhanlal
 
descriptive epidemiology
descriptive epidemiologydescriptive epidemiology
descriptive epidemiology
 
Crossectional studies 2
Crossectional studies 2Crossectional studies 2
Crossectional studies 2
 
A comparative risk assessment of burden of disease and injury attributable to...
A comparative risk assessment of burden of disease and injury attributable to...A comparative risk assessment of burden of disease and injury attributable to...
A comparative risk assessment of burden of disease and injury attributable to...
 
Crude Rate, Age Adjusted Rate, Age Specific Rates in Cancer
Crude Rate, Age Adjusted Rate, Age Specific Rates in CancerCrude Rate, Age Adjusted Rate, Age Specific Rates in Cancer
Crude Rate, Age Adjusted Rate, Age Specific Rates in Cancer
 
Research hotspot and frontier progress of cancer under the background of pre...
Research hotspot and frontier progress of cancer under the  background of pre...Research hotspot and frontier progress of cancer under the  background of pre...
Research hotspot and frontier progress of cancer under the background of pre...
 
Descriptive and Analytical Epidemiology
Descriptive and Analytical Epidemiology Descriptive and Analytical Epidemiology
Descriptive and Analytical Epidemiology
 
Analytical study designs case control study
Analytical study designs case control studyAnalytical study designs case control study
Analytical study designs case control study
 
Analytical epidemiology (1)
Analytical epidemiology (1)Analytical epidemiology (1)
Analytical epidemiology (1)
 
Dr. Obumneke Amadi Transcripts June_ P2_2017
Dr. Obumneke Amadi  Transcripts June_ P2_2017Dr. Obumneke Amadi  Transcripts June_ P2_2017
Dr. Obumneke Amadi Transcripts June_ P2_2017
 
Epidemiological statistics I
Epidemiological statistics IEpidemiological statistics I
Epidemiological statistics I
 
[READ]❤book⚡ Mausner & Bahn Epidemiology An Introductory Text
[READ]❤book⚡ Mausner & Bahn Epidemiology An Introductory Text [READ]❤book⚡ Mausner & Bahn Epidemiology An Introductory Text
[READ]❤book⚡ Mausner & Bahn Epidemiology An Introductory Text
 
INTRO TO EPIDEMIOLOGICAL METHODS
INTRO TO EPIDEMIOLOGICAL METHODSINTRO TO EPIDEMIOLOGICAL METHODS
INTRO TO EPIDEMIOLOGICAL METHODS
 
Descriptive epidemiology
Descriptive epidemiologyDescriptive epidemiology
Descriptive epidemiology
 

Viewers also liked

EPINOR Abstract book 2015
EPINOR Abstract book 2015EPINOR Abstract book 2015
EPINOR Abstract book 2015EPINOR
 
Creating a flexible strategy using causal diagrams
Creating a flexible strategy using causal diagrams Creating a flexible strategy using causal diagrams
Creating a flexible strategy using causal diagrams Luxoft
 
3100-Food Industry
3100-Food Industry3100-Food Industry
3100-Food IndustryWendy Regier
 
Familipreneurship_Extended Abstrct
Familipreneurship_Extended AbstrctFamilipreneurship_Extended Abstrct
Familipreneurship_Extended AbstrctMijan Rahman
 
Do You Know Your Net Worth?
Do You Know Your Net Worth?Do You Know Your Net Worth?
Do You Know Your Net Worth?Gerstein Fisher
 
3200-Grain and Milling
3200-Grain and Milling3200-Grain and Milling
3200-Grain and MillingWendy Regier
 
GEUseminar(TECHNOLOGY)
GEUseminar(TECHNOLOGY)GEUseminar(TECHNOLOGY)
GEUseminar(TECHNOLOGY)Vishal Jaiswal
 
Adding Emerging Markets to a Portfolio
Adding Emerging Markets to a PortfolioAdding Emerging Markets to a Portfolio
Adding Emerging Markets to a PortfolioGerstein Fisher
 
Neonflare - short introduction and usecase
Neonflare - short introduction and usecaseNeonflare - short introduction and usecase
Neonflare - short introduction and usecaseNiko Suomalainen
 
Epinor abstract book 2014
Epinor abstract book 2014Epinor abstract book 2014
Epinor abstract book 2014EPINOR
 
Oracle marketing cloud what do do next part 1
Oracle marketing cloud what do do next part 1 Oracle marketing cloud what do do next part 1
Oracle marketing cloud what do do next part 1 4Thought Marketing
 
Types of Technology
Types of TechnologyTypes of Technology
Types of TechnologyKrice92
 
The norwegian cause of death registry christian l.ellingsen
The norwegian cause of death registry christian l.ellingsenThe norwegian cause of death registry christian l.ellingsen
The norwegian cause of death registry christian l.ellingsenEPINOR
 
Merits of the Global Portfolio
Merits of the Global PortfolioMerits of the Global Portfolio
Merits of the Global PortfolioGerstein Fisher
 

Viewers also liked (17)

EPINOR Abstract book 2015
EPINOR Abstract book 2015EPINOR Abstract book 2015
EPINOR Abstract book 2015
 
Creating a flexible strategy using causal diagrams
Creating a flexible strategy using causal diagrams Creating a flexible strategy using causal diagrams
Creating a flexible strategy using causal diagrams
 
Understand the complexity of urban transport problem, using the Causal Loop D...
Understand the complexity of urban transport problem, using the Causal Loop D...Understand the complexity of urban transport problem, using the Causal Loop D...
Understand the complexity of urban transport problem, using the Causal Loop D...
 
3100-Food Industry
3100-Food Industry3100-Food Industry
3100-Food Industry
 
Familipreneurship_Extended Abstrct
Familipreneurship_Extended AbstrctFamilipreneurship_Extended Abstrct
Familipreneurship_Extended Abstrct
 
Do You Know Your Net Worth?
Do You Know Your Net Worth?Do You Know Your Net Worth?
Do You Know Your Net Worth?
 
3200-Grain and Milling
3200-Grain and Milling3200-Grain and Milling
3200-Grain and Milling
 
GEUseminar(TECHNOLOGY)
GEUseminar(TECHNOLOGY)GEUseminar(TECHNOLOGY)
GEUseminar(TECHNOLOGY)
 
Adding Emerging Markets to a Portfolio
Adding Emerging Markets to a PortfolioAdding Emerging Markets to a Portfolio
Adding Emerging Markets to a Portfolio
 
Neonflare - short introduction and usecase
Neonflare - short introduction and usecaseNeonflare - short introduction and usecase
Neonflare - short introduction and usecase
 
Epinor abstract book 2014
Epinor abstract book 2014Epinor abstract book 2014
Epinor abstract book 2014
 
Oracle marketing cloud what do do next part 1
Oracle marketing cloud what do do next part 1 Oracle marketing cloud what do do next part 1
Oracle marketing cloud what do do next part 1
 
lumativ-lookbook-6-17-16
lumativ-lookbook-6-17-16lumativ-lookbook-6-17-16
lumativ-lookbook-6-17-16
 
Types of Technology
Types of TechnologyTypes of Technology
Types of Technology
 
The norwegian cause of death registry christian l.ellingsen
The norwegian cause of death registry christian l.ellingsenThe norwegian cause of death registry christian l.ellingsen
The norwegian cause of death registry christian l.ellingsen
 
Merits of the Global Portfolio
Merits of the Global PortfolioMerits of the Global Portfolio
Merits of the Global Portfolio
 
2008-01
2008-012008-01
2008-01
 

Similar to Epinor presentation 24.09.2015.

Cohort studies with example of classical cohort studies
Cohort studies with example of classical cohort studiesCohort studies with example of classical cohort studies
Cohort studies with example of classical cohort studiesshefali jain
 
6..Study designs in descritive epidemiology DR.SOMANATH.ppt
6..Study designs in descritive  epidemiology DR.SOMANATH.ppt6..Study designs in descritive  epidemiology DR.SOMANATH.ppt
6..Study designs in descritive epidemiology DR.SOMANATH.pptDentalYoutube
 
JOURNAL CLUB PRESENTATION.pptx
JOURNAL CLUB PRESENTATION.pptxJOURNAL CLUB PRESENTATION.pptx
JOURNAL CLUB PRESENTATION.pptxAkshata Darandale
 
Epidemiology lecture of Community Medicine
Epidemiology lecture of Community Medicine Epidemiology lecture of Community Medicine
Epidemiology lecture of Community Medicine Dr.Farhana Yasmin
 
Frequency Measures Used in EpidemiologyIntroductionIn e.docx
 Frequency Measures Used in EpidemiologyIntroductionIn e.docx Frequency Measures Used in EpidemiologyIntroductionIn e.docx
Frequency Measures Used in EpidemiologyIntroductionIn e.docxMARRY7
 
Case control study
Case control studyCase control study
Case control studyswati shikha
 
Introduction to Epidemiology
Introduction to EpidemiologyIntroduction to Epidemiology
Introduction to EpidemiologyAya Mohamed
 
What is epidemiology?.pdf
What is epidemiology?.pdfWhat is epidemiology?.pdf
What is epidemiology?.pdfMdNayem35
 
Sepsis Case Study
Sepsis Case StudySepsis Case Study
Sepsis Case StudyAmanda Reed
 
Introduction to epidemiology
Introduction to epidemiologyIntroduction to epidemiology
Introduction to epidemiologymlogaraj
 
No association between circulating concentrations of vitamin D and risk of lu...
No association between circulating concentrations of vitamin D and risk of lu...No association between circulating concentrations of vitamin D and risk of lu...
No association between circulating concentrations of vitamin D and risk of lu...KarlFrank99
 

Similar to Epinor presentation 24.09.2015. (20)

DESCRIPTIVE EPIDEMIOLOGY
DESCRIPTIVE EPIDEMIOLOGYDESCRIPTIVE EPIDEMIOLOGY
DESCRIPTIVE EPIDEMIOLOGY
 
EPIDEMIOLOGY .pptx
EPIDEMIOLOGY  .pptxEPIDEMIOLOGY  .pptx
EPIDEMIOLOGY .pptx
 
Module5_Study_Design.pptx
Module5_Study_Design.pptxModule5_Study_Design.pptx
Module5_Study_Design.pptx
 
Study designs
Study designsStudy designs
Study designs
 
Cohort studies with example of classical cohort studies
Cohort studies with example of classical cohort studiesCohort studies with example of classical cohort studies
Cohort studies with example of classical cohort studies
 
Association & cousation
Association & cousationAssociation & cousation
Association & cousation
 
Epidemiology of periodontal disease
Epidemiology of periodontal diseaseEpidemiology of periodontal disease
Epidemiology of periodontal disease
 
6..Study designs in descritive epidemiology DR.SOMANATH.ppt
6..Study designs in descritive  epidemiology DR.SOMANATH.ppt6..Study designs in descritive  epidemiology DR.SOMANATH.ppt
6..Study designs in descritive epidemiology DR.SOMANATH.ppt
 
Casecontrolstudy
CasecontrolstudyCasecontrolstudy
Casecontrolstudy
 
Cohort study
Cohort studyCohort study
Cohort study
 
JOURNAL CLUB PRESENTATION.pptx
JOURNAL CLUB PRESENTATION.pptxJOURNAL CLUB PRESENTATION.pptx
JOURNAL CLUB PRESENTATION.pptx
 
Epidemiology lecture of Community Medicine
Epidemiology lecture of Community Medicine Epidemiology lecture of Community Medicine
Epidemiology lecture of Community Medicine
 
Frequency Measures Used in EpidemiologyIntroductionIn e.docx
 Frequency Measures Used in EpidemiologyIntroductionIn e.docx Frequency Measures Used in EpidemiologyIntroductionIn e.docx
Frequency Measures Used in EpidemiologyIntroductionIn e.docx
 
Case control study
Case control studyCase control study
Case control study
 
Introduction to Epidemiology
Introduction to EpidemiologyIntroduction to Epidemiology
Introduction to Epidemiology
 
What is epidemiology?.pdf
What is epidemiology?.pdfWhat is epidemiology?.pdf
What is epidemiology?.pdf
 
Sepsis Case Study
Sepsis Case StudySepsis Case Study
Sepsis Case Study
 
Introduction to epidemiology
Introduction to epidemiologyIntroduction to epidemiology
Introduction to epidemiology
 
No association between circulating concentrations of vitamin D and risk of lu...
No association between circulating concentrations of vitamin D and risk of lu...No association between circulating concentrations of vitamin D and risk of lu...
No association between circulating concentrations of vitamin D and risk of lu...
 
EPIDEMIOLOGY
EPIDEMIOLOGYEPIDEMIOLOGY
EPIDEMIOLOGY
 

More from EPINOR

Søk Master Public Health
Søk Master Public HealthSøk Master Public Health
Søk Master Public HealthEPINOR
 
Familial associations and disease risk steinar tretli
Familial associations and disease risk steinar tretliFamilial associations and disease risk steinar tretli
Familial associations and disease risk steinar tretliEPINOR
 
Health registries for research anja ariansen
Health registries for research anja ariansenHealth registries for research anja ariansen
Health registries for research anja ariansenEPINOR
 
Knowledge from health registries, cohorts, and biobanks stein emilvollset
Knowledge from health registries, cohorts, and biobanks stein emilvollsetKnowledge from health registries, cohorts, and biobanks stein emilvollset
Knowledge from health registries, cohorts, and biobanks stein emilvollsetEPINOR
 
Norwegian cardiovascular disease registry rune kvåle
Norwegian cardiovascular disease registry rune kvåleNorwegian cardiovascular disease registry rune kvåle
Norwegian cardiovascular disease registry rune kvåleEPINOR
 
Registries rolv skjaerven
Registries rolv skjaervenRegistries rolv skjaerven
Registries rolv skjaervenEPINOR
 
Newsletter EPINOR february 2016
Newsletter EPINOR february 2016Newsletter EPINOR february 2016
Newsletter EPINOR february 2016EPINOR
 
Newsletter EPINOR january 2016
Newsletter EPINOR january 2016Newsletter EPINOR january 2016
Newsletter EPINOR january 2016EPINOR
 
Newsletter EPINOR
Newsletter EPINOR Newsletter EPINOR
Newsletter EPINOR EPINOR
 
Presentation EPINOR-lunsj
Presentation EPINOR-lunsjPresentation EPINOR-lunsj
Presentation EPINOR-lunsjEPINOR
 
Newsletter EPINOR october 2015
Newsletter EPINOR october 2015Newsletter EPINOR october 2015
Newsletter EPINOR october 2015EPINOR
 
Newsletter EPINOR march 2015
Newsletter EPINOR march 2015Newsletter EPINOR march 2015
Newsletter EPINOR march 2015EPINOR
 
Newsletter EPINOR april 2015
Newsletter EPINOR april 2015Newsletter EPINOR april 2015
Newsletter EPINOR april 2015EPINOR
 
Newsletter EPINOR may 2015
Newsletter EPINOR may 2015Newsletter EPINOR may 2015
Newsletter EPINOR may 2015EPINOR
 
EPINOR Newsletter june_2015
EPINOR Newsletter june_2015EPINOR Newsletter june_2015
EPINOR Newsletter june_2015EPINOR
 
What should a poster contain? 18.03.14
What should a poster contain? 18.03.14What should a poster contain? 18.03.14
What should a poster contain? 18.03.14EPINOR
 
Epinor Summer School abstract-book-2014
Epinor Summer School abstract-book-2014Epinor Summer School abstract-book-2014
Epinor Summer School abstract-book-2014EPINOR
 
Lecture by Professor Imre Janszky about random error.
Lecture by Professor Imre Janszky about random error. Lecture by Professor Imre Janszky about random error.
Lecture by Professor Imre Janszky about random error. EPINOR
 
Hunt presentation 08.06.2015
Hunt presentation 08.06.2015Hunt presentation 08.06.2015
Hunt presentation 08.06.2015EPINOR
 

More from EPINOR (19)

Søk Master Public Health
Søk Master Public HealthSøk Master Public Health
Søk Master Public Health
 
Familial associations and disease risk steinar tretli
Familial associations and disease risk steinar tretliFamilial associations and disease risk steinar tretli
Familial associations and disease risk steinar tretli
 
Health registries for research anja ariansen
Health registries for research anja ariansenHealth registries for research anja ariansen
Health registries for research anja ariansen
 
Knowledge from health registries, cohorts, and biobanks stein emilvollset
Knowledge from health registries, cohorts, and biobanks stein emilvollsetKnowledge from health registries, cohorts, and biobanks stein emilvollset
Knowledge from health registries, cohorts, and biobanks stein emilvollset
 
Norwegian cardiovascular disease registry rune kvåle
Norwegian cardiovascular disease registry rune kvåleNorwegian cardiovascular disease registry rune kvåle
Norwegian cardiovascular disease registry rune kvåle
 
Registries rolv skjaerven
Registries rolv skjaervenRegistries rolv skjaerven
Registries rolv skjaerven
 
Newsletter EPINOR february 2016
Newsletter EPINOR february 2016Newsletter EPINOR february 2016
Newsletter EPINOR february 2016
 
Newsletter EPINOR january 2016
Newsletter EPINOR january 2016Newsletter EPINOR january 2016
Newsletter EPINOR january 2016
 
Newsletter EPINOR
Newsletter EPINOR Newsletter EPINOR
Newsletter EPINOR
 
Presentation EPINOR-lunsj
Presentation EPINOR-lunsjPresentation EPINOR-lunsj
Presentation EPINOR-lunsj
 
Newsletter EPINOR october 2015
Newsletter EPINOR october 2015Newsletter EPINOR october 2015
Newsletter EPINOR october 2015
 
Newsletter EPINOR march 2015
Newsletter EPINOR march 2015Newsletter EPINOR march 2015
Newsletter EPINOR march 2015
 
Newsletter EPINOR april 2015
Newsletter EPINOR april 2015Newsletter EPINOR april 2015
Newsletter EPINOR april 2015
 
Newsletter EPINOR may 2015
Newsletter EPINOR may 2015Newsletter EPINOR may 2015
Newsletter EPINOR may 2015
 
EPINOR Newsletter june_2015
EPINOR Newsletter june_2015EPINOR Newsletter june_2015
EPINOR Newsletter june_2015
 
What should a poster contain? 18.03.14
What should a poster contain? 18.03.14What should a poster contain? 18.03.14
What should a poster contain? 18.03.14
 
Epinor Summer School abstract-book-2014
Epinor Summer School abstract-book-2014Epinor Summer School abstract-book-2014
Epinor Summer School abstract-book-2014
 
Lecture by Professor Imre Janszky about random error.
Lecture by Professor Imre Janszky about random error. Lecture by Professor Imre Janszky about random error.
Lecture by Professor Imre Janszky about random error.
 
Hunt presentation 08.06.2015
Hunt presentation 08.06.2015Hunt presentation 08.06.2015
Hunt presentation 08.06.2015
 

Recently uploaded

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 

Recently uploaded (20)

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 

Epinor presentation 24.09.2015.

  • 1. 24/09/2015 1 To explain or to predict Kathrine Frey Frøslie, statistician, PhD Norwegian advisory unit for women’s health, Oslo university hospital Oslo Centre for Biostatistics and Epidemiology, University of Oslo Overview Epidemiology The research process Aim vs statistical methods Regression analysis To explain or to predict: Explain Mechanisms Causality DAGs Exposure & outcome Confounder, Mediator, Collider Direct and indirect effects To explain or to predict: Predict Diagnostic tests, Forecasting Personalized medicine Statistical learning, big data, black box Prediction error: Bias vs variance Constructing a predictor Thom R. Prédire N’est Pas Expliquer (1991) Shmueli G. To explain or to predict (2010) Abdelnoor M, Sandven I: Etiologisk versus prognostisk strategi i klinisk epidemiologisk forskning (2006) EPIDEMIOLOGY The study of the occurrence and distribution of health-related states or events in specified populations, including the study of DETERMINANTS influencing such states, and the application of this knowledge to control the health problems. Study includes surveillance, observation, hypothesis testing, analytic research, and experiments. Distribution refers to analysis by time, place, classes or subgroups of persons affected in a population or in a society. Determinants are all the physical, biological, social, cultural, economic and behavioral factors that influence health. Health-related states and events include diseases, causes of death, behaviors, reactions to preventive programs, and provision and use of health services. Specified populations are those with common identifiable characteristics. Application… to control… makes explicit the aim of epidemiology – to promote, protect, and restore health. The primary “knowledge object” of epidemiology as a scientific discipline are causes of health-related events in populations.In the last 70 years, the definition has broadened from concern with communicable disease epidemics to take in all processes and phenomena related to health in populations. Therefore epidemiology is much more than a branch of medicine treating epidemics. Porta M: A Dictionary of Epidemiology, Fifth Edition, 2008 John Snow, 1854 “To summarize, it is not reasonable, in our view, to attribute the results to any special selection of cases or to bias in recording. In other words, it must be concluded that there is a real association between carcinoma of the lung and smoking.” “...it is concluded that smoking is an important factor in the cause of carcinoma of the lung.” DOLL R, HILL AB. Smoking and carcinoma of the lung. BMJ 1950;4682:739-748.
  • 2. 24/09/2015 2 Ill: Åshild Irgens Marit B Veierød, 2015: Melanoma incidence on the rise again APPLIED EPIDEMIOLOGY The application and evaluation of epidemiological knowledge and methods (e.g. in public health or in health care). It includes applications of etiological research, priority setting and evaluation of health care programs, policies, technologies, and services. It is epidemiological practice aimed at protecting and/or improving the health of a defined population. It usually involves identifying and investigating health problems, MONITORING changes in health status, and/or evaluating the outcomes of interventions. It is generally conducted in a time frame determined by the need to protect the health of an exposed population and an administrative context that results in public health action. See also FIELD EPIDEMIOLOGY*; HOSPITAL EPIDEMIOLOGY**. Porta M: A Dictionary of Epidemiology, Fifth Edition, 2008 Problem of interest Aim Main outcomeDesign Statistical analysis Categorical Lung cancer Continuous Blood glucose Censored Life time Experimental RCT Observational Case-Control Cohort Regression analyses REGRESSION ANALYSIS Given data on a regressand (dependent variable) y and one or more regressors (covariates or independent variables) x1, x2, etc., regression analysis involves finding a mathematicalmodel (within some restricted class of models) that adequately describes y as a function of the x’s, or that predicts y from the x’s. The most common form of model for an unbounded continuous y is a linear model; the logistic and proportional hazards models are the most common forms used when y is binary or a survival time, respectively. Porta M: A Dictionary of Epidemiology, Fifth Edition, 2008 Regressand Regressor Response Explanatory variable Dependent variable y Independent variable x Outcome Covariate Output Endogenous variable ⁞ Predictor Determinant Exogenous variable ⁞ Response Continuous Ex Birth weight Categorical Ex Lung cancer/no lung cancer Ex 3 different types of lung cancer Ex 3 different stages of lung cancer Survival (Censored) Common regression model Linear Logistic Binary logistic Nominal logistic Ordinal logistic Cox
  • 3. 24/09/2015 3 ε+++++= 443322110 xBxBxBxBBy The price of a used car: y depends on Age, Mileage, Size, Brand, Colour, Rust,… Linear function ε+++++= 544 2 3322110 ln xxBxBxBxBBy Generalised linear function The price of a used car: y depends on Age, Mileage, Size, Brand, Colour, Rust,… ε+++++= 544 2 3322110 ln)( xxBxBxBxBByg Linear function WHAT IS YOUR RESEARCH QUESTION? (AIM?) Write your aim in one sentence. WHAT IS YOUR DESIGN? WHAT IS YOUR OUTCOME? HOW WOULD YOU DO A REGRESSION ANAYSIS ON YOUR DATA? To explain Understanding Mechanisms Expert knowledge What is the best estimate for the association between the main exposure and the main outcome? Etiology Causality CAUSALITY The relating of causes to the effects they produce. The property of being causal. The presence of cause. Ideas about the nature of the relations of cause and effect. The potential for changing an outcome (the effect) by changing the antecedent (the cause). Most of clinical, epidemiological, and public health research concerns causality. In the health and life sciences, causality is often established by integration of biological, clinical, epidemiological, and social evidence, as appropriate to the hypothesis at stake. Porta M: A Dictionary of Epidemiology, Fifth Edition, 2008
  • 4. 24/09/2015 4 “Necessary” and/or “sufficient” causes Examples Measles virus is necessary and sufficient to cause measles in an unimmunized individual. Mycobacterium tuberculosis is the necessary cause of tuberculosis, but often is not a sufficient cause without poverty, poor nutrition etc Smoking is sufficient to cause lung cancer, but not necessary, as lung cancer can also be caused by radon gas, asbestos fibres etc. Giving a lecture is not necessary to make me stressed, as many other things stress me. Also, it is not sufficient, as I usually enjoy giving lectures. However, sometimes giving a lecture causes stress. Counterfactual theory This is the story of what didn’t happen Det var synd! Hernán MA, Robins JM (2016). Causal Inference. ‘Any cause of disease will itself have causes. And each of these causes will have causes, in a theoretically infinite causal chain. Acute myocardial infarction is caused by (among other things) atherosclerosis of the coronary arteries, which is caused by (among other things) high plasma cholesterol concentrations, which is caused by (among other things) a high dietary intake of fat, which is caused by (among other things) living in 21st century western society, and so on. In a sense, all diseases have a potentially infinite set of causes and so can never be completely understood.’ Coggon D, Martyn C. Time and chance: the stochastic nature of disease causation. Lancet 2005;365:1434-1437. The infinite causal chain Causality What is causality? The Counterfactual concept Interventions and causality Consequences of actions Philosophic background Ultimate goal: Action!
  • 5. 24/09/2015 5 Implantation of blastocysts from RAG- deficient mice into pseudo-pregnant mothers results in the development of viable mice that lack mature B and T cells. However, if normal embryonic stem (ES) cells are injected into RAG-deficient blastocysts, somatic chimaeras are formed, which develop mature B and T cells.
  • 6. 24/09/2015 6 Problem of interest Aim Main outcomeDesign Statistical analysis Categorical Lung cancer Continuous Blood glucose Censored Life time Experimental RCT Observational Case-Control Cohort Regression analyses Problem of interest Aim Main outcomeDesign Statistical analysis Categorical Lung cancer Continuous Blood glucose Censored Life time Experimental RCT Observational Case-Control Cohort Regression analyses Based on expert knowledge of the topic under investigation, we want to estimate the association between an exposure and an outcome as unbiasedly as possible. Knowledge of the topic makes it plausible that the estimated association can be interpreted as an effect, i. e. causal, i.e. as a quantification of mechanisms. The expert knowledge about the topic is formalised in a graph of the variables studied, a Directed Acyclic Graph (DAG). In a DAG, one variable is defined as the main outcome, and one variable is defined as the main exposure. Expert knowledge is used to define other variables as either a confounder, a mediator, or a collider. DIRECTED ACYCLIC GRAPH (DAG) See CAUSAL DIAGRAM. CAUSAL DIAGRAM (Syn: causal graph, path diagram) A graphical display of causal relations among variables, in which each variable is assigned a fixed location on the graph (called a node) and in which each direct causal effect of one variable on another is represented by an arrow with its tail at the cause and its head at the effect. Direct noncausal associations are usually represented by lines without arrowheads. Graphs with only directed arrows (in which all direct associations are causal) are called directed graphs. Graphs in which no variable can affect itself (no feedback loop) are called acyclic. Algorithms have been developed to determine from causal diagrams which sets of variables are sufficient to control for confounding, and for when control of variables leads to bias. Porta M: A Dictionary of Epidemiology, Fifth Edition, 2008 RE RE Exposure and response RE RE RE Confounder, mediator, collider
  • 7. 24/09/2015 7 RE RE RE Confounder, mediator, collider Confounder ColliderMediator Note: The presence of either of these may affect the association of interest. Hence, including either of these in regression analyses may change effect estimates. Confounding Confounding is bias of the estimated effect of an exposure on an outcome due to the presence of a common cause of the exposure and the outcome. Important in observational designs. May lead to underestimation, overestimation, or even change the sign of the estimated effect. A confounder is a variable that is associated with the outcome (either as a cause or a proxy for a cause, but not as an effect of the disease), associated with the exposure, and not an effect of the exposure. The definition of confounding may also include bias due to baseline differences in exposure groups in the risk factor for the outcome, although this may be considered as selection bias. Confounding can be reduced by proper adjustment. Exploring data is not sufficient to identify whether a variable is a confounder, and such evaluation of confounding may lead to bias. Other evidence like pathophysiological and clinical knowledge and external data is needed. DAGs are useful tools when considering confounding variables. Ex: Shopping time vs estradiol level. Simulated data.
  • 8. 24/09/2015 8 Ex: Shopping time vs estradiol level. Simulated data. Females Males Ex: Shopping time vs estradiol level. Simulated data. Ex: Shopping time vs estradiol level. Simulated data. Females Males ShoppingEstradiol levels Hours Frequency Intensity Sex Confounder: Sex or gender? Biomarker Mediator In contrast to the confounder, a mediator represents a step in the causal pathway between the exposure and the outcome. Such a variable will also be associated with both the exposure and the outcome. Example: Mediators between maternal BMI and birth weight Path diagram showing a decomposition of the hypothesized effect of early pregnancy BMI on birth weight. The indirect pathways between BMI and birth weight were hypothesized to be mediated by fasting glucose (nutrient availability), the interleukins IL-1Ra or IL-6 or a combination of these. Friis et al. The interleukins IL-6 and IL-1Ra: a mediating role in the associations between BMI and birth weight?
  • 9. 24/09/2015 9 Direct effects and indirect effects Depends on the level of explanation. When available knowledge allows for it, a direct effect may be split into components, i.e. indirect effects via mediators. Example: Mediators between maternal BMI and birth weight Path diagram showing a decomposition of the hypothesized effect of early pregnancy BMI on birth weight. The indirect pathways between BMI and birth weight were hypothesized to be mediated by fasting glucose (nutrient availability), the interleukins IL-1Ra or IL-6 or a combination of these. Friis et al. The interleukins IL-6 and IL-1Ra: a mediating role in the associations between BMI and birth weight? Decomposition of the total effect of maternal BMI on birth weight. The total effect is the sum of all arrows, that is, the direct and indirect effects. The arrow widths represent the relative proportions of the total effect through a specific pathway. COLLIDER A variable directly affected by two or more other variables in the causal diagram. Porta M: A Dictionary of Epidemiology, Fifth Edition, 2008 Ex: Post-traumatic stress after terror attack. Simulated data. Grouped after length of hospital stay: > 2 weeks < 2 weeks Ex: Post-traumatic stress after terror attack. Simulated data.
  • 10. 24/09/2015 10 Ex: Post-traumatic stress after terror attack. Simulated data. Grouped after length of hospital stay: > 2 weeks < 2 weeks RE RE RE Confounder, mediator, collider Confounder ColliderMediator Note: The presence of either of these may affect the association of interest. Hence, including either of these in regression analyses may change effect estimates. > 2 weeks < 2 weeks Females Males Confounder. Correct to adjust Only expert knowledge can tell us what to do > 2 weeks < 2 weeks Females Males Collider. Wrong to adjust Regression analysis recipe (kind of) When the ultimate goal is to understand mechanisms, and to estimate associations between an exposure and an outcome as unbiasedly as possible: Use expert knowledge to identify exposure & outcome, confounders, colliders & mediators. Use DAGs to clarify and communicate. Find the crude association between exposure and outcome Measured variables: Adjust for confounders Do not adjust for colliders Sometimes adjust for mediators Unmeasured variables: Sensitivity analysis Alas, real world may not be so simple. Alas, real world may not be so simple: Expert knowledge may be lacking or inconclusive, regarding which variables and arrows to include or leave out in the DAG. Additional variables may come into consideration as confounders e.g. for indirect effects, resulting in a very complex DAG. It may be hard to tell (based on present knowledge) the direction of a causal effect, e.g. whether a variable is a confounder or a mediator, or a mediator or a collider. Feed-back may be of concern. Time-dependent covariates may exist.
  • 11. 24/09/2015 11 Topics not covered Inverse probability weighting Dynamic path analysis Time-dependent confounders Marginal structural models Latent variables Structural equation modelling Background: There is an association between giving birth to a large baby, and subsequent maternal diabetes. Draw a DAG that can clarify potential mechanisms behind this association. Discussion exercises Background It is controversial whether maternal hyperglycemia less severe than that in diabetes mellitus is associated with increased risks of adverse pregnancy outcomes. Hyperglycemia and Adverse Pregnancy Outcomes The HAPO Study Cooperative Research Group N Engl J Med 2008;358:1991-2002 Methods A total of 25,505 pregnant women underwent 75-g oral glucose-tolerance testing at 24 to 32 weeks of gestation. Data remained blinded if the fasting plasma glucose level was <5.8 mmol/l and the 2-hour plasma glucose level was <11.1 mmol/l. Primary outcomes: Birth weight above the 90th percentile for gestational age, primary cesarean delivery, clinically diagnosed neonatal hypoglycemia, and cord- blood serum C-peptide level above the 90th percentile. For continuous-variable analyses, odds ratios were calculated for a 1-SD increase in fasting, 1-hour, and 2-hour plasma glucose levels. As prespecified, to assess whether the log of the odds of each outcome was linearly related to glucose level, we added squared terms for glucose level for each adverse pregnancy outcome to assess whether there were significant quadratic associations. For each outcome, two logistic models were fit. Model I included adjustment for center or the variables used in estimating the 90th percentile for birth weight for gestational age (infant's sex, race or ethnic group, center, and parity). For associations of glycemia with primary outcomes, each glucose measurement was considered as both a categorical and a continuous variable in multiple logistic- regression analyses. For categorical analyses, each measure of glycemia was divided into seven categories, such that the 1- and 2-hour plasma glucose measures reflected data for approximately the same number of women in each category as did the fasting plasma glucose measure. Model II included adjustment for multiple potential prespecified confounders, including age, body-mass index (BMI), smoking status, alcohol use, presence or absence of a family history of diabetes, gestational age at the oral glucose-tolerance test, sex of the infant, parity (0, 1, or ≥2, except for primary cesarean deliveries), mean arterial pressure and presence or absence of hospitalization before delivery (except for preeclampsia), and presence or absence of a family history of hypertension and maternal urinary tract infection (for analysis of preeclampsia only). Height was also included as a potential confounder, on the basis of post hoc findings of an association with birth weight greater than the 90th percentile, and two prespecified confounders (maternal urinary tract infection and previous prenatal death) were excluded from primary and secondary outcome analyses when neither was found to be related to any primary outcome or to affect primary outcome–glucose associations. Only the fully adjusted model results are presented in this report. Please discuss this approach with your nearest neighbours. The HAPO Study Cooperative Research Group. N Engl J Med 2008;358:1991-2002. Frequency of Primary Outcomes across the Glucose Categories.
  • 12. 24/09/2015 12 Background: Physical activity is beneficial to a variety of physiological processes, including the metabolisation of nutritients. Might there also be positive effects on bone metabolism, as measured by bone mineral density? Aim: Estimate the association between physical activity and bone mineral density. Measured variables: Physical activity BMD Age Draw the DAG. Are there unmeasured variables that should be taken into consideration? If so, include them in the DAG. When studying the association between milk consume and bone mineral density, should one adjust for calcium supplements? Draw the DAG. To predict Does not have to explain/understand mechanisms, as long as it predicts well. Diagnostic tests Ex Diabetes diagnosis Melanoma screening based on picture and blood samples Expected devlopment in disease (e.g. prognosis after sepsis) Identify high-risk groups/profiles, compute risk scores Personalised medicine Diagnostic tests Diagnostic tests
  • 13. 24/09/2015 13 Forecasting [slide source: Birgitte Freiesleben deBlasio, Oslo Center for Biostatistics and Epidemiology] Precision medicine Aim: Tailoring of medical treatment to the individual characteristics, needs, and preferences of a patient during all stages of care, including prevention, diagnosis, treatment, and follow-up. Provide the right patient with the right drug at the right dose at the right time. Methods: Diagnostic tests to determine which medical treatments will work best for each patient. Personalized Health Care is based on systems biology and uses predictive tools to evaluate health risks and to design personalized health plans. Ex Predict a person’s risk for a particular disease, based on one or several genes. Initiate preventative treatment before the disease presents itself in their patient. Personalized medicine Refers to a vast set of tools for modelling and understanding data, i.e. to extract important patterns and trends in complex data sets. These tools can be classified as supervised or unsupervised. Supervised statistical learning involves building a statistical model for predicting, or estimating, an output based on one or more inputs. Statistical learning is a recently developed area in statistics and blends with parallel developments in computer science and, in particular, machine learning. The field encompasses many methods such as the lasso and sparse regression, classification and regression trees, and boosting and support vector machines. With the explosion of “Big Data” problems, statistical learning has become a very hot field in many scientific areas as well as marketing, finance, and other business disciplines. Statistical learning
  • 14. 24/09/2015 14 What is inside the black box? “BLACK BOX” 1. A method of reasoning or studying a problem in which the methods and procedures are not described, explained, or perhaps even understood.Nothing is stated or inferred about the method; discussion and conclusions relate solely to the empirical relationships observed. 2. A method of formally relating an input (e.g., quantity of a drug administered, exposure to a putative causal factor) to an output or an observed effect (e.g., amount of the drug eliminated, disease), without making detailed assumptions about the MECHANISMS that have contributed to the transformation of input to output within the organism (the “black box”). “BLACK-BOX EPIDEMIOLOGY” A common epidemiological approach, used both in research and in public health practice, in which the focus is on assessing putative causes and clinical effects (beneficial or adverse) rather than the underlying biological MECHANISMS. It is not a formal branch or speciality of epidemiology, nor is it an epidemiological method or philosophy. Loosely speaking, it is an opposite of MECHANISTIC EPIDEMIOLOGY. Porta M: A Dictionary of Epidemiology, Fifth Edition, 2008 MECHANISM In epidemiology and in other health, life and social sciences, the way in which a particular health-related event or outcome occurs, often described in terms of the agents and steps involved. Whereas the focus is often on biological mechanisms, environmental, social, and cultural mechanisms are also relevant to epidemiology, public health, medicine and related disciplines. MECHANISTIC EPIDEMIOLOGY Epidemiological research that focuses on MECHANISMS underlying and explaining associations between DETERMINANTS and health-related events or states. It is not a formal branch or or speciality of epidemiology, nor is it an epidemiological method or philosophy. Loosely, the opposite of “BLACK-BOX EPIDEMIOLOGY”. See also APPLIED EPIDEMIOLOGY. Porta M: A Dictionary of Epidemiology, Fifth Edition, 2008
  • 15. 24/09/2015 15 The big issue in prediction models: Prediction error www.scientificamerican.com Totalpopulationinbillions [slide source: Birgitte Freiesleben deBlasio, Oslo Center for Biostatistics and Epidemiology] http://www.minitab.com/en-us/Published-Articles/Weather-Forecasts--Just-How-Reliable-Are-They-/ The big issue in prediction models: Prediction error The best model = the optimal predictor is the one which minimises the prediction error, i.e. the model that predicts new values (or classifies undiagnosed patients) best possible
  • 16. 24/09/2015 16 Optimal predictor = the best model 1) Define prediction error. 2) Consider model complexity 3) «Old», registered data (data used to develop prediction model) vs new (yet unknown) data Model complexity # variables, polynomial terms, interactions, functional form etc. ComplexSimple E.g. Overall mean Predictionerror Optimal predictor = the best model ComplexSimple E.g. Overall mean New data Predictionerror Underfitting. Too simple model Overfitting. Too complex model Optimal predictor = the best model ComplexSimple E.g. Overall mean New data Predictionerror Model complexity # variables, polynomial terms, interactions, functional form etc. Optimal predictor = the best model Prediction error is a trade-off between bias and variance. Bias Variance New data Prediction error is a trade-off between bias and variance. Complex Precise fit to old data, but overfits and performs badly on new data Simple Potentially high prediction error in both old and new data Optimal predictor = the best model Predictionerror “Best model” dependent on how we define the error criterion, and how we weight/penalize bias and variance. Variable selection? No DAG to identify the roles of the variables; exposure, confounder, mediator etc. Model selection rather than variable selection. Not necessary to assume linearity by convenience for the interpretation of effect estimates, as prediction of future values is more important than interpretation of parameters.
  • 17. 24/09/2015 17 Ex2 Classification into two classes (toxic/non-toxic) based on two continuous predictors x1 and x2 Constructing a predictor/classifier, two simple examples Ex 1 Prediction of a continuous outcome y based on one continuous predictor x y x Predictor x Responsey I.e. What response do we expect to get for a new observation of x? Ex: What do we expect the blood glucose value to be 30 mins after a meal? new observation of x Ex 1 Prediction of a continuous outcome y based on one continuous predictor x Blood glucose y measured x minutes after a meal First shot: Simple linear model, y= a + bx + e A linear model is a simple model. Stable Not flexible Here: High prediction error Here: Bad! Predictor x Responsey Ex 1 Prediction of a continuous outcome y based on one continuous predictor x Blood glucose y measured x minutes after a meal Slightly more complex: y=a + bt + ct2+e Predictor x Responsey Ex 1 Prediction of a continuous outcome y based on one continuous predictor x Blood glucose y measured x minutes after a meal Increasing complexity: y=a + bt + ct2 + dt3+e Predictor x Responsey Ex 1 Prediction of a continuous outcome y based on one continuous predictor x Blood glucose y measured x minutes after a meal Predictor x Responsey Increasing complexity: A sum of third-degree polynomials More flexible Ex 1 Prediction of a continuous outcome y based on one continuous predictor x Blood glucose y measured x minutes after a meal
  • 18. 24/09/2015 18 The best linear combination of the basis functions is found by minimising the least squares expression Increasing complexity: Choose a set of basis functions, e.g. third- degree polynomials (= B-splines), formulate the predictor as a linear combination of the basis functions: Ex 1 Prediction of a continuous outcome y based on one continuous predictor x Blood glucose y measured x minutes after a meal f(x) f(x)f(x) (x) Best model, according to SSE, fits data perfectly We want to restrict the total amount of curvature, and minimize these least squares instead: Overfitted (too curvy) if measurement error Ex 1 Prediction of a continuous outcome y based on one continuous predictor x Blood glucose y measured x minutes after a meal f(x)f(x) Better model: Smoothed curve. Less curvy, overall better fit How much should we smooth/penalize? Chose the model with the λ that gives the least cross-validation score. Cross-validation: Leaves one out, then predicts it based on the other observations. Ex 1 Prediction of a continuous outcome y based on one continuous predictor x Blood glucose y measured x minutes after a meal Sample Observed category Predicted Linear Predicted 1-NN 1 1 2 1 3 1 4 1 5 1 6 0 7 0 8 1 9 1 10 0 11 0 12 0 13 0 14 1 15 0 16 0 [Artificial data by Manuela Zucknick, Oslo Center for Biostatistics and Epidemiology] Ex 2 Classifier: Predict correct class of a categorical outcome y, based on two continuous predictors x1 and x2 Sample Observed category Predicted Linear Predicted 1-NN 1 1 2 1 3 1 4 1 5 1 6 0 7 0 8 1 9 1 10 0 11 0 12 0 13 0 14 1 15 0 16 0 [Artificial data by Manuela Zucknick/Kathrine Frey Frøslie] CS (=1) Not CS (=0) BMI 18.5 25 30 35 40 Fetalsizemeasuredbyultrasound Ex 2 Classifier: Predict correct class of a categorical outcome y, based on two continuous predictors x1 and x2 Sample Observed category Predicted Linear Predicted 1-NN 1 1 1 2 1 1 3 1 1 4 1 1 5 1 1 6 0 0 7 0 0 8 1 1 9 1 1 10 0 0 11 0 0 12 0 0 13 0 0 14 1 1 15 0 0 16 0 1 Linear classifier CS (=1) Not CS (=0) BMI 18.5 25 30 35 40 Fetalsizemeasuredbyultrasound A linear model is a simple model. Stable Not flexible Here: Low prediction error. Here: Good!
  • 19. 24/09/2015 19 Sample Observed category Predicted Linear Predicted 1-NN 1 1 1 2 1 1 3 1 1 4 1 1 5 1 1 6 0 0 7 0 0 8 1 1 9 1 1 10 0 0 11 0 0 12 0 0 13 0 0 14 1 1 15 0 0 16 0 1 Predict/classify an observation according to its nearest neighbour (1-NN) CS (=1) Not CS (=0) BMI 18.5 25 30 35 40 Fetalsizemeasuredbyultrasound Divide the data set into training set (1/3) and test set (2/3) Leave-one-out cross-validation K-fold cross-validation How to obtain prediction error based on new data? A brilliant tutorial is found at http://www.autonlab.org/tutorials/overfit10.pdf Cross-validation for detecting and preventing overfitting Andrew W. Moore “Best” model? Variable selection (Forward/Backward/stepwise) Akaike’s information criterion (AIC) Bayes’ information criterion (BIC) Focused information criterion (FIC) Variable shrinkage PCA Ridge regression cf. Penalization in the curve fitting Lasso Functional forms of variables? Nonlinear models? “There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools.” Breiman L. Statistical modeling: The two cultures. Statist. Sci. 2001;16:199-231. What is inside the black box? Regression analysis comes with a huge toolbox and add-ons. Different regression tools and different approaches to (regression modelling) must be chosen for different scientific questions, even though the data set in the study «contains several variables to be included in the analysis».
  • 20. 24/09/2015 20 Topics not covered Model selection (Forward/Backward/stepwise) Akaike’s information criterion (AIC) Bayes’ information criterion (BIC) Focused information criterion (FIC) Variable shrinkage PCA Ridge regression Lasso Cross-validation What are the main differences between a regression analysis for the estimation of associations, and a regression analysis for prediction purposes? Discussion exercises To explain and to predict Public health perspective Using prediction models to construct/improve hypotheses and theories “Learning causal structures” Using expert knowledge to improve prediction by incorporating knowledge into prediction models Take action to improve health OCBE partner in new SFI The University of Oslo celebrates BIG INSIGHT, our new centre for research based innovation, awarded by the Research Council of Norway. Leaders of the new SFI centre Big Insight: Arnoldo Frigessi, Lars Holden, Kjersti Aas and André Teigland. UiO headline, November 2014 BIG INSIGHT will produce innovative solutions for key challenges facing the consortium of private, public and research partners, by developing original statistical and machine learning methodologies. Fulfilling the promise of the big data revolution, we shall invent deep analytical tools to extract knowledge from complex data and deliver BIG INSIGHT. We shall focus on two central innovation themes: deeply novel personalised solutions and sharper predictions of transient behaviours.
  • 21. 24/09/2015 21 References 1/2 • Abdelnoor M, Sandven I: Etiologisk versus prognostisk strategi i klinisk epidemiologisk forskning. Norsk epidemiologi, 2006;16:77-80. • Benner A, Zucknick M, Hielscher T, Ittrich C, Mansmann U. High-Dimensional Cox Models: The Choice of Penalty as Part of the Model Building Process. Biometrical Journal 2010;52:50-69. • Breiman L. Statistical modeling: The two cultures. Statist. Sci. 2001;16:199-231. • Fattori E, Cappelletti M, Costa P, et al. Defective inflammatory response in interleukin 6-deficient mice. J Exp Med. 1994; 180:1243–1250. • Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology 1999;10:37-48. • Hastie, Tibshirani and Friedman. The Elements of Statistical Learning, 2nd ed. Springer, 2009 • Hernan MA, Hernandez-Diaz S, Werler MM, Mitchell AA. Causal knowledge as a prerequisite for confounding evaluation: An application to birth defects epidemiology. Am J Epidemiol 2002;155:176-84. • Hernán MA, Robins JM. Causal Inference. Boca Raton: Chapman & Hall/CRC, forthcoming 2016. http://www.hsph.harvard.edu/miguel-hernan/causal-inference- book/ • James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning with Applications in R. Springer, 2014 http://www-bcf.usc.edu/~gareth/ISL/index.html References 2/2 • Janszky I, Ahlbom A, Svensson AC. The Janus face of statistical adjustment: confounder versus collider. Eur J Epidemiol 2010,25:361-363 • Porta M. A dictionary of epidemiology. 5th ed. New York: Oxford University Press; 2008. • Rothman KJ, Greenland S, Lash TL. Modern Epidemiology, 3rd ed. Lippincott Williams & Wilkins; 2008. • Shmueli G. To explain or to predict. Statist. Sci. 2010;25:289-310. • Somm E, Cettour-Rose P, Asensio C, et al. Interleukin-1 receptor antagonist is upregulated during diet-induced obesity and regulates insulin sensitivity in rodents. Diabetologia. 2006;49, 387–393. • Steyerberg EW: Clinical Prediction Models. A practical approach to development, validation and updating. Springer, 2009 • Thom R. Prédire N’est Pas Expliquer, 1991. Translated by Roy Lisker: http://www.fermentmagazine.org/Stories/Thom/Predire.pdf • VanderWeele TJ. Explanation in Causal Inference: Methods for Mediation and Interaction, Oxford University Press, 2015. • Veierød M, Lydersen S, Laake, (eds.). Medical statistics in clinical and epidemiological research. Gyldendal Akademisk; 2012. • Aalen OO, Røysland K, Gran JM, Ledergerber B. Causality, mediation and time: a dynamic viewpoint. Journal of the Royal Statistical Society: Series A, 2012;175:831- 861. Thanks to the sweet and smart colleagues at OCBE, who generously shared their slides and knowledge! Thanks to Valeria Vitelli, Odd Aalen, https://www.youtube.com/watch?v=e0vj-0imOLw