SlideShare a Scribd company logo
Regression models 1
Choosing regression models
An elementary introduction
Stephen Senn
Explanation
• I am not presenting these things because I
think you don’t know them
• I am presenting them because the people
you work with don’t know them
• And you need to explain these things to
them
Regression models 2
Outline
• Basic considerations in modelling
• Choosing predictors
• Transformation of the predictor(s)
• Transformation of the outcome
• Advice
Regression models 3
Basic considerations
Thinking before you model
Regression models 4
Regression models 5
Some Modelling Tasks
• Choose a generally suitable probability model
• Choose a set of suitable predictors
• Consider whether these need to be transformed
• Consider whether the outcome needs to be transformed
• Choose a technique for fitting the model
• Fit the model
• Assess goodness of fit of model
• Make causal inferences
• Issue predictions
Regression models 6
Factors Affecting Choice of
Model
• Purpose of model
– Causal, predictive, classification
• Design of study
– Designed experiment, observational study, survey,
• Temporal sequence
• Prior knowledge
• Type of data
– Continuous measurements, binary, ordinal, counts, censored life-
times
• Case ascertainment
• Results of model fitting
Preliminaries
• Choosing good regression models is not a question
of throwing some data at a stepwise selection
algorithm
• Two things are important
– Being clear about the purpose
– Insight (which in turn is based on)
• Experience
• Understanding
• Logic
Regression models 7
Two Extremes
Causal analysis
• The putative causal factor(s) must
be in the model
• Other factors are in the model
because they help us understand
the causal factor(s)
• They are of no interest in
themselves
• We pay particular attention to the
significance of the putative causal
factor(s)
Predictive modelling
• We are trying to find predictors of
some outcome
• It is their joint value as predictors
that is important
• We simply want the most
predictive model
• We compare entire models to
judge which is best
Regression models 8
Example
• Modelling the effect of treatment in a clinical trial
• Treatment must be in any model whether or not it
is significant
• Other factors will be in the model to help me
improve my estimate of the effect of treatment
– They are of little interest in themselves
– They are nearly always predetermined
Regression models 9
Does Smoking Cause Lung Cancer?
A Tale of Two Statisticians
Works in public health
• I wish to establish whether it
is causal
• If so I can warn smokers to
quit and this will benefit
their health
• It is important for me to rule
out possible confounding
factors
Works in life insurance
• I don’t care if it is causal or
not
• The data show that smokers
are much more likely to get
lung cancer
• That’s enough for me to
take account of it in setting
the premiums
Regression models 10
Warning
• Regression models are there to help you use
your insight, experience and prior
knowledge to understand your datasets
• They are not a substitute for scientific
understanding
Regression models 11
Choosing predictors
It’s not just a matter of significance
Regression models 12
Regression models 13
An Example
• Multicentre trial of asthma comparing formoterol, salbutamol and
placebo for their effects on forced expiratory volume in one second
(FEV1).
• Randomisation stratified by steroid use (yes/no) and centre
• Sex, age, height of patient and baseline FEV1 also measured
• Definitely in the model
– Blocking factors: centre & steroid use
– Treatment factor (3 levels: formoterol, salbutamol, placebo)
• Possibly in the model
– Covariates: sex, age, height of patient and baseline FEV1
– NB sex, age, height are very predictive of baseline FEV1 also therefore if
you put them all in the model none may be significant
– This does not matter
Regression models 14
Temporal Sequence I
• If we are interested in causal inferences it is
usually inappropriate to include variables
that were measured later in a model than
putative causal variables that were
measured earlier.
• The later variables cannot have caused the
earlier variables and so should not be
included.
Regression models 15
Example
• It is desired to study whether the type of school attended
(private or state school) affects students’ chances of
success in final degree examinations at university
• Data are obtained for a large group of students
• In addition to information on degree results and type of
school attended, information is obtained on
– sex of student,
– high school results
– parents’ income
• Which of these factors is it inappropriate to include in the
model and why?
Regression models 16
Temporal Sequence II
• The same does not apply if the purpose of
the model is simply classification
• It may then be helpful to have factors in the
model even if they are measured after the
“outcome variable of interest”
• Indeed they can be included even if they
have been “caused” by the variable of
interest
Regression models 17
Example
• We wish to develop a model for classifying
patients who present with abdominal pain as
either suffering from appendicitis or non-
specific abdominal pain
• We use location of pain, degree of pain,
absence/presence of nausea, body
temperature as “predictor” variable
– Even though these are consequences of rather
than causes of appendicitis
Regression models 18
Prior Knowledge
• Frequently when fitting models we already have strong opinions about
the effect of some factors even if we are ignorant about others.
– For example we may be examining the effect of a previously
unstudied environmental exposure on health
– we know, however, that age is an important determinant of health
• We will tend to put factors we believe are important in the model
irrespective of their significance according to the current data set.
• Similarly, implicitly, there will always be a host of factors we believe
are irrelevant.
• We will not put these in the model on prior grounds
19
Type of Data
and Choice of Basic Model
Type of Data
• Continuous
measurement
• Count data
• Binary data
• Ordered categorical
• Censored lifetimes
• Multinomial
Possible Basic Model
• General linear model
(Normal outcomes)
• Poisson regression
• Logistic regression
• Proportional odds
• Proportional hazards
• Log-linear
Regression models
Regression models 20
Case Ascertainment
• The way in which data are obtained (ascertained) can
affect the way that we build a model
• For example in a case-control study we sample by outcome
(cases and controls) and then measure how these two differ
by exposure
– Example
• Case: lung cancer, Control: other cancer
• Exposure: smoker versus non-smoker
• We cannot model relative risk using such data
• We can only model (log) odds ratios
• For a cohort study where we sample by exposure we could
model either
Regression models 21
Social Status: Longer life expectancy for Oscar winners
A study of actors and actresses found that Oscar winners lived, on average,
almost four years longer than nominees who went home empty-handed, reports
the March issue of the Harvard Health Letter. Actors aren’t the only people who
reap benefits. Dr. Donald Redelmeier of Toronto’s Sunnybrook and Women’s
College Health Sciences Centre found that Oscar-winning directors live longer
than non-winners, and male directors live 4.5 years longer on average than
actors. These findings add to a large body of evidence delineating connections
between social status and health and longevity, reports the Harvard Health
Letter. Redelmeier theorizes that an Oscar on the mantel moves the winner up
the Hollywood pecking order. Winners find it easier to get work, and when they
do, they’re better appreciated and better paid.
Not Harvard Health Publications
Regression models 22
A study has shown that getting a telegram from The Queen can add 20 years to
your life
An extensive study of individuals who have received telegrams from The
Queen has shown that an astonishing proportion of them have lived to be 100.
Age at death of a control group of non-recipients was typically 20 years less.
Researchers have postulated that esteem is an important determinant of health
Joked lead researcher, Prof Morton Gullible, ‘our advice to her Majesty is send
yourself a telegram, Ma'am’
Regression models 23
Results of Model Fitting
• Statisticians have developed a number of
techniques for assessing the adequacy of various
models using the data in hand
– Standard errors, significance tests on coefficients
– Analysis of variance/ deviance on factors
– Goodness of fit generally
– Residual plots
– AIC, BIC
• These are important tools but are by no means the
only tools for assessing the adequacy of a model
Transforming predictors
The X Files
Regression models 24
Luxembourg Temperature Example
Regression models 25
Data on temperatures in Luxembourg
Month Normal temperatures deg C
January 0.6
February 1.4
March 4.7
April 7.7
May 12.4
June 15.1
July 17.5
August 17.3
September 13.5
October 8.9
November 4.0
December 1.8
Modelling the temperature
Regression models 26
Note that in the yearly rhythm, January follows December even
though January is point 1 and December point 12.
The data are periodic and we need a model that reflects this.
The simplest periodic pattern is a sine wave.
𝑌 = 𝛼 + βsin 𝑋 + 𝜃
 = level (the average temperature)
 = amplitude (the difference max to average)
 = phase (governs point at which maximum is reached)
Fitting a sine wave
Regression models 27
A sine wave model can be fitted by using the fact that
sin 𝑋 + 𝜃 = cos 𝜃 sin 𝑋 + sin 𝜃 cos 𝑋
This is linear in sin 𝑋, cos 𝑋. Hence by regressing Y on two
variables sin 𝑋, cos 𝑋 we can obtain a periodic fit.
Note that X must be transformed from linear to angular
measure. So we can write
𝑋 = 360 × 𝑚𝑜𝑛𝑡ℎ 12
if we measure in degrees or
𝑋 = 2Π × 𝑚𝑜𝑛𝑡ℎ 12
radians
Regression models 28
3 parameters
fit 12 points
rather well
Transforming the outcome
Being wise about Ys
Regression models 29
Regression models 30
An Example of a One-way Layout
• Four experimental p38 kinase inhibitors
• Vehicle and marketed product as controls
• Thrombaxane B2 (TXB2) is used as a
marker of COX-1 activity
• Six rats per group were treated for a total of
36 rats
• At the end of the study rats are sacrificed
and TXB2 is measured.
Regression models 31
Regression models 32
GenStat® ANOVA
(Original data)
Analysis of variance
Variate: TXB2 𝜎2
2
Source of variation d.f. s.s. m.s. v.r. F pr.
Treatment 5 184596. 36919. 6.31 <.001
Residual 30 175439. 5848.
Total 35 360035.
𝜎1
2
𝜎2
2
𝜎1
2
A2WAY [TREATMENTS=Treatment] TXB2
Regression models 33
GenStat plot of
residuals
Regression models 34
Regression models 35
GenStat ANOVA
(log transformed)
A2WAY [TREATMENTS=Treatment] logTXB2
Analysis of variance
Variate: logTXB2
Source of variation d.f. s.s. m.s. v.r.
Treatment 5 62.6760 12.5352 40.09
Residual 30 9.3800 0.3127
Total 35 72.0559
Signal to noise ratio is
now much higher
Regression models 36
GenStat plot
of residuals
Regression models 37
Homogeneity of Variances
(Bartlett’ Test: GenStat)
Untransformed
*** Bartlett's Test for homogeneity of variances ***
Chi-square 50.87 on 5 degrees of freedom: probability <
0.001
Log-transformed
*** Bartlett's Test for homogeneity of variances ***
Chi-square 8.95 on 5 degrees of freedom: probability 0.111
Data-filtering examples
or find the flaw
• A 20 year follow-up study of women in an English village
found higher survival amongst smokers than non-smokers
• Transplant receivers on highest doses of cyclosporine had
higher probability of graft rejection than on lower doses
• Left-handers observed to die younger on average than
right-handers
• Obese infarct survivors have better prognosis than non-
obese
Regression models 38
Advice
Statistics is a way of improving your thinking, not a substitute for it
Regression models 39
Advice
• Think before you model
• Purpose is key
– Causal
– Predictive
– Classification
• Think about time
• Think about case ascertainment
• Testing is a small part of discerning
• Don’t use stepwise regression as a substitute for
understanding
Regression models 40

More Related Content

What's hot

Clinical trials are about comparability not generalisability V2.pptx
Clinical trials are about comparability not generalisability V2.pptxClinical trials are about comparability not generalisability V2.pptx
Clinical trials are about comparability not generalisability V2.pptx
StephenSenn2
 
ML and AI: a blessing and curse for statisticians and medical doctors
ML and AI: a blessing and curse forstatisticians and medical doctorsML and AI: a blessing and curse forstatisticians and medical doctors
ML and AI: a blessing and curse for statisticians and medical doctors
Maarten van Smeden
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
StephenSenn2
 
Introduction to meta-analysis (1612_MA_workshop)
Introduction to meta-analysis (1612_MA_workshop)Introduction to meta-analysis (1612_MA_workshop)
Introduction to meta-analysis (1612_MA_workshop)
Ahmed Negida
 
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Ewout Steyerberg
 
Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?
Maarten van Smeden
 
Correcting for missing data, measurement error and confounding
Correcting for missing data, measurement error and confoundingCorrecting for missing data, measurement error and confounding
Correcting for missing data, measurement error and confounding
Maarten van Smeden
 
Machine learning versus traditional statistical modeling and medical doctors
Machine learning versus traditional statistical modeling and medical doctorsMachine learning versus traditional statistical modeling and medical doctors
Machine learning versus traditional statistical modeling and medical doctors
Maarten van Smeden
 
Clinical prediction models
Clinical prediction modelsClinical prediction models
Clinical prediction models
Maarten van Smeden
 
Introduction to prediction modelling - Berlin 2018 - Part I
Introduction to prediction modelling - Berlin 2018 - Part IIntroduction to prediction modelling - Berlin 2018 - Part I
Introduction to prediction modelling - Berlin 2018 - Part I
Maarten van Smeden
 
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...
Maarten van Smeden
 
On p-values
On p-valuesOn p-values
On p-values
Maarten van Smeden
 
Sample size for survival analysis - a guide to planning successful clinical t...
Sample size for survival analysis - a guide to planning successful clinical t...Sample size for survival analysis - a guide to planning successful clinical t...
Sample size for survival analysis - a guide to planning successful clinical t...
nQuery
 
Precision Medicine in the Big Data World
Precision Medicine in the Big Data WorldPrecision Medicine in the Big Data World
Precision Medicine in the Big Data World
Cloudera, Inc.
 
Dichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianDichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatistician
Laure Wynants
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
Derek Kane
 
Development and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutionsDevelopment and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutions
Maarten van Smeden
 
Statistical Models Explored and Explained
Statistical Models Explored and ExplainedStatistical Models Explored and Explained
Statistical Models Explored and Explained
Optimizely
 
Data Analytics in Healthcare
Data Analytics in HealthcareData Analytics in Healthcare
Data Analytics in Healthcare
Mark Gall
 
Structural Equation Modelling (SEM) Part 3
Structural Equation Modelling (SEM) Part 3Structural Equation Modelling (SEM) Part 3
Structural Equation Modelling (SEM) Part 3
COSTARCH Analytical Consulting (P) Ltd.
 

What's hot (20)

Clinical trials are about comparability not generalisability V2.pptx
Clinical trials are about comparability not generalisability V2.pptxClinical trials are about comparability not generalisability V2.pptx
Clinical trials are about comparability not generalisability V2.pptx
 
ML and AI: a blessing and curse for statisticians and medical doctors
ML and AI: a blessing and curse forstatisticians and medical doctorsML and AI: a blessing and curse forstatisticians and medical doctors
ML and AI: a blessing and curse for statisticians and medical doctors
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
 
Introduction to meta-analysis (1612_MA_workshop)
Introduction to meta-analysis (1612_MA_workshop)Introduction to meta-analysis (1612_MA_workshop)
Introduction to meta-analysis (1612_MA_workshop)
 
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
Prediction, Big Data, and AI: Steyerberg, Basel Nov 1, 2019
 
Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?Is it causal, is it prediction or is it neither?
Is it causal, is it prediction or is it neither?
 
Correcting for missing data, measurement error and confounding
Correcting for missing data, measurement error and confoundingCorrecting for missing data, measurement error and confounding
Correcting for missing data, measurement error and confounding
 
Machine learning versus traditional statistical modeling and medical doctors
Machine learning versus traditional statistical modeling and medical doctorsMachine learning versus traditional statistical modeling and medical doctors
Machine learning versus traditional statistical modeling and medical doctors
 
Clinical prediction models
Clinical prediction modelsClinical prediction models
Clinical prediction models
 
Introduction to prediction modelling - Berlin 2018 - Part I
Introduction to prediction modelling - Berlin 2018 - Part IIntroduction to prediction modelling - Berlin 2018 - Part I
Introduction to prediction modelling - Berlin 2018 - Part I
 
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...
Shrinkage in medical prediction: the poor man’s solution for an inadequate sa...
 
On p-values
On p-valuesOn p-values
On p-values
 
Sample size for survival analysis - a guide to planning successful clinical t...
Sample size for survival analysis - a guide to planning successful clinical t...Sample size for survival analysis - a guide to planning successful clinical t...
Sample size for survival analysis - a guide to planning successful clinical t...
 
Precision Medicine in the Big Data World
Precision Medicine in the Big Data WorldPrecision Medicine in the Big Data World
Precision Medicine in the Big Data World
 
Dichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatisticianDichotomania and other challenges for the collaborating biostatistician
Dichotomania and other challenges for the collaborating biostatistician
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
 
Development and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutionsDevelopment and evaluation of prediction models: pitfalls and solutions
Development and evaluation of prediction models: pitfalls and solutions
 
Statistical Models Explored and Explained
Statistical Models Explored and ExplainedStatistical Models Explored and Explained
Statistical Models Explored and Explained
 
Data Analytics in Healthcare
Data Analytics in HealthcareData Analytics in Healthcare
Data Analytics in Healthcare
 
Structural Equation Modelling (SEM) Part 3
Structural Equation Modelling (SEM) Part 3Structural Equation Modelling (SEM) Part 3
Structural Equation Modelling (SEM) Part 3
 

Similar to Choosing Regression Models

Data collection
Data collectionData collection
Data collection
manayer otb
 
SAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptx
SAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptxSAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptx
SAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptx
ssuserd509321
 
Quick introduction to critical appraisal of quantitative research
Quick introduction to critical appraisal of quantitative researchQuick introduction to critical appraisal of quantitative research
Quick introduction to critical appraisal of quantitative research
Alan Fricker
 
9. Selecting a sample
9. Selecting a sample9. Selecting a sample
9. Selecting a sample
Razif Shahril
 
Questionnaire-based Research Workshop.pdf
Questionnaire-based Research Workshop.pdfQuestionnaire-based Research Workshop.pdf
Questionnaire-based Research Workshop.pdf
AimanAlwadi1
 
Tests of significance Periodontology
Tests of significance PeriodontologyTests of significance Periodontology
Tests of significance Periodontology
SaiLakshmi128
 
Chapter 2Study DesignsLearning Objectives•.docx
Chapter 2Study DesignsLearning Objectives•.docxChapter 2Study DesignsLearning Objectives•.docx
Chapter 2Study DesignsLearning Objectives•.docx
keturahhazelhurst
 
Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)
Bioinformatics and Computational Biosciences Branch
 
final.pptx
final.pptxfinal.pptx
final.pptx
Ritasman Baisya
 
Study design2 6_07
Study design2 6_07Study design2 6_07
Study design2 6_07Dan Fisher
 
Chapter 10Data Interpretation IssuesLearning Objec.docx
Chapter 10Data Interpretation IssuesLearning Objec.docxChapter 10Data Interpretation IssuesLearning Objec.docx
Chapter 10Data Interpretation IssuesLearning Objec.docx
keturahhazelhurst
 
Epidemiological study designs
Epidemiological study designsEpidemiological study designs
Epidemiological study designs
jarati
 
Study designs in oncology
Study designs in oncologyStudy designs in oncology
Study designs in oncology
Dr.Ram Madhavan
 
Research Methods 2 for Midwifery students .pptx
Research Methods 2 for Midwifery students .pptxResearch Methods 2 for Midwifery students .pptx
Research Methods 2 for Midwifery students .pptx
Endex Tam
 
UPDATED-Quantitative-Methods for Prelims
UPDATED-Quantitative-Methods for PrelimsUPDATED-Quantitative-Methods for Prelims
UPDATED-Quantitative-Methods for Prelims
Marvin158667
 
Case control studies
Case control studiesCase control studies
Case control studies
Ram Arya
 
Hnc research methodology
Hnc research methodologyHnc research methodology
Hnc research methodology
Margaret Templeton
 
Biostatistics.pptxhgjfhgfthfujkolikhgjhcghd
Biostatistics.pptxhgjfhgfthfujkolikhgjhcghdBiostatistics.pptxhgjfhgfthfujkolikhgjhcghd
Biostatistics.pptxhgjfhgfthfujkolikhgjhcghd
madanshresthanepal
 
Case_Control_Study.pptx
Case_Control_Study.pptxCase_Control_Study.pptx
Case_Control_Study.pptx
Royal Dental College Library
 

Similar to Choosing Regression Models (20)

Data collection
Data collectionData collection
Data collection
 
SAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptx
SAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptxSAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptx
SAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptx
 
Quick introduction to critical appraisal of quantitative research
Quick introduction to critical appraisal of quantitative researchQuick introduction to critical appraisal of quantitative research
Quick introduction to critical appraisal of quantitative research
 
9. Selecting a sample
9. Selecting a sample9. Selecting a sample
9. Selecting a sample
 
Questionnaire-based Research Workshop.pdf
Questionnaire-based Research Workshop.pdfQuestionnaire-based Research Workshop.pdf
Questionnaire-based Research Workshop.pdf
 
Tests of significance Periodontology
Tests of significance PeriodontologyTests of significance Periodontology
Tests of significance Periodontology
 
Chapter 2Study DesignsLearning Objectives•.docx
Chapter 2Study DesignsLearning Objectives•.docxChapter 2Study DesignsLearning Objectives•.docx
Chapter 2Study DesignsLearning Objectives•.docx
 
Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)
 
final.pptx
final.pptxfinal.pptx
final.pptx
 
Study design2 6_07
Study design2 6_07Study design2 6_07
Study design2 6_07
 
Chapter 10Data Interpretation IssuesLearning Objec.docx
Chapter 10Data Interpretation IssuesLearning Objec.docxChapter 10Data Interpretation IssuesLearning Objec.docx
Chapter 10Data Interpretation IssuesLearning Objec.docx
 
Epidemiological study designs
Epidemiological study designsEpidemiological study designs
Epidemiological study designs
 
Study designs in oncology
Study designs in oncologyStudy designs in oncology
Study designs in oncology
 
Research methodology by hw
 Research methodology by hw Research methodology by hw
Research methodology by hw
 
Research Methods 2 for Midwifery students .pptx
Research Methods 2 for Midwifery students .pptxResearch Methods 2 for Midwifery students .pptx
Research Methods 2 for Midwifery students .pptx
 
UPDATED-Quantitative-Methods for Prelims
UPDATED-Quantitative-Methods for PrelimsUPDATED-Quantitative-Methods for Prelims
UPDATED-Quantitative-Methods for Prelims
 
Case control studies
Case control studiesCase control studies
Case control studies
 
Hnc research methodology
Hnc research methodologyHnc research methodology
Hnc research methodology
 
Biostatistics.pptxhgjfhgfthfujkolikhgjhcghd
Biostatistics.pptxhgjfhgfthfujkolikhgjhcghdBiostatistics.pptxhgjfhgfthfujkolikhgjhcghd
Biostatistics.pptxhgjfhgfthfujkolikhgjhcghd
 
Case_Control_Study.pptx
Case_Control_Study.pptxCase_Control_Study.pptx
Case_Control_Study.pptx
 

More from Stephen Senn

What is your question
What is your questionWhat is your question
What is your question
Stephen Senn
 
Vaccine trials in the age of COVID-19
Vaccine trials in the age of COVID-19Vaccine trials in the age of COVID-19
Vaccine trials in the age of COVID-19
Stephen Senn
 
To infinity and beyond v2
To infinity and beyond v2To infinity and beyond v2
To infinity and beyond v2
Stephen Senn
 
Approximate ANCOVA
Approximate ANCOVAApproximate ANCOVA
Approximate ANCOVA
Stephen Senn
 
The Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective StatisticiansThe Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective Statisticians
Stephen Senn
 
Minimally important differences v2
Minimally important differences v2Minimally important differences v2
Minimally important differences v2
Stephen Senn
 
Clinical trials: quo vadis in the age of covid?
Clinical trials: quo vadis in the age of covid?Clinical trials: quo vadis in the age of covid?
Clinical trials: quo vadis in the age of covid?
Stephen Senn
 
A century of t tests
A century of t testsA century of t tests
A century of t tests
Stephen Senn
 
Is ignorance bliss
Is ignorance blissIs ignorance bliss
Is ignorance bliss
Stephen Senn
 
What should we expect from reproducibiliry
What should we expect from reproducibiliryWhat should we expect from reproducibiliry
What should we expect from reproducibiliry
Stephen Senn
 
Personalised medicine a sceptical view
Personalised medicine a sceptical viewPersonalised medicine a sceptical view
Personalised medicine a sceptical view
Stephen Senn
 
In search of the lost loss function
In search of the lost loss function In search of the lost loss function
In search of the lost loss function
Stephen Senn
 
To infinity and beyond
To infinity and beyond To infinity and beyond
To infinity and beyond
Stephen Senn
 
De Finetti meets Popper
De Finetti meets PopperDe Finetti meets Popper
De Finetti meets Popper
Stephen Senn
 
Understanding randomisation
Understanding randomisationUnderstanding randomisation
Understanding randomisation
Stephen Senn
 
In Search of Lost Infinities: What is the “n” in big data?
In Search of Lost Infinities: What is the “n” in big data?In Search of Lost Infinities: What is the “n” in big data?
In Search of Lost Infinities: What is the “n” in big data?
Stephen Senn
 
NNTs, responder analysis & overlap measures
NNTs, responder analysis & overlap measuresNNTs, responder analysis & overlap measures
NNTs, responder analysis & overlap measures
Stephen Senn
 
Seventy years of RCTs
Seventy years of RCTsSeventy years of RCTs
Seventy years of RCTs
Stephen Senn
 
The Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxThe Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradox
Stephen Senn
 
The revenge of RA Fisher
The revenge of RA Fisher The revenge of RA Fisher
The revenge of RA Fisher
Stephen Senn
 

More from Stephen Senn (20)

What is your question
What is your questionWhat is your question
What is your question
 
Vaccine trials in the age of COVID-19
Vaccine trials in the age of COVID-19Vaccine trials in the age of COVID-19
Vaccine trials in the age of COVID-19
 
To infinity and beyond v2
To infinity and beyond v2To infinity and beyond v2
To infinity and beyond v2
 
Approximate ANCOVA
Approximate ANCOVAApproximate ANCOVA
Approximate ANCOVA
 
The Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective StatisticiansThe Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective Statisticians
 
Minimally important differences v2
Minimally important differences v2Minimally important differences v2
Minimally important differences v2
 
Clinical trials: quo vadis in the age of covid?
Clinical trials: quo vadis in the age of covid?Clinical trials: quo vadis in the age of covid?
Clinical trials: quo vadis in the age of covid?
 
A century of t tests
A century of t testsA century of t tests
A century of t tests
 
Is ignorance bliss
Is ignorance blissIs ignorance bliss
Is ignorance bliss
 
What should we expect from reproducibiliry
What should we expect from reproducibiliryWhat should we expect from reproducibiliry
What should we expect from reproducibiliry
 
Personalised medicine a sceptical view
Personalised medicine a sceptical viewPersonalised medicine a sceptical view
Personalised medicine a sceptical view
 
In search of the lost loss function
In search of the lost loss function In search of the lost loss function
In search of the lost loss function
 
To infinity and beyond
To infinity and beyond To infinity and beyond
To infinity and beyond
 
De Finetti meets Popper
De Finetti meets PopperDe Finetti meets Popper
De Finetti meets Popper
 
Understanding randomisation
Understanding randomisationUnderstanding randomisation
Understanding randomisation
 
In Search of Lost Infinities: What is the “n” in big data?
In Search of Lost Infinities: What is the “n” in big data?In Search of Lost Infinities: What is the “n” in big data?
In Search of Lost Infinities: What is the “n” in big data?
 
NNTs, responder analysis & overlap measures
NNTs, responder analysis & overlap measuresNNTs, responder analysis & overlap measures
NNTs, responder analysis & overlap measures
 
Seventy years of RCTs
Seventy years of RCTsSeventy years of RCTs
Seventy years of RCTs
 
The Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxThe Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradox
 
The revenge of RA Fisher
The revenge of RA Fisher The revenge of RA Fisher
The revenge of RA Fisher
 

Recently uploaded

Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 

Recently uploaded (20)

Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 

Choosing Regression Models

  • 1. Regression models 1 Choosing regression models An elementary introduction Stephen Senn
  • 2. Explanation • I am not presenting these things because I think you don’t know them • I am presenting them because the people you work with don’t know them • And you need to explain these things to them Regression models 2
  • 3. Outline • Basic considerations in modelling • Choosing predictors • Transformation of the predictor(s) • Transformation of the outcome • Advice Regression models 3
  • 4. Basic considerations Thinking before you model Regression models 4
  • 5. Regression models 5 Some Modelling Tasks • Choose a generally suitable probability model • Choose a set of suitable predictors • Consider whether these need to be transformed • Consider whether the outcome needs to be transformed • Choose a technique for fitting the model • Fit the model • Assess goodness of fit of model • Make causal inferences • Issue predictions
  • 6. Regression models 6 Factors Affecting Choice of Model • Purpose of model – Causal, predictive, classification • Design of study – Designed experiment, observational study, survey, • Temporal sequence • Prior knowledge • Type of data – Continuous measurements, binary, ordinal, counts, censored life- times • Case ascertainment • Results of model fitting
  • 7. Preliminaries • Choosing good regression models is not a question of throwing some data at a stepwise selection algorithm • Two things are important – Being clear about the purpose – Insight (which in turn is based on) • Experience • Understanding • Logic Regression models 7
  • 8. Two Extremes Causal analysis • The putative causal factor(s) must be in the model • Other factors are in the model because they help us understand the causal factor(s) • They are of no interest in themselves • We pay particular attention to the significance of the putative causal factor(s) Predictive modelling • We are trying to find predictors of some outcome • It is their joint value as predictors that is important • We simply want the most predictive model • We compare entire models to judge which is best Regression models 8
  • 9. Example • Modelling the effect of treatment in a clinical trial • Treatment must be in any model whether or not it is significant • Other factors will be in the model to help me improve my estimate of the effect of treatment – They are of little interest in themselves – They are nearly always predetermined Regression models 9
  • 10. Does Smoking Cause Lung Cancer? A Tale of Two Statisticians Works in public health • I wish to establish whether it is causal • If so I can warn smokers to quit and this will benefit their health • It is important for me to rule out possible confounding factors Works in life insurance • I don’t care if it is causal or not • The data show that smokers are much more likely to get lung cancer • That’s enough for me to take account of it in setting the premiums Regression models 10
  • 11. Warning • Regression models are there to help you use your insight, experience and prior knowledge to understand your datasets • They are not a substitute for scientific understanding Regression models 11
  • 12. Choosing predictors It’s not just a matter of significance Regression models 12
  • 13. Regression models 13 An Example • Multicentre trial of asthma comparing formoterol, salbutamol and placebo for their effects on forced expiratory volume in one second (FEV1). • Randomisation stratified by steroid use (yes/no) and centre • Sex, age, height of patient and baseline FEV1 also measured • Definitely in the model – Blocking factors: centre & steroid use – Treatment factor (3 levels: formoterol, salbutamol, placebo) • Possibly in the model – Covariates: sex, age, height of patient and baseline FEV1 – NB sex, age, height are very predictive of baseline FEV1 also therefore if you put them all in the model none may be significant – This does not matter
  • 14. Regression models 14 Temporal Sequence I • If we are interested in causal inferences it is usually inappropriate to include variables that were measured later in a model than putative causal variables that were measured earlier. • The later variables cannot have caused the earlier variables and so should not be included.
  • 15. Regression models 15 Example • It is desired to study whether the type of school attended (private or state school) affects students’ chances of success in final degree examinations at university • Data are obtained for a large group of students • In addition to information on degree results and type of school attended, information is obtained on – sex of student, – high school results – parents’ income • Which of these factors is it inappropriate to include in the model and why?
  • 16. Regression models 16 Temporal Sequence II • The same does not apply if the purpose of the model is simply classification • It may then be helpful to have factors in the model even if they are measured after the “outcome variable of interest” • Indeed they can be included even if they have been “caused” by the variable of interest
  • 17. Regression models 17 Example • We wish to develop a model for classifying patients who present with abdominal pain as either suffering from appendicitis or non- specific abdominal pain • We use location of pain, degree of pain, absence/presence of nausea, body temperature as “predictor” variable – Even though these are consequences of rather than causes of appendicitis
  • 18. Regression models 18 Prior Knowledge • Frequently when fitting models we already have strong opinions about the effect of some factors even if we are ignorant about others. – For example we may be examining the effect of a previously unstudied environmental exposure on health – we know, however, that age is an important determinant of health • We will tend to put factors we believe are important in the model irrespective of their significance according to the current data set. • Similarly, implicitly, there will always be a host of factors we believe are irrelevant. • We will not put these in the model on prior grounds
  • 19. 19 Type of Data and Choice of Basic Model Type of Data • Continuous measurement • Count data • Binary data • Ordered categorical • Censored lifetimes • Multinomial Possible Basic Model • General linear model (Normal outcomes) • Poisson regression • Logistic regression • Proportional odds • Proportional hazards • Log-linear Regression models
  • 20. Regression models 20 Case Ascertainment • The way in which data are obtained (ascertained) can affect the way that we build a model • For example in a case-control study we sample by outcome (cases and controls) and then measure how these two differ by exposure – Example • Case: lung cancer, Control: other cancer • Exposure: smoker versus non-smoker • We cannot model relative risk using such data • We can only model (log) odds ratios • For a cohort study where we sample by exposure we could model either
  • 21. Regression models 21 Social Status: Longer life expectancy for Oscar winners A study of actors and actresses found that Oscar winners lived, on average, almost four years longer than nominees who went home empty-handed, reports the March issue of the Harvard Health Letter. Actors aren’t the only people who reap benefits. Dr. Donald Redelmeier of Toronto’s Sunnybrook and Women’s College Health Sciences Centre found that Oscar-winning directors live longer than non-winners, and male directors live 4.5 years longer on average than actors. These findings add to a large body of evidence delineating connections between social status and health and longevity, reports the Harvard Health Letter. Redelmeier theorizes that an Oscar on the mantel moves the winner up the Hollywood pecking order. Winners find it easier to get work, and when they do, they’re better appreciated and better paid.
  • 22. Not Harvard Health Publications Regression models 22 A study has shown that getting a telegram from The Queen can add 20 years to your life An extensive study of individuals who have received telegrams from The Queen has shown that an astonishing proportion of them have lived to be 100. Age at death of a control group of non-recipients was typically 20 years less. Researchers have postulated that esteem is an important determinant of health Joked lead researcher, Prof Morton Gullible, ‘our advice to her Majesty is send yourself a telegram, Ma'am’
  • 23. Regression models 23 Results of Model Fitting • Statisticians have developed a number of techniques for assessing the adequacy of various models using the data in hand – Standard errors, significance tests on coefficients – Analysis of variance/ deviance on factors – Goodness of fit generally – Residual plots – AIC, BIC • These are important tools but are by no means the only tools for assessing the adequacy of a model
  • 24. Transforming predictors The X Files Regression models 24
  • 25. Luxembourg Temperature Example Regression models 25 Data on temperatures in Luxembourg Month Normal temperatures deg C January 0.6 February 1.4 March 4.7 April 7.7 May 12.4 June 15.1 July 17.5 August 17.3 September 13.5 October 8.9 November 4.0 December 1.8
  • 26. Modelling the temperature Regression models 26 Note that in the yearly rhythm, January follows December even though January is point 1 and December point 12. The data are periodic and we need a model that reflects this. The simplest periodic pattern is a sine wave. 𝑌 = 𝛼 + βsin 𝑋 + 𝜃  = level (the average temperature)  = amplitude (the difference max to average)  = phase (governs point at which maximum is reached)
  • 27. Fitting a sine wave Regression models 27 A sine wave model can be fitted by using the fact that sin 𝑋 + 𝜃 = cos 𝜃 sin 𝑋 + sin 𝜃 cos 𝑋 This is linear in sin 𝑋, cos 𝑋. Hence by regressing Y on two variables sin 𝑋, cos 𝑋 we can obtain a periodic fit. Note that X must be transformed from linear to angular measure. So we can write 𝑋 = 360 × 𝑚𝑜𝑛𝑡ℎ 12 if we measure in degrees or 𝑋 = 2Π × 𝑚𝑜𝑛𝑡ℎ 12 radians
  • 28. Regression models 28 3 parameters fit 12 points rather well
  • 29. Transforming the outcome Being wise about Ys Regression models 29
  • 30. Regression models 30 An Example of a One-way Layout • Four experimental p38 kinase inhibitors • Vehicle and marketed product as controls • Thrombaxane B2 (TXB2) is used as a marker of COX-1 activity • Six rats per group were treated for a total of 36 rats • At the end of the study rats are sacrificed and TXB2 is measured.
  • 32. Regression models 32 GenStat® ANOVA (Original data) Analysis of variance Variate: TXB2 𝜎2 2 Source of variation d.f. s.s. m.s. v.r. F pr. Treatment 5 184596. 36919. 6.31 <.001 Residual 30 175439. 5848. Total 35 360035. 𝜎1 2 𝜎2 2 𝜎1 2 A2WAY [TREATMENTS=Treatment] TXB2
  • 33. Regression models 33 GenStat plot of residuals
  • 35. Regression models 35 GenStat ANOVA (log transformed) A2WAY [TREATMENTS=Treatment] logTXB2 Analysis of variance Variate: logTXB2 Source of variation d.f. s.s. m.s. v.r. Treatment 5 62.6760 12.5352 40.09 Residual 30 9.3800 0.3127 Total 35 72.0559 Signal to noise ratio is now much higher
  • 36. Regression models 36 GenStat plot of residuals
  • 37. Regression models 37 Homogeneity of Variances (Bartlett’ Test: GenStat) Untransformed *** Bartlett's Test for homogeneity of variances *** Chi-square 50.87 on 5 degrees of freedom: probability < 0.001 Log-transformed *** Bartlett's Test for homogeneity of variances *** Chi-square 8.95 on 5 degrees of freedom: probability 0.111
  • 38. Data-filtering examples or find the flaw • A 20 year follow-up study of women in an English village found higher survival amongst smokers than non-smokers • Transplant receivers on highest doses of cyclosporine had higher probability of graft rejection than on lower doses • Left-handers observed to die younger on average than right-handers • Obese infarct survivors have better prognosis than non- obese Regression models 38
  • 39. Advice Statistics is a way of improving your thinking, not a substitute for it Regression models 39
  • 40. Advice • Think before you model • Purpose is key – Causal – Predictive – Classification • Think about time • Think about case ascertainment • Testing is a small part of discerning • Don’t use stepwise regression as a substitute for understanding Regression models 40