SlideShare a Scribd company logo
1 of 41
Download to read offline
Measurement error and missing data
Killing two birds with one stone
Ruth Keogh
Department of Medical Statistics
London School of Hygiene & Tropical Medicine
On Behalf of TG4 Measurement Error and Misclassification
Measurement error and misclassification
Laurence Freedman
Victor Kipnis
Hendriek Bozhuizen
Raymond Carroll
Veronika Deffner
Kevin Dodd
Paul Gustafson
Ruth Keogh
Helmut Kuechenhoff
Pamela Shaw
Anne Thiebaut
Janet Tooze
Michael Wallace
Literature survey
We surveyed the literature in four
areas :
• Nutritional intake cohort studies
• Physical activity cohort studies
• Air pollution cohort studies
• Dietary intake distributions
What are people saying and doing about
measurement error?
Literature survey: N=81
What percentage of studies mentioned
measurement error as a potential problem?
What percentage of studies used methods to
mitigate the impact of measurement error?
What percentage of studies categorized their
main exposure?
Literature survey: N=81
What percentage of studies mentioned
measurement error as a potential problem?
What percentage of studies used methods to
mitigate the impact of measurement error?
80% (N=65)
What percentage of studies categorized their
main exposure?
Literature survey: N=81
What percentage of studies mentioned
measurement error as a potential problem?
What percentage of studies used methods to
mitigate the impact of measurement error?
80% (N=65)
6% (N=5)
What percentage of studies categorized their
main exposure?
Literature survey: N=81
What percentage of studies mentioned
measurement error as a potential problem?
What percentage of studies used methods to
mitigate the impact of measurement error?
80% (N=65)
6% (N=5)
What percentage of studies categorized their
main exposure?
88% (N=71)
Literature survey: observations
• Most of those who mentioned error as a problem made an
incomplete/incorrect claim
– Many stated that their estimates could only be attenuated by measurement error
– Some claimed no bias in associations but for spurious reasons
Literature survey: observations
• Most of those who mentioned error as a problem made an
incomplete/incorrect claim
– Many stated that their estimates could only be attenuated by measurement error
– Some claimed no bias in associations but for spurious reasons
• Most studies categorized the continuous exposures
– Common belief: categorization will reduce impact of measurement error
– Categorizing can actually make things worse
References
Epidemiologic analyses with error-prone exposures: review of current
practice and recommendations.
Shaw et al. Annals of Epidemiology 2018; 28: 821-828.
Measurement error is often neglected in medical literature: a systematic
review.
Brackenhoff et al. Journal of Clinical Epidemiology 2018; 98: 89-97.
Five myths about measurement error in epidemiologic research.
van Smeden, Lash, Groenwold. DOI 10.17605/OSF.IO/MSX8D.
https://osf.io/msx8d/
Forthcoming guidance papers
STRATOS guidance document on measurement error and
misclassification of variables in observational epidemiology
Part 1 – basic theory and simple methods of adjustment
Ruth H Keogh, Pamela A Shaw, Paul Gustafson, Raymond J Carroll, Veronika Deffner,
Kevin W Dodd, Helmut Küchenhoff, Janet A Tooze, Michael P Wallace, Victor Kipnis,
Laurence S Freedman
Part 2 –more complex methods of adjustment and advanced topics
Pamela A Shaw, Paul Gustafson, Raymond J Carroll, Veronika Deffner, Kevin W Dodd,
Ruth H Keogh, Victor Kipnis, Janet A Tooze, Michael P Wallace, Helmut Küchenhoff,
Laurence S Freedman
Measurement error and missing data
Killing two birds with one stone
Ruth Keogh
Department of Medical Statistics
London School of Hygiene & Tropical Medicine
On Behalf of TG4 Measurement Error and Misclassification
Jonathan Bartlett, University of Bath
Christen Gray, IQVIA
Notation and set-up
True
outcome
𝑌
True
exposure
𝑋
Confounder
𝑍
Notation and set-up
True
outcome
𝑌
True
exposure
𝑋
Observed
exposure
𝑋∗
Error 𝑋∗
= 𝑋 + 𝜖
Confounder
𝑍
Notation and set-up
True outcome model: Using 𝑋
𝑌 = 𝛽0 + 𝛽 𝑋 𝑋 + 𝛽 𝑍 𝑍 + 𝑒
True
outcome
𝑌
True
exposure
𝑋
Observed
exposure
𝑋∗
Error 𝑋∗
= 𝑋 + 𝜖
Confounder
𝑍
Notation and set-up
True outcome model: Using 𝑋
𝑌 = 𝛽0 + 𝛽 𝑋 𝑋 + 𝛽 𝑍 𝑍 + 𝑒
Naive outcome model: Using 𝑋∗
𝑌 = 𝛽0
∗
+ 𝛽 𝑋
∗
𝑋∗
+ 𝛽 𝑍
∗
𝑍 + 𝑒
True
outcome
𝑌
True
exposure
𝑋
Observed
exposure
𝑋∗
Error 𝑋∗
= 𝑋 + 𝜖
Confounder
𝑍
Ancillary studies
𝑋
Validation study
𝑋2
∗
Replicates study
𝑋∗ 𝑋1
∗
To do something about the impact of measurement error in our analysis,
we need to know the form and extent of the error
𝑋1
∗∗
, 𝑋2
∗∗
Calibration study
𝑋∗
Motivating example: NHANES data
• Association between systolic blood pressure (SBP) and deaths due to
cardiovascular disease (CVD)
• Adjusted for sex, age, smoking status, diabetes
• Analysis method: Cox regression
Motivating example: NHANES data
• Association between systolic blood pressure (SBP) and deaths due to
cardiovascular disease (CVD)
• Adjusted for sex, age, smoking status, diabetes
• Analysis method: Cox regression
Challenges
• SBP is error-prone
• Missing data in smoking status
Motivating example: NHANES data
• Association between systolic blood pressure (SBP) and deaths due to
cardiovascular disease (CVD)
• Adjusted for sex, age, smoking status, diabetes
• Analysis method: Cox regression
Challenges
• SBP is error-prone
• Missing data in smoking status
N=6519 Smoking
observed
N=2667
5%
Replicate SBP measurement
Regression calibration
Obtain an estimate of 𝐸(𝑋|𝑋∗, 𝑍) using the ancillary study and use in the
outcome regression model:
𝑌 = 𝛽0 + 𝛽 𝑋 𝐸 𝑋 𝑋∗
, 𝑍 +𝛽 𝑍 𝑍 + 𝑒
Regression calibration
Limitations
• Requires non-differential error assumption
• Requires an approximation for non-linear outcome models
• How do we accommodate missing data as well?
Obtain an estimate of 𝐸(𝑋|𝑋∗, 𝑍) using the ancillary study and use in the
outcome regression model:
𝑌 = 𝛽0 + 𝛽 𝑋 𝐸 𝑋 𝑋∗
, 𝑍 +𝛽 𝑍 𝑍 + 𝑒
Multiple imputation (MI)
• Very popular method for handling missing data
• Measurement error can be viewed as a missing data problem – the
‘truth’ is missing
Multiple imputation (MI)
• Very popular method for handling missing data
• Measurement error can be viewed as a missing data problem – the
‘truth’ is missing
𝑋
Validation study
𝑋2
∗
Replicates study
𝑋∗ 𝑋1
∗
…for some people
…for everyone!
𝑋1
∗∗
, 𝑋2
∗∗
Calibration study
𝑋∗
Multiple imputation (MI)
Cole SR, Chu H and Greenland S. Multiple-
imputation for measurement-error correction.
Int J Epidemiol 2006; 35: 1074–1081.
1. For individuals with 𝑋 missing, draw a value 𝑋
from 𝑋|𝑋∗, 𝑍, 𝑌
2. This gives a complete imputed data set
3. Fit the outcome model using the imputed data
4. Repeat for M imputed data sets
5. Pool the results using Rubin’s Rules
𝑋
Validation study
𝑋∗
Multiple imputation (MI)
𝑋
Validation study
𝑋∗
In the validation situation we benefit from
the huge missing data literature on MI.
Carpenter & Kenward. Multiple imputation
and its application. New York: Wiley. 2013
Sterne et al. Multiple imputation for missing
data in epidemiological and clinical
research: potential and pitfalls. BMJ 2009;
338: b2393
Multiple imputation (MI)
𝑋
Validation study
𝑋∗
Software
R: mice, smcfcs
Stata: mi impute, smcfcs
SAS: PROC MI
In the validation situation we benefit from
the huge missing data literature on MI.
Carpenter & Kenward. Multiple imputation
and its application. New York: Wiley. 2013
Sterne et al. Multiple imputation for missing
data in epidemiological and clinical
research: potential and pitfalls. BMJ 2009;
338: b2393
Freedman et al. A comparison of regression
calibration, moment reconstruction and imputation
for adjusting for covariate measurement error in
regression. Stat Med 2008; 27: 5195–5216.
Keogh & White. A toolkit for measurement error
correction, with a focus on nutritional epidemiology.
Stat Med 2014; 33: 2137-2155.
Multiple imputation (MI)
𝑋1
∗∗
, 𝑋2
∗∗
Calibration study
𝑋∗
𝑋2
∗
Replicates study
𝑋1
∗
• We need to e.g. assume a multivariate normal
distribution for 𝑋, 𝑋1
∗
, 𝑋2
∗
|𝑍
• This gives form of 𝑝(𝑋|𝑋1
∗
, 𝑋2
∗
, 𝑍, 𝑌)
𝑋2
∗
Replicates study
𝑋1
∗The difficult step of MI
1. For individuals with 𝑋 missing, draw a value 𝑋
from 𝑋|𝑋1
∗
, 𝑋2
∗
, 𝑍, 𝑌
Multiple imputation (MI)
• We need to assume a distribution for
𝑋, 𝑋1
∗
, 𝑋2
∗
|𝑍, e.g. multivariate normal
• This gives form of 𝑝 𝑋 𝑋1
∗
, 𝑋2
∗
, 𝑍, 𝑌
𝑋2
∗
Replicates study
𝑋1
∗
Multiple imputation (MI)
• This approach is not very flexible
• There is no software and it is not very easy to implement
The difficult step of MI
1. For individuals with 𝑋 missing, draw a value 𝑋
from 𝑋|𝑋1
∗
, 𝑋2
∗
, 𝑍, 𝑌
In general it is difficult to know what is the form of 𝑋|𝑋1
∗
, 𝑋2
∗
, 𝑍, 𝑌
- There are non-linear terms in the model
𝑌 = 𝛽0 + 𝛽 𝑋 𝑋 + 𝛽 𝑍 𝑍 + 𝛽 𝑋2 𝑋2
+ 𝑒
- The outcome model is not a linear regression
ℎ(𝑡|𝑋, 𝑍) = ℎ0 𝑡 𝑒 𝛽0+𝛽 𝑋 𝑋+𝛽 𝑍 𝑍
A more flexible MI approach
A more flexible MI approach
Meng. Multiple-imputation inferences with uncongenial sources of input. Statistical
Science 1994; 9: 538-558.
Bartlett et al. Multiple imputation of covariates by fully conditional specification:
Accommodating the substantive model. Stat Meth Med Res 2015; 24: 462-487.
In general it is difficult to know what is the form of 𝑋|𝑋1
∗
, 𝑋2
∗
, 𝑍, 𝑌
- There are non-linear terms in the model
𝑌 = 𝛽0 + 𝛽 𝑋 𝑋 + 𝛽 𝑍 𝑍 + 𝛽 𝑋2 𝑋2
+ 𝑒
- The outcome model is not a linear regression
ℎ(𝑡|𝑋, 𝑍) = ℎ0 𝑡 𝑒 𝛽0+𝛽 𝑋 𝑋+𝛽 𝑍 𝑍
Instead of trying to specify 𝑋|𝑋1
∗
, 𝑋2
∗
, 𝑍, 𝑌,
…we specify 𝑌|𝑋, 𝑍 and 𝑋|𝑍 and
the measurement error model
Basic idea
1. Propose a potential imputed value for 𝑋 from 𝑋|𝑋1
∗
, 𝑋2
∗
, 𝑍
2. Use a rejection sampling procedure to accept or reject the value as
being from the target distribution 𝑋|𝑋1
∗
, 𝑋2
∗
, 𝑍, 𝑌
3. The acceptance/rejection rule is a function of the outcome model
Substantive model compatible full conditional specification
(SMCFCS)
A more flexible MI approach
A more flexible MI approach
Application for measurement error correction
• Validation study: we can use it directly
• Replicates: we extended the method to the setting of replicates
Keogh & Bartlett. Measurement error as a missing data problem. Handbook of
Measurement Error and Variable Selection. 2019. Forthcoming.
Bartlett & Keogh. smcfcs: Multiple imputation of
covariates by substantive model compatible fully conditional specification. 2019.
https://github.com/ruthkeogh/meas_error_handbook
Motivating example: NHANES data
• Association between systolic blood pressure (SBP) and deaths due to
cardiovascular disease (CVD)
• Adjusted for sex, age, smoking status, diabetes
• Analysis method: Cox regression
Challenges
• SBP is error-prone
• Missing data in smoking status
N=6519 Smoking
observed
N=2667
5%
Replicate SBP measurement
Motivating example: NHANES data
First ignoring missing data…..
Covariate Naïve analysis Regression
calibration
Multiple imputation
SBP 0.085 (0.014, 0.157) 0.114 (0.011, 0.222) 0.120 (0.020, 0.219)
Male 0.49 (0.30, 0.67) 0.49 (0.32, 0.68) 0.49 (0.30, 0.67)
Age 0.88 (0.77, 0.99) 0.87 (0.76, 0.99) 0.88 (0.77, 0.99)
Smoker 0.26 (0.07, 0.46) 0.26 (0.07, 0.45) 0.26 (0.07, 0.46)
Diabetes 0.50 (0.29, 0.72) 0.50 (0.28, 0.72) 0.50 (0.29, 0.72)
Motivating example: NHANES data
Covariate Naïve analysis Regression calibration Multiple imputation Multiple imputation 2
SBP 0.085 (0.014, 0.157) 0.114 (0.011, 0.222) 0.120 (0.020, 0.219) 0.104 (0.035, 0.173)
Male 0.49 (0.30, 0.67) 0.49 (0.32, 0.68) 0.49 (0.30, 0.67) 0.46 (0.35, 0.56)
Age 0.88 (0.77, 0.99) 0.87 (0.76, 0.99) 0.88 (0.77, 0.99) 1.04 (0.97, 1.11)
Smoker 0.26 (0.07, 0.46) 0.26 (0.07, 0.45) 0.26 (0.07, 0.46) 0.26 (0.09, 0.43)
Diabetes 0.50 (0.29, 0.72) 0.50 (0.28, 0.72) 0.50 (0.29, 0.72) 0.69 (0.56, 0.83)
Accounting for missing data as well…
N=2667 N=6519
Summary
• We commonly face more than one ‘data quality’ challenge at the same
time
• Multiple imputation (and fully Bayesian approaches) enable us to
‘easily’ tackle measurement error and missing data together
• The smcfcs package in R facilitates this
Summary
• We commonly face more than one ‘data quality’ challenge at the same
time
• Multiple imputation (and fully Bayesian approaches) enable us to
‘easily’ tackle measurement error and missing data together
• The smcfcs package in R facilitates this
Bartlett & Keogh. Bayesian correction for covariate measurement error: A
frequentist evaluation and comparison with regression calibration.
Stat Meth Med Res 2016; 27: 1695-1708.
Measurement error and misclassification
Laurence Freedman
Victor Kipnis
Hendriek Bozhuizen
Raymond Carroll
Veronika Deffner
Kevin Dodd
Paul Gustafson
Ruth Keogh
Helmut Kuechenhoff
Pamela Shaw
Anne Thiebaut
Janet Tooze
Michael Wallace
ISCB 2020 Short Course
Measurement error:
Understanding impact on statistical analyses and
how to address it
Pam Shaw & Ruth Keogh

More Related Content

What's hot

Vergoulas Choosing the appropriate statistical test (2019 Hippokratia journal)
Vergoulas Choosing the appropriate statistical test (2019 Hippokratia journal)Vergoulas Choosing the appropriate statistical test (2019 Hippokratia journal)
Vergoulas Choosing the appropriate statistical test (2019 Hippokratia journal)Vaggelis Vergoulas
 
Alternatives to t test
Alternatives to t testAlternatives to t test
Alternatives to t testLONDIWE SHANGE
 
Meta analysis
Meta analysisMeta analysis
Meta analysisSethu S
 
Statistical methods for the life sciences lb
Statistical methods for the life sciences lbStatistical methods for the life sciences lb
Statistical methods for the life sciences lbpriyaupm
 
2010 smg training_cardiff_day2_session3_dwan_altman
2010 smg training_cardiff_day2_session3_dwan_altman2010 smg training_cardiff_day2_session3_dwan_altman
2010 smg training_cardiff_day2_session3_dwan_altmanrgveroniki
 
Sample size calculation - Animal experimentation
Sample size calculation - Animal experimentationSample size calculation - Animal experimentation
Sample size calculation - Animal experimentationDr Anudeep Reddy
 
Lesson 6 Nonparametric Test 2009 Ta
Lesson 6 Nonparametric Test 2009 TaLesson 6 Nonparametric Test 2009 Ta
Lesson 6 Nonparametric Test 2009 TaSumit Prajapati
 
Comparing data accuracy between structured abstracts and full-text journal ar...
Comparing data accuracy between structured abstracts and full-text journal ar...Comparing data accuracy between structured abstracts and full-text journal ar...
Comparing data accuracy between structured abstracts and full-text journal ar...University of the Philippines Manila
 
Combination of informative biomarkers in small pilot studies and estimation ...
Combination of informative  biomarkers in small pilot studies and estimation ...Combination of informative  biomarkers in small pilot studies and estimation ...
Combination of informative biomarkers in small pilot studies and estimation ...LEGATO project
 
day1(2010 smg training_cardiff)_session2b (1of 2) lewis
day1(2010 smg training_cardiff)_session2b (1of 2) lewisday1(2010 smg training_cardiff)_session2b (1of 2) lewis
day1(2010 smg training_cardiff)_session2b (1of 2) lewisrgveroniki
 
2010 smg training_cardiff_day2_session4_sterne
2010 smg training_cardiff_day2_session4_sterne2010 smg training_cardiff_day2_session4_sterne
2010 smg training_cardiff_day2_session4_sternergveroniki
 
P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...David Pratap
 
Regression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsRegression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsMaarten van Smeden
 
Statistics tests and Probablity
Statistics tests and ProbablityStatistics tests and Probablity
Statistics tests and ProbablityAbdul Wasay Baloch
 
Sample Size Estimation and Statistical Test Selection
Sample Size Estimation  and Statistical Test SelectionSample Size Estimation  and Statistical Test Selection
Sample Size Estimation and Statistical Test SelectionVaggelis Vergoulas
 

What's hot (20)

Vergoulas Choosing the appropriate statistical test (2019 Hippokratia journal)
Vergoulas Choosing the appropriate statistical test (2019 Hippokratia journal)Vergoulas Choosing the appropriate statistical test (2019 Hippokratia journal)
Vergoulas Choosing the appropriate statistical test (2019 Hippokratia journal)
 
Alternatives to t test
Alternatives to t testAlternatives to t test
Alternatives to t test
 
Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)Overview of statistical tests: Data handling and data quality (Part II)
Overview of statistical tests: Data handling and data quality (Part II)
 
Meta analysis
Meta analysisMeta analysis
Meta analysis
 
Statistical methods for the life sciences lb
Statistical methods for the life sciences lbStatistical methods for the life sciences lb
Statistical methods for the life sciences lb
 
2010 smg training_cardiff_day2_session3_dwan_altman
2010 smg training_cardiff_day2_session3_dwan_altman2010 smg training_cardiff_day2_session3_dwan_altman
2010 smg training_cardiff_day2_session3_dwan_altman
 
Sample size calculation - Animal experimentation
Sample size calculation - Animal experimentationSample size calculation - Animal experimentation
Sample size calculation - Animal experimentation
 
Lesson 6 Nonparametric Test 2009 Ta
Lesson 6 Nonparametric Test 2009 TaLesson 6 Nonparametric Test 2009 Ta
Lesson 6 Nonparametric Test 2009 Ta
 
Comparing data accuracy between structured abstracts and full-text journal ar...
Comparing data accuracy between structured abstracts and full-text journal ar...Comparing data accuracy between structured abstracts and full-text journal ar...
Comparing data accuracy between structured abstracts and full-text journal ar...
 
Meta analysis
Meta analysisMeta analysis
Meta analysis
 
Overview of statistics: Statistical testing (Part I)
Overview of statistics: Statistical testing (Part I)Overview of statistics: Statistical testing (Part I)
Overview of statistics: Statistical testing (Part I)
 
Non Parametric Tests
Non Parametric TestsNon Parametric Tests
Non Parametric Tests
 
Combination of informative biomarkers in small pilot studies and estimation ...
Combination of informative  biomarkers in small pilot studies and estimation ...Combination of informative  biomarkers in small pilot studies and estimation ...
Combination of informative biomarkers in small pilot studies and estimation ...
 
Categorical models
Categorical modelsCategorical models
Categorical models
 
day1(2010 smg training_cardiff)_session2b (1of 2) lewis
day1(2010 smg training_cardiff)_session2b (1of 2) lewisday1(2010 smg training_cardiff)_session2b (1of 2) lewis
day1(2010 smg training_cardiff)_session2b (1of 2) lewis
 
2010 smg training_cardiff_day2_session4_sterne
2010 smg training_cardiff_day2_session4_sterne2010 smg training_cardiff_day2_session4_sterne
2010 smg training_cardiff_day2_session4_sterne
 
P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...
 
Regression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questionsRegression shrinkage: better answers to causal questions
Regression shrinkage: better answers to causal questions
 
Statistics tests and Probablity
Statistics tests and ProbablityStatistics tests and Probablity
Statistics tests and Probablity
 
Sample Size Estimation and Statistical Test Selection
Sample Size Estimation  and Statistical Test SelectionSample Size Estimation  and Statistical Test Selection
Sample Size Estimation and Statistical Test Selection
 

Similar to STRATOS ISCB 2019: Ruth Keogh

Common statistical errors in medical publications
Common statistical errors in medical publicationsCommon statistical errors in medical publications
Common statistical errors in medical publicationsARDC
 
Critical Appraisal - Quantitative SS.pptx
Critical Appraisal - Quantitative SS.pptxCritical Appraisal - Quantitative SS.pptx
Critical Appraisal - Quantitative SS.pptxMrs S Sen
 
Statistics pres 3.31.2014
Statistics pres 3.31.2014Statistics pres 3.31.2014
Statistics pres 3.31.2014tjcarter
 
COVID and Causal Inference -- NAS CATS 6/2020
COVID and Causal Inference -- NAS CATS 6/2020COVID and Causal Inference -- NAS CATS 6/2020
COVID and Causal Inference -- NAS CATS 6/2020Ellie Murray
 
Sample size and power calculations
Sample size and power calculationsSample size and power calculations
Sample size and power calculationsRamachandra Barik
 
Undergraduate Research work
Undergraduate Research workUndergraduate Research work
Undergraduate Research workPeter M Addo
 
MH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -cleanMH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -cleanMin-hyung Kim
 
Statistics pres 10 27 2015 roy sabo
Statistics pres 10 27 2015   roy saboStatistics pres 10 27 2015   roy sabo
Statistics pres 10 27 2015 roy sabotjcarter
 
Advice On Statistical Analysis For Circulation Research
Advice On Statistical Analysis For Circulation ResearchAdvice On Statistical Analysis For Circulation Research
Advice On Statistical Analysis For Circulation ResearchNancy Ideker
 
Test of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxTest of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxHome
 
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGY
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGYBIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGY
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGYGauravBoruah
 
Nursing Research Ratio1.pptx
Nursing Research Ratio1.pptxNursing Research Ratio1.pptx
Nursing Research Ratio1.pptxYaseenNor
 
Cadth 2015 c2 tt eincea_cadth_042015
Cadth 2015 c2 tt eincea_cadth_042015Cadth 2015 c2 tt eincea_cadth_042015
Cadth 2015 c2 tt eincea_cadth_042015CADTH Symposium
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Evangelos Kritsotakis
 
Choosing statistical tests
Choosing statistical testsChoosing statistical tests
Choosing statistical testsAkiode Noah
 
PUH 5302, Applied Biostatistics 1 Course Learning Outcomes.docx
PUH 5302, Applied Biostatistics 1 Course Learning Outcomes.docxPUH 5302, Applied Biostatistics 1 Course Learning Outcomes.docx
PUH 5302, Applied Biostatistics 1 Course Learning Outcomes.docxdenneymargareta
 

Similar to STRATOS ISCB 2019: Ruth Keogh (20)

Common statistical errors in medical publications
Common statistical errors in medical publicationsCommon statistical errors in medical publications
Common statistical errors in medical publications
 
Critical Appraisal - Quantitative SS.pptx
Critical Appraisal - Quantitative SS.pptxCritical Appraisal - Quantitative SS.pptx
Critical Appraisal - Quantitative SS.pptx
 
Metaanalysis copy
Metaanalysis    copyMetaanalysis    copy
Metaanalysis copy
 
Copenhagen 23.10.2008
Copenhagen 23.10.2008Copenhagen 23.10.2008
Copenhagen 23.10.2008
 
Statistics pres 3.31.2014
Statistics pres 3.31.2014Statistics pres 3.31.2014
Statistics pres 3.31.2014
 
COVID and Causal Inference -- NAS CATS 6/2020
COVID and Causal Inference -- NAS CATS 6/2020COVID and Causal Inference -- NAS CATS 6/2020
COVID and Causal Inference -- NAS CATS 6/2020
 
Hypo
HypoHypo
Hypo
 
Sample size and power calculations
Sample size and power calculationsSample size and power calculations
Sample size and power calculations
 
Undergraduate Research work
Undergraduate Research workUndergraduate Research work
Undergraduate Research work
 
1.1 statistical and critical thinking
1.1 statistical and critical thinking1.1 statistical and critical thinking
1.1 statistical and critical thinking
 
MH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -cleanMH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -clean
 
Statistics pres 10 27 2015 roy sabo
Statistics pres 10 27 2015   roy saboStatistics pres 10 27 2015   roy sabo
Statistics pres 10 27 2015 roy sabo
 
Advice On Statistical Analysis For Circulation Research
Advice On Statistical Analysis For Circulation ResearchAdvice On Statistical Analysis For Circulation Research
Advice On Statistical Analysis For Circulation Research
 
Test of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptxTest of Significance of Large Samples for Mean = µ.pptx
Test of Significance of Large Samples for Mean = µ.pptx
 
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGY
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGYBIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGY
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGY
 
Nursing Research Ratio1.pptx
Nursing Research Ratio1.pptxNursing Research Ratio1.pptx
Nursing Research Ratio1.pptx
 
Cadth 2015 c2 tt eincea_cadth_042015
Cadth 2015 c2 tt eincea_cadth_042015Cadth 2015 c2 tt eincea_cadth_042015
Cadth 2015 c2 tt eincea_cadth_042015
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...
 
Choosing statistical tests
Choosing statistical testsChoosing statistical tests
Choosing statistical tests
 
PUH 5302, Applied Biostatistics 1 Course Learning Outcomes.docx
PUH 5302, Applied Biostatistics 1 Course Learning Outcomes.docxPUH 5302, Applied Biostatistics 1 Course Learning Outcomes.docx
PUH 5302, Applied Biostatistics 1 Course Learning Outcomes.docx
 

Recently uploaded

Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫qfactory1
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 

STRATOS ISCB 2019: Ruth Keogh

  • 1. Measurement error and missing data Killing two birds with one stone Ruth Keogh Department of Medical Statistics London School of Hygiene & Tropical Medicine On Behalf of TG4 Measurement Error and Misclassification
  • 2. Measurement error and misclassification Laurence Freedman Victor Kipnis Hendriek Bozhuizen Raymond Carroll Veronika Deffner Kevin Dodd Paul Gustafson Ruth Keogh Helmut Kuechenhoff Pamela Shaw Anne Thiebaut Janet Tooze Michael Wallace
  • 3. Literature survey We surveyed the literature in four areas : • Nutritional intake cohort studies • Physical activity cohort studies • Air pollution cohort studies • Dietary intake distributions What are people saying and doing about measurement error?
  • 4. Literature survey: N=81 What percentage of studies mentioned measurement error as a potential problem? What percentage of studies used methods to mitigate the impact of measurement error? What percentage of studies categorized their main exposure?
  • 5. Literature survey: N=81 What percentage of studies mentioned measurement error as a potential problem? What percentage of studies used methods to mitigate the impact of measurement error? 80% (N=65) What percentage of studies categorized their main exposure?
  • 6. Literature survey: N=81 What percentage of studies mentioned measurement error as a potential problem? What percentage of studies used methods to mitigate the impact of measurement error? 80% (N=65) 6% (N=5) What percentage of studies categorized their main exposure?
  • 7. Literature survey: N=81 What percentage of studies mentioned measurement error as a potential problem? What percentage of studies used methods to mitigate the impact of measurement error? 80% (N=65) 6% (N=5) What percentage of studies categorized their main exposure? 88% (N=71)
  • 8. Literature survey: observations • Most of those who mentioned error as a problem made an incomplete/incorrect claim – Many stated that their estimates could only be attenuated by measurement error – Some claimed no bias in associations but for spurious reasons
  • 9. Literature survey: observations • Most of those who mentioned error as a problem made an incomplete/incorrect claim – Many stated that their estimates could only be attenuated by measurement error – Some claimed no bias in associations but for spurious reasons • Most studies categorized the continuous exposures – Common belief: categorization will reduce impact of measurement error – Categorizing can actually make things worse
  • 10. References Epidemiologic analyses with error-prone exposures: review of current practice and recommendations. Shaw et al. Annals of Epidemiology 2018; 28: 821-828. Measurement error is often neglected in medical literature: a systematic review. Brackenhoff et al. Journal of Clinical Epidemiology 2018; 98: 89-97. Five myths about measurement error in epidemiologic research. van Smeden, Lash, Groenwold. DOI 10.17605/OSF.IO/MSX8D. https://osf.io/msx8d/
  • 11. Forthcoming guidance papers STRATOS guidance document on measurement error and misclassification of variables in observational epidemiology Part 1 – basic theory and simple methods of adjustment Ruth H Keogh, Pamela A Shaw, Paul Gustafson, Raymond J Carroll, Veronika Deffner, Kevin W Dodd, Helmut Küchenhoff, Janet A Tooze, Michael P Wallace, Victor Kipnis, Laurence S Freedman Part 2 –more complex methods of adjustment and advanced topics Pamela A Shaw, Paul Gustafson, Raymond J Carroll, Veronika Deffner, Kevin W Dodd, Ruth H Keogh, Victor Kipnis, Janet A Tooze, Michael P Wallace, Helmut Küchenhoff, Laurence S Freedman
  • 12. Measurement error and missing data Killing two birds with one stone Ruth Keogh Department of Medical Statistics London School of Hygiene & Tropical Medicine On Behalf of TG4 Measurement Error and Misclassification
  • 13. Jonathan Bartlett, University of Bath Christen Gray, IQVIA
  • 16. Notation and set-up True outcome model: Using 𝑋 𝑌 = 𝛽0 + 𝛽 𝑋 𝑋 + 𝛽 𝑍 𝑍 + 𝑒 True outcome 𝑌 True exposure 𝑋 Observed exposure 𝑋∗ Error 𝑋∗ = 𝑋 + 𝜖 Confounder 𝑍
  • 17. Notation and set-up True outcome model: Using 𝑋 𝑌 = 𝛽0 + 𝛽 𝑋 𝑋 + 𝛽 𝑍 𝑍 + 𝑒 Naive outcome model: Using 𝑋∗ 𝑌 = 𝛽0 ∗ + 𝛽 𝑋 ∗ 𝑋∗ + 𝛽 𝑍 ∗ 𝑍 + 𝑒 True outcome 𝑌 True exposure 𝑋 Observed exposure 𝑋∗ Error 𝑋∗ = 𝑋 + 𝜖 Confounder 𝑍
  • 18. Ancillary studies 𝑋 Validation study 𝑋2 ∗ Replicates study 𝑋∗ 𝑋1 ∗ To do something about the impact of measurement error in our analysis, we need to know the form and extent of the error 𝑋1 ∗∗ , 𝑋2 ∗∗ Calibration study 𝑋∗
  • 19. Motivating example: NHANES data • Association between systolic blood pressure (SBP) and deaths due to cardiovascular disease (CVD) • Adjusted for sex, age, smoking status, diabetes • Analysis method: Cox regression
  • 20. Motivating example: NHANES data • Association between systolic blood pressure (SBP) and deaths due to cardiovascular disease (CVD) • Adjusted for sex, age, smoking status, diabetes • Analysis method: Cox regression Challenges • SBP is error-prone • Missing data in smoking status
  • 21. Motivating example: NHANES data • Association between systolic blood pressure (SBP) and deaths due to cardiovascular disease (CVD) • Adjusted for sex, age, smoking status, diabetes • Analysis method: Cox regression Challenges • SBP is error-prone • Missing data in smoking status N=6519 Smoking observed N=2667 5% Replicate SBP measurement
  • 22. Regression calibration Obtain an estimate of 𝐸(𝑋|𝑋∗, 𝑍) using the ancillary study and use in the outcome regression model: 𝑌 = 𝛽0 + 𝛽 𝑋 𝐸 𝑋 𝑋∗ , 𝑍 +𝛽 𝑍 𝑍 + 𝑒
  • 23. Regression calibration Limitations • Requires non-differential error assumption • Requires an approximation for non-linear outcome models • How do we accommodate missing data as well? Obtain an estimate of 𝐸(𝑋|𝑋∗, 𝑍) using the ancillary study and use in the outcome regression model: 𝑌 = 𝛽0 + 𝛽 𝑋 𝐸 𝑋 𝑋∗ , 𝑍 +𝛽 𝑍 𝑍 + 𝑒
  • 24. Multiple imputation (MI) • Very popular method for handling missing data • Measurement error can be viewed as a missing data problem – the ‘truth’ is missing
  • 25. Multiple imputation (MI) • Very popular method for handling missing data • Measurement error can be viewed as a missing data problem – the ‘truth’ is missing 𝑋 Validation study 𝑋2 ∗ Replicates study 𝑋∗ 𝑋1 ∗ …for some people …for everyone! 𝑋1 ∗∗ , 𝑋2 ∗∗ Calibration study 𝑋∗
  • 26. Multiple imputation (MI) Cole SR, Chu H and Greenland S. Multiple- imputation for measurement-error correction. Int J Epidemiol 2006; 35: 1074–1081. 1. For individuals with 𝑋 missing, draw a value 𝑋 from 𝑋|𝑋∗, 𝑍, 𝑌 2. This gives a complete imputed data set 3. Fit the outcome model using the imputed data 4. Repeat for M imputed data sets 5. Pool the results using Rubin’s Rules 𝑋 Validation study 𝑋∗
  • 27. Multiple imputation (MI) 𝑋 Validation study 𝑋∗ In the validation situation we benefit from the huge missing data literature on MI. Carpenter & Kenward. Multiple imputation and its application. New York: Wiley. 2013 Sterne et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 2009; 338: b2393
  • 28. Multiple imputation (MI) 𝑋 Validation study 𝑋∗ Software R: mice, smcfcs Stata: mi impute, smcfcs SAS: PROC MI In the validation situation we benefit from the huge missing data literature on MI. Carpenter & Kenward. Multiple imputation and its application. New York: Wiley. 2013 Sterne et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 2009; 338: b2393
  • 29. Freedman et al. A comparison of regression calibration, moment reconstruction and imputation for adjusting for covariate measurement error in regression. Stat Med 2008; 27: 5195–5216. Keogh & White. A toolkit for measurement error correction, with a focus on nutritional epidemiology. Stat Med 2014; 33: 2137-2155. Multiple imputation (MI) 𝑋1 ∗∗ , 𝑋2 ∗∗ Calibration study 𝑋∗ 𝑋2 ∗ Replicates study 𝑋1 ∗
  • 30. • We need to e.g. assume a multivariate normal distribution for 𝑋, 𝑋1 ∗ , 𝑋2 ∗ |𝑍 • This gives form of 𝑝(𝑋|𝑋1 ∗ , 𝑋2 ∗ , 𝑍, 𝑌) 𝑋2 ∗ Replicates study 𝑋1 ∗The difficult step of MI 1. For individuals with 𝑋 missing, draw a value 𝑋 from 𝑋|𝑋1 ∗ , 𝑋2 ∗ , 𝑍, 𝑌 Multiple imputation (MI)
  • 31. • We need to assume a distribution for 𝑋, 𝑋1 ∗ , 𝑋2 ∗ |𝑍, e.g. multivariate normal • This gives form of 𝑝 𝑋 𝑋1 ∗ , 𝑋2 ∗ , 𝑍, 𝑌 𝑋2 ∗ Replicates study 𝑋1 ∗ Multiple imputation (MI) • This approach is not very flexible • There is no software and it is not very easy to implement The difficult step of MI 1. For individuals with 𝑋 missing, draw a value 𝑋 from 𝑋|𝑋1 ∗ , 𝑋2 ∗ , 𝑍, 𝑌
  • 32. In general it is difficult to know what is the form of 𝑋|𝑋1 ∗ , 𝑋2 ∗ , 𝑍, 𝑌 - There are non-linear terms in the model 𝑌 = 𝛽0 + 𝛽 𝑋 𝑋 + 𝛽 𝑍 𝑍 + 𝛽 𝑋2 𝑋2 + 𝑒 - The outcome model is not a linear regression ℎ(𝑡|𝑋, 𝑍) = ℎ0 𝑡 𝑒 𝛽0+𝛽 𝑋 𝑋+𝛽 𝑍 𝑍 A more flexible MI approach
  • 33. A more flexible MI approach Meng. Multiple-imputation inferences with uncongenial sources of input. Statistical Science 1994; 9: 538-558. Bartlett et al. Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Stat Meth Med Res 2015; 24: 462-487. In general it is difficult to know what is the form of 𝑋|𝑋1 ∗ , 𝑋2 ∗ , 𝑍, 𝑌 - There are non-linear terms in the model 𝑌 = 𝛽0 + 𝛽 𝑋 𝑋 + 𝛽 𝑍 𝑍 + 𝛽 𝑋2 𝑋2 + 𝑒 - The outcome model is not a linear regression ℎ(𝑡|𝑋, 𝑍) = ℎ0 𝑡 𝑒 𝛽0+𝛽 𝑋 𝑋+𝛽 𝑍 𝑍
  • 34. Instead of trying to specify 𝑋|𝑋1 ∗ , 𝑋2 ∗ , 𝑍, 𝑌, …we specify 𝑌|𝑋, 𝑍 and 𝑋|𝑍 and the measurement error model Basic idea 1. Propose a potential imputed value for 𝑋 from 𝑋|𝑋1 ∗ , 𝑋2 ∗ , 𝑍 2. Use a rejection sampling procedure to accept or reject the value as being from the target distribution 𝑋|𝑋1 ∗ , 𝑋2 ∗ , 𝑍, 𝑌 3. The acceptance/rejection rule is a function of the outcome model Substantive model compatible full conditional specification (SMCFCS) A more flexible MI approach
  • 35. A more flexible MI approach Application for measurement error correction • Validation study: we can use it directly • Replicates: we extended the method to the setting of replicates Keogh & Bartlett. Measurement error as a missing data problem. Handbook of Measurement Error and Variable Selection. 2019. Forthcoming. Bartlett & Keogh. smcfcs: Multiple imputation of covariates by substantive model compatible fully conditional specification. 2019. https://github.com/ruthkeogh/meas_error_handbook
  • 36. Motivating example: NHANES data • Association between systolic blood pressure (SBP) and deaths due to cardiovascular disease (CVD) • Adjusted for sex, age, smoking status, diabetes • Analysis method: Cox regression Challenges • SBP is error-prone • Missing data in smoking status N=6519 Smoking observed N=2667 5% Replicate SBP measurement
  • 37. Motivating example: NHANES data First ignoring missing data….. Covariate Naïve analysis Regression calibration Multiple imputation SBP 0.085 (0.014, 0.157) 0.114 (0.011, 0.222) 0.120 (0.020, 0.219) Male 0.49 (0.30, 0.67) 0.49 (0.32, 0.68) 0.49 (0.30, 0.67) Age 0.88 (0.77, 0.99) 0.87 (0.76, 0.99) 0.88 (0.77, 0.99) Smoker 0.26 (0.07, 0.46) 0.26 (0.07, 0.45) 0.26 (0.07, 0.46) Diabetes 0.50 (0.29, 0.72) 0.50 (0.28, 0.72) 0.50 (0.29, 0.72)
  • 38. Motivating example: NHANES data Covariate Naïve analysis Regression calibration Multiple imputation Multiple imputation 2 SBP 0.085 (0.014, 0.157) 0.114 (0.011, 0.222) 0.120 (0.020, 0.219) 0.104 (0.035, 0.173) Male 0.49 (0.30, 0.67) 0.49 (0.32, 0.68) 0.49 (0.30, 0.67) 0.46 (0.35, 0.56) Age 0.88 (0.77, 0.99) 0.87 (0.76, 0.99) 0.88 (0.77, 0.99) 1.04 (0.97, 1.11) Smoker 0.26 (0.07, 0.46) 0.26 (0.07, 0.45) 0.26 (0.07, 0.46) 0.26 (0.09, 0.43) Diabetes 0.50 (0.29, 0.72) 0.50 (0.28, 0.72) 0.50 (0.29, 0.72) 0.69 (0.56, 0.83) Accounting for missing data as well… N=2667 N=6519
  • 39. Summary • We commonly face more than one ‘data quality’ challenge at the same time • Multiple imputation (and fully Bayesian approaches) enable us to ‘easily’ tackle measurement error and missing data together • The smcfcs package in R facilitates this
  • 40. Summary • We commonly face more than one ‘data quality’ challenge at the same time • Multiple imputation (and fully Bayesian approaches) enable us to ‘easily’ tackle measurement error and missing data together • The smcfcs package in R facilitates this Bartlett & Keogh. Bayesian correction for covariate measurement error: A frequentist evaluation and comparison with regression calibration. Stat Meth Med Res 2016; 27: 1695-1708.
  • 41. Measurement error and misclassification Laurence Freedman Victor Kipnis Hendriek Bozhuizen Raymond Carroll Veronika Deffner Kevin Dodd Paul Gustafson Ruth Keogh Helmut Kuechenhoff Pamela Shaw Anne Thiebaut Janet Tooze Michael Wallace ISCB 2020 Short Course Measurement error: Understanding impact on statistical analyses and how to address it Pam Shaw & Ruth Keogh