Correcting for missing data, measurement error and confounding

Correcting for missing data, measurement
error and confounding
Maarten van Smeden, PhD
University Medical Center Utrecht
Julius Center for Health Sciences and Primary Care
The Netherlands
Twitter: @MvanSmeden
Email: M.vanSmeden@umcutrecht.nl
30 November 2020
Methods meeting
Epidemiology methods group, UMC Utrecht
I have no conflicts of interest to declare

Twitter: @MaartenvSmedenUtrecht, November 30 2020

Rationale
• Confounding -> correlation is not causation
• Measurement error & missing data -> correlation is not always
correlation
• In causal epidemiologic research we often see all three
…. but we rarely try to ”correct” for all three

There is no shortage of methods
Confounding Missing data Measurement error
Multivariable adjustments Multiple imputation Regression calibration
Weighting Weighting Weighting
Matching Full information maximum
likelihood
Multiple imputation for ME
Instrumental variable analysis Last observation carried forward SIMEX
RANDOMIZATION (!) Missing indicator methods Full information maximum
likelihood
“Bayesian approaches” “Bayesian approaches” “Bayesian approaches”
A non-exhaustive list of statistical correction strategies

Outline
• Confounding (1 slide)
• Missing data (2 slides)
• Measurement error (many slides)
• How to solve” all three? (couple of more slides)
• What about prediction (if I have time left)

Confounding
• A: treatment (Tx, 1 for treated; 0 for not treated)
• Y: outcome (1 for death; 0 for survival)
• Potential outcomes
Ya=1: outcome under Tx; Ya=0: outcome under no Tx
usually observe either Ya=0 or Ya=1 for an individual
• Randomized trials: Ya ⊥ A (unconditional exchangeability)
• Observational studies aim: Ya ⊥ A | L (conditional exchangeability)
L: confounding variables -> no unmeasured confounding
• (Additional causal assumptions: positivity, consistency,
SUTVA,…)
More info: causal inference: what if? Hernan & Robins

Missing data
• Missing values are observations/records which were:
– never collected (either by design or not)
– lost accidentally
– wrongly collected and so deleted (measurement error?)
• Usually distinguish between three types of missing data
– MCAR: the probability that data are missing does NOT
depend on the values of observed or missing data
– MAR: the probability that data are missing depends on the
values of the observed data, but does NOT depend on the
values of the missing data
– MNAR: the probability that data are missing depends on the
values of the missing data

Beware of mindless imputation
Source: Hughes et al. IJE, 2019, doi:10.1093/ije/dyz032

Personal observations (I may be biased)
Causal inference epidemiology
• Confounding on center stage in analyses and discussion
• Missing data often cannot be ignored (especially for higher %):
performing multiple imputation becoming mainstream?
• Measurement error the elephant in the room: belongs to the
discussion section (not methods), lots of misconceptions!
• (Note: not independent, e.g. measurement error can result in
problems with confounding)

Measurement error
“Errors in reading, calculating or recording a
numerical value. The difference between
observed values of a variable recorded
under similar conditions and some fixed true
value.“
The Cambridge Dictionary of Statistics (4th ed), ISBN: 9780521766999

Twitter: @MaartenvSmedenUtrecht, November 30 2020 img: https://bit.ly/2T9UnRt

Measurement of systolic blood pressure
Measurement error due to:
• White coat effect1
• Non-adherence to measurement protocol2
• Fallibility of measurement instruments3
• ….
Measurement error varies:
• Number of BP measurements taken4
• Gender4
• Circadian rhythm
• ….
doi: 110.1370/afm.1211; 210.3399/096016407782604965; 310.2147/MDER.S141599; 410.3109/08037051.2014.986952

Example circadian rythm
doi: 10.1111/j.1552-6909.2000.tb02771.x

Imprecision of medical measurements
doi: 10.1136/bmj.m149

Measurement error: a long list
• Blood pressure
• Dietary intake
• Smoking status
• Air pollution
• BMI
• Physical activity
• Vaccination status
• Social class
• Carotid intima media thickness
• Thyroid hormone levels
• Glucose levels
• Cholesterol levels
• Income
• Family history
• Mental health history
• Education level
• “Intelligence”
• Respiratory rates
• Medication use
• Sedentary hours
• Vitamin use
• Immigration status
• Age at first intercourse
• Age at menopause
• ICD coding
• Symptoms
• Date of symptom onset
• Medication use
• Visceral adipose tissue
• Angina class
• Heart rate
• Grip and pinch strength
• Cough frequency
• Infant height
• Gestational age
• Disease specific mortality
• ….

Measurement error mentioned
Journals of epidemiology
Jurek et al. 20061 61% (N = 35)
Brakenhoff et al. 20182 56% (N = 198)
Shaw et al. 20193 80% (N = 65)
doi: 110.1007/s10654-006-9083-0; 210.1016/j.jclinepi.2018.02.02; 310.1016/j.annepidem.2018.09.001

Measurement error mentioned
Journals of general medicine
Brakenhoff et al. 20182: 25% (N = 57)
doi: 210.1016/j.jclinepi.2018.02.02

Measurement error “corrections” applied
Journals of epidemiology
Jurek et al. 20061: 2% (N = 1)
Shaw et al. 20193: 6% (N = 5)
doi: 110.1007/s10654-006-9083-0; 210.1016/j.jclinepi.2018.02.02; 310.1016/j.annepidem.2018.09.001

Measurement error “corrections” applied
Journals of general medicine
doi: 210.1016/j.jclinepi.2018.02.02

• Myth 1: measurement error can be compensated for by large
numbers of observations
• Myth 2: the exposure effect is underestimated when variables
are measured with error
• Myth 3: exposure measurement error is nondifferential if
measurements are taken without knowledge of the outcome
• Myth 4: measurement error can be prevented but not mitigated
in epidemiological data analyses
• Myth 5: certain types of epidemiological research are
unaffected by measurement error

Types of measurement error
Measurement are
consistently wrong in a
particular direction
Classical (Random)
measurement error
Differential
measurement error
Systematic
measurement error
Measurements fluctuate
around their true value
Measurements are
consistently wrong in a
particular direction,
varying per group
Courtesy: Linda Nab

Classical measurement error

Tripple whammy of measurement error
• Bias
• Increased imprecision
• Masked functional relations

• Bias
Always weaker effects?

Example: classical measurement error
doi: 10.1371/journal.pone.0192298

Second Manifestations of ARTerial disease (SMART) cohort
Effect of
interest
Confounder
with error
Outcome

% bias in hazard ratio for SBP (multivariable Cox regression model)

Randomized trials unaffected?
excerpt from: 10.1186/s13063-018-2954-3

Randomized controlled trials
doi: 10.1002/sim.8359

Classical (Random)
measurement error
Systematic
measurement error
Differential
measurement error
doi: 10.1002/sim.8359

Classical (Random)
measurement error
Systematic
measurement error
Differential
measurement error
• Unbiased Tx effect estimator
• Increased Type-II error
• Nominal Type-I error
• Possibly biased Tx effect estimator
• Type-II error affected
• Type-I generally nominal
• Possibly biased Tx effect estimator
• Type-II error affected
• Type-I not nominal
doi: 10.1002/sim.8359

• Bias
Usually the target for measurement error “corrections”

Measurement error corrections
Replicates study
Study sample
𝑌∗
Standard
measurements
replicated

External validation set
Study sample
𝑌∗
External validation set
Standard
measurements
Standard
measurements
+
Validated
measurements

Internal validation set
Study sample
𝑌∗Internal validation set
Standard
measurements
Standard
measurements
+
Validated
measurements

Simulation study
OLS regression
Y = a0+a1A + b1L1+…+bpLp + e, e~N(0,s)
a1: effect of primary interest
A,L ~ multivariate normal with mean vector 0 and correlation-matrix
with equal pairwise correlations
Random measurement error: on A, generating a new A*
Missing data (MAR): on L1
True values for a0 = 0, a1 = 10, and b1= b2 = … = bp based on total
confounding effect (crude minus adjusted)

Simulation factors
100,000 generated datasets by random draws from simulation factors

Models

Sequential models
• MIME: Multiple imputation for measurement error
Multiple impute both A (only observed in subset) and missingness
L1 : full conditional specification (Y,A,A*,L), followed by OLS using
A and L as covariates (Rubin’s rules)
• MIRC: Multiple imputation and regression calibration
1. Impute missing values in L1
2. In subset: OLS for A given A*,L
3. For the entire set: Arc = E(A| A*,L)
4. For each multiple imputed set: OLS using Arc and L as
covariates, and adjust standard errors (RC)
5. Combine using Rubin’s rules

Simultaneous models
Conditional submodels
• Y | A, L (primary analysis model)
• A*| Y, A, L
• A | L
• L1 | L2,…,LP
Estimated simultaneously
• MCMC: Bayes (uninformative priors)
• Full information maximum likelihood: FIML (structural equation
model)

Results

What does this mean?
• Simple setting (OLS, 1 covariate with missing data, 1 covariate
with measurement error, internal validation): ”full adjustment”
approaches work really well even in small N = 100.
• Differences especially in rMSE, nearly no bias
• The Bayesian approach seems most promising (for its
frequentist properties!): least bias, easy to expand to other link
functions, multivariate missing data and measurement error

Measurement error models are not new
doi: 10.2307/1422689

Exceptions?
• Measurement error in prognostic factors in an RCT
– Same argument about missing data (e.g. see White and
Thompson, Stat Med 2005)
• Special case of measurement error in a confounder
– e.g. confounding by indication, where indication was based
on the confounder with error

Twitter: @MaartenvSmedenUtrecht, November 30 2020 Nab et al. Epidemiology, 2020, doi: 10.1097/EDE.0000000000001239

Sensitivity analysis tool
https://lindanab.shinyapps.io/SensitivityAnalysis/
Preprint: https://arxiv.org/abs/1912.05800

Exceptions?
• Measurement error in prognostic factors in an RCT
– Same argument about missing data (e.g. see White and
Thompson, Stat Med 2005)
• Special case of measurement error in a confounder
– e.g. confounding by indication, where indication was based
on the confounder with error
• Prediction models BUT…..

Measurement heterogeneity

Measurement: are labels the new oil?
https://twitter.com/DrHughHarvey/status/1230218991026819077

Correcting for missing data, measurement error and confounding

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Correcting for missing data, measurement error and confounding

Similar to Correcting for missing data, measurement error and confounding (20)

More from Maarten van Smeden

More from Maarten van Smeden (13)

Recently uploaded

Recently uploaded (20)

Correcting for missing data, measurement error and confounding