SlideShare a Scribd company logo
MICE
Multivariate Imputation by Chained Equations
Richard Jacques
21st July 2015
21st July 2015 1 / 13
Multiple Imputation (MI)
MI is a statistical techniques for handling missing data.
The key concent of MI is to use the distribution of the observed data
to estimate a set of plausible values for the missing data.
Random components are incorporated into these estimated values to
reflect their uncertainty.
Multiple datasets are created and then analyzed individually but
identically to obtain a set of parameter estimates.
Estimates are combined to obtain a set of parameter estimates.
IR White et al. Multiple imputation using chained equations: Issues and guidance for
practice. Statist. Med. 2011; 30:337-399.
21st July 2015 2 / 13
Example Data
NHANES (National Health and Nutrition Examination Survey)
Four variables: age (age group), bmi (body mass index), hyp
(hypertention status), chl (cholesterol level)
> library(mice)
> nhanes[1:5,]
age bmi hyp chl
1 1 NA NA NA
2 2 22.7 1 187
3 1 NA 1 187
4 3 NA NA NA
5 1 20.4 1 113
21st July 2015 3 / 13
Inspecting Missing Data
> md.pattern(nhanes)
age hyp bmi chl
13 1 1 1 1 0
1 1 1 0 1 1
3 1 1 1 0 1
1 1 0 0 1 2
7 1 0 0 0 3
0 8 9 10 27
A matrix, in which each row corresponds to a missing data pattern
(1=observed, 0=missing).
21st July 2015 4 / 13
Multiple Imputation: Main Steps
Imputation Steps R Function R Object Class
Incomplete Data data frame
mice( )
Imputed Data mids
with( )
Analysis Results mira
pool( )
Pooled Results mipo
21st July 2015 5 / 13
Generating mutliple imputations: mice()
> mice(data,m,method,predictorMatrix)
data: A data frame or matrix containing the incomplete data. Missing
values coded as NA.
m: Number of imputations (default = 5)
method: A single string, or a vector of strings, specifying the
imputation method used for each column in the data.
predictorMatrix: A square matrix specifying the set of predictors to be
used for each column.
21st July 2015 6 / 13
Built-in imputation methods
Method Description Scale Type
ppm Predictive mean matching numeric
norm Bayesian linear regression numeric
norm.nob Linear regression, non-Bayesian numeric
mean Unconditional mean imputation numeric
2l.norm Two-level linear model numeric
logreg Logistic regression factor, 2 levels
polyreg Polytomous (unordered) regression factor, >2 levels
lda Linear discriminant analysis factor
sample Random sample from observed data any
21st July 2015 7 / 13
Example
> nhanes_mice<-mice(nhanes,m=5,method=c("","norm","pmm","mean"))
> nhanes_mice
Multiply imputed data set
Call:
mice(data = nhanes, m = 5, method = c("", "norm", "pmm", "mean"))
Number of multiple imputations: 5
Missing cells per column:
age bmi hyp chl
0 9 8 10
Imputation methods:
age bmi hyp chl
"" "norm" "pmm" "mean"
VisitSequence:
bmi hyp chl
2 3 4
PredictorMatrix:
age bmi hyp chl
age 0 0 0 0
bmi 1 0 1 1
hyp 1 1 0 1
chl 1 1 1 0
Random generator seed value: NA
21st July 2015 8 / 13
Diagnostics
Check plausibility of imputations for individual variables:
> nhanes_mice$imp$bmi
1 2 3 4 5
1 25.10264 34.96051 23.63793 27.37651 30.08139
3 28.80055 29.08077 31.07668 31.37782 29.65301
4 20.64118 24.85547 25.44350 25.44396 25.42621
Examine complete data combined with imputed data:
> complete(nhanes_mice,1)
age bmi hyp chl
1 1 25.10264 1 191.4
2 2 22.70000 1 187.0
3 1 28.80055 1 187.0
4 3 20.64118 1 191.4
5 1 20.40000 1 113.0
21st July 2015 9 / 13
Data Analysis
with.mids() is used to perform the desired analysis for each imputed copy
of the data.
> fit<-with(nhanes_mice,lm(chl~age+bmi))
> summary(fit)
## summary of imputation 1 :
Call:
lm(formula = chl ~ age + bmi)
Residuals:
Min 1Q Median 3Q Max
-43.225 -10.881 -2.835 9.934 65.137
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -22.546 48.078 -0.469 0.643721
age 31.660 7.436 4.258 0.000322 ***
bmi 6.004 1.496 4.012 0.000585 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 25.43 on 22 degrees of freedom
Multiple R-squared: 0.5028, Adjusted R-squared: 0.4576
F-statistic: 11.12 on 2 and 22 DF, p-value: 0.0004593
## summary of imputation 2 : 21st July 2015 10 / 13
Pooling Results
pool() talks the results from with.mids() and combines the separate
estimates and standard errors from each of the m imputed data sets to
give an over estimate and standard error
> est<-pool(fit)
> summary(est)
est se t df Pr(>|t|) lo 95
(Intercept) -2.063050 56.538439 -0.03648934 12.54558 0.971466388 -124.658189
age 28.054106 8.827146 3.17816263 11.35829 0.008466749 8.700200
bmi 5.404212 1.736748 3.11168532 13.92380 0.007695105 1.677345
hi 95 nmis
(Intercept) 120.532089 NA
age 47.408013 0
bmi 9.131079 9
21st July 2015 11 / 13
Models
pool() can be used with any object having both coef() and vcov()
methods. The function will abort if an approporiate method is not
found.
pool() can also be used with results obtained with lme() and lmer(),
but only with the fixed part of the model.
21st July 2015 12 / 13
References
S van Buuren, K Groothuis-Oudshoorn. MICE: Multivariate
Imputation by Chained Equations in R. Journal of Statistical Software
2011; 45(3)
IR White, P Royston, AM Wood. Multiple imputation using chained
equations: Issues and guidance for practice. Statistic in Medicine
2011; 30(4): 337-339.
21st July 2015 13 / 13

More Related Content

What's hot

Biostatistics Workshop: Missing Data
Biostatistics Workshop: Missing DataBiostatistics Workshop: Missing Data
Biostatistics Workshop: Missing Data
HopkinsCFAR
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-I
hktripathy
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
Ricardo Wendell Rodrigues da Silveira
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
Marina Santini
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
Venkata Reddy Konasani
 
Missing data and non response pdf
Missing data and non response pdfMissing data and non response pdf
Missing data and non response pdf
Anuj Bhatia
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT Slides
AbhishekKumar4995
 
Structural equation modeling in amos
Structural equation modeling in amosStructural equation modeling in amos
Structural equation modeling in amos
Balaji P
 
PCA (Principal component analysis)
PCA (Principal component analysis)PCA (Principal component analysis)
PCA (Principal component analysis)
Learnbay Datascience
 
Feature Extraction and Principal Component Analysis
Feature Extraction and Principal Component AnalysisFeature Extraction and Principal Component Analysis
Feature Extraction and Principal Component AnalysisSayed Abulhasan Quadri
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
Partha Sarathi Kar
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
Farah M. Altufaili
 
Missing Data and Causes
Missing Data and CausesMissing Data and Causes
Missing Data and Causes
akanni azeez olamide
 
Principal Component Analysis and Clustering
Principal Component Analysis and ClusteringPrincipal Component Analysis and Clustering
Principal Component Analysis and Clustering
Usha Vijay
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 
Multiple Imputation: Joint and Conditional Modeling of Missing Data
Multiple Imputation: Joint and Conditional Modeling of Missing DataMultiple Imputation: Joint and Conditional Modeling of Missing Data
Multiple Imputation: Joint and Conditional Modeling of Missing Data
Kazuki Yoshida
 
Cross validation
Cross validationCross validation
Cross validation
RidhaAfrawe
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
gokulprasath06
 
Clustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesClustering - Machine Learning Techniques
Clustering - Machine Learning Techniques
Kush Kulshrestha
 

What's hot (20)

Biostatistics Workshop: Missing Data
Biostatistics Workshop: Missing DataBiostatistics Workshop: Missing Data
Biostatistics Workshop: Missing Data
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-I
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
 
Missing data and non response pdf
Missing data and non response pdfMissing data and non response pdf
Missing data and non response pdf
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT Slides
 
Structural equation modeling in amos
Structural equation modeling in amosStructural equation modeling in amos
Structural equation modeling in amos
 
PCA (Principal component analysis)
PCA (Principal component analysis)PCA (Principal component analysis)
PCA (Principal component analysis)
 
Pca analysis
Pca analysisPca analysis
Pca analysis
 
Feature Extraction and Principal Component Analysis
Feature Extraction and Principal Component AnalysisFeature Extraction and Principal Component Analysis
Feature Extraction and Principal Component Analysis
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Missing Data and Causes
Missing Data and CausesMissing Data and Causes
Missing Data and Causes
 
Principal Component Analysis and Clustering
Principal Component Analysis and ClusteringPrincipal Component Analysis and Clustering
Principal Component Analysis and Clustering
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Multiple Imputation: Joint and Conditional Modeling of Missing Data
Multiple Imputation: Joint and Conditional Modeling of Missing DataMultiple Imputation: Joint and Conditional Modeling of Missing Data
Multiple Imputation: Joint and Conditional Modeling of Missing Data
 
Cross validation
Cross validationCross validation
Cross validation
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Clustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesClustering - Machine Learning Techniques
Clustering - Machine Learning Techniques
 

Similar to SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) package

The Right Way
The Right WayThe Right Way
The Right Way
tim_morris
 
eggs_project_interm
eggs_project_intermeggs_project_interm
eggs_project_intermRoopan Verma
 
Diabetes data - model assessment using R
Diabetes data - model assessment using RDiabetes data - model assessment using R
Diabetes data - model assessment using R
Gregg Barrett
 
A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...
A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...
A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...
ijcsa
 
Dm
DmDm
Risk Factors Estimation.pptx
Risk Factors  Estimation.pptxRisk Factors  Estimation.pptx
Risk Factors Estimation.pptx
KaziAmitHasan1
 
Regression Analysis and model comparison on the Boston Housing Data
Regression Analysis and model comparison on the Boston Housing DataRegression Analysis and model comparison on the Boston Housing Data
Regression Analysis and model comparison on the Boston Housing Data
Shivaram Prakash
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
Chandan Reddy
 
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET Journal
 
Prognosticating Autism Spectrum Disorder Using Artificial Neural Network: Lev...
Prognosticating Autism Spectrum Disorder Using Artificial Neural Network: Lev...Prognosticating Autism Spectrum Disorder Using Artificial Neural Network: Lev...
Prognosticating Autism Spectrum Disorder Using Artificial Neural Network: Lev...
Avishek Choudhury
 
A review on early hospital mortality prediction using vital signals
A review on early hospital mortality prediction using vital signalsA review on early hospital mortality prediction using vital signals
A review on early hospital mortality prediction using vital signals
Reza Sadeghi
 
IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine Learning
IRJET Journal
 
Classification of Heart Diseases Patients using Data Mining Techniques
Classification of Heart Diseases Patients using Data Mining TechniquesClassification of Heart Diseases Patients using Data Mining Techniques
Classification of Heart Diseases Patients using Data Mining Techniques
Lovely Professional University
 
R Regression Models with Zelig
R Regression Models with ZeligR Regression Models with Zelig
R Regression Models with Zelig
izahn
 
Android Based Questionnaires Application for Heart Disease Prediction System
Android Based Questionnaires Application for Heart Disease Prediction SystemAndroid Based Questionnaires Application for Heart Disease Prediction System
Android Based Questionnaires Application for Heart Disease Prediction System
ijtsrd
 
Bayesian Estimation of Reproductive Number for Tuberculosis in India
Bayesian Estimation of Reproductive Number for Tuberculosis in IndiaBayesian Estimation of Reproductive Number for Tuberculosis in India
Bayesian Estimation of Reproductive Number for Tuberculosis in India
arjun_bhardwaj
 
Ijariie1117 volume 1-issue 1-page-25-27
Ijariie1117 volume 1-issue 1-page-25-27Ijariie1117 volume 1-issue 1-page-25-27
Ijariie1117 volume 1-issue 1-page-25-27
IJARIIE JOURNAL
 
Statistical analysis &amp; errors (lecture 3)
Statistical analysis &amp; errors (lecture 3)Statistical analysis &amp; errors (lecture 3)
Statistical analysis &amp; errors (lecture 3)
Farhad Ashraf
 
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
IRJET Journal
 

Similar to SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) package (20)

report
reportreport
report
 
The Right Way
The Right WayThe Right Way
The Right Way
 
eggs_project_interm
eggs_project_intermeggs_project_interm
eggs_project_interm
 
Diabetes data - model assessment using R
Diabetes data - model assessment using RDiabetes data - model assessment using R
Diabetes data - model assessment using R
 
A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...
A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...
A ROBUST MISSING VALUE IMPUTATION METHOD MIFOIMPUTE FOR INCOMPLETE MOLECULAR ...
 
Dm
DmDm
Dm
 
Risk Factors Estimation.pptx
Risk Factors  Estimation.pptxRisk Factors  Estimation.pptx
Risk Factors Estimation.pptx
 
Regression Analysis and model comparison on the Boston Housing Data
Regression Analysis and model comparison on the Boston Housing DataRegression Analysis and model comparison on the Boston Housing Data
Regression Analysis and model comparison on the Boston Housing Data
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
 
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression MethodIRJET- Error Reduction in Data Prediction using Least Square Regression Method
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
 
Prognosticating Autism Spectrum Disorder Using Artificial Neural Network: Lev...
Prognosticating Autism Spectrum Disorder Using Artificial Neural Network: Lev...Prognosticating Autism Spectrum Disorder Using Artificial Neural Network: Lev...
Prognosticating Autism Spectrum Disorder Using Artificial Neural Network: Lev...
 
A review on early hospital mortality prediction using vital signals
A review on early hospital mortality prediction using vital signalsA review on early hospital mortality prediction using vital signals
A review on early hospital mortality prediction using vital signals
 
IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine Learning
 
Classification of Heart Diseases Patients using Data Mining Techniques
Classification of Heart Diseases Patients using Data Mining TechniquesClassification of Heart Diseases Patients using Data Mining Techniques
Classification of Heart Diseases Patients using Data Mining Techniques
 
R Regression Models with Zelig
R Regression Models with ZeligR Regression Models with Zelig
R Regression Models with Zelig
 
Android Based Questionnaires Application for Heart Disease Prediction System
Android Based Questionnaires Application for Heart Disease Prediction SystemAndroid Based Questionnaires Application for Heart Disease Prediction System
Android Based Questionnaires Application for Heart Disease Prediction System
 
Bayesian Estimation of Reproductive Number for Tuberculosis in India
Bayesian Estimation of Reproductive Number for Tuberculosis in IndiaBayesian Estimation of Reproductive Number for Tuberculosis in India
Bayesian Estimation of Reproductive Number for Tuberculosis in India
 
Ijariie1117 volume 1-issue 1-page-25-27
Ijariie1117 volume 1-issue 1-page-25-27Ijariie1117 volume 1-issue 1-page-25-27
Ijariie1117 volume 1-issue 1-page-25-27
 
Statistical analysis &amp; errors (lecture 3)
Statistical analysis &amp; errors (lecture 3)Statistical analysis &amp; errors (lecture 3)
Statistical analysis &amp; errors (lecture 3)
 
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...
 

More from Paul Richards

Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflow
Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflowSheffield_R_ July meeting - Interacting with R - IDEs, Git and workflow
Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflow
Paul Richards
 
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Paul Richards
 
How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
Paul Richards
 
Introduction to knitr - May Sheffield R Users group
Introduction to knitr - May Sheffield R Users groupIntroduction to knitr - May Sheffield R Users group
Introduction to knitr - May Sheffield R Users group
Paul Richards
 
Querying open data with R - Talk at April SheffieldR Users Gp
Querying open data with R - Talk at April SheffieldR Users GpQuerying open data with R - Talk at April SheffieldR Users Gp
Querying open data with R - Talk at April SheffieldR Users Gp
Paul Richards
 
OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...
OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...
OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...
Paul Richards
 
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Paul Richards
 
Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Intro to ggplot2 - Sheffield R Users Group, Feb 2015Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Paul Richards
 
Introduction to Shiny for building web apps in R
Introduction to Shiny for building web apps in RIntroduction to Shiny for building web apps in R
Introduction to Shiny for building web apps in R
Paul Richards
 
Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...
Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...
Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...
Paul Richards
 
Introduction to data.table in R
Introduction to data.table in RIntroduction to data.table in R
Introduction to data.table in R
Paul Richards
 
Dplyr and Plyr
Dplyr and PlyrDplyr and Plyr
Dplyr and Plyr
Paul Richards
 

More from Paul Richards (12)

Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflow
Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflowSheffield_R_ July meeting - Interacting with R - IDEs, Git and workflow
Sheffield_R_ July meeting - Interacting with R - IDEs, Git and workflow
 
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
Preparing and submitting a package to CRAN - June Sanderson, Sheffield R User...
 
How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
How to win $10m - analysing DOTA2 data in R (Sheffield R Users Group - May)
 
Introduction to knitr - May Sheffield R Users group
Introduction to knitr - May Sheffield R Users groupIntroduction to knitr - May Sheffield R Users group
Introduction to knitr - May Sheffield R Users group
 
Querying open data with R - Talk at April SheffieldR Users Gp
Querying open data with R - Talk at April SheffieldR Users GpQuerying open data with R - Talk at April SheffieldR Users Gp
Querying open data with R - Talk at April SheffieldR Users Gp
 
OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...
OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...
OrienteeRing - using R to optimise mini mountain marathon routes - Pete Dodd ...
 
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015Phylogeny in R - Bianca Santini Sheffield R Users March 2015
Phylogeny in R - Bianca Santini Sheffield R Users March 2015
 
Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Intro to ggplot2 - Sheffield R Users Group, Feb 2015Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Intro to ggplot2 - Sheffield R Users Group, Feb 2015
 
Introduction to Shiny for building web apps in R
Introduction to Shiny for building web apps in RIntroduction to Shiny for building web apps in R
Introduction to Shiny for building web apps in R
 
Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...
Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...
Sheffield R Jan 2015 - Using R to investigate parasite infections in Asian el...
 
Introduction to data.table in R
Introduction to data.table in RIntroduction to data.table in R
Introduction to data.table in R
 
Dplyr and Plyr
Dplyr and PlyrDplyr and Plyr
Dplyr and Plyr
 

Recently uploaded

openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
Google
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 

Recently uploaded (20)

openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 

SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) package

  • 1. MICE Multivariate Imputation by Chained Equations Richard Jacques 21st July 2015 21st July 2015 1 / 13
  • 2. Multiple Imputation (MI) MI is a statistical techniques for handling missing data. The key concent of MI is to use the distribution of the observed data to estimate a set of plausible values for the missing data. Random components are incorporated into these estimated values to reflect their uncertainty. Multiple datasets are created and then analyzed individually but identically to obtain a set of parameter estimates. Estimates are combined to obtain a set of parameter estimates. IR White et al. Multiple imputation using chained equations: Issues and guidance for practice. Statist. Med. 2011; 30:337-399. 21st July 2015 2 / 13
  • 3. Example Data NHANES (National Health and Nutrition Examination Survey) Four variables: age (age group), bmi (body mass index), hyp (hypertention status), chl (cholesterol level) > library(mice) > nhanes[1:5,] age bmi hyp chl 1 1 NA NA NA 2 2 22.7 1 187 3 1 NA 1 187 4 3 NA NA NA 5 1 20.4 1 113 21st July 2015 3 / 13
  • 4. Inspecting Missing Data > md.pattern(nhanes) age hyp bmi chl 13 1 1 1 1 0 1 1 1 0 1 1 3 1 1 1 0 1 1 1 0 0 1 2 7 1 0 0 0 3 0 8 9 10 27 A matrix, in which each row corresponds to a missing data pattern (1=observed, 0=missing). 21st July 2015 4 / 13
  • 5. Multiple Imputation: Main Steps Imputation Steps R Function R Object Class Incomplete Data data frame mice( ) Imputed Data mids with( ) Analysis Results mira pool( ) Pooled Results mipo 21st July 2015 5 / 13
  • 6. Generating mutliple imputations: mice() > mice(data,m,method,predictorMatrix) data: A data frame or matrix containing the incomplete data. Missing values coded as NA. m: Number of imputations (default = 5) method: A single string, or a vector of strings, specifying the imputation method used for each column in the data. predictorMatrix: A square matrix specifying the set of predictors to be used for each column. 21st July 2015 6 / 13
  • 7. Built-in imputation methods Method Description Scale Type ppm Predictive mean matching numeric norm Bayesian linear regression numeric norm.nob Linear regression, non-Bayesian numeric mean Unconditional mean imputation numeric 2l.norm Two-level linear model numeric logreg Logistic regression factor, 2 levels polyreg Polytomous (unordered) regression factor, >2 levels lda Linear discriminant analysis factor sample Random sample from observed data any 21st July 2015 7 / 13
  • 8. Example > nhanes_mice<-mice(nhanes,m=5,method=c("","norm","pmm","mean")) > nhanes_mice Multiply imputed data set Call: mice(data = nhanes, m = 5, method = c("", "norm", "pmm", "mean")) Number of multiple imputations: 5 Missing cells per column: age bmi hyp chl 0 9 8 10 Imputation methods: age bmi hyp chl "" "norm" "pmm" "mean" VisitSequence: bmi hyp chl 2 3 4 PredictorMatrix: age bmi hyp chl age 0 0 0 0 bmi 1 0 1 1 hyp 1 1 0 1 chl 1 1 1 0 Random generator seed value: NA 21st July 2015 8 / 13
  • 9. Diagnostics Check plausibility of imputations for individual variables: > nhanes_mice$imp$bmi 1 2 3 4 5 1 25.10264 34.96051 23.63793 27.37651 30.08139 3 28.80055 29.08077 31.07668 31.37782 29.65301 4 20.64118 24.85547 25.44350 25.44396 25.42621 Examine complete data combined with imputed data: > complete(nhanes_mice,1) age bmi hyp chl 1 1 25.10264 1 191.4 2 2 22.70000 1 187.0 3 1 28.80055 1 187.0 4 3 20.64118 1 191.4 5 1 20.40000 1 113.0 21st July 2015 9 / 13
  • 10. Data Analysis with.mids() is used to perform the desired analysis for each imputed copy of the data. > fit<-with(nhanes_mice,lm(chl~age+bmi)) > summary(fit) ## summary of imputation 1 : Call: lm(formula = chl ~ age + bmi) Residuals: Min 1Q Median 3Q Max -43.225 -10.881 -2.835 9.934 65.137 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -22.546 48.078 -0.469 0.643721 age 31.660 7.436 4.258 0.000322 *** bmi 6.004 1.496 4.012 0.000585 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 25.43 on 22 degrees of freedom Multiple R-squared: 0.5028, Adjusted R-squared: 0.4576 F-statistic: 11.12 on 2 and 22 DF, p-value: 0.0004593 ## summary of imputation 2 : 21st July 2015 10 / 13
  • 11. Pooling Results pool() talks the results from with.mids() and combines the separate estimates and standard errors from each of the m imputed data sets to give an over estimate and standard error > est<-pool(fit) > summary(est) est se t df Pr(>|t|) lo 95 (Intercept) -2.063050 56.538439 -0.03648934 12.54558 0.971466388 -124.658189 age 28.054106 8.827146 3.17816263 11.35829 0.008466749 8.700200 bmi 5.404212 1.736748 3.11168532 13.92380 0.007695105 1.677345 hi 95 nmis (Intercept) 120.532089 NA age 47.408013 0 bmi 9.131079 9 21st July 2015 11 / 13
  • 12. Models pool() can be used with any object having both coef() and vcov() methods. The function will abort if an approporiate method is not found. pool() can also be used with results obtained with lme() and lmer(), but only with the fixed part of the model. 21st July 2015 12 / 13
  • 13. References S van Buuren, K Groothuis-Oudshoorn. MICE: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software 2011; 45(3) IR White, P Royston, AM Wood. Multiple imputation using chained equations: Issues and guidance for practice. Statistic in Medicine 2011; 30(4): 337-339. 21st July 2015 13 / 13