Your SlideShare is downloading. ×
0
The methodology for handling missing data during development of predictive model
The methodology for handling missing data during development of predictive model
The methodology for handling missing data during development of predictive model
The methodology for handling missing data during development of predictive model
The methodology for handling missing data during development of predictive model
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

The methodology for handling missing data during development of predictive model

221

Published on

Published in: Health & Medicine, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
221
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The methodology for handling missing data during development of predictive model<br />Xiao-Ou Ping1, Ja-Der Liang3, Yi-Ju Tseng2, Pei-Ming Yang3, <br />Guan-Tarn Huang3, Feipei Lai1, 2<br />1. Department of Computer Science and Information Engineering, <br />National Taiwan University <br />2. Graduate Institute of Biomedical Electronic and Bioinformatics, <br />National Taiwan University <br />3. Department of Internal Medicine, <br />National Taiwan University Hospital and <br />National Taiwan University College of Medicine<br />
  • 2. Introduction<br />In medical research, the problem of missing data occurs frequently<br />According to a study of reviewing 100 articles among seven cancer journals<br />81 of these articles have evidences of missing data<br />In our study of developing liver cancer recurrence predictive model, there have still missing data<br />The adoption of methods for dealing with missing data is necessary<br />The aims of this study are as follows<br />To evaluate the imputed performance of imputation methods in the stability and accuracy<br />To present the impact of different missing data handling methods in the predictive results of recurrence predictive model<br />To discover whether if the clinical features with missing values still have the potential for building the more accurate predictive model<br />2<br />
  • 3. Materials and methods<br />To develop predictive models based on incomplete clinical data using the missing data handling methods <br />Complete case (CC) analysis, complete variable (CV) analysis, and imputation method (IM)<br />The 92 liver cancer patients were included in the study<br />7.6% missing values<br />Analyzed features contain age, gender, laboratory tests, tumor size, tumor number, and cancer staging, etc<br />In the simulation experiment<br />The observed values are randomly masked as missing values and the IMs are employed for imputing these missing entries<br />After the process of data imputation, the masked true valuesand the imputed valued estimated by IMscan be compared<br />The normalized root mean squared errors (NRMSEs) can be used for evaluating the imputation accuracy of IMs<br />The summation of first quartile, third quartile, and the median is regarded as the IM selection criterion for comparing the imputation performance in the stability and accuracy<br />3<br />
  • 4. Evaluation of missing data handling methods <br />4<br /><ul><li>Complete case (CC) analysis: analyzing only the data of patients without missing data
  • 5. Complete variable (CV) analysis: dropping the variables with missing data and analyzing only the variables without missing data
  • 6. Imputation method (IM): estimating the missing values (MVs) based on different methods</li></ul>8 IMs are compared<br />in this study:<br />6 single MIs<br />1. “SVDImpute”<br />2. “LLSImpute”<br />3. “PPCA”<br />4. “BPCA” <br />5. “NLPCA”<br />6. “Nipals PCA”<br />2 multiple MIs<br />1. “MICE”<br />2. “mi”<br />
  • 7. Results and Conclusion<br />The sensitivity and specificity of CC are 100% and 83%<br />The best sensitivity and specificity of CV are 86% and 79%<br />In the study, we designed the score of Imputation Method (IM) selection criterionfor selecting the more appropriate IMs<br />The best three IMs can achieve better predictive accuracy (i.e. the same sensitivity, 86%, and the better specificity, 88%) than CV<br />The best IM, “BPCA”, can use just four features <br />Included the feature with missing values<br />IMs for data with missing values still show the compatible benefit for the recurrence predictive model<br />5<br />

×