A survey on missing information strategies and imputation methods in healthcare

A Survey on Missing Information Strategies and
Imputation Methods in Healthcare
Presented By
Saroj Kumar Pandey
Department of Information Technology
National Institute of Technology, Raipur (C.G.)

Introduction
Strategies of missing data
Techniques for managing the missing information
Supporting tools
Conclusion
References
Outline…

Introduction
Issue of missing information is generally basic in many existing exploration informational
index and can significantly affect the outcomes.
An issue in healthcare framework happens when information are absent in at least one
spots chance elements.
Missing information shows different issues
Absence of data reduce the statistical power.
Lost data can cause bias in the estimation of parameters.
Reduce the representativeness of the samples.
Complicate the analysis of the study.

Strategies of missing data
Missing completely at random (MCAR)
No –No condition
Example: Blood pressure measurement is missing because of break down of an
automatic sphygmomanometer.
Missing at random (MAR)
No-Yes condition
Example : Missing blood pressure measurement may be lower than measured
blood pressure because younger people may have more likely to have missing
blood pressure measurement.
Missing not at random (MNAR)
Yes-Yes condition
Examples: Suppose the study is not effective for reducing the blood pressure, there
may be a chance of subjects drop out.

Techniques for managing the missing information
List-wise deletion
o Decrease statistical power.
o May introduce bias in parameter.
o Default option in many statistical package.
Pair-wise deletion
o Preserve great deal of information than list wise deletion.
o Interpretation become difficult.
o May lead mathematically inconsistent correlation.

Cont…
Mean imputation
o Involves replacing missing value with the value of the sample mean for that
variable.
o The oldest most widely used method.
Regression imputation
o Estimate missing data based on other variables in the data set.
o Better than list-wise and pair wise deletion .
N
xf
x

 )(

Cont…
Last observation carried forward
o Replaces every missing value with the last observed value from the same subject.
o Easy to understand and communicate between the statisticians and clinicians or
between a sponsor and the researcher.
Maximum likelihood imputation
o The assumption that the observed data are a sample drawn from a multivariate normal
distribution is relatively easy to understand.
o Parameters are estimated using the available data, the missing data are estimated based
on the parameters which have just been estimated.

Cont…
Expectation maximization
o Type of the maximum likelihood method that can be used to create a new data set,
in which all missing values are imputed with values estimated by the maximum
likelihood methods.
Multiple Imputations(MI)
o Multiple imputation technique is used to replacing missing data value when a data set
having more than one missing data.
o Every imputed information is examined in the same manner by standard information
techniques, and the results are merged using the simple mathematics
Expectation step
Update variable
Maximization step
Update hypothesis

Supporting tools
R-studio: It supports numerous libraries such as “norm”, “cat”, “mix”, and “pan”
for imputing information under multivariable standard models namely, log-linear
models, general location models, and linear mixed models.
MATLAB: While missing data are present in the data set, You can fill missing
value with the following: ‘constant’, 'previous', 'next', 'nearest', 'linear', 'spline',
'pchip' .
SAS: PROC MI applies regression methods and propensity scores for imputation.
IVEware: Imputation and Variance Estimation programmed tool for SRMI,
MICE: Multiple Imputation tool using Chained Equations, library available in
both S-plus and R –studio .

Conclusion
The article in general emphasis on the level of disappeared and mislaid data
contrivances (problems) and various missing data managing practices and tools.
Distinguishing what should and should not be imputed is usually not possible
using a single code for every type of the missing value.
It is difficult to know whether the multiple imputation or full maximum likelihood
estimation is best, but both are superior to the traditional approaches. Both
techniques are best used with large samples.

References
1. J. Luengo, S. García, and F. Herrera, On the choice of the best imputation methods for missing values
considering three groups of classification methods, vol. 32, no. 1. 2012.
2. J. Luengo, J. A. Sáez, and F. Herrera, “Missing data imputation for fuzzy rule-based classification systems,”
Soft Comput., vol. 16, no. 5, pp. 863–881, 2012.
3. R. T. O’Neill and R. Temple, “The prevention and treatment of missing data in clinical trials: an FDA
perspective on the importance of dealing with it.,” Clin. Pharmacol. Ther., vol. 91, no. 3, pp. 550–4, 2012.
4. P. D. Allison, “Missing Data,” vol. 17, no. 4, pp. 372–411, 2008.
5. D. B. Rubin, “Inference and missing data,” Biometrika, vol. 63, no. 3. pp. 581–592, 1976.
6. G. E. A. P. A. Batista and M. C. Monard, “An analysis of four missing data treatment methods for supervised
learning,” Appl. Artif. Intell., vol. 17, no. 5–6, pp. 519–533, 2003.
7. J. D. Dziura, L. A. Post, Q. Zhao, Z. Fu, and P. Peduzzi, “Strategies for dealing with missing data in clinical
trials: from design to analysis.,” Yale J. Biol. Med., vol. 86, no. 3, pp. 343–58, 2013.
8. H. Daniell, “NIH Public Access,” vol. 76, no. October 2009, pp. 211–220, 2012.
9. H. Kang, “The prevention and handling of the missing data,” vol. 64, no. 5, pp. 402–406, 2013.
10. M. Soley-bori, “Dealing with missing data: Key assumptions and methods for applied analysis,” PM931 Dir.
Study Heal. Policy Manag., no. 4, p. 20, 2013.
11. J. Figueredo, P. E. McKnight, K. M. McKnight, and S. Sidani, “Multivariate modeling of missing data
within and across assessment waves.,” Addiction, vol. 95 Suppl 3, no. February, pp. S361–S380, 2000.
12. X. P. Zhu, “Comparison of Four Methods for Handing Missing Data in Longitudinal Data Analysis through a
Simulation Study,” Open J. Stat., vol. 4, no. 4, pp. 933–944, 2014.

Cont…
13. A. N. Baraldi and C. K. Enders, “An introduction to modern missing data analyses,” J. Sch. Psychol., vol. 48,
no. 1, pp. 5–37, 2010.
14. H. Xu, “LOCF Method and Application in Clinical Data Analysis,” Sugi, no. 2, pp. 1–5, 2009.
15. R. M. Hamer and P. M. Simpson, “Last observation carried forward versus mixed models in the analysis of
psychiatric clinical trials (American Journal of Psychiatry (2009) 166, (639-641)),” Am. J. Psychiatry, vol.
166, no. 8, p. 942, 2009.
16. P. D. Allison, “Handling Missing Data by Maximum Likelihood,” SAS Glob. Forum 2012 Stat. Data Anal.,
pp. 1–21, 2012.
17. Y. Dong and C.-Y. J. Peng, “Principled missing data methods for researchers.,” Springerplus, vol. 2, no. 1, p.
222, 2013.
18. A. A. P. Dempster, N. M. Laird, D. B. Rubin, S. Journal, R. Statistical, and S. Series, Maximum Likelihood
from Incomplete Data via the EM Algorithm, vol. 39, no. 1. 2017.
19. L. M. Collins, J. L. Schafer, and C. M. Kam, “A comparison of inclusive and restrictive strategies in modern
missing data procedures.,” Psychol. Methods, vol. 6, no. 4, pp. 330–51, 2001.
20. E.-L. Silva-Ramírez, R. Pino-Mejías, M. López-Coello, and M.-D. Cubiles-de-la-Vega, “Missing value
imputation on missing completely at random data using multilayer perceptrons.,” Neural networks, vol. 24,
no. 1, pp. 121–129, 2011.
21. Rosato, Rosalba, et al. "Missing data imputation in longitudinal trial of endometrial cancer patients."
QUALITY OF LIFE RESEARCH. Vol. 25. VAN GODEWIJCKSTRAAT 30, 3311 GZ DORDRECHT,
NETHERLANDS: SPRINGER, 2016.
22. Beaulieu-Jones, Brett K., and Jason H. Moore. "Missing data imputation in the electronic health record using
deeply learned autoencoders.” PACIFIC SYMPOSIUM ON BIOCOMPUTING 2017
23. Zeng, Yan, et al. "A Study of Missing Data Imputation in Predictive Modeling of a Wood-Composite
Manufacturing Process." Journal of Quality Technology 48.3 (2016): 284.

A survey on missing information strategies and imputation methods in healthcare

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to A survey on missing information strategies and imputation methods in healthcare

Similar to A survey on missing information strategies and imputation methods in healthcare (20)

Recently uploaded

Recently uploaded (20)

A survey on missing information strategies and imputation methods in healthcare