1. Why Not to Use Zero Imputation?
Correcting Sparsity Bias in Training Neural
Networks
Joonyoung Yi, Juhyuk Lee, Kwang Joon Kim, Sung Ju Hwang, Eunho Yang
ICLR 2020
Machine Learning & Intelligence Laboratory
2. The Value and Problem of Zero Imputation
● Missing data is widespread in machine learning. (e.g. Recommendation, Electronic
medical records, IoT sensor dataset)
● Zero Imputation: The simplest and most intuitive way to handle missing data.
● However, many previous studies have reported that zero imputation has an
adverse effect on model performance [Hazan et al., 2015; Luo et al., 2018; Smieja
et al., 2018].
● We identified Variable Sparsity Problem (VSP) which causes performance
degradation of zero imputation.
2
3. Variable Sparsity Problem & Sparsity Normalization
● Variable Sparsity Problem (VSP): The output of a neural network greatly vary with
respect to the number of missing entries in the input. (Figure (a), Under left figure)
● Sparsity Normalization (SN): Making expected output independent of input sparsity
level. (Figure (b), Under right figure)
● With SN, as more features are known for a particular instance, the variance of
prediction for that instance decreases.
3
4. Experiment Result
● Collaborative Filtering dataset.
○ State-of-the-arts among neural network based collaborative filtering models.
● National Health Insurance Service (NHIS) dataset.
○ Even with its simplicity, SN exhibits better or similar performances compared
to other more complex techniques.
4