Andy Bosyi: Data Imputation using Reverse ML
Data Science Online Camp 2023 Winter
Website: https://dscamp.org
Youtube: https://www.youtube.com/channel/UCeHtPZ_ZLZ-nHFMUCXY81RQ
FB: https://www.facebook.com/people/Data-Science-Camp/100064240830422/
UJJAIN CALL GIRL ❤ 8272964427❤ CALL GIRLS IN UJJAIN ESCORTS SERVICE PROVIDE
Andy Bosyi: Data Imputation using Reverse ML
1. mindcraft.ai
Data Imputation and Restoration using Reverse ML
Data imputation heals spoiled data
Dataset models the world only partially
Input, Transformation, Interpretation
Difference between 0 and NULL
(no item, no info, not available, no input)
Impute or Remove
2. mindcraft.ai
Types of Item Non-Response
Missing at Random (MAR)
Missing Completely at Random (MCAR)
Missing not at Random (MNAR)
Deletion for MAR and MCAR only
3. mindcraft.ai
Types of Imputation
Univariate imputation: Impute values using only the target variable itself (Mean).
Multivariate imputation: Impute values based on other variables (LR).
Single imputation: Impute any missing values within the dataset only once to
create a single imputed dataset.
Multiple imputation: Impute the same missing values within the dataset multiple
times (MICE).
4. mindcraft.ai
Imputation methods - Simple and Out of Box
Remove Data
- multivariate missing?
Deductive Investigation
Zero, Constant
Random (uniform, normal)
5. mindcraft.ai
Imputation methods - Basic
Mean, Median, Mode:
- reduce variance
- ignores correlation
- NULL category
LR or any other regression using NN
- problem in multivariate
KNN, Fuzzy Clustering
- sensitive for outliers
- heavy computation
References: https://towardsdatascience.com/6-different-ways-to-
compensate-for-missing-values-data-imputation-with-examples-
6022d9ca0779
6. mindcraft.ai
Imputation methods - MICE
Multivariate Imputation by Chained Equation
Multiple Regressions
Predictive Mean Matching
Generate values from predictive
distributions
Uncertainty and MCMC
References: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3074241/
https://towardsdatascience.com/how-to-handle-missing-data-
8646b18db0d4
7. mindcraft.ai
Imputation methods - Time Series
Last Observation Carried Forward (LOCF)
Next Observation Carried Backward (NOCB)
Interpolation (Linear, RNN)
Seasonal Adjustment + Interpolation
Interpolation -> Extrapolation
-> Predictive Models
8. mindcraft.ai
Imputation methods - Cleaning
AutoEncoder
Limited amount of missed data
Reference: https://github.com/andy-bosyi/articles/blob/master/AutoEncoder-MNIST-
clean.ipynb
15. mindcraft.ai
Reverse ML - Results and Conclusion
AE
Acc = 90.56%
RTAE
Acc = 96.22%
Better accuracy than classical methods
Requires more computational resources
Stable to compare with generative models
Scalability
Reference: https://github.com/andy-
bosyi/articles/blob/master/ReverseTrainedAutoEncoder-MNIST.ipynb
16. mindcraft.ai
Das ist MindCraft
Decision-making Engines for Data-driven Businesses, especially:
- Document and Web pages Classification, Capturing (NLP, CNN, CV, NER)
- Price Prediction (DNN, Regression, Prognosis)
- Command Centers for IoT systems (RNN, Time Series, Anomaly Detection)
- Computer Vision and Object Detection
- Data Analysis and Generation