SlideShare a Scribd company logo
1 of 52
Download to read offline
Finding the
right ML technique
for
Predictive Modeling
SK Reddy
skreddy99
skreddy99
Basics of anomaly detection
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0152173
Novelty vs Anomaly
Credit Card transactions
Contextual anomaly
Fraud detection
across various
application domains
http://tylervigen.com/spurious-correlations
Strange correlations
Strange correlations
http://tylervigen.com/spurious-correlations
http://tylervigen.com/spurious-correlations
Strange correlations
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0152173
8
SVM
KNN
Classical ML Classifiers
NaĂŻve Bayes
Predictive analyses algos
https://d.dam.sap.com/a/xOXXb/50764_GB_46588_en.pdf
Predicting the real-time availability of 200 million grocery items -
Instacart
https://tech.instacart.com/predicting-real-time-availability-of-200-million-grocery-items-in-us-canada-stores-61f43a16eafe
The problem:
understanding “not founds”
Routes followed by shoppers in SF, Austin, Boston and Miami
https://tech.instacart.com/space-time-and-groceries-a315925acf3a
https://tech.instacart.com/predicting-real-time-availability-of-200-million-grocery-items-in-us-canada-stores-61f43a16eafe
Predicting the real-time availability of 200 million grocery items -
Instacart
Feature engineering: item level features, time-based features, and categorical features
https://tech.instacart.com/predicting-real-time-availability-of-200-million-grocery-items-in-us-canada-stores-61f43a16eafe
https://tech.instacart.com/3-million-instacart-orders-open-sourced-d40d29ead6f2
Predicting the real-time availability of 200 million grocery items -
Instacart
Leveraging Elastic Demand for Forecasting
https://tech.instacart.com/leveraging-elastic-demand-for-forecasting-6278b45f805f; https://arxiv.org/pdf/1809.03018.pdf
Goal: shifting the right amount of elastic demand to minimize the difference between the new
demand series
Input: historical demand series, and the amount of elastic demand
Output: shifted demand series
Leveraging Elastic Demand for Forecasting
https://tech.instacart.com/leveraging-elastic-demand-for-forecasting-6278b45f805f; https://arxiv.org/pdf/1809.03018.pdf
Leveraging Elastic Demand for Forecasting
https://tech.instacart.com/leveraging-elastic-demand-for-forecasting-6278b45f805f; https://arxiv.org/pdf/1809.03018.pdf
Demand Variance
Leveraging Elastic Demand for Forecasting
A larger amount of elastic demand leads to
a smaller variance. However, the first 10%
elastic demand produces the largest
variance reduction.
https://tech.instacart.com/leveraging-elastic-demand-for-forecasting-6278b45f805f; https://arxiv.org/pdf/1809.03018.pdf
Predicting creditworthiness in retail banking with limited scoring data
https://www.sciencedirect.com/science/article/pii/S0950705116300156?via%3Dihub 19
Credit Card Fraud Detection
http://isyou.info/inpra/papers/inpra-v5n4-02.pdf
Characteristic of the mobile payment dataset
F-measure after classification
20
“Nowcasting” Recession
https://arxiv.org/pdf/1903.03202.pdf
SVM:
• Linear (classes separable with a linear
hyperplane)
• Non-linear
Features
1. Monthly log difference in nonfarm payrolls
2. Log difference in average monthly price of
the S&P 500
3. Production index from Manufacturing ISM
Report (info about the goods market)
4. 10-year Treasury yield minus the federal
funds rate
SVM Dual Parameter and NBER Recessions
21
22https://arxiv.org/pdf/1804.10796.pdf
Handling Uncertainty in Social Lending
Credit Risk Prediction
The results of combining the three classifiers through a Choquet fuzzy integral
approach compared to the performance of each base classifiers alone
Deep Autoencoders
https://arxiv.org/pdf/1903.06580.pdf
Variational Autoencoder (VAE)
A standard Autoencoder
23
A variational Autoencoder
https://arxiv.org/pdf/1903.06580.pdf
Latent representation of bank customers
Learning Latent Representations of Bank Customers
24
25
Predicting bankruptcy –
evaluating the performance of various methods
https://towardsdatascience.com/predicting-bankruptcy-f4611afe8d2c
Classifiers tried
1. Logistic Regression
2. Perceptron as a classifier
3. Deep Neural Network Classifiers (with different size and
depth)
4. Fischer Linear Discriminant Analysis
5. K Nearest Neighbor Classifier (with different values of k)
6. Naive Bayes Classifier
7. Decision Tree (with different bucket size thresholds)
8. Bagged Decision Trees
9. Random Forest (with different tree sizes)
10. Gradient Boosting
11. Support Vector Machines (with different kernels)
Random Forest
kNN
26https://towardsdatascience.com/predicting-bankruptcy-f4611afe8d2c
Model comparison
Predicting bankruptcy –
evaluating the performance of various methods
https://arxiv.org/pdf/1812.10389.pdf
Production forecasting
Schematic diagram for multi-step-ahead predictions
https://arxiv.org/pdf/1812.10389.pdf
Production forecasting
https://arxiv.org/pdf/1809.00542v2.pdf
Predicting thermal power
consumption of the Mars Express Spacecraft
Distribution of different feature groups in the feature rankings produced by (a) G-RF, (b) L-RF and (c) XGB
ensembles, for the 33 power lines
https://arxiv.org/pdf/1901.03407v1.pdf
Deep Learning based Anomaly detection
Anomaly detection approach within the context of an X-ray security screening problem
GANomaly: Semi-Supervised Anomaly Detection via
Adversarial Training; 2018
https://arxiv.org/pdf/1805.06725.pdf
AAD: Adaptive Anomaly Detection through traffic surveillance videos
Pixel movement across the frame after ∆t
PASCAL Visual Object Classes
https://arxiv.org/pdf/1808.10044.pdf
Video Anomaly Detection
https://arxiv.org/pdf/1805.11223.pdf
Variational autoencoder
RADS: Real-time Anomaly Detection System for Cloud Data Centers
https://arxiv.org/pdf/1811.04481.pdf
15 Million Battery Voltages and Current
Example: Anomaly Detection of Sensor Data Using
Distance-Based Failure Analysis
Sensor data from the same 15m Batteries
Can you find the anomaly?
Time series
http://www.unofficialgoogledatascience.com/2017/04/our-quest-for-robust-time-series.html
Time series ensemble
• Bass Diffusion Model
• Theta Model,
• Logistic models,
• Bayesian Structural Time Series
• STL (Seasonal-Trend Decomposition Procedure Based on Loess)
• Holt-Winters and other Exponential Smoothing models,
• Seasonal and other other ARIMA-based models,
• Year-over-Year growth models,
• custom models, and
• more
The demand data over the 2010-2015 timeframe
Combining Multiple Methods To Improve Time Series Prediction
Step 1Step 2
Step 3
The estimated trend (Hodrick-Prescott Filter)
• Trend (the increase or decrease in the
series over a period of time),
• Seasonality (the fluctuation that occurs
within the series over each week, each
month, etc.)
• Residuals (the data point that falls outside
of the expected data range)
Multi-seasonality (Loess method)
Step 4
Step 5
(after Elastic Net Regression and Fourier
transformation)
https://labs.eleks.com/2016/10/combined-different-methods-create-advanced-time-series-prediction.html
Turkish Electricity data
http://www.unofficialgoogledatascience.com/2017/04/our-quest-for-robust-time-series.html
Ensemble
Modeling and Forecasting
Vehicle Fleet Maintenance
https://arxiv.org/pdf/1710.06839.pdf
Vehicles
Maintenance
(a) 3-mode data tensor; (b) the same tensor as a stacked series of frontal slices, or
arrays; (c) an example single frontal slice of a vehicle data tensor used in this analysis
(each entry corresponds to the count of a specific job type for a vehicle at a fixed time)
Vehicle Fleet Maintenance
PARAFAC 3-way plot of absolute-time analysis. High factor weights in
the top panel are for 2014 Terrastar Horton vehicles, an ambulance.
The bottom two panels show systems (Body, Cab/Sheet Metal,
Engine and Motor, and Preventive Maintenance Service) and time
frames where this maintenance most often occurs.
https://arxiv.org/pdf/1710.06839.pdf
PARAFAC 3-way plot of vehicle lifetime analysis revealing a simple
pattern common to almost all vehicles, as demonstrated by the
consistent loading across the vehicle factor (top panel):
tires/tubes/valves/liners replacement during the second year of
lifetime, with few repairs to this system either before or after.
Vehicle Fleet Maintenance
https://arxiv.org/pdf/1710.06839.pdf
PARAFAC 3-way plot of vehicle lifetime analysis showing the
2012 Freightliner M2112V, a Department of Solid Waste
garbage truck. This plot reveals a strong pattern of increased
maintenance in years 2-4 after purchase, focusing on a variety
of technical systems: hydraulics, lighting, gauges and warning
devices, and cooling systems.
PARAFAC 3-way plot of absolute-time analysis. This plot demonstrates
strong and specific maintenance patterns for the 2015 Smeal SST
Pumper fire truck. It shows extensive and specific repair to the engine
systems with little other maintenance, from late 2015 through 2016.
• Local Outlier Factor (LOF)
• Connectivity-Based Outlier Factor
(COF)
• Influenced Outlierness (INFLO)
• Local Outlier Probability (LoOP)
• Local Correlation Integral (LOCI)
• Approximate Local Correlation
Integral (aLOCI)
• Cluster-Based Local Outlier Factor
(CBLOF/ uCBLOF)
• Local Density Cluster-based
Outlier Factor (LDCOF)
• Clustering-based Multivariate
Gaussian Outlier Score (CMGOS)
• Histogram-based Outlier Score
(HBOS)
• One-Class Support Vector
Machine
• Robust Principal Component
Analysis (rPCA)
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0152173
Nearest-Neighbor based algorithms on the breast-cancer dataset Clustering-based anomaly detection algorithms
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0152173
Recommendations for technique selection
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0152173
https://arxiv.org/pdf/1901.03407v1.pdf
Deep Anomaly Detection techniques
*DAD = Deep Anomaly Detection
*
Autoencoder architectures for anomaly detection
Other DAD Anomaly detection models
• Transfer Learning based anomaly
detection
• Zero Shot learning based anomaly
detection
• Ensemble based anomaly detection
• Clustering based anomaly detection
• Deep Reinforcement Learning (DRL)
based anomaly detection
https://arxiv.org/pdf/1901.03407v1.pdf
SDAE: Stacked Denoising Autoencoder, DAE : Denoising Autoencoders
GRU: Gated Recurrent Unit, CNN: Convolutional Neural Networks
LSTM: Long Short Term Memory, AE: Autoencoders
CAE: Convolutional Autoencoders
• Classification - predicts a failure in next n-steps.
• Logistic Regression
• Perceptron as a classifier
• Deep Neural Network Classifiers (with different size
and depth)
• Fischer Linear Discriminant Analysis
• K Nearest Neighbor Classifier (with different values
of k)
• Naive Bayes Classifier
• Decision Tree (with different bucket size
thresholds)
• Bagged Decision Trees
• Random Forest (with different tree sizes)
• Gradient Boosting
• Support Vector Machines (with different kernels)
• Regression - predicts how much time is left before the
next failure called Remaining Useful Life .
ML Techniques for Predictive Maintenance
• Supervised Anomaly Detection: fully labeled
training and test data sets.
• Comments: Decision trees like C4.5
cannot deal well with unbalanced data,
whereas SVM or ANNs should perform
better.
• Semi-supervised Anomaly Detection: training
data only consists of normal data without any
anomalies.
• Comments: One-class SVMs and
autoencoders. Density modeling models
like Gaussian Mixture Models (many
variants exist), Kernel Density Estimation
• Unsupervised Anomaly Detection: no labels.
• Comments: distances or densities are
used to give an estimation what is normal
and what is an outlier.
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0152173#pone.0152173.ref033
https://towardsdatascience.com/predicting-bankruptcy-f4611afe8d2c
https://arxiv.org/pdf/1901.03407v1.pdf
Deep learning methods for Intrusion Detection
Linear regression:
• Relies on the normal, heteroscedasticity and other assumptions,
• does not capture highly non-linear, chaotic patterns.
• Prone to over-fitting.
• Parameters difficult to interpret.
• Very unstable when independent variables are highly correlated.
• Fixes: variable reduction, apply a transformation to your variables,
use constrained regression (e.g. ridge or Lasso regression)
Decision trees:
• Very large decision trees are very unstable and impossible to
interpret, and
• prone to over-fitting.
• Fix: combine multiple small decision trees together instead of using
a large decision tree.
Naive Bayes:
• Used e.g. in fraud and spam detection, and for scoring. Assumes
that variables are independent, if not it will fail miserably. In the
context of fraud or spam detection, variables (sometimes called
rules) are highly correlated.
• Fix: group variables into independent clusters of variables (in each
cluster, variables are highly correlated).
• Apply naive Bayes to the clusters. Or use data reduction techniques.
K-means clustering:
• Used for clustering, tends to produce circular clusters.
• Does not work well with data points that are not a mixture of
Gaussian distributions.
Neural networks:
• Difficult to interpret, unstable, subject to over-fitting.
Maximum Likelihood estimation:
• Requires your data to fit with a prespecified probabilistic
distribution. Not data-driven. In many cases the pre-specified
Gaussian distribution is a terrible fit for your data.
Density estimation in high dimensions:
• Subject to what is referred to as the curse of dimensionality.
Fix: use (non parametric) kernel density estimators with
adaptive bandwidths.
Linear discriminant analysis (LDA):
• Used for supervised clustering. Bad technique because it
assumes that clusters do not overlap, and are well
separated by hyper-planes. In practice, they never do. Use
density estimation techniques instead.
Critique of predictive techniques
Critique of predictive techniques
https://www.analyticbridge.datasciencecentral.com/profiles/blogs/the-8-worst-predictive-modeling-techniques
Random Forests
Pros
• One of the most accurate learning algorithms available. For
many data sets, it produces a highly accurate classifier.
• Runs efficiently on large databases.
• Handles thousands of input variables without variable deletion.
• Gives estimates of what variables are important in the
classification.
• Generates an internal unbiased estimate of the generalization
error as the forest building progresses.
• Has an effective method for estimating missing data and
maintains accuracy when a large proportion of the data are
missing.
• Has methods for balancing error in class population unbalanced
data sets.
• Prototypes are computed that give information about the relation
between the variables and the classification.
• The capabilities of the above can be extended to unlabeled data,
leading to unsupervised clustering, data views and outlier
detection.
Random Forests
Cons
• Random forests have been observed to overfit for
some datasets with noisy classification/regression
tasks.
• Unlike decision trees, the classifications made by
random forests are difficult for humans to interpret.
• For data including categorical variables with different
number of levels, random forests are biased in favor
of those attributes with more levels. Therefore, the
variable importance scores from random forest are
not reliable for this type of data. Methods such as
partial permutations were used to solve the problem.
• If the data contain groups of correlated features of
similar relevance for the output, then smaller groups
are favored over larger groups.
• The right technique is dependent on the use case and data
• K-NN is the best in many global use cases (especially if high-dimensionality problems of > 400
dimensions, the best k value is <5)
• LoF is preferred for many local use cases
• Don’t use local anomaly detection algorithms, such as LOF, COF, INFLO and LoOP on datasets
containing global anomalies (Note: Global anomaly detection algos perform OK for local
anomalies)
• Nearest-neighbor based algorithms perform better in most cases when compared to clustering
algorithms (but pick the right k value) (pick these if computing time is not an issue)
• clustering-based algorithms have a lower computation time (hence use for real-time or near real-
time use cases)
• uCBLOF algorithm is better among clustering-based algorithms
• ARIMA perfomed better for a long time, till ANNs came
• When not sure, use ANN
Recommendations
Thank you
SK Reddy
skreddy99
skreddy99

More Related Content

Similar to Finding the right Machine Learning method for predictive modeling

This article appeared in a journal published by Elsevier. The .docx
This article appeared in a journal published by Elsevier. The .docxThis article appeared in a journal published by Elsevier. The .docx
This article appeared in a journal published by Elsevier. The .docxhowardh5
 
Using Machine Learning to Quantify the Impact of Heterogeneous Data on Transf...
Using Machine Learning to Quantify the Impact of Heterogeneous Data on Transf...Using Machine Learning to Quantify the Impact of Heterogeneous Data on Transf...
Using Machine Learning to Quantify the Impact of Heterogeneous Data on Transf...Power System Operation
 
Real-Time Simulation for MBSE of Synchrophasor Systems
Real-Time Simulation for MBSE of Synchrophasor SystemsReal-Time Simulation for MBSE of Synchrophasor Systems
Real-Time Simulation for MBSE of Synchrophasor SystemsLuigi Vanfretti
 
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...IRJET Journal
 
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RSvm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RIRJET Journal
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Industrial IoT to Predictive Analytics: A Reverse Engineering Approach from S...
Industrial IoT to Predictive Analytics: A Reverse Engineering Approach from S...Industrial IoT to Predictive Analytics: A Reverse Engineering Approach from S...
Industrial IoT to Predictive Analytics: A Reverse Engineering Approach from S...Lokukaluge Prasad Perera
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataDatamining Tools
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataDataminingTools Inc
 
"How to document your decisions", Dmytro Ovcharenko
"How to document your decisions", Dmytro Ovcharenko "How to document your decisions", Dmytro Ovcharenko
"How to document your decisions", Dmytro Ovcharenko Fwdays
 
Machine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and ApplicationsMachine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and ApplicationsQuantUniversity
 
Towards a Better Comprehensibility of Web Applications: Lessons Learned from ...
Towards a Better Comprehensibility of Web Applications: Lessons Learned from ...Towards a Better Comprehensibility of Web Applications: Lessons Learned from ...
Towards a Better Comprehensibility of Web Applications: Lessons Learned from ...Porfirio Tramontana
 
Scale Container Operations with AIOps
Scale Container Operations with AIOpsScale Container Operations with AIOps
Scale Container Operations with AIOpsTimothy Chen
 
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Martin Pinzger
 
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...IEEEFINALSEMSTUDENTPROJECTS
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...IEEEGLOBALSOFTSTUDENTPROJECTS
 
Application of Lotka-Volterra model to analyse Cloud behavior and optimise re...
Application of Lotka-Volterra model to analyse Cloud behavior and optimise re...Application of Lotka-Volterra model to analyse Cloud behavior and optimise re...
Application of Lotka-Volterra model to analyse Cloud behavior and optimise re...IJSRP Journal
 

Similar to Finding the right Machine Learning method for predictive modeling (20)

This article appeared in a journal published by Elsevier. The .docx
This article appeared in a journal published by Elsevier. The .docxThis article appeared in a journal published by Elsevier. The .docx
This article appeared in a journal published by Elsevier. The .docx
 
Using Machine Learning to Quantify the Impact of Heterogeneous Data on Transf...
Using Machine Learning to Quantify the Impact of Heterogeneous Data on Transf...Using Machine Learning to Quantify the Impact of Heterogeneous Data on Transf...
Using Machine Learning to Quantify the Impact of Heterogeneous Data on Transf...
 
Real-Time Simulation for MBSE of Synchrophasor Systems
Real-Time Simulation for MBSE of Synchrophasor SystemsReal-Time Simulation for MBSE of Synchrophasor Systems
Real-Time Simulation for MBSE of Synchrophasor Systems
 
Supply chain network design
Supply chain network designSupply chain network design
Supply chain network design
 
Supply chain network modelling
Supply chain network modellingSupply chain network modelling
Supply chain network modelling
 
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...
A Host Selection Algorithm for Dynamic Container Consolidation in Cloud Data ...
 
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RSvm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
 
F1083644
F1083644F1083644
F1083644
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Industrial IoT to Predictive Analytics: A Reverse Engineering Approach from S...
Industrial IoT to Predictive Analytics: A Reverse Engineering Approach from S...Industrial IoT to Predictive Analytics: A Reverse Engineering Approach from S...
Industrial IoT to Predictive Analytics: A Reverse Engineering Approach from S...
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
"How to document your decisions", Dmytro Ovcharenko
"How to document your decisions", Dmytro Ovcharenko "How to document your decisions", Dmytro Ovcharenko
"How to document your decisions", Dmytro Ovcharenko
 
Machine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and ApplicationsMachine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and Applications
 
Towards a Better Comprehensibility of Web Applications: Lessons Learned from ...
Towards a Better Comprehensibility of Web Applications: Lessons Learned from ...Towards a Better Comprehensibility of Web Applications: Lessons Learned from ...
Towards a Better Comprehensibility of Web Applications: Lessons Learned from ...
 
Scale Container Operations with AIOps
Scale Container Operations with AIOpsScale Container Operations with AIOps
Scale Container Operations with AIOps
 
Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)Populating a Release History Database (ICSM 2013 MIP)
Populating a Release History Database (ICSM 2013 MIP)
 
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
 
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
 
Application of Lotka-Volterra model to analyse Cloud behavior and optimise re...
Application of Lotka-Volterra model to analyse Cloud behavior and optimise re...Application of Lotka-Volterra model to analyse Cloud behavior and optimise re...
Application of Lotka-Volterra model to analyse Cloud behavior and optimise re...
 

More from SK Reddy

AI to open more doors in Personal Finance Management (PFM)
AI to open more doors in Personal Finance Management (PFM)AI to open more doors in Personal Finance Management (PFM)
AI to open more doors in Personal Finance Management (PFM)SK Reddy
 
Making sense from 3D Point Clouds
Making sense from 3D Point Clouds Making sense from 3D Point Clouds
Making sense from 3D Point Clouds SK Reddy
 
The wonders of big data processing using Deep Learning in Brazil Live 15 sep ...
The wonders of big data processing using Deep Learning in Brazil Live 15 sep ...The wonders of big data processing using Deep Learning in Brazil Live 15 sep ...
The wonders of big data processing using Deep Learning in Brazil Live 15 sep ...SK Reddy
 
How organizations can get ready for ai
How organizations can get ready for aiHow organizations can get ready for ai
How organizations can get ready for aiSK Reddy
 
Practical implementation of AI solutions for Smart Cities
Practical implementation of AI solutions for Smart Cities Practical implementation of AI solutions for Smart Cities
Practical implementation of AI solutions for Smart Cities SK Reddy
 
Recommender systems
Recommender systems Recommender systems
Recommender systems SK Reddy
 
How recommender systems work
How recommender systems work How recommender systems work
How recommender systems work SK Reddy
 
In search of better deep Recommender Systems
In search of better deep Recommender Systems In search of better deep Recommender Systems
In search of better deep Recommender Systems SK Reddy
 
Deep Learning (DL) Solutions for Smart City use cases
Deep Learning (DL) Solutions for Smart City use casesDeep Learning (DL) Solutions for Smart City use cases
Deep Learning (DL) Solutions for Smart City use casesSK Reddy
 
AI driven innovation
AI driven innovation AI driven innovation
AI driven innovation SK Reddy
 
How AI is revolutionizing the world
How AI is revolutionizing the worldHow AI is revolutionizing the world
How AI is revolutionizing the worldSK Reddy
 
How NLP is revolutionizing marketing and communications
How NLP is revolutionizing marketing and communications How NLP is revolutionizing marketing and communications
How NLP is revolutionizing marketing and communications SK Reddy
 
AI in Smart Cities
AI in Smart Cities AI in Smart Cities
AI in Smart Cities SK Reddy
 
SF ACM Bay chapter meetup on NLP will revolutionize the world
SF ACM Bay chapter meetup on NLP will revolutionize the world SF ACM Bay chapter meetup on NLP will revolutionize the world
SF ACM Bay chapter meetup on NLP will revolutionize the world SK Reddy
 
The Magic of Image processing using Neural Networks
The Magic of Image processing using Neural Networks The Magic of Image processing using Neural Networks
The Magic of Image processing using Neural Networks SK Reddy
 
The Magic of Text Summarization using Deep Networks
The Magic of Text Summarization using Deep NetworksThe Magic of Text Summarization using Deep Networks
The Magic of Text Summarization using Deep NetworksSK Reddy
 
Natural Language Processing Tech workshop
Natural Language Processing Tech workshop Natural Language Processing Tech workshop
Natural Language Processing Tech workshop SK Reddy
 
The magic of machine translation 20 july 2017
The magic of machine translation 20 july 2017The magic of machine translation 20 july 2017
The magic of machine translation 20 july 2017SK Reddy
 
Summarization and Abstraction using deep learning
Summarization and Abstraction using deep learningSummarization and Abstraction using deep learning
Summarization and Abstraction using deep learningSK Reddy
 
Question Answering in NLP on Mahabharata 24 may 2017
Question Answering in NLP on Mahabharata 24 may 2017Question Answering in NLP on Mahabharata 24 may 2017
Question Answering in NLP on Mahabharata 24 may 2017SK Reddy
 

More from SK Reddy (20)

AI to open more doors in Personal Finance Management (PFM)
AI to open more doors in Personal Finance Management (PFM)AI to open more doors in Personal Finance Management (PFM)
AI to open more doors in Personal Finance Management (PFM)
 
Making sense from 3D Point Clouds
Making sense from 3D Point Clouds Making sense from 3D Point Clouds
Making sense from 3D Point Clouds
 
The wonders of big data processing using Deep Learning in Brazil Live 15 sep ...
The wonders of big data processing using Deep Learning in Brazil Live 15 sep ...The wonders of big data processing using Deep Learning in Brazil Live 15 sep ...
The wonders of big data processing using Deep Learning in Brazil Live 15 sep ...
 
How organizations can get ready for ai
How organizations can get ready for aiHow organizations can get ready for ai
How organizations can get ready for ai
 
Practical implementation of AI solutions for Smart Cities
Practical implementation of AI solutions for Smart Cities Practical implementation of AI solutions for Smart Cities
Practical implementation of AI solutions for Smart Cities
 
Recommender systems
Recommender systems Recommender systems
Recommender systems
 
How recommender systems work
How recommender systems work How recommender systems work
How recommender systems work
 
In search of better deep Recommender Systems
In search of better deep Recommender Systems In search of better deep Recommender Systems
In search of better deep Recommender Systems
 
Deep Learning (DL) Solutions for Smart City use cases
Deep Learning (DL) Solutions for Smart City use casesDeep Learning (DL) Solutions for Smart City use cases
Deep Learning (DL) Solutions for Smart City use cases
 
AI driven innovation
AI driven innovation AI driven innovation
AI driven innovation
 
How AI is revolutionizing the world
How AI is revolutionizing the worldHow AI is revolutionizing the world
How AI is revolutionizing the world
 
How NLP is revolutionizing marketing and communications
How NLP is revolutionizing marketing and communications How NLP is revolutionizing marketing and communications
How NLP is revolutionizing marketing and communications
 
AI in Smart Cities
AI in Smart Cities AI in Smart Cities
AI in Smart Cities
 
SF ACM Bay chapter meetup on NLP will revolutionize the world
SF ACM Bay chapter meetup on NLP will revolutionize the world SF ACM Bay chapter meetup on NLP will revolutionize the world
SF ACM Bay chapter meetup on NLP will revolutionize the world
 
The Magic of Image processing using Neural Networks
The Magic of Image processing using Neural Networks The Magic of Image processing using Neural Networks
The Magic of Image processing using Neural Networks
 
The Magic of Text Summarization using Deep Networks
The Magic of Text Summarization using Deep NetworksThe Magic of Text Summarization using Deep Networks
The Magic of Text Summarization using Deep Networks
 
Natural Language Processing Tech workshop
Natural Language Processing Tech workshop Natural Language Processing Tech workshop
Natural Language Processing Tech workshop
 
The magic of machine translation 20 july 2017
The magic of machine translation 20 july 2017The magic of machine translation 20 july 2017
The magic of machine translation 20 july 2017
 
Summarization and Abstraction using deep learning
Summarization and Abstraction using deep learningSummarization and Abstraction using deep learning
Summarization and Abstraction using deep learning
 
Question Answering in NLP on Mahabharata 24 may 2017
Question Answering in NLP on Mahabharata 24 may 2017Question Answering in NLP on Mahabharata 24 may 2017
Question Answering in NLP on Mahabharata 24 may 2017
 

Recently uploaded

Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 

Recently uploaded (20)

Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 

Finding the right Machine Learning method for predictive modeling

  • 1. Finding the right ML technique for Predictive Modeling SK Reddy skreddy99 skreddy99
  • 2. Basics of anomaly detection https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0152173 Novelty vs Anomaly
  • 3. Credit Card transactions Contextual anomaly Fraud detection across various application domains
  • 9.
  • 10.
  • 12. Predicting the real-time availability of 200 million grocery items - Instacart https://tech.instacart.com/predicting-real-time-availability-of-200-million-grocery-items-in-us-canada-stores-61f43a16eafe The problem: understanding “not founds” Routes followed by shoppers in SF, Austin, Boston and Miami https://tech.instacart.com/space-time-and-groceries-a315925acf3a
  • 14. Feature engineering: item level features, time-based features, and categorical features https://tech.instacart.com/predicting-real-time-availability-of-200-million-grocery-items-in-us-canada-stores-61f43a16eafe https://tech.instacart.com/3-million-instacart-orders-open-sourced-d40d29ead6f2 Predicting the real-time availability of 200 million grocery items - Instacart
  • 15. Leveraging Elastic Demand for Forecasting https://tech.instacart.com/leveraging-elastic-demand-for-forecasting-6278b45f805f; https://arxiv.org/pdf/1809.03018.pdf
  • 16. Goal: shifting the right amount of elastic demand to minimize the difference between the new demand series Input: historical demand series, and the amount of elastic demand Output: shifted demand series Leveraging Elastic Demand for Forecasting https://tech.instacart.com/leveraging-elastic-demand-for-forecasting-6278b45f805f; https://arxiv.org/pdf/1809.03018.pdf
  • 17. Leveraging Elastic Demand for Forecasting https://tech.instacart.com/leveraging-elastic-demand-for-forecasting-6278b45f805f; https://arxiv.org/pdf/1809.03018.pdf Demand Variance
  • 18. Leveraging Elastic Demand for Forecasting A larger amount of elastic demand leads to a smaller variance. However, the first 10% elastic demand produces the largest variance reduction. https://tech.instacart.com/leveraging-elastic-demand-for-forecasting-6278b45f805f; https://arxiv.org/pdf/1809.03018.pdf
  • 19. Predicting creditworthiness in retail banking with limited scoring data https://www.sciencedirect.com/science/article/pii/S0950705116300156?via%3Dihub 19
  • 20. Credit Card Fraud Detection http://isyou.info/inpra/papers/inpra-v5n4-02.pdf Characteristic of the mobile payment dataset F-measure after classification 20
  • 21. “Nowcasting” Recession https://arxiv.org/pdf/1903.03202.pdf SVM: • Linear (classes separable with a linear hyperplane) • Non-linear Features 1. Monthly log difference in nonfarm payrolls 2. Log difference in average monthly price of the S&P 500 3. Production index from Manufacturing ISM Report (info about the goods market) 4. 10-year Treasury yield minus the federal funds rate SVM Dual Parameter and NBER Recessions 21
  • 22. 22https://arxiv.org/pdf/1804.10796.pdf Handling Uncertainty in Social Lending Credit Risk Prediction The results of combining the three classifiers through a Choquet fuzzy integral approach compared to the performance of each base classifiers alone
  • 23. Deep Autoencoders https://arxiv.org/pdf/1903.06580.pdf Variational Autoencoder (VAE) A standard Autoencoder 23 A variational Autoencoder
  • 24. https://arxiv.org/pdf/1903.06580.pdf Latent representation of bank customers Learning Latent Representations of Bank Customers 24
  • 25. 25 Predicting bankruptcy – evaluating the performance of various methods https://towardsdatascience.com/predicting-bankruptcy-f4611afe8d2c Classifiers tried 1. Logistic Regression 2. Perceptron as a classifier 3. Deep Neural Network Classifiers (with different size and depth) 4. Fischer Linear Discriminant Analysis 5. K Nearest Neighbor Classifier (with different values of k) 6. Naive Bayes Classifier 7. Decision Tree (with different bucket size thresholds) 8. Bagged Decision Trees 9. Random Forest (with different tree sizes) 10. Gradient Boosting 11. Support Vector Machines (with different kernels) Random Forest kNN
  • 29. https://arxiv.org/pdf/1809.00542v2.pdf Predicting thermal power consumption of the Mars Express Spacecraft Distribution of different feature groups in the feature rankings produced by (a) G-RF, (b) L-RF and (c) XGB ensembles, for the 33 power lines
  • 31. Anomaly detection approach within the context of an X-ray security screening problem GANomaly: Semi-Supervised Anomaly Detection via Adversarial Training; 2018 https://arxiv.org/pdf/1805.06725.pdf
  • 32. AAD: Adaptive Anomaly Detection through traffic surveillance videos Pixel movement across the frame after ∆t PASCAL Visual Object Classes https://arxiv.org/pdf/1808.10044.pdf
  • 34. RADS: Real-time Anomaly Detection System for Cloud Data Centers https://arxiv.org/pdf/1811.04481.pdf
  • 35. 15 Million Battery Voltages and Current Example: Anomaly Detection of Sensor Data Using Distance-Based Failure Analysis Sensor data from the same 15m Batteries Can you find the anomaly?
  • 36. Time series http://www.unofficialgoogledatascience.com/2017/04/our-quest-for-robust-time-series.html Time series ensemble • Bass Diffusion Model • Theta Model, • Logistic models, • Bayesian Structural Time Series • STL (Seasonal-Trend Decomposition Procedure Based on Loess) • Holt-Winters and other Exponential Smoothing models, • Seasonal and other other ARIMA-based models, • Year-over-Year growth models, • custom models, and • more
  • 37. The demand data over the 2010-2015 timeframe Combining Multiple Methods To Improve Time Series Prediction Step 1Step 2 Step 3 The estimated trend (Hodrick-Prescott Filter) • Trend (the increase or decrease in the series over a period of time), • Seasonality (the fluctuation that occurs within the series over each week, each month, etc.) • Residuals (the data point that falls outside of the expected data range) Multi-seasonality (Loess method) Step 4 Step 5 (after Elastic Net Regression and Fourier transformation) https://labs.eleks.com/2016/10/combined-different-methods-create-advanced-time-series-prediction.html
  • 39. Modeling and Forecasting Vehicle Fleet Maintenance https://arxiv.org/pdf/1710.06839.pdf Vehicles Maintenance (a) 3-mode data tensor; (b) the same tensor as a stacked series of frontal slices, or arrays; (c) an example single frontal slice of a vehicle data tensor used in this analysis (each entry corresponds to the count of a specific job type for a vehicle at a fixed time)
  • 40. Vehicle Fleet Maintenance PARAFAC 3-way plot of absolute-time analysis. High factor weights in the top panel are for 2014 Terrastar Horton vehicles, an ambulance. The bottom two panels show systems (Body, Cab/Sheet Metal, Engine and Motor, and Preventive Maintenance Service) and time frames where this maintenance most often occurs. https://arxiv.org/pdf/1710.06839.pdf PARAFAC 3-way plot of vehicle lifetime analysis revealing a simple pattern common to almost all vehicles, as demonstrated by the consistent loading across the vehicle factor (top panel): tires/tubes/valves/liners replacement during the second year of lifetime, with few repairs to this system either before or after.
  • 41. Vehicle Fleet Maintenance https://arxiv.org/pdf/1710.06839.pdf PARAFAC 3-way plot of vehicle lifetime analysis showing the 2012 Freightliner M2112V, a Department of Solid Waste garbage truck. This plot reveals a strong pattern of increased maintenance in years 2-4 after purchase, focusing on a variety of technical systems: hydraulics, lighting, gauges and warning devices, and cooling systems. PARAFAC 3-way plot of absolute-time analysis. This plot demonstrates strong and specific maintenance patterns for the 2015 Smeal SST Pumper fire truck. It shows extensive and specific repair to the engine systems with little other maintenance, from late 2015 through 2016.
  • 42. • Local Outlier Factor (LOF) • Connectivity-Based Outlier Factor (COF) • Influenced Outlierness (INFLO) • Local Outlier Probability (LoOP) • Local Correlation Integral (LOCI) • Approximate Local Correlation Integral (aLOCI) • Cluster-Based Local Outlier Factor (CBLOF/ uCBLOF) • Local Density Cluster-based Outlier Factor (LDCOF) • Clustering-based Multivariate Gaussian Outlier Score (CMGOS) • Histogram-based Outlier Score (HBOS) • One-Class Support Vector Machine • Robust Principal Component Analysis (rPCA) https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0152173
  • 43. Nearest-Neighbor based algorithms on the breast-cancer dataset Clustering-based anomaly detection algorithms https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0152173
  • 44. Recommendations for technique selection https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0152173
  • 45. https://arxiv.org/pdf/1901.03407v1.pdf Deep Anomaly Detection techniques *DAD = Deep Anomaly Detection *
  • 46. Autoencoder architectures for anomaly detection Other DAD Anomaly detection models • Transfer Learning based anomaly detection • Zero Shot learning based anomaly detection • Ensemble based anomaly detection • Clustering based anomaly detection • Deep Reinforcement Learning (DRL) based anomaly detection https://arxiv.org/pdf/1901.03407v1.pdf SDAE: Stacked Denoising Autoencoder, DAE : Denoising Autoencoders GRU: Gated Recurrent Unit, CNN: Convolutional Neural Networks LSTM: Long Short Term Memory, AE: Autoencoders CAE: Convolutional Autoencoders
  • 47. • Classification - predicts a failure in next n-steps. • Logistic Regression • Perceptron as a classifier • Deep Neural Network Classifiers (with different size and depth) • Fischer Linear Discriminant Analysis • K Nearest Neighbor Classifier (with different values of k) • Naive Bayes Classifier • Decision Tree (with different bucket size thresholds) • Bagged Decision Trees • Random Forest (with different tree sizes) • Gradient Boosting • Support Vector Machines (with different kernels) • Regression - predicts how much time is left before the next failure called Remaining Useful Life . ML Techniques for Predictive Maintenance • Supervised Anomaly Detection: fully labeled training and test data sets. • Comments: Decision trees like C4.5 cannot deal well with unbalanced data, whereas SVM or ANNs should perform better. • Semi-supervised Anomaly Detection: training data only consists of normal data without any anomalies. • Comments: One-class SVMs and autoencoders. Density modeling models like Gaussian Mixture Models (many variants exist), Kernel Density Estimation • Unsupervised Anomaly Detection: no labels. • Comments: distances or densities are used to give an estimation what is normal and what is an outlier. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0152173#pone.0152173.ref033 https://towardsdatascience.com/predicting-bankruptcy-f4611afe8d2c
  • 49. Linear regression: • Relies on the normal, heteroscedasticity and other assumptions, • does not capture highly non-linear, chaotic patterns. • Prone to over-fitting. • Parameters difficult to interpret. • Very unstable when independent variables are highly correlated. • Fixes: variable reduction, apply a transformation to your variables, use constrained regression (e.g. ridge or Lasso regression) Decision trees: • Very large decision trees are very unstable and impossible to interpret, and • prone to over-fitting. • Fix: combine multiple small decision trees together instead of using a large decision tree. Naive Bayes: • Used e.g. in fraud and spam detection, and for scoring. Assumes that variables are independent, if not it will fail miserably. In the context of fraud or spam detection, variables (sometimes called rules) are highly correlated. • Fix: group variables into independent clusters of variables (in each cluster, variables are highly correlated). • Apply naive Bayes to the clusters. Or use data reduction techniques. K-means clustering: • Used for clustering, tends to produce circular clusters. • Does not work well with data points that are not a mixture of Gaussian distributions. Neural networks: • Difficult to interpret, unstable, subject to over-fitting. Maximum Likelihood estimation: • Requires your data to fit with a prespecified probabilistic distribution. Not data-driven. In many cases the pre-specified Gaussian distribution is a terrible fit for your data. Density estimation in high dimensions: • Subject to what is referred to as the curse of dimensionality. Fix: use (non parametric) kernel density estimators with adaptive bandwidths. Linear discriminant analysis (LDA): • Used for supervised clustering. Bad technique because it assumes that clusters do not overlap, and are well separated by hyper-planes. In practice, they never do. Use density estimation techniques instead. Critique of predictive techniques
  • 50. Critique of predictive techniques https://www.analyticbridge.datasciencecentral.com/profiles/blogs/the-8-worst-predictive-modeling-techniques Random Forests Pros • One of the most accurate learning algorithms available. For many data sets, it produces a highly accurate classifier. • Runs efficiently on large databases. • Handles thousands of input variables without variable deletion. • Gives estimates of what variables are important in the classification. • Generates an internal unbiased estimate of the generalization error as the forest building progresses. • Has an effective method for estimating missing data and maintains accuracy when a large proportion of the data are missing. • Has methods for balancing error in class population unbalanced data sets. • Prototypes are computed that give information about the relation between the variables and the classification. • The capabilities of the above can be extended to unlabeled data, leading to unsupervised clustering, data views and outlier detection. Random Forests Cons • Random forests have been observed to overfit for some datasets with noisy classification/regression tasks. • Unlike decision trees, the classifications made by random forests are difficult for humans to interpret. • For data including categorical variables with different number of levels, random forests are biased in favor of those attributes with more levels. Therefore, the variable importance scores from random forest are not reliable for this type of data. Methods such as partial permutations were used to solve the problem. • If the data contain groups of correlated features of similar relevance for the output, then smaller groups are favored over larger groups.
  • 51. • The right technique is dependent on the use case and data • K-NN is the best in many global use cases (especially if high-dimensionality problems of > 400 dimensions, the best k value is <5) • LoF is preferred for many local use cases • Don’t use local anomaly detection algorithms, such as LOF, COF, INFLO and LoOP on datasets containing global anomalies (Note: Global anomaly detection algos perform OK for local anomalies) • Nearest-neighbor based algorithms perform better in most cases when compared to clustering algorithms (but pick the right k value) (pick these if computing time is not an issue) • clustering-based algorithms have a lower computation time (hence use for real-time or near real- time use cases) • uCBLOF algorithm is better among clustering-based algorithms • ARIMA perfomed better for a long time, till ANNs came • When not sure, use ANN Recommendations