SlideShare a Scribd company logo
Time series analysis and
prediction in the deep
learning era
Alberto Arrigoni, PhD
February 2019
Time series: analysis and prediction
What will the future hold?
FuturePast
Now
Time series applications + context
Time series prediction: e.g.
demand/sales forecasting...
Use prediction for anomaly
detection: e.g. manufacturing
settings...
Counterfactual prediction:
e.g. marketing campaigns...
Show ads
Counterfactual
Time series applications + context
Time series prediction: e.g.
demand/sales forecasting...
Use prediction for anomaly
detection: e.g. manufacturing
settings...
Counterfactual prediction:
e.g. marketing campaigns...
Show ads
Counterfactual
Time series prediction methods
(non-comprehensive list)
Classical autoregressive models Bayesian AR models
General machine learning
approaches
Deep learning
t+3
Number of time series (~ thousands)
[the SCALE problem]
Time series are often highly erratic,
intermittent or bursty (...and on highly
different scales)
~ 10 items
2 items
Product A Product B
...
(1)
(2)
Time series prediction and sales forecasting: issues
E.g. retail businesses
Time series belong to a hierarchy
of products/categories
E.g. online retailer selling clothes
Time series prediction and sales forecasting: issues
Now
Nike t-shirts
Clothes (total sales)
T-shirts total sales
~ 100
~ 1000(3)
For new products historical data is
missing (the cold-start problem)
(4)
Adidas t-shirts
Classical autoregressive models
Estimate model order (AIC, BIC)
Fit model parameters
(maximum likelihood)
Autoregressive component
Moving average component
Test residuals for
randomness
De-trending by differencing
Variance stabilization by log
or Box-Cox transformation
Workflow
Classical autoregressive models
THE PROS:
- Good explainability
- Solid theoretical background
- Very explicit model
- A lot of control as it is a manual process
THE CONS:
- Data is seldom stationary: trend,
seasonality, cycles need to modeled as
well
- Computationally intensive (one model for
each time series)
- No information sharing across time series
(apart from Hyndman’s hts approach) *
- Historical data are essential for
forecasting, (no cold-start)
* https://robjhyndman.com/publications/hierarchical/
Tech stack and packages
- Rob Hyndman’s online text:
https://otexts.com/fpp2/
- Infamous auto.arima
package, ets, tbats, garch,
stl...
- Python’s Pyramid
- Aggregate histograms over time scales
- Transform into Fourier space
- Add low/high pass filters as variables
General machine learning approach for ts prediction
Past Yt
t
Autoregressive component
- Can use any number of methods (linear, trees,
neural networks...)
- Turn the time series prediction problem into a
supervised learning problem
- Easily extendable to support multiple input
variables
- Covariates can be easily handled and
transformed through feature engineering
Covariates
E.g. feature engineering
THE PROS:
- Can model non-linear relationships
- Can model the “hierarchical structure” of the
time series through categorical variables
- Support for covariates (predictors) + feature
engineering
- One model is shared among multiple time
series
- Cold-start predictions are possible by
iteratively feeding the predictions back to the
feature space
THE CONS:
- Feature engineering takes time
- Long-term relationships between data points
need to be explicitly modeled
(autoregressive features)
General machine learning approach for ts prediction
Tech stack and packages
- Sklearn, PySpark for feature
engineering, data reduction
Bayesian AR models (Facebook Prophet)
Prophet is a Bayesian GAM (Generalized Additive Model)
Linear trend with
changepoints
Seasonal
component
Holiday-specific
componentt
Sales
1) Detect changepoints in the time
series
2) Fit linear trend parameters (k and
delta)
(piecewise) linear
trends
Growth rate Growth rate
adjustment
**
** An additional ‘offset’ term has been omitted from the formula
* Implemented using STAN
*
Bayesian AR models (Facebook Prophet)
E.g. P = 365 for yearly data
Need to estimate 2N parameters (an
and bn
) using MCMC!
Prophet is a Bayesian GAM (Generalized Additive Model)
Linear trend with
changepoints
Seasonal
component
Holiday-specific
componentt
Sales
THE PROS:
- Uncertainty estimation
- Bayesian changepoint detection
- User-in-the-loop paradigm (Prophet)
- Black-box variational inference is
revolutionizing Bayesian inference
THE CONS:
- Bayesian inference takes time (the “scale”
issue)
- One model for each time series
- No information sharing among series
(unless you specify a hierarchical bayesian
model with shared parameters, but still...)
- Historical data are needed for prediction!
- Performance is often on par* with
autoregressive models
Tech stack and packages
- Python/R clients for Prophet *
- R package for structural bayesian
time series models: Bsts
Bayesian AR models
* Taylor et al., Forecasting at scale* This may open endless discussions. Bottom line: depends on your data :)
Interlude: uncertainty estimation with deep learning
- Uncertainty estimation is a prerogative of Bayesian methods.
- Black box variational inference (ADVI) has sprung renewed interest towards Bayesian
neural networks, but we are not there yet in terms of performance
- A DeepMind paper from NIPS 2017 introduces a simple yet effective way to estimate
predictive uncertainty using Deep Ensembles
For a TensorFlow implementation of this paper: https://arrigonialberto86.github.io/funtime/deep_ensembles.html
“Engineering Uncertainty
Estimation in Neural Networks for
Time Series Prediction at Uber”
https://eng.uber.com/neural-network
s-uncertainty-estimation/
1) 2)
Interlude: Deep Ensembles
Train a deep learning model using a custom
final layer which parametrizes a Gaussian
distribution
Sample x from the Gaussian
distribution using fitted
parameters
Calculate loss to backpropagate the
error (using Gaussian likelihood)
(1)
(3)
(2)
Network output
What the network is learning: different
regions of the x space have different
variances
Generate a synthetic
dataset with different
variances
Interlude: Deep Ensembles
PREDICTION ON
TRAINING DATASET
SYNTHETIC TRAINING
DATASET
Use the network from previous
slide to predict on the training
set to see if it actually detects
variance reduction
Interlude: Deep Ensembles
The authors suggest to train different NNs on the
same data (the whole training set) with random
initialization
Ensemble networks (improve generalization power)
Uniformly weighted mixture model
Predictions for regions outside of
the training dataset show
increasing variance (due to
ensembling)
In addition to ‘distribution’ modeling
and ensembling the authors suggest to
use the fast gradient sign method * to
produce adversarial training example
(Not shown here)
* Goodfellow et al., 2014
Interlude: Deep Ensembles
Custom GaussianLayer
Let’s just do some extra work and define a
custom layer
For a TensorFlow implementation of this paper: https://arrigonialberto86.github.io/funtime/deep_ensembles.html
Interlude: Deep Ensembles
Custom layer returns both
mu and sigma
Build 2 weight matrices + 2
biase terms
DeepAR (Amazon)
Instead of fitting separate models for each time series we create a global model from related time
series to handle widely-varying scales through rescaling and velocity-based sampling.
Differentscales
Probabilities
~1000 time series
Past Future
Covariates
Flunkert et al., 2017
DeepAR (Amazon)
ht-1
ht
ht+1
- Use LSTM interactions in the time series
- As seen with the Deep Ensemble
architecture, we can predict parameters of
distributions at each time point (theta
vector)
- Time series need to be scaled for the
network to learn time-varying dynamics
DeepAR (Amazon)
* Likelihood/loss is customizable: Gaussian/negative binomial for count data + overdispersion
Training Prediction
*
For a commentary + code review: https://arrigonialberto86.github.io/funtime/deepar.html
DeepAR (Amazon)
The mandatory ‘AirPassengers’ prediction example (results shown on training set)
It is given that this is not the use case Amazon had in mind...
DeepAR (Amazon)
- Long-term relationships are handled by
design using LSTMs
- One model is fitted for all the time series
- The hierarchical ts structure and
inter-dependencies are captured by
using covariates (even holidays,
recurrent events etc...)
- The model can be used for cold-start
predictions (using categorical covariates
with ‘descriptive’ product information)
- Hassle-free uncertainty estimation
DeepAR and the AWS ecosystem
AWS SageMaker
Deep State Space (NIPS 2018)*
A state space model or SSM is just like an Hidden Markov Model, except the hidden states are
continuous
Observation (zt
)
update
Latent state (lt
)
update
In normal settings we would need to fit these parameters for each time series
zt-1 zt
zt+1
???
* Rangapuram et al, 2018, Deep State Space Models for Time Series Forecasting
Deep State Space (NIPS 2018)
Training
Prediction
Compute the negative
likelihood, derive the
time-varying SS
parameters using
backpropagation
Use Kalman filtering to
estimate lt
, then
recursively apply the
transition equation and the
observation model to
generate prediction
samples
- Long-term relationships are handled by
design using LSTMs
- One model is fitted for all the time
series
- The hierarchical ts structure and
inter-dependencies are captured by
ad-hoc design and components of the SS
model (even holidays, recurrent events
etc...)
- The model can be used for cold-start
predictions (using categorical covariates
with ‘descriptive’ product information)
Deep State Space (NIPS 2018)
Going forward: Deep factors with GPs *
* Maddix et al., “Deep Factors with Gaussian Processes for Forecasting”, NIPS 2018
The combination of probabilistic graphical models with deep neural networks has been an active
research area recently
Global DNN backbone and local Gaussian Process (GP). The main idea is to represent each
time series as a combination of a global time series and a corresponding local model.
gt
gt
gt
gt
RNN
zit
+ covariates Backpropagation to find RNN
parameters to produce global factors (gt
)
+ GP hyperparameters
M4 forecasting competition winner algo (Uber, 2018)
The winning idea is often the simplest!
Hybrid Exponential Smoothing-Recurrent Neural Networks (ES-RNN) method. It
mixes hand-coded parts like ES formulas with a black-box recurrent neural network
(RNN) forecasting engine.
yt-1
yt
yt+1
Deseasonalized and normalized vector of covariates + previous state
RNN results are now part of a parametric model
Classical
autoregressive
models
Bayesian models
(GAM/structural)
Classical
machine
learning
Deep learning
approaches
Scalability
Info sharing
across ts
Cold-start
predictions
Uncertainty
estimation
Unevenly spaced
time series *
Summary of performance
* DeepAR
Deep Factors
* Chen et al., Neural ordinary differential equations, 2018 / Futoma et al., 2017, Multitask GP + RNN
BACKUP SLIDES
Deep State Space (Amazon)
Level-trend model parametrization:
DeepAR (Amazon)
Step 1 Step 2 Step 3
Training procedure:
- Predict parameters (e.g. mu,
sigma)
- Compute likelihood of the
prediction (can be Gaussian as we
have seen with Deep Ensembles)
*
- Sample next point
* Likelihood/loss is customizable: Gaussian/negative
binomial for count data + overdispersion
Training
Prediction (~ Monte Carlo)

More Related Content

What's hot

Time series forecasting
Time series forecastingTime series forecasting
Time series forecasting
Firas Kastantin
 
Deep ar presentation
Deep ar presentationDeep ar presentation
Deep ar presentation
Cyrus Moazami-Vahid
 
LSTM
LSTMLSTM
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Databricks
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
Khang Pham
 
Time Series Forecasting Project Presentation.
Time Series Forecasting Project  Presentation.Time Series Forecasting Project  Presentation.
Time Series Forecasting Project Presentation.
Anupama Kate
 
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Edureka!
 
Time Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del PraTime Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del Pra
Data Science Milan
 
Arima model (time series)
Arima model (time series)Arima model (time series)
Arima model (time series)
Kumar P
 
LSTM Tutorial
LSTM TutorialLSTM Tutorial
LSTM Tutorial
Ralph Schlosser
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
Shubhmay Potdar
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
Hye-min Ahn
 
Machine Learning Strategies for Time Series Prediction
Machine Learning Strategies for Time Series PredictionMachine Learning Strategies for Time Series Prediction
Machine Learning Strategies for Time Series Prediction
Gianluca Bontempi
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
Yan Xu
 
Lstm
LstmLstm
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
Larry Guo
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
Jun Young Park
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders
Akash Goel
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
Andrii Gakhov
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
ananth
 

What's hot (20)

Time series forecasting
Time series forecastingTime series forecasting
Time series forecasting
 
Deep ar presentation
Deep ar presentationDeep ar presentation
Deep ar presentation
 
LSTM
LSTMLSTM
LSTM
 
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...
 
Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
 
Time Series Forecasting Project Presentation.
Time Series Forecasting Project  Presentation.Time Series Forecasting Project  Presentation.
Time Series Forecasting Project Presentation.
 
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
 
Time Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del PraTime Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del Pra
 
Arima model (time series)
Arima model (time series)Arima model (time series)
Arima model (time series)
 
LSTM Tutorial
LSTM TutorialLSTM Tutorial
LSTM Tutorial
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
Introduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNNIntroduction For seq2seq(sequence to sequence) and RNN
Introduction For seq2seq(sequence to sequence) and RNN
 
Machine Learning Strategies for Time Series Prediction
Machine Learning Strategies for Time Series PredictionMachine Learning Strategies for Time Series Prediction
Machine Learning Strategies for Time Series Prediction
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Lstm
LstmLstm
Lstm
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 

Similar to Time series deep learning

NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
ssuser4b1f48
 
Novel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsNovel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data Streams
IJERA Editor
 
Time Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming PlatformTime Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming Platform
confluent
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
Dr. Mirko Kämpf
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnPrediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Josef A. Habdank
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
Joaquin Vanschoren
 
Time Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and AzureTime Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and Azure
Marco Parenzan
 
2017 nov reflow sbtb
2017 nov reflow sbtb2017 nov reflow sbtb
2017 nov reflow sbtb
mariuseriksen4
 
timeseries cheat sheet with example code for R
timeseries cheat sheet with example code for Rtimeseries cheat sheet with example code for R
timeseries cheat sheet with example code for R
derekjohnson549253
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
tsysglobalsolutions
 
House price prediction
House price predictionHouse price prediction
House price prediction
SabahBegum
 
ML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time Series
Sigmoid
 
Stock Market Prediction Using ANN
Stock Market Prediction Using ANNStock Market Prediction Using ANN
Stock Market Prediction Using ANN
Krishna Mohan Mishra
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
MLReview
 
Josh Patterson MLconf slides
Josh Patterson MLconf slidesJosh Patterson MLconf slides
Josh Patterson MLconf slides
MLconf
 
DLT UNIT-3.docx
DLT  UNIT-3.docxDLT  UNIT-3.docx
DLT UNIT-3.docx
0567Padma
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomate
Anubhav Jain
 
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef Habdank
Spark Summit
 
Time Series Forecasting Using Novel Feature Extraction Algorithm and Multilay...
Time Series Forecasting Using Novel Feature Extraction Algorithm and Multilay...Time Series Forecasting Using Novel Feature Extraction Algorithm and Multilay...
Time Series Forecasting Using Novel Feature Extraction Algorithm and Multilay...
Editor IJCATR
 

Similar to Time series deep learning (20)

NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
 
Novel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data StreamsNovel Ensemble Tree for Fast Prediction on Data Streams
Novel Ensemble Tree for Fast Prediction on Data Streams
 
Time Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming PlatformTime Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming Platform
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnPrediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
Time Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and AzureTime Series Anomaly Detection with .net and Azure
Time Series Anomaly Detection with .net and Azure
 
2017 nov reflow sbtb
2017 nov reflow sbtb2017 nov reflow sbtb
2017 nov reflow sbtb
 
timeseries cheat sheet with example code for R
timeseries cheat sheet with example code for Rtimeseries cheat sheet with example code for R
timeseries cheat sheet with example code for R
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
 
House price prediction
House price predictionHouse price prediction
House price prediction
 
ML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time SeriesML on Big Data: Real-Time Analysis on Time Series
ML on Big Data: Real-Time Analysis on Time Series
 
Stock Market Prediction Using ANN
Stock Market Prediction Using ANNStock Market Prediction Using ANN
Stock Market Prediction Using ANN
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
 
Josh Patterson MLconf slides
Josh Patterson MLconf slidesJosh Patterson MLconf slides
Josh Patterson MLconf slides
 
DLT UNIT-3.docx
DLT  UNIT-3.docxDLT  UNIT-3.docx
DLT UNIT-3.docx
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomate
 
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef Habdank
 
Time Series Forecasting Using Novel Feature Extraction Algorithm and Multilay...
Time Series Forecasting Using Novel Feature Extraction Algorithm and Multilay...Time Series Forecasting Using Novel Feature Extraction Algorithm and Multilay...
Time Series Forecasting Using Novel Feature Extraction Algorithm and Multilay...
 

Recently uploaded

一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 

Recently uploaded (20)

一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 

Time series deep learning

  • 1. Time series analysis and prediction in the deep learning era Alberto Arrigoni, PhD February 2019
  • 2. Time series: analysis and prediction What will the future hold? FuturePast Now
  • 3. Time series applications + context Time series prediction: e.g. demand/sales forecasting... Use prediction for anomaly detection: e.g. manufacturing settings... Counterfactual prediction: e.g. marketing campaigns... Show ads Counterfactual
  • 4. Time series applications + context Time series prediction: e.g. demand/sales forecasting... Use prediction for anomaly detection: e.g. manufacturing settings... Counterfactual prediction: e.g. marketing campaigns... Show ads Counterfactual
  • 5. Time series prediction methods (non-comprehensive list) Classical autoregressive models Bayesian AR models General machine learning approaches Deep learning t+3
  • 6. Number of time series (~ thousands) [the SCALE problem] Time series are often highly erratic, intermittent or bursty (...and on highly different scales) ~ 10 items 2 items Product A Product B ... (1) (2) Time series prediction and sales forecasting: issues E.g. retail businesses
  • 7. Time series belong to a hierarchy of products/categories E.g. online retailer selling clothes Time series prediction and sales forecasting: issues Now Nike t-shirts Clothes (total sales) T-shirts total sales ~ 100 ~ 1000(3) For new products historical data is missing (the cold-start problem) (4) Adidas t-shirts
  • 8. Classical autoregressive models Estimate model order (AIC, BIC) Fit model parameters (maximum likelihood) Autoregressive component Moving average component Test residuals for randomness De-trending by differencing Variance stabilization by log or Box-Cox transformation Workflow
  • 9. Classical autoregressive models THE PROS: - Good explainability - Solid theoretical background - Very explicit model - A lot of control as it is a manual process THE CONS: - Data is seldom stationary: trend, seasonality, cycles need to modeled as well - Computationally intensive (one model for each time series) - No information sharing across time series (apart from Hyndman’s hts approach) * - Historical data are essential for forecasting, (no cold-start) * https://robjhyndman.com/publications/hierarchical/ Tech stack and packages - Rob Hyndman’s online text: https://otexts.com/fpp2/ - Infamous auto.arima package, ets, tbats, garch, stl... - Python’s Pyramid
  • 10. - Aggregate histograms over time scales - Transform into Fourier space - Add low/high pass filters as variables General machine learning approach for ts prediction Past Yt t Autoregressive component - Can use any number of methods (linear, trees, neural networks...) - Turn the time series prediction problem into a supervised learning problem - Easily extendable to support multiple input variables - Covariates can be easily handled and transformed through feature engineering Covariates E.g. feature engineering
  • 11. THE PROS: - Can model non-linear relationships - Can model the “hierarchical structure” of the time series through categorical variables - Support for covariates (predictors) + feature engineering - One model is shared among multiple time series - Cold-start predictions are possible by iteratively feeding the predictions back to the feature space THE CONS: - Feature engineering takes time - Long-term relationships between data points need to be explicitly modeled (autoregressive features) General machine learning approach for ts prediction Tech stack and packages - Sklearn, PySpark for feature engineering, data reduction
  • 12. Bayesian AR models (Facebook Prophet) Prophet is a Bayesian GAM (Generalized Additive Model) Linear trend with changepoints Seasonal component Holiday-specific componentt Sales 1) Detect changepoints in the time series 2) Fit linear trend parameters (k and delta) (piecewise) linear trends Growth rate Growth rate adjustment ** ** An additional ‘offset’ term has been omitted from the formula * Implemented using STAN *
  • 13. Bayesian AR models (Facebook Prophet) E.g. P = 365 for yearly data Need to estimate 2N parameters (an and bn ) using MCMC! Prophet is a Bayesian GAM (Generalized Additive Model) Linear trend with changepoints Seasonal component Holiday-specific componentt Sales
  • 14. THE PROS: - Uncertainty estimation - Bayesian changepoint detection - User-in-the-loop paradigm (Prophet) - Black-box variational inference is revolutionizing Bayesian inference THE CONS: - Bayesian inference takes time (the “scale” issue) - One model for each time series - No information sharing among series (unless you specify a hierarchical bayesian model with shared parameters, but still...) - Historical data are needed for prediction! - Performance is often on par* with autoregressive models Tech stack and packages - Python/R clients for Prophet * - R package for structural bayesian time series models: Bsts Bayesian AR models * Taylor et al., Forecasting at scale* This may open endless discussions. Bottom line: depends on your data :)
  • 15. Interlude: uncertainty estimation with deep learning - Uncertainty estimation is a prerogative of Bayesian methods. - Black box variational inference (ADVI) has sprung renewed interest towards Bayesian neural networks, but we are not there yet in terms of performance - A DeepMind paper from NIPS 2017 introduces a simple yet effective way to estimate predictive uncertainty using Deep Ensembles For a TensorFlow implementation of this paper: https://arrigonialberto86.github.io/funtime/deep_ensembles.html “Engineering Uncertainty Estimation in Neural Networks for Time Series Prediction at Uber” https://eng.uber.com/neural-network s-uncertainty-estimation/ 1) 2)
  • 16. Interlude: Deep Ensembles Train a deep learning model using a custom final layer which parametrizes a Gaussian distribution Sample x from the Gaussian distribution using fitted parameters Calculate loss to backpropagate the error (using Gaussian likelihood) (1) (3) (2) Network output
  • 17. What the network is learning: different regions of the x space have different variances Generate a synthetic dataset with different variances Interlude: Deep Ensembles PREDICTION ON TRAINING DATASET SYNTHETIC TRAINING DATASET Use the network from previous slide to predict on the training set to see if it actually detects variance reduction
  • 18. Interlude: Deep Ensembles The authors suggest to train different NNs on the same data (the whole training set) with random initialization Ensemble networks (improve generalization power) Uniformly weighted mixture model Predictions for regions outside of the training dataset show increasing variance (due to ensembling) In addition to ‘distribution’ modeling and ensembling the authors suggest to use the fast gradient sign method * to produce adversarial training example (Not shown here) * Goodfellow et al., 2014
  • 19. Interlude: Deep Ensembles Custom GaussianLayer Let’s just do some extra work and define a custom layer For a TensorFlow implementation of this paper: https://arrigonialberto86.github.io/funtime/deep_ensembles.html
  • 20. Interlude: Deep Ensembles Custom layer returns both mu and sigma Build 2 weight matrices + 2 biase terms
  • 21. DeepAR (Amazon) Instead of fitting separate models for each time series we create a global model from related time series to handle widely-varying scales through rescaling and velocity-based sampling. Differentscales Probabilities ~1000 time series Past Future Covariates Flunkert et al., 2017
  • 22. DeepAR (Amazon) ht-1 ht ht+1 - Use LSTM interactions in the time series - As seen with the Deep Ensemble architecture, we can predict parameters of distributions at each time point (theta vector) - Time series need to be scaled for the network to learn time-varying dynamics
  • 23. DeepAR (Amazon) * Likelihood/loss is customizable: Gaussian/negative binomial for count data + overdispersion Training Prediction *
  • 24. For a commentary + code review: https://arrigonialberto86.github.io/funtime/deepar.html DeepAR (Amazon) The mandatory ‘AirPassengers’ prediction example (results shown on training set) It is given that this is not the use case Amazon had in mind...
  • 25. DeepAR (Amazon) - Long-term relationships are handled by design using LSTMs - One model is fitted for all the time series - The hierarchical ts structure and inter-dependencies are captured by using covariates (even holidays, recurrent events etc...) - The model can be used for cold-start predictions (using categorical covariates with ‘descriptive’ product information) - Hassle-free uncertainty estimation DeepAR and the AWS ecosystem AWS SageMaker
  • 26. Deep State Space (NIPS 2018)* A state space model or SSM is just like an Hidden Markov Model, except the hidden states are continuous Observation (zt ) update Latent state (lt ) update In normal settings we would need to fit these parameters for each time series zt-1 zt zt+1 ??? * Rangapuram et al, 2018, Deep State Space Models for Time Series Forecasting
  • 27. Deep State Space (NIPS 2018) Training Prediction Compute the negative likelihood, derive the time-varying SS parameters using backpropagation Use Kalman filtering to estimate lt , then recursively apply the transition equation and the observation model to generate prediction samples
  • 28. - Long-term relationships are handled by design using LSTMs - One model is fitted for all the time series - The hierarchical ts structure and inter-dependencies are captured by ad-hoc design and components of the SS model (even holidays, recurrent events etc...) - The model can be used for cold-start predictions (using categorical covariates with ‘descriptive’ product information) Deep State Space (NIPS 2018)
  • 29. Going forward: Deep factors with GPs * * Maddix et al., “Deep Factors with Gaussian Processes for Forecasting”, NIPS 2018 The combination of probabilistic graphical models with deep neural networks has been an active research area recently Global DNN backbone and local Gaussian Process (GP). The main idea is to represent each time series as a combination of a global time series and a corresponding local model. gt gt gt gt RNN zit + covariates Backpropagation to find RNN parameters to produce global factors (gt ) + GP hyperparameters
  • 30. M4 forecasting competition winner algo (Uber, 2018) The winning idea is often the simplest! Hybrid Exponential Smoothing-Recurrent Neural Networks (ES-RNN) method. It mixes hand-coded parts like ES formulas with a black-box recurrent neural network (RNN) forecasting engine. yt-1 yt yt+1 Deseasonalized and normalized vector of covariates + previous state RNN results are now part of a parametric model
  • 31. Classical autoregressive models Bayesian models (GAM/structural) Classical machine learning Deep learning approaches Scalability Info sharing across ts Cold-start predictions Uncertainty estimation Unevenly spaced time series * Summary of performance * DeepAR Deep Factors * Chen et al., Neural ordinary differential equations, 2018 / Futoma et al., 2017, Multitask GP + RNN
  • 33. Deep State Space (Amazon) Level-trend model parametrization:
  • 34. DeepAR (Amazon) Step 1 Step 2 Step 3 Training procedure: - Predict parameters (e.g. mu, sigma) - Compute likelihood of the prediction (can be Gaussian as we have seen with Deep Ensembles) * - Sample next point * Likelihood/loss is customizable: Gaussian/negative binomial for count data + overdispersion Training Prediction (~ Monte Carlo)