© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cyrus Vahid - Principal Architect – AWS Deep Learning
Amazon Web Services
Multivariate Time Series
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Autoregressive Models
• Hyndman[1] defines autoregressive models as:
’’	In	an	autoregression model,	we	forecast	the	variable	of	
interest	using	a	linear	combination	of past	values	of	the	
variable.	The	term autoregression indicates	that	it	is	a	
regression	of	the	variable	against	itself.’’
• AR(p)	model:
?@ = B + DE?@FE + D?@FG + … + D?@FH + I@
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Auto Regressive Models
!" = 18 − 0.8!")* + ," !" = 8 + 1.3!" − 1 − 0.7 !")/ − 2 + ,"
• Autoregressive models are remarkably flexible at handling a wide range of
different time series patterns.
1,2: 4!56785 [1]
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenges faced by existing models
• Most methods are designed to forecasting individual
series or small groups. New set of problems have
emerged:
• Forecasting a large number of individual or grouped time
series.
• Trying to learn a global model facing the difficulty of dealing
with scale of different time-series that would otherwise be
related.
• Many older models cannot account for environmental inputs.
• Cold start problem for new items to be included in the
forecast.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Goal
• Ability to learn and generalized from similar series
provides us with the ability to learn more complex
models without overfitting.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DeepAR
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Solution
• DeepAR is a forecasting model based on autoregressive
RNNs, which learns a global model from historical data
of all time series in all datasets.[2]
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DeepAR Advantages
• Minimal manual feature engineering
• Ability to provide forecast for series with little or no
history.
• Ability to incorporate a wide range of likelihood models.
• Provides consistent estimates for subgroups.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DeepAR Model
• Goal: Given observed values of a series ! for " time-steps, estimating
probability distribution of next # steps; more formally, modeling the below
conditional distribution is the goal: $ %&,():+ %&,,:()
, -&,,:+
• Parameterized by output of an AR RNN.
./ %&,():+ %&,,:()
, -&,,:+ =
1
(2()
+
./ 3&,( %&,,:(4,, -&,,:+ =
1
(2()
+
ℓ(3&,(|8(9&,(, Θ))
9&,( = h(9&,(4,, 3&,(4,, -&,(, Θ)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DeepAR Architecture
• DeepAR is an encoder decode architecture, taking a
number of input steps, output from encoder, and
covariates, and predicts for the number of steps
indicated as horizon.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Likelihood Model – Gaussian
• Gaussian likelihood for real-valued Data
ℓ" # $, & = 2)&* +
,
*-
+
#+. /
*0/
$ 12,3 = 4.
512,3 + 7.
& 12,3 = log 1 + - <=
>1?,@ABC
Softplus activation
Network output
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Likelihood Model – Negative Bionomial
• Negative-binomial likelihood for positive count data. The
Negative Binomial distribution is the distribution that
underlies the stochasticity in over-dispersed count
data.[3]
ℓ"# $ %, ' =
Γ $ +
1
'
Γ $ + 1 Γ
1
'
1
1 + '%
,
- '%
1 + '%
$
% ./,0 = log 1 + 4 56
7.8,9:;6
' ./,0 = log 1 + 4 5<
7.8,9:;<
• % =>? '=@4 ABCℎ BECFEC BG =
?4>H4 I=J4@ KLCℎ
HBGCFIEH =MCLN=CLB>
• ' HM=I4H N=@L=>M4 @4I=CLN4 CB
Cℎ4 O4=>
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scaling
• Non-linearity results in loss of scale.
• Solution:
• Dividing AR inputs by item-dependent scale factor.
• Multiplying scale-dependent likelihood by the same factor.
• !" = 1 +
&
'(
∑'*&
'(
+",'
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Comparison
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Code
https://github.com/awslabs/amazon-sagemaker-
examples/blob/master/introduction_to_amazon_algorithms/deepar_electricity/DeepAR-Electricity.ipynb
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
LSTNet
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Challenge
• Autoregressive models may fail to capture mixture of
long and short term patterns.`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Solution – LSTNet[4]
• Long and Short Terms Time-series Networks is designed
to capture mix long- and short-term patterns in data for
multivariate time-series.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Concept
• Using CNN to discover local dependencies
• RNNs to capture long-term dependencies
• Autoregressive model to handle scale.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Problem Formulation
• Given ! = #$, #&, … , #( where #)*ℝ, and - is the variable
dimension, the aim is to predict #(./, and h is the
horizon.
• Similarly, given ! = #$, #&, … , #(.$ , we want to predict
#(.$. /
• The input matrix is denoted as 1 = #$, #&, … , #( *ℝ,×(
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Architecture
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Convolutional Component
• Extract short-term patterns in the time dimension as
well as local dependencies between variables.
• Multiple filters of width ! and height " = "$%_'()
• ℎ+ = ,-./(1+ ∗ 3 + 5+)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Recurrent Component
• The output of the Conv layer is simultaneously fed to
Recurrent and Recurrent-skip layers (next slide).
• RNN component is GRU layer with RELU activation.*
!" = $ %"&'( + ℎ"+,&-( + .(
/" = $ %"&'0 + ℎ"+,&-0 + .0
1" = 2345 %"&'6 + !" ⊙ (ℎ"+, &6() + .6
ℎ" = 1 − /" ⊙ ℎ"+, + /" ⊙ 1"
* The implementation of the paper is using tanh, but the authors claim is that RELU performs better than tanh
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Recurrent-skip Component
• Recurrent skip component is a recurrent layer that
captures lagged long-term dependencies according to
the appropriate lag. For instance hourly electricity
consumption would have a lag of 24 time steps.
!" = $ %"&'( + ℎ"+,&-( + .(
/" = $ %"&'0 + ℎ"+,&-0 + .0
1" = 2345 %"&'6 + !" ⊙ (ℎ"+, &6() + .6
ℎ" = 1 − /" ⊙ ℎ"+, + /" ⊙ 1"
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Combining Recurrent and Recurrent-skip Outputs
• A Dense layer combines the output from recurrent
layers.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Temporal Attention Layer
• In case of non-seasonal data skip step p is not useful.
• In such cases an attention mechanism is used, which
learns the weighted combination of hidden
representations at each window position of the input
matrix.
!" = $%%&'()*+ ,-
.
, ℎ-12
.
; !"4ℝ6
: $%%&. 9+:;ℎ%<
,-
.
= ℎ"16
.
, … , ℎ"12
.
: <%>(?:&; ℎ:@@+& <%>%+< ()ABC& − 9:<+AE
(" = ,"!":	context	vector
ℎ"
O
= P ("; ℎ"12
.
+ R: )B%SB% :< ()&(>%:&>%:)& )T ( >&@ A><% 9:&@)9 ℎ:@@+& *+S*+<+&%>%:&)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Autoregressive Component
• ARC overcomes loss of scale, cased by DNN non-
linearity.
• ARC is a linear AR.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Final Output
• Final output is obtained by integrating AR and DNN
outputs.
!"# = ℎ#
&
+ ℎ#
(
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Objective Function
• The paper suggests using either L1 or L2 loss function.
!: !#$%&'($)* +$#,: - .
= 0
123
4
0
523
6
|815|9
ℎ: ℎ$#(;$'
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Metrics
• Root Relative Squared Error (RSE): We want lower error.
• Empirical Correlation Coefficient (CORR): We want higher
correlation.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Comparison
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Code
https://github.com/safrooze/LSTNet-Gluon
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
References
1. Forecasting: Principles and Practice – Rob J Hyndman, George Athanasopoulos https://www.otexts.org/fpp/8/3
2. DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks - Valentin Flunkert , David Salinas , Jan
Gasthaus. https://arxiv.org/abs/1704.04110
3. http://sherrytowers.com/2014/07/11/negative-binomial-likelihood/
4. Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks, Guokun Lai et. Al
https://arxiv.org/pdf/1703.07015.pdf
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Multivariate Time Series

  • 1.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Cyrus Vahid - Principal Architect – AWS Deep Learning Amazon Web Services Multivariate Time Series
  • 2.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Autoregressive Models • Hyndman[1] defines autoregressive models as: ’’ In an autoregression model, we forecast the variable of interest using a linear combination of past values of the variable. The term autoregression indicates that it is a regression of the variable against itself.’’ • AR(p) model: ?@ = B + DE?@FE + D?@FG + … + D?@FH + I@
  • 3.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Auto Regressive Models !" = 18 − 0.8!")* + ," !" = 8 + 1.3!" − 1 − 0.7 !")/ − 2 + ," • Autoregressive models are remarkably flexible at handling a wide range of different time series patterns. 1,2: 4!56785 [1]
  • 4.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Challenges faced by existing models • Most methods are designed to forecasting individual series or small groups. New set of problems have emerged: • Forecasting a large number of individual or grouped time series. • Trying to learn a global model facing the difficulty of dealing with scale of different time-series that would otherwise be related. • Many older models cannot account for environmental inputs. • Cold start problem for new items to be included in the forecast.
  • 5.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Goal • Ability to learn and generalized from similar series provides us with the ability to learn more complex models without overfitting.
  • 6.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. DeepAR
  • 7.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Solution • DeepAR is a forecasting model based on autoregressive RNNs, which learns a global model from historical data of all time series in all datasets.[2]
  • 8.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. DeepAR Advantages • Minimal manual feature engineering • Ability to provide forecast for series with little or no history. • Ability to incorporate a wide range of likelihood models. • Provides consistent estimates for subgroups.
  • 9.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. DeepAR Model • Goal: Given observed values of a series ! for " time-steps, estimating probability distribution of next # steps; more formally, modeling the below conditional distribution is the goal: $ %&,():+ %&,,:() , -&,,:+ • Parameterized by output of an AR RNN. ./ %&,():+ %&,,:() , -&,,:+ = 1 (2() + ./ 3&,( %&,,:(4,, -&,,:+ = 1 (2() + ℓ(3&,(|8(9&,(, Θ)) 9&,( = h(9&,(4,, 3&,(4,, -&,(, Θ)
  • 10.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. DeepAR Architecture • DeepAR is an encoder decode architecture, taking a number of input steps, output from encoder, and covariates, and predicts for the number of steps indicated as horizon.
  • 11.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Likelihood Model – Gaussian • Gaussian likelihood for real-valued Data ℓ" # $, & = 2)&* + , *- + #+. / *0/ $ 12,3 = 4. 512,3 + 7. & 12,3 = log 1 + - <= >1?,@ABC Softplus activation Network output
  • 12.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Likelihood Model – Negative Bionomial • Negative-binomial likelihood for positive count data. The Negative Binomial distribution is the distribution that underlies the stochasticity in over-dispersed count data.[3] ℓ"# $ %, ' = Γ $ + 1 ' Γ $ + 1 Γ 1 ' 1 1 + '% , - '% 1 + '% $ % ./,0 = log 1 + 4 56 7.8,9:;6 ' ./,0 = log 1 + 4 5< 7.8,9:;< • % =>? '=@4 ABCℎ BECFEC BG = ?4>H4 I=J4@ KLCℎ HBGCFIEH =MCLN=CLB> • ' HM=I4H N=@L=>M4 @4I=CLN4 CB Cℎ4 O4=>
  • 13.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Scaling • Non-linearity results in loss of scale. • Solution: • Dividing AR inputs by item-dependent scale factor. • Multiplying scale-dependent likelihood by the same factor. • !" = 1 + & '( ∑'*& '( +",'
  • 14.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Comparison
  • 15.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Code https://github.com/awslabs/amazon-sagemaker- examples/blob/master/introduction_to_amazon_algorithms/deepar_electricity/DeepAR-Electricity.ipynb
  • 16.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. LSTNet
  • 17.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Challenge • Autoregressive models may fail to capture mixture of long and short term patterns.` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `
  • 18.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Solution – LSTNet[4] • Long and Short Terms Time-series Networks is designed to capture mix long- and short-term patterns in data for multivariate time-series.
  • 19.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Concept • Using CNN to discover local dependencies • RNNs to capture long-term dependencies • Autoregressive model to handle scale.
  • 20.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Problem Formulation • Given ! = #$, #&, … , #( where #)*ℝ, and - is the variable dimension, the aim is to predict #(./, and h is the horizon. • Similarly, given ! = #$, #&, … , #(.$ , we want to predict #(.$. / • The input matrix is denoted as 1 = #$, #&, … , #( *ℝ,×(
  • 21.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Architecture
  • 22.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Convolutional Component • Extract short-term patterns in the time dimension as well as local dependencies between variables. • Multiple filters of width ! and height " = "$%_'() • ℎ+ = ,-./(1+ ∗ 3 + 5+)
  • 23.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Recurrent Component • The output of the Conv layer is simultaneously fed to Recurrent and Recurrent-skip layers (next slide). • RNN component is GRU layer with RELU activation.* !" = $ %"&'( + ℎ"+,&-( + .( /" = $ %"&'0 + ℎ"+,&-0 + .0 1" = 2345 %"&'6 + !" ⊙ (ℎ"+, &6() + .6 ℎ" = 1 − /" ⊙ ℎ"+, + /" ⊙ 1" * The implementation of the paper is using tanh, but the authors claim is that RELU performs better than tanh
  • 24.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Recurrent-skip Component • Recurrent skip component is a recurrent layer that captures lagged long-term dependencies according to the appropriate lag. For instance hourly electricity consumption would have a lag of 24 time steps. !" = $ %"&'( + ℎ"+,&-( + .( /" = $ %"&'0 + ℎ"+,&-0 + .0 1" = 2345 %"&'6 + !" ⊙ (ℎ"+, &6() + .6 ℎ" = 1 − /" ⊙ ℎ"+, + /" ⊙ 1"
  • 25.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Combining Recurrent and Recurrent-skip Outputs • A Dense layer combines the output from recurrent layers.
  • 26.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Temporal Attention Layer • In case of non-seasonal data skip step p is not useful. • In such cases an attention mechanism is used, which learns the weighted combination of hidden representations at each window position of the input matrix. !" = $%%&'()*+ ,- . , ℎ-12 . ; !"4ℝ6 : $%%&. 9+:;ℎ%< ,- . = ℎ"16 . , … , ℎ"12 . : <%>(?:&; ℎ:@@+& <%>%+< ()ABC& − 9:<+AE (" = ,"!": context vector ℎ" O = P ("; ℎ"12 . + R: )B%SB% :< ()&(>%:&>%:)& )T ( >&@ A><% 9:&@)9 ℎ:@@+& *+S*+<+&%>%:&)
  • 27.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Autoregressive Component • ARC overcomes loss of scale, cased by DNN non- linearity. • ARC is a linear AR.
  • 28.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Final Output • Final output is obtained by integrating AR and DNN outputs. !"# = ℎ# & + ℎ# (
  • 29.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Objective Function • The paper suggests using either L1 or L2 loss function. !: !#$%&'($)* +$#,: - . = 0 123 4 0 523 6 |815|9 ℎ: ℎ$#(;$'
  • 30.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Metrics • Root Relative Squared Error (RSE): We want lower error. • Empirical Correlation Coefficient (CORR): We want higher correlation.
  • 31.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Comparison
  • 32.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. Code https://github.com/safrooze/LSTNet-Gluon
  • 33.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved. References 1. Forecasting: Principles and Practice – Rob J Hyndman, George Athanasopoulos https://www.otexts.org/fpp/8/3 2. DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks - Valentin Flunkert , David Salinas , Jan Gasthaus. https://arxiv.org/abs/1704.04110 3. http://sherrytowers.com/2014/07/11/negative-binomial-likelihood/ 4. Modeling Long- and Short-Term Temporal Patterns with Deep Neural Networks, Guokun Lai et. Al https://arxiv.org/pdf/1703.07015.pdf
  • 34.
    © 2018, AmazonWeb Services, Inc. or its affiliates. All rights reserved.