SlideShare a Scribd company logo
1 of 66
Download to read offline
Data Analysis:
Statistical Modeling and Computation in Applications
Time Series Analysis: Model Selection, Linear Processes, LLR
Stefanie Jegelka (and Caroline Uhler) 1 / 26
Overview
Determining model order for AR models
Partial Autocorrelation
Akaike Information Criterion
Cross-validation
Linear Processes
Subseasonal weather forecasting and Local linear regression
Stefanie Jegelka (and Caroline Uhler) 2 / 26
Recap: Time series models
White Noise: Xt = Wt;
Wt independent, mean zero, same variance σ2
w
Autoregressive AR(p):
Xt = φ1Xt−1 + φ2Xt−2 + . . . φpXt−p + Wt
Moving Average MA(q):
Xt = Wt + θ1Wt−1 + θ2Wt−2 + . . . + θqWt−q
ARMA / ARIMA:
Xt = φ1Xt−1 + . . . φpXt−p + Wt + θ1Wt−1 + . . . + θqWt−q
(ARIMA: ARMA after differencing)
Stefanie Jegelka (and Caroline Uhler) 3 / 26
Fitting a time series: Overview (Part I)
1 transform to make it stationary
log-transform
remove trends / seasonality
differentiate successively
2 check for white noise (ACF)
3 if stationary: plot autocorrelation. If finite lag, fit MA (ACF gives
order), otherwise AR.
Fitting AR(p):
1 compute PACF to get order
2 estimate coefficients φk and noise variance σ2
w via Yule-Walker
equations
3 compute residuals, test for white noise
Stefanie Jegelka (and Caroline Uhler) 4 / 26
Model order for AR(p)
MA(q) model: autocorrelation reveals order q
Stefanie Jegelka (and Caroline Uhler) 5 / 26
Model order for AR(p)
MA(q) model: autocorrelation reveals order q
But not for AR(p) model!
Stefanie Jegelka (and Caroline Uhler) 5 / 26
Model order for AR(p)
MA(q) model: autocorrelation reveals order q
But not for AR(p) model!
Example AR(1): Xt = φXt−1 + Wt
Stefanie Jegelka (and Caroline Uhler) 5 / 26
Model order for AR(p)
MA(q) model: autocorrelation reveals order q
But not for AR(p) model!
Example AR(1): Xt = φXt−1 + Wt
corr(Xt, Xt−2) = φ2γ(0)
Correlation via Xt−1.
Stefanie Jegelka (and Caroline Uhler) 5 / 26
Model order for AR(p)
MA(q) model: autocorrelation reveals order q
But not for AR(p) model!
Example AR(1): Xt = φXt−1 + Wt
corr(Xt, Xt−2) = φ2γ(0)
Correlation via Xt−1.
Idea: if we remove linear effect of Xt−1, then Xt, Xt−2 should be
uncorrelated.
Stefanie Jegelka (and Caroline Uhler) 5 / 26
Partial Autocorrelation (PACF)
Partial Correlation of X, Y given Z:
Stefanie Jegelka (and Caroline Uhler) 6 / 26
Partial Autocorrelation (PACF)
Partial Correlation of X, Y given Z:
regress X on Z; Y on Z
Stefanie Jegelka (and Caroline Uhler) 6 / 26
Partial Autocorrelation (PACF)
Partial Correlation of X, Y given Z:
regress X on Z; Y on Z
ρXY |Z = corr(X − X̂, Y − Ŷ ).
Stefanie Jegelka (and Caroline Uhler) 6 / 26
Partial Autocorrelation (PACF)
Partial Correlation of X, Y given Z:
regress X on Z; Y on Z
ρXY |Z = corr(X − X̂, Y − Ŷ ).
for time series: Partial Autocorrelation of Xt, Xt−h.
Remove the effect of all variables “in between”, Xt−1, . . . Xt−h+1.
Stefanie Jegelka (and Caroline Uhler) 6 / 26
Partial Autocorrelation (PACF)
Partial Correlation of X, Y given Z:
regress X on Z; Y on Z
ρXY |Z = corr(X − X̂, Y − Ŷ ).
for time series: Partial Autocorrelation of Xt, Xt−h.
Remove the effect of all variables “in between”, Xt−1, . . . Xt−h+1.
equivalent computation:
Stefanie Jegelka (and Caroline Uhler) 6 / 26
Partial Autocorrelation (PACF)
Partial Correlation of X, Y given Z:
regress X on Z; Y on Z
ρXY |Z = corr(X − X̂, Y − Ŷ ).
for time series: Partial Autocorrelation of Xt, Xt−h.
Remove the effect of all variables “in between”, Xt−1, . . . Xt−h+1.
equivalent computation:
fit AR(h) model to obtain φ̂h1, . . . φ̂hh
Stefanie Jegelka (and Caroline Uhler) 6 / 26
Partial Autocorrelation (PACF)
Partial Correlation of X, Y given Z:
regress X on Z; Y on Z
ρXY |Z = corr(X − X̂, Y − Ŷ ).
for time series: Partial Autocorrelation of Xt, Xt−h.
Remove the effect of all variables “in between”, Xt−1, . . . Xt−h+1.
equivalent computation:
fit AR(h) model to obtain φ̂h1, . . . φ̂hh
(estimated) partial autocorrelation is last coefficient φ̂hh
Stefanie Jegelka (and Caroline Uhler) 6 / 26
Partial Autocorrelation (PACF)
Partial Correlation of X, Y given Z:
regress X on Z; Y on Z
ρXY |Z = corr(X − X̂, Y − Ŷ ).
for time series: Partial Autocorrelation of Xt, Xt−h.
Remove the effect of all variables “in between”, Xt−1, . . . Xt−h+1.
equivalent computation:
fit AR(h) model to obtain φ̂h1, . . . φ̂hh
(estimated) partial autocorrelation is last coefficient φ̂hh
for AR(p): φhh = 0 for h > p.
Stefanie Jegelka (and Caroline Uhler) 6 / 26
Autocorrelation (ACF) and PACF for AR(2)
3.4 Autocorrelation and Partial Autocorrelation 107
5 10 15 20
0.5
0.0
0.5
1.0
lag
ACF
5 10 15 20
0.5
0.0
0.5
1.0
lag
PACF
Fig. 3.4. The ACF and PACF of an AR(2) model with 1 = 1.5 and 2 = .75.
because, by causality, xt b
xt depends only on {wt+h 1, wt+h 2, . . .}; recall
equation (3.54). When h  p, pp is not zero, and 11, . . . , p 1,p 1 are not
necessarily zero. We will see later that, in fact, pp = p. Figure 3.4 shows
the ACF and the PACF of the AR(2) model presented in Example 3.10.
Stefanie Jegelka (and Caroline Uhler) 7 / 26
Partial ACF as a diagnostic tool: example from last lecture
Example: Xt = Tt + Yt, sum of
linear trend (Tt = 50 + 3t) and AR(1) (Yt = 0.8Yt−1 + Wt, σW = 20).
Stefanie Jegelka (and Caroline Uhler) 8 / 26
Partial ACF as a diagnostic tool: example from last lecture
Example: Xt = Tt + Yt, sum of
linear trend (Tt = 50 + 3t) and AR(1) (Yt = 0.8Yt−1 + Wt, σW = 20).
top: series; bottom: autocorrelation and partial autocorrelation of residuals after
fitting only a linear model.
Stefanie Jegelka (and Caroline Uhler) 8 / 26
Overview: ACF, PACF and model order
ACF PACF
AR(p) decays zero for h > p
MA(q) zero for h > q decays
ARMA(p,q) decays decays
Stefanie Jegelka (and Caroline Uhler) 9 / 26
Overview
Determining model order for AR models
Partial Autocorrelation
Akaike Information Criterion
Cross-validation
Linear Processes
Subseasonal weather forecasting and Local linear regression
Stefanie Jegelka (and Caroline Uhler) 10 / 26
Akaike Information Criterion (AIC)
Main idea: tradeoff between data fit and model complexity.
Stefanie Jegelka (and Caroline Uhler) 11 / 26
Akaike Information Criterion (AIC)
Main idea: tradeoff between data fit and model complexity.
For a model with k parameters:
AIC(k) = −2log-likelihood
| {z }
model fit
+ 2k
|{z}
complexity
Stefanie Jegelka (and Caroline Uhler) 11 / 26
Cross-validation for time series
Stefanie Jegelka (and Caroline Uhler) 12 / 26
Cross-validation for time series
“usual” cross-validation: randomly partition into k folds.
Repeatedly train on k − 1 folds, test on remaining.
Stefanie Jegelka (and Caroline Uhler) 12 / 26
Cross-validation for time series
“usual” cross-validation: randomly partition into k folds.
Repeatedly train on k − 1 folds, test on remaining.
cross-validation works in special cases for AR models (Bergmeier,
Hyndman, Koo 2015)
Stefanie Jegelka (and Caroline Uhler) 12 / 26
Cross-validation for time series
“usual” cross-validation: randomly partition into k folds.
Repeatedly train on k − 1 folds, test on remaining.
cross-validation works in special cases for AR models (Bergmeier,
Hyndman, Koo 2015)
evaluation on a rolling forecasting origin
Stefanie Jegelka (and Caroline Uhler) 12 / 26
Overview
Determining model order for AR models
Partial Autocorrelation
Akaike Information Criterion
Cross-validation
Linear Processes
Subseasonal weather forecasting and Local linear regression
Stefanie Jegelka (and Caroline Uhler) 13 / 26
Relating MA and AR?
Autoregressive AR(p):
Xt = φ1Xt−1 + φ2Xt−2 + . . . φpXt−p + Wt
Moving Average MA(q):
Xt = Wt + θ1Wt−1 + θ2Wt−2 + . . . + θqWt−q
Stefanie Jegelka (and Caroline Uhler) 14 / 26
Relating MA and AR: Linear Process
Xt =
∞
X
j=−∞
ψj Wt−j
Stefanie Jegelka (and Caroline Uhler) 15 / 26
Relating MA and AR: Linear Process
Xt =
∞
X
j=−∞
ψj Wt−j
X
j
|ψj | < ∞
Stefanie Jegelka (and Caroline Uhler) 15 / 26
Relating MA and AR: Linear Process
Xt =
∞
X
j=−∞
ψj Wt−j
X
j
|ψj | < ∞
causal: ψj = 0 whenever j < 0
(function of the past)
Stefanie Jegelka (and Caroline Uhler) 15 / 26
Relating MA and AR: Linear Process
Xt =
∞
X
j=−∞
ψj Wt−j
X
j
|ψj | < ∞
causal: ψj = 0 whenever j < 0
(function of the past)
E[Xt] = 0
Stefanie Jegelka (and Caroline Uhler) 15 / 26
Relating MA and AR: Linear Process
Xt =
∞
X
j=−∞
ψj Wt−j
X
j
|ψj | < ∞
causal: ψj = 0 whenever j < 0
(function of the past)
E[Xt] = 0
γX (t, t + h) =
P∞
i=−∞ ψi ψi+hσ2
w = γX (h)
Stefanie Jegelka (and Caroline Uhler) 15 / 26
Relating MA and AR: Linear Process
Xt =
∞
X
j=−∞
ψj Wt−j
X
j
|ψj | < ∞
causal: ψj = 0 whenever j < 0
(function of the past)
E[Xt] = 0
γX (t, t + h) =
P∞
i=−∞ ψi ψi+hσ2
w = γX (h)
Linear Process is stationary!
Stefanie Jegelka (and Caroline Uhler) 15 / 26
Relating MA and AR: Linear Process
Xt =
∞
X
j=−∞
ψj Wt−j
X
j
|ψj | < ∞
causal: ψj = 0 whenever j < 0
(function of the past)
E[Xt] = 0
γX (t, t + h) =
P∞
i=−∞ ψi ψi+hσ2
w = γX (h)
Linear Process is stationary!
MA(q) is linear process, causal
Stefanie Jegelka (and Caroline Uhler) 15 / 26
Relating MA and AR: Linear Process
Xt =
∞
X
j=−∞
ψj Wt−j
X
j
|ψj | < ∞
causal: ψj = 0 whenever j < 0
(function of the past)
E[Xt] = 0
γX (t, t + h) =
P∞
i=−∞ ψi ψi+hσ2
w = γX (h)
Linear Process is stationary!
MA(q) is linear process, causal
What about AR?
Stefanie Jegelka (and Caroline Uhler) 15 / 26
AR(1) as Linear Process
Xt = Wt + φ1Xt−1
Stefanie Jegelka (and Caroline Uhler) 16 / 26
AR(1) as Linear Process
Xt = Wt + φ1Xt−1
=
∞
X
j=0
φj
1Wt−j
Stefanie Jegelka (and Caroline Uhler) 16 / 26
AR(1) as Linear Process
Xt = Wt + φ1Xt−1
=
∞
X
j=0
φj
1Wt−j
converges if |φ1| < 1.
Then: AR(1) is causal and stationary.
Stefanie Jegelka (and Caroline Uhler) 16 / 26
AR(1) as Linear Process
Xt = Wt + φ1Xt−1
=
∞
X
j=0
φj
1Wt−j
converges if |φ1| < 1.
Then: AR(1) is causal and stationary.
general result: AR(p) is stationary & causal if linear process
converges.
Stefanie Jegelka (and Caroline Uhler) 16 / 26
AR(1) as Linear Process
Xt = Wt + φ1Xt−1
=
∞
X
j=0
φj
1Wt−j
converges if |φ1| < 1.
Then: AR(1) is causal and stationary.
general result: AR(p) is stationary & causal if linear process
converges.
Similarly: can write MA(1) as an infinite AR process (under similar
convergence conditions) in the form
Wt = Xt − φ1Xt−1 − φ2Xt−2 − . . .. If converges: invertible.
Stefanie Jegelka (and Caroline Uhler) 16 / 26
Overview
Determining model order for AR models
Partial Autocorrelation
Akaike Information Criterion
Cross-validation
Linear Processes
Subseasonal weather forecasting and Local linear regression
J. Hwang, P. Orenstein, K. Pfeiffer, J. Cohen, L. Mackey. Improving
Subseasonal Forecasting in the Western U.S. with Machine Learning.
KDD 2019.
Stefanie Jegelka (and Caroline Uhler) 17 / 26
Subseasonal Rodeo
images: Lester Mackey
Stefanie Jegelka (and Caroline Uhler) 18 / 26
Subseasonal Rodeo: measurements
multiple time series Yt,g , indexed by grid points g
temperature
precipitation
sea surface temperature & sea ice concentration
ENSO index (pressure, wind, SST, temperature, cloudiness)
Madden-Julian oscillation
relative humidity / pressure
North Americal Model Ensemble
Stefanie Jegelka (and Caroline Uhler) 19 / 26
Subseasonal Rodeo: measurements
multiple time series Yt,g , indexed by grid points g
temperature
precipitation
sea surface temperature & sea ice concentration
ENSO index (pressure, wind, SST, temperature, cloudiness)
Madden-Julian oscillation
relative humidity / pressure
North Americal Model Ensemble
use anomalies
at = yt − cmonthday(t)
accuracy measure:
skill(ât, at) = cos(ât, at) =
hât, ati
kâtkkatk
Stefanie Jegelka (and Caroline Uhler) 19 / 26
Questions
What model should we fit?
Which variables should we use for prediction?
Stefanie Jegelka (and Caroline Uhler) 20 / 26
Local Linear Regression (no time series)
data from a nonlinear function, but we don’t know exact structure
Stefanie Jegelka (and Caroline Uhler) 21 / 26
Local Linear Regression (no time series)
data from a nonlinear function, but we don’t know exact structure
locally well approximated by a linear function
Idea: To predict at x0, locally fit a linear model around x0
D = {x : kx − x0k ≤ h}
Stefanie Jegelka (and Caroline Uhler) 21 / 26
Local Linear Regression (no time series)
data from a nonlinear function, but we don’t know exact structure
locally well approximated by a linear function
Idea: To predict at x0, locally fit a linear model around x0
D = {x : kx − x0k ≤ h}
Obtain one β for each x0:
β̂x0
= arg min
β
X
xi ∈D
wi (yi − β>
(xi − x0))2
Stefanie Jegelka (and Caroline Uhler) 21 / 26
Local Linear Regression (no time series)
data from a nonlinear function, but we don’t know exact structure
locally well approximated by a linear function
Idea: To predict at x0, locally fit a linear model around x0
D = {x : kx − x0k ≤ h}
Obtain one β for each x0:
β̂x0
= arg min
β
X
xi ∈D
wi (yi − β>
(xi − x0))2
larger h: smoother function
Stefanie Jegelka (and Caroline Uhler) 21 / 26
Local Linear Regression (no time series)
data from a nonlinear function, but we don’t know exact structure
locally well approximated by a linear function
Idea: To predict at x0, locally fit a linear model around x0
D = {x : kx − x0k ≤ h}
Obtain one β for each x0:
β̂x0
= arg min
β
X
xi ∈D
wi (yi − β>
(xi − x0))2
larger h: smoother function
weighting: can use a kernel wi = K(xi −x0
h )
Stefanie Jegelka (and Caroline Uhler) 21 / 26
Local Linear Regression (no time series)
data from a nonlinear function, but we don’t know exact structure
locally well approximated by a linear function
Idea: To predict at x0, locally fit a linear model around x0
D = {x : kx − x0k ≤ h}
Obtain one β for each x0:
β̂x0
= arg min
β
X
xi ∈D
wi (yi − β>
(xi − x0))2
larger h: smoother function
weighting: can use a kernel wi = K(xi −x0
h )
(K(u) = 0 for |u| > 1, K(u) = K(−u), K(u) > 0 for |u| < 1)
Stefanie Jegelka (and Caroline Uhler) 21 / 26
Local Linear Regression for Time Series
one model for each day of the year (shared across years) and grid
point g.
D = ±56-day span around target day of the year
Stefanie Jegelka (and Caroline Uhler) 22 / 26
Local Linear Regression for Time Series
one model for each day of the year (shared across years) and grid
point g.
D = ±56-day span around target day of the year
for each day of the year:
β̂g = arg min
β
X
t∈D
wt,g (yt,g −bt,g − β>
xt,g )2
Stefanie Jegelka (and Caroline Uhler) 22 / 26
Local Linear Regression for Time Series
one model for each day of the year (shared across years) and grid
point g.
D = ±56-day span around target day of the year
for each day of the year:
β̂g = arg min
β
X
t∈D
wt,g (yt,g −bt,g − β>
xt,g )2
weighting: e.g. wt,g = 1 or wt,g = 1/ var(at)
Stefanie Jegelka (and Caroline Uhler) 22 / 26
Local Linear Regression for Time Series
one model for each day of the year (shared across years) and grid
point g.
D = ±56-day span around target day of the year
for each day of the year:
β̂g = arg min
β
X
t∈D
wt,g (yt,g −bt,g − β>
xt,g )2
weighting: e.g. wt,g = 1 or wt,g = 1/ var(at)
offsets: bt,g = 0 or bt,g = cmonthday(t)
Stefanie Jegelka (and Caroline Uhler) 22 / 26
Model 1: external regressors with multitask feature
selection
Candidate features: measurements from each data source at
different lags, and constant feature
Stefanie Jegelka (and Caroline Uhler) 23 / 26
Model 1: external regressors with multitask feature
selection
Candidate features: measurements from each data source at
different lags, and constant feature
allow different features for different days of the year
but same features across all grid points for a given day
Stefanie Jegelka (and Caroline Uhler) 23 / 26
Model 1: external regressors with multitask feature
selection
Candidate features: measurements from each data source at
different lags, and constant feature
allow different features for different days of the year
but same features across all grid points for a given day
Backward regression: Start with all features, prune features one by
one.
Stefanie Jegelka (and Caroline Uhler) 23 / 26
Model 1: external regressors with multitask feature
selection
Candidate features: measurements from each data source at
different lags, and constant feature
allow different features for different days of the year
but same features across all grid points for a given day
Backward regression: Start with all features, prune features one by
one.
Drop feature that reduces the prediction accuracy the least, if below a
tolerance threshold
Accuracy: Leave-one-year out cross-validation with “skill”
Stefanie Jegelka (and Caroline Uhler) 23 / 26
Selected features (example)
Stefanie Jegelka (and Caroline Uhler) 24 / 26
Selected features (example)
image source: (Hwang et al 2019)
Stefanie Jegelka (and Caroline Uhler) 24 / 26
Summary
Determining model order for AR models:
Partial Autocorrelation
Akaike Information Criterion
Cross-validation
Relating AR and MA modesl: Linear Processes
Nonlinear model: Local linear regression and application example
Stefanie Jegelka (and Caroline Uhler) 25 / 26
References and Further Reading
Paul S.P. Cowpertwait and Andrew V. Metcalfe. Introductory Time
Series with R. Springer, 2009.
Chapter 4.5, 4.6
R.H. Shumway, D.S. Stoffer. Time Series Analysis and its
Applications, with Examples in R. Springer, 2011.
Chapter 3.1, 3.3.
R. Carmona. Statistical Analysis of Financial Data in R. Springer,
2014.
Chapter 6.
J. Hwang, P. Orenstein, K. Pfeiffer, J. Cohen, L. Mackey. Improving
Subseasonal Forecasting in the Western U.S. with Machine Learning.
KDD 2019.
Stefanie Jegelka (and Caroline Uhler) 26 / 26

More Related Content

Similar to final_slides_time-series-3scrib_annotated.pdf

slides_online_optimization_david_mateos
slides_online_optimization_david_mateosslides_online_optimization_david_mateos
slides_online_optimization_david_mateos
David Mateos
 
QMC: Transition Workshop - Importance Sampling the Union of Rare Events with ...
QMC: Transition Workshop - Importance Sampling the Union of Rare Events with ...QMC: Transition Workshop - Importance Sampling the Union of Rare Events with ...
QMC: Transition Workshop - Importance Sampling the Union of Rare Events with ...
The Statistical and Applied Mathematical Sciences Institute
 
DissertationSlides169
DissertationSlides169DissertationSlides169
DissertationSlides169
Ryan White
 

Similar to final_slides_time-series-3scrib_annotated.pdf (20)

CLIM Transition Workshop - Semiparametric Models for Extremes - Surya Tokdar,...
CLIM Transition Workshop - Semiparametric Models for Extremes - Surya Tokdar,...CLIM Transition Workshop - Semiparametric Models for Extremes - Surya Tokdar,...
CLIM Transition Workshop - Semiparametric Models for Extremes - Surya Tokdar,...
 
Slides erm-cea-ia
Slides erm-cea-iaSlides erm-cea-ia
Slides erm-cea-ia
 
Slides erm-cea-ia
Slides erm-cea-iaSlides erm-cea-ia
Slides erm-cea-ia
 
Wavelets and Other Adaptive Methods
Wavelets and Other Adaptive MethodsWavelets and Other Adaptive Methods
Wavelets and Other Adaptive Methods
 
slides_online_optimization_david_mateos
slides_online_optimization_david_mateosslides_online_optimization_david_mateos
slides_online_optimization_david_mateos
 
Unbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte CarloUnbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte Carlo
 
Numerical approach for Hamilton-Jacobi equations on a network: application to...
Numerical approach for Hamilton-Jacobi equations on a network: application to...Numerical approach for Hamilton-Jacobi equations on a network: application to...
Numerical approach for Hamilton-Jacobi equations on a network: application to...
 
A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...
 
PCA on graph/network
PCA on graph/networkPCA on graph/network
PCA on graph/network
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Anomaly And Parity Odd Transport
Anomaly And Parity Odd Transport Anomaly And Parity Odd Transport
Anomaly And Parity Odd Transport
 
QMC: Transition Workshop - Importance Sampling the Union of Rare Events with ...
QMC: Transition Workshop - Importance Sampling the Union of Rare Events with ...QMC: Transition Workshop - Importance Sampling the Union of Rare Events with ...
QMC: Transition Workshop - Importance Sampling the Union of Rare Events with ...
 
DissertationSlides169
DissertationSlides169DissertationSlides169
DissertationSlides169
 
Graduate Econometrics Course, part 4, 2017
Graduate Econometrics Course, part 4, 2017Graduate Econometrics Course, part 4, 2017
Graduate Econometrics Course, part 4, 2017
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
Slides mevt
Slides mevtSlides mevt
Slides mevt
 
Nested sampling
Nested samplingNested sampling
Nested sampling
 
Distributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUsDistributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUs
 
Several nonlinear models and methods for FDA
Several nonlinear models and methods for FDASeveral nonlinear models and methods for FDA
Several nonlinear models and methods for FDA
 
Linear response theory and TDDFT
Linear response theory and TDDFT Linear response theory and TDDFT
Linear response theory and TDDFT
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

final_slides_time-series-3scrib_annotated.pdf

  • 1. Data Analysis: Statistical Modeling and Computation in Applications Time Series Analysis: Model Selection, Linear Processes, LLR Stefanie Jegelka (and Caroline Uhler) 1 / 26
  • 2. Overview Determining model order for AR models Partial Autocorrelation Akaike Information Criterion Cross-validation Linear Processes Subseasonal weather forecasting and Local linear regression Stefanie Jegelka (and Caroline Uhler) 2 / 26
  • 3. Recap: Time series models White Noise: Xt = Wt; Wt independent, mean zero, same variance σ2 w Autoregressive AR(p): Xt = φ1Xt−1 + φ2Xt−2 + . . . φpXt−p + Wt Moving Average MA(q): Xt = Wt + θ1Wt−1 + θ2Wt−2 + . . . + θqWt−q ARMA / ARIMA: Xt = φ1Xt−1 + . . . φpXt−p + Wt + θ1Wt−1 + . . . + θqWt−q (ARIMA: ARMA after differencing) Stefanie Jegelka (and Caroline Uhler) 3 / 26
  • 4. Fitting a time series: Overview (Part I) 1 transform to make it stationary log-transform remove trends / seasonality differentiate successively 2 check for white noise (ACF) 3 if stationary: plot autocorrelation. If finite lag, fit MA (ACF gives order), otherwise AR. Fitting AR(p): 1 compute PACF to get order 2 estimate coefficients φk and noise variance σ2 w via Yule-Walker equations 3 compute residuals, test for white noise Stefanie Jegelka (and Caroline Uhler) 4 / 26
  • 5. Model order for AR(p) MA(q) model: autocorrelation reveals order q Stefanie Jegelka (and Caroline Uhler) 5 / 26
  • 6. Model order for AR(p) MA(q) model: autocorrelation reveals order q But not for AR(p) model! Stefanie Jegelka (and Caroline Uhler) 5 / 26
  • 7. Model order for AR(p) MA(q) model: autocorrelation reveals order q But not for AR(p) model! Example AR(1): Xt = φXt−1 + Wt Stefanie Jegelka (and Caroline Uhler) 5 / 26
  • 8. Model order for AR(p) MA(q) model: autocorrelation reveals order q But not for AR(p) model! Example AR(1): Xt = φXt−1 + Wt corr(Xt, Xt−2) = φ2γ(0) Correlation via Xt−1. Stefanie Jegelka (and Caroline Uhler) 5 / 26
  • 9. Model order for AR(p) MA(q) model: autocorrelation reveals order q But not for AR(p) model! Example AR(1): Xt = φXt−1 + Wt corr(Xt, Xt−2) = φ2γ(0) Correlation via Xt−1. Idea: if we remove linear effect of Xt−1, then Xt, Xt−2 should be uncorrelated. Stefanie Jegelka (and Caroline Uhler) 5 / 26
  • 10. Partial Autocorrelation (PACF) Partial Correlation of X, Y given Z: Stefanie Jegelka (and Caroline Uhler) 6 / 26
  • 11. Partial Autocorrelation (PACF) Partial Correlation of X, Y given Z: regress X on Z; Y on Z Stefanie Jegelka (and Caroline Uhler) 6 / 26
  • 12. Partial Autocorrelation (PACF) Partial Correlation of X, Y given Z: regress X on Z; Y on Z ρXY |Z = corr(X − X̂, Y − Ŷ ). Stefanie Jegelka (and Caroline Uhler) 6 / 26
  • 13. Partial Autocorrelation (PACF) Partial Correlation of X, Y given Z: regress X on Z; Y on Z ρXY |Z = corr(X − X̂, Y − Ŷ ). for time series: Partial Autocorrelation of Xt, Xt−h. Remove the effect of all variables “in between”, Xt−1, . . . Xt−h+1. Stefanie Jegelka (and Caroline Uhler) 6 / 26
  • 14. Partial Autocorrelation (PACF) Partial Correlation of X, Y given Z: regress X on Z; Y on Z ρXY |Z = corr(X − X̂, Y − Ŷ ). for time series: Partial Autocorrelation of Xt, Xt−h. Remove the effect of all variables “in between”, Xt−1, . . . Xt−h+1. equivalent computation: Stefanie Jegelka (and Caroline Uhler) 6 / 26
  • 15. Partial Autocorrelation (PACF) Partial Correlation of X, Y given Z: regress X on Z; Y on Z ρXY |Z = corr(X − X̂, Y − Ŷ ). for time series: Partial Autocorrelation of Xt, Xt−h. Remove the effect of all variables “in between”, Xt−1, . . . Xt−h+1. equivalent computation: fit AR(h) model to obtain φ̂h1, . . . φ̂hh Stefanie Jegelka (and Caroline Uhler) 6 / 26
  • 16. Partial Autocorrelation (PACF) Partial Correlation of X, Y given Z: regress X on Z; Y on Z ρXY |Z = corr(X − X̂, Y − Ŷ ). for time series: Partial Autocorrelation of Xt, Xt−h. Remove the effect of all variables “in between”, Xt−1, . . . Xt−h+1. equivalent computation: fit AR(h) model to obtain φ̂h1, . . . φ̂hh (estimated) partial autocorrelation is last coefficient φ̂hh Stefanie Jegelka (and Caroline Uhler) 6 / 26
  • 17. Partial Autocorrelation (PACF) Partial Correlation of X, Y given Z: regress X on Z; Y on Z ρXY |Z = corr(X − X̂, Y − Ŷ ). for time series: Partial Autocorrelation of Xt, Xt−h. Remove the effect of all variables “in between”, Xt−1, . . . Xt−h+1. equivalent computation: fit AR(h) model to obtain φ̂h1, . . . φ̂hh (estimated) partial autocorrelation is last coefficient φ̂hh for AR(p): φhh = 0 for h > p. Stefanie Jegelka (and Caroline Uhler) 6 / 26
  • 18. Autocorrelation (ACF) and PACF for AR(2) 3.4 Autocorrelation and Partial Autocorrelation 107 5 10 15 20 0.5 0.0 0.5 1.0 lag ACF 5 10 15 20 0.5 0.0 0.5 1.0 lag PACF Fig. 3.4. The ACF and PACF of an AR(2) model with 1 = 1.5 and 2 = .75. because, by causality, xt b xt depends only on {wt+h 1, wt+h 2, . . .}; recall equation (3.54). When h  p, pp is not zero, and 11, . . . , p 1,p 1 are not necessarily zero. We will see later that, in fact, pp = p. Figure 3.4 shows the ACF and the PACF of the AR(2) model presented in Example 3.10. Stefanie Jegelka (and Caroline Uhler) 7 / 26
  • 19. Partial ACF as a diagnostic tool: example from last lecture Example: Xt = Tt + Yt, sum of linear trend (Tt = 50 + 3t) and AR(1) (Yt = 0.8Yt−1 + Wt, σW = 20). Stefanie Jegelka (and Caroline Uhler) 8 / 26
  • 20. Partial ACF as a diagnostic tool: example from last lecture Example: Xt = Tt + Yt, sum of linear trend (Tt = 50 + 3t) and AR(1) (Yt = 0.8Yt−1 + Wt, σW = 20). top: series; bottom: autocorrelation and partial autocorrelation of residuals after fitting only a linear model. Stefanie Jegelka (and Caroline Uhler) 8 / 26
  • 21. Overview: ACF, PACF and model order ACF PACF AR(p) decays zero for h > p MA(q) zero for h > q decays ARMA(p,q) decays decays Stefanie Jegelka (and Caroline Uhler) 9 / 26
  • 22. Overview Determining model order for AR models Partial Autocorrelation Akaike Information Criterion Cross-validation Linear Processes Subseasonal weather forecasting and Local linear regression Stefanie Jegelka (and Caroline Uhler) 10 / 26
  • 23. Akaike Information Criterion (AIC) Main idea: tradeoff between data fit and model complexity. Stefanie Jegelka (and Caroline Uhler) 11 / 26
  • 24. Akaike Information Criterion (AIC) Main idea: tradeoff between data fit and model complexity. For a model with k parameters: AIC(k) = −2log-likelihood | {z } model fit + 2k |{z} complexity Stefanie Jegelka (and Caroline Uhler) 11 / 26
  • 25. Cross-validation for time series Stefanie Jegelka (and Caroline Uhler) 12 / 26
  • 26. Cross-validation for time series “usual” cross-validation: randomly partition into k folds. Repeatedly train on k − 1 folds, test on remaining. Stefanie Jegelka (and Caroline Uhler) 12 / 26
  • 27. Cross-validation for time series “usual” cross-validation: randomly partition into k folds. Repeatedly train on k − 1 folds, test on remaining. cross-validation works in special cases for AR models (Bergmeier, Hyndman, Koo 2015) Stefanie Jegelka (and Caroline Uhler) 12 / 26
  • 28. Cross-validation for time series “usual” cross-validation: randomly partition into k folds. Repeatedly train on k − 1 folds, test on remaining. cross-validation works in special cases for AR models (Bergmeier, Hyndman, Koo 2015) evaluation on a rolling forecasting origin Stefanie Jegelka (and Caroline Uhler) 12 / 26
  • 29. Overview Determining model order for AR models Partial Autocorrelation Akaike Information Criterion Cross-validation Linear Processes Subseasonal weather forecasting and Local linear regression Stefanie Jegelka (and Caroline Uhler) 13 / 26
  • 30. Relating MA and AR? Autoregressive AR(p): Xt = φ1Xt−1 + φ2Xt−2 + . . . φpXt−p + Wt Moving Average MA(q): Xt = Wt + θ1Wt−1 + θ2Wt−2 + . . . + θqWt−q Stefanie Jegelka (and Caroline Uhler) 14 / 26
  • 31. Relating MA and AR: Linear Process Xt = ∞ X j=−∞ ψj Wt−j Stefanie Jegelka (and Caroline Uhler) 15 / 26
  • 32. Relating MA and AR: Linear Process Xt = ∞ X j=−∞ ψj Wt−j X j |ψj | < ∞ Stefanie Jegelka (and Caroline Uhler) 15 / 26
  • 33. Relating MA and AR: Linear Process Xt = ∞ X j=−∞ ψj Wt−j X j |ψj | < ∞ causal: ψj = 0 whenever j < 0 (function of the past) Stefanie Jegelka (and Caroline Uhler) 15 / 26
  • 34. Relating MA and AR: Linear Process Xt = ∞ X j=−∞ ψj Wt−j X j |ψj | < ∞ causal: ψj = 0 whenever j < 0 (function of the past) E[Xt] = 0 Stefanie Jegelka (and Caroline Uhler) 15 / 26
  • 35. Relating MA and AR: Linear Process Xt = ∞ X j=−∞ ψj Wt−j X j |ψj | < ∞ causal: ψj = 0 whenever j < 0 (function of the past) E[Xt] = 0 γX (t, t + h) = P∞ i=−∞ ψi ψi+hσ2 w = γX (h) Stefanie Jegelka (and Caroline Uhler) 15 / 26
  • 36. Relating MA and AR: Linear Process Xt = ∞ X j=−∞ ψj Wt−j X j |ψj | < ∞ causal: ψj = 0 whenever j < 0 (function of the past) E[Xt] = 0 γX (t, t + h) = P∞ i=−∞ ψi ψi+hσ2 w = γX (h) Linear Process is stationary! Stefanie Jegelka (and Caroline Uhler) 15 / 26
  • 37. Relating MA and AR: Linear Process Xt = ∞ X j=−∞ ψj Wt−j X j |ψj | < ∞ causal: ψj = 0 whenever j < 0 (function of the past) E[Xt] = 0 γX (t, t + h) = P∞ i=−∞ ψi ψi+hσ2 w = γX (h) Linear Process is stationary! MA(q) is linear process, causal Stefanie Jegelka (and Caroline Uhler) 15 / 26
  • 38. Relating MA and AR: Linear Process Xt = ∞ X j=−∞ ψj Wt−j X j |ψj | < ∞ causal: ψj = 0 whenever j < 0 (function of the past) E[Xt] = 0 γX (t, t + h) = P∞ i=−∞ ψi ψi+hσ2 w = γX (h) Linear Process is stationary! MA(q) is linear process, causal What about AR? Stefanie Jegelka (and Caroline Uhler) 15 / 26
  • 39. AR(1) as Linear Process Xt = Wt + φ1Xt−1 Stefanie Jegelka (and Caroline Uhler) 16 / 26
  • 40. AR(1) as Linear Process Xt = Wt + φ1Xt−1 = ∞ X j=0 φj 1Wt−j Stefanie Jegelka (and Caroline Uhler) 16 / 26
  • 41. AR(1) as Linear Process Xt = Wt + φ1Xt−1 = ∞ X j=0 φj 1Wt−j converges if |φ1| < 1. Then: AR(1) is causal and stationary. Stefanie Jegelka (and Caroline Uhler) 16 / 26
  • 42. AR(1) as Linear Process Xt = Wt + φ1Xt−1 = ∞ X j=0 φj 1Wt−j converges if |φ1| < 1. Then: AR(1) is causal and stationary. general result: AR(p) is stationary & causal if linear process converges. Stefanie Jegelka (and Caroline Uhler) 16 / 26
  • 43. AR(1) as Linear Process Xt = Wt + φ1Xt−1 = ∞ X j=0 φj 1Wt−j converges if |φ1| < 1. Then: AR(1) is causal and stationary. general result: AR(p) is stationary & causal if linear process converges. Similarly: can write MA(1) as an infinite AR process (under similar convergence conditions) in the form Wt = Xt − φ1Xt−1 − φ2Xt−2 − . . .. If converges: invertible. Stefanie Jegelka (and Caroline Uhler) 16 / 26
  • 44. Overview Determining model order for AR models Partial Autocorrelation Akaike Information Criterion Cross-validation Linear Processes Subseasonal weather forecasting and Local linear regression J. Hwang, P. Orenstein, K. Pfeiffer, J. Cohen, L. Mackey. Improving Subseasonal Forecasting in the Western U.S. with Machine Learning. KDD 2019. Stefanie Jegelka (and Caroline Uhler) 17 / 26
  • 45. Subseasonal Rodeo images: Lester Mackey Stefanie Jegelka (and Caroline Uhler) 18 / 26
  • 46. Subseasonal Rodeo: measurements multiple time series Yt,g , indexed by grid points g temperature precipitation sea surface temperature & sea ice concentration ENSO index (pressure, wind, SST, temperature, cloudiness) Madden-Julian oscillation relative humidity / pressure North Americal Model Ensemble Stefanie Jegelka (and Caroline Uhler) 19 / 26
  • 47. Subseasonal Rodeo: measurements multiple time series Yt,g , indexed by grid points g temperature precipitation sea surface temperature & sea ice concentration ENSO index (pressure, wind, SST, temperature, cloudiness) Madden-Julian oscillation relative humidity / pressure North Americal Model Ensemble use anomalies at = yt − cmonthday(t) accuracy measure: skill(ât, at) = cos(ât, at) = hât, ati kâtkkatk Stefanie Jegelka (and Caroline Uhler) 19 / 26
  • 48. Questions What model should we fit? Which variables should we use for prediction? Stefanie Jegelka (and Caroline Uhler) 20 / 26
  • 49. Local Linear Regression (no time series) data from a nonlinear function, but we don’t know exact structure Stefanie Jegelka (and Caroline Uhler) 21 / 26
  • 50. Local Linear Regression (no time series) data from a nonlinear function, but we don’t know exact structure locally well approximated by a linear function Idea: To predict at x0, locally fit a linear model around x0 D = {x : kx − x0k ≤ h} Stefanie Jegelka (and Caroline Uhler) 21 / 26
  • 51. Local Linear Regression (no time series) data from a nonlinear function, but we don’t know exact structure locally well approximated by a linear function Idea: To predict at x0, locally fit a linear model around x0 D = {x : kx − x0k ≤ h} Obtain one β for each x0: β̂x0 = arg min β X xi ∈D wi (yi − β> (xi − x0))2 Stefanie Jegelka (and Caroline Uhler) 21 / 26
  • 52. Local Linear Regression (no time series) data from a nonlinear function, but we don’t know exact structure locally well approximated by a linear function Idea: To predict at x0, locally fit a linear model around x0 D = {x : kx − x0k ≤ h} Obtain one β for each x0: β̂x0 = arg min β X xi ∈D wi (yi − β> (xi − x0))2 larger h: smoother function Stefanie Jegelka (and Caroline Uhler) 21 / 26
  • 53. Local Linear Regression (no time series) data from a nonlinear function, but we don’t know exact structure locally well approximated by a linear function Idea: To predict at x0, locally fit a linear model around x0 D = {x : kx − x0k ≤ h} Obtain one β for each x0: β̂x0 = arg min β X xi ∈D wi (yi − β> (xi − x0))2 larger h: smoother function weighting: can use a kernel wi = K(xi −x0 h ) Stefanie Jegelka (and Caroline Uhler) 21 / 26
  • 54. Local Linear Regression (no time series) data from a nonlinear function, but we don’t know exact structure locally well approximated by a linear function Idea: To predict at x0, locally fit a linear model around x0 D = {x : kx − x0k ≤ h} Obtain one β for each x0: β̂x0 = arg min β X xi ∈D wi (yi − β> (xi − x0))2 larger h: smoother function weighting: can use a kernel wi = K(xi −x0 h ) (K(u) = 0 for |u| > 1, K(u) = K(−u), K(u) > 0 for |u| < 1) Stefanie Jegelka (and Caroline Uhler) 21 / 26
  • 55. Local Linear Regression for Time Series one model for each day of the year (shared across years) and grid point g. D = ±56-day span around target day of the year Stefanie Jegelka (and Caroline Uhler) 22 / 26
  • 56. Local Linear Regression for Time Series one model for each day of the year (shared across years) and grid point g. D = ±56-day span around target day of the year for each day of the year: β̂g = arg min β X t∈D wt,g (yt,g −bt,g − β> xt,g )2 Stefanie Jegelka (and Caroline Uhler) 22 / 26
  • 57. Local Linear Regression for Time Series one model for each day of the year (shared across years) and grid point g. D = ±56-day span around target day of the year for each day of the year: β̂g = arg min β X t∈D wt,g (yt,g −bt,g − β> xt,g )2 weighting: e.g. wt,g = 1 or wt,g = 1/ var(at) Stefanie Jegelka (and Caroline Uhler) 22 / 26
  • 58. Local Linear Regression for Time Series one model for each day of the year (shared across years) and grid point g. D = ±56-day span around target day of the year for each day of the year: β̂g = arg min β X t∈D wt,g (yt,g −bt,g − β> xt,g )2 weighting: e.g. wt,g = 1 or wt,g = 1/ var(at) offsets: bt,g = 0 or bt,g = cmonthday(t) Stefanie Jegelka (and Caroline Uhler) 22 / 26
  • 59. Model 1: external regressors with multitask feature selection Candidate features: measurements from each data source at different lags, and constant feature Stefanie Jegelka (and Caroline Uhler) 23 / 26
  • 60. Model 1: external regressors with multitask feature selection Candidate features: measurements from each data source at different lags, and constant feature allow different features for different days of the year but same features across all grid points for a given day Stefanie Jegelka (and Caroline Uhler) 23 / 26
  • 61. Model 1: external regressors with multitask feature selection Candidate features: measurements from each data source at different lags, and constant feature allow different features for different days of the year but same features across all grid points for a given day Backward regression: Start with all features, prune features one by one. Stefanie Jegelka (and Caroline Uhler) 23 / 26
  • 62. Model 1: external regressors with multitask feature selection Candidate features: measurements from each data source at different lags, and constant feature allow different features for different days of the year but same features across all grid points for a given day Backward regression: Start with all features, prune features one by one. Drop feature that reduces the prediction accuracy the least, if below a tolerance threshold Accuracy: Leave-one-year out cross-validation with “skill” Stefanie Jegelka (and Caroline Uhler) 23 / 26
  • 63. Selected features (example) Stefanie Jegelka (and Caroline Uhler) 24 / 26
  • 64. Selected features (example) image source: (Hwang et al 2019) Stefanie Jegelka (and Caroline Uhler) 24 / 26
  • 65. Summary Determining model order for AR models: Partial Autocorrelation Akaike Information Criterion Cross-validation Relating AR and MA modesl: Linear Processes Nonlinear model: Local linear regression and application example Stefanie Jegelka (and Caroline Uhler) 25 / 26
  • 66. References and Further Reading Paul S.P. Cowpertwait and Andrew V. Metcalfe. Introductory Time Series with R. Springer, 2009. Chapter 4.5, 4.6 R.H. Shumway, D.S. Stoffer. Time Series Analysis and its Applications, with Examples in R. Springer, 2011. Chapter 3.1, 3.3. R. Carmona. Statistical Analysis of Financial Data in R. Springer, 2014. Chapter 6. J. Hwang, P. Orenstein, K. Pfeiffer, J. Cohen, L. Mackey. Improving Subseasonal Forecasting in the Western U.S. with Machine Learning. KDD 2019. Stefanie Jegelka (and Caroline Uhler) 26 / 26