Multidimensional time series appear in many fields of application. Sometimes, it can be useful to use PCA to reach dimensionality reduction. However, formal inference procedures on PC rely on the independence of the variables. Therefore, several PC-like techniques, as Singular Spectrum Analysis, are used to attain this reduction by decomposing the original series into a sum of a small number of interpretable components. Here, SSA and its extension are described and applied to real datasets.
1. Isabel Silva Some contributions to PCA for time series
Some contributions to PCA for time series
Isabel Silva
Departamento de Engenharia Civil, Faculdade de Engenharia da Universidade do Porto
Unidade de Investigação Matemática e Aplicações (UIMA), Universidade de Aveiro
JOCLAD 2010
JOCLAD 2010 1 / 21
2. Isabel Silva Some contributions to PCA for time series
Outline
Motivation
Principal Component Analysis for time series
◮ Classic PCA
◮ Singular Spectrum Analysis (SSA) / Multi-Channel Singular Spectrum Analysis
(MSSA)
Illustration
Final remarks
JOCLAD 2010 2 / 21
3. Isabel Silva Some contributions to PCA for time series
Motivation
Multidimensional time and space-time series
Motivation JOCLAD 2010 3 / 21
4. Isabel Silva Some contributions to PCA for time series
Motivation
Multidimensional time and space-time series
Number of observations (T) > Number of series (n)
Dimensionality reduction
Motivation JOCLAD 2010 3 / 21
5. Isabel Silva Some contributions to PCA for time series
Motivation
Multidimensional time and space-time series
Number of observations (T) > Number of series (n)
Dimensionality reduction
Principal Components Analysis (PCA)
Motivation JOCLAD 2010 3 / 21
6. Isabel Silva Some contributions to PCA for time series
Motivation
Multidimensional time and space-time series
Number of observations (T) > Number of series (n)
Dimensionality reduction
Principal Components Analysis (PCA)
p original variables linear
−−−−−−−−−→
transformation
M uncorrelated variables:
Principal Components (PC)
M ≪ p retain most of the variation presented in the dataset [Jolliffe (2002)]
Motivation JOCLAD 2010 3 / 21
7. Isabel Silva Some contributions to PCA for time series
Motivation
Time series can be considered as variables or as measurements
↓
observation times are the variables
Motivation JOCLAD 2010 4 / 21
8. Isabel Silva Some contributions to PCA for time series
Motivation
Time series can be considered as variables or as measurements
↓
observation times are the variables
Formal inference based on PC rely on the independence (and multivariate normality)
↓
condition not satisfied for time series
Motivation JOCLAD 2010 4 / 21
9. Isabel Silva Some contributions to PCA for time series
Motivation
Time series can be considered as variables or as measurements
↓
observation times are the variables
Formal inference based on PC rely on the independence (and multivariate normality)
↓
condition not satisfied for time series
To take in account the correlation in time (and space):
Dynamic Principal Component Analysis [Brillinger (2001)]: PCA for stationary time
series at each frequency −→ uncorrelated principal components series
Singular Spectral Analysis (SSA): Carry out a PCA on a suitable chosen lagged
version of the original time series
Motivation JOCLAD 2010 4 / 21
10. Isabel Silva Some contributions to PCA for time series
Classic Principal Component Analysis
n measurements on T VARIABLES: {Y1,Y2,...,YT}, Yj ∈ Rn, j = 1,...,T
n time series, each one with T OBSERVATIONS: {y1,y2,...,yn}, yi ∈ RT, i = 1,...,n
Principal Component Analysis for time series JOCLAD 2010 5 / 21
11. Isabel Silva Some contributions to PCA for time series
Classic Principal Component Analysis
n measurements on T VARIABLES: {Y1,Y2,...,YT}, Yj ∈ Rn, j = 1,...,T
n time series, each one with T OBSERVATIONS: {y1,y2,...,yn}, yi ∈ RT, i = 1,...,n
xij = yij −Yj = yij −
1
n
n
∑
i=1
yij, i = 1,...,n; j = 1,...,T
X =
x1
x2
...
xn
= X1 X2 ··· XT =
x11 x12 ··· x1T
x21 x22 ··· x2T
...
...
...
...
xn1 xn2 ··· xnT
Principal Component Analysis for time series JOCLAD 2010 5 / 21
12. Isabel Silva Some contributions to PCA for time series
Classic Principal Component Analysis
Sample variance-covariance matrix (T ×T) of X : S =
1
n
XT
X
Diagonalizing S
λ1 ≥ λ2 ≥ ··· ≥ λT > 0 ||υυυj|| = 1, j = 1,...,T
Principal Component Analysis for time series JOCLAD 2010 6 / 21
13. Isabel Silva Some contributions to PCA for time series
Classic Principal Component Analysis
Sample variance-covariance matrix (T ×T) of X : S =
1
n
XT
X
Diagonalizing S
λ1 ≥ λ2 ≥ ··· ≥ λT > 0 ||υυυj|| = 1, j = 1,...,T
jth Principal Component
Zj = Xυυυj = υj1X1 +υj2X2 +...+υjTXT, j = 1,...,T
Var(Zj) = λj, j = 1,...,T
Proportion of variance due to Zj :
λj
λ1 +···+λT
, j = 1,...,T
Principal Component Analysis for time series JOCLAD 2010 6 / 21
14. Isabel Silva Some contributions to PCA for time series
Singular Spectrum Analysis (SSA)
Decompose the original series in a small number of independent and
interpretable components that can be considered as trend and oscillatory
components and a structureless noise
No stationarity assumptions for the time series are needed
Principal Component Analysis for time series JOCLAD 2010 7 / 21
15. Isabel Silva Some contributions to PCA for time series
Singular Spectrum Analysis (SSA)
Decompose the original series in a small number of independent and
interpretable components that can be considered as trend and oscillatory
components and a structureless noise
No stationarity assumptions for the time series are needed
Basic SSA [Golyandina, Nekrutkin and Zhigljavsky (2001)]
Decomposition stage
◮ Embedding
◮ Singular Value Decomposition (SVD)
Reconstruction stage
◮ Grouping
◮ Diagonal averaging
Principal Component Analysis for time series JOCLAD 2010 7 / 21
16. Isabel Silva Some contributions to PCA for time series
Singular Spectrum Analysis (SSA)
Embedding
Time series: y = {y0,y1,...,yn−1} L : window length (1 < L < n)
Principal Component Analysis for time series JOCLAD 2010 8 / 21
17. Isabel Silva Some contributions to PCA for time series
Singular Spectrum Analysis (SSA)
Embedding
Time series: y = {y0,y1,...,yn−1} L : window length (1 < L < n)
Trajectory matrix (K ×L, K = n−L+1)
X = X1 X2 X3 ··· XL =
y0 y1 y2 ··· yL−1
y1 y2 y3 ··· yL
y2 y3 y4 ··· yL+1
...
...
...
...
...
yK yK+1 yK+2 ··· yn−1
Principal Component Analysis for time series JOCLAD 2010 8 / 21
18. Isabel Silva Some contributions to PCA for time series
Singular Spectrum Analysis (SSA)
Embedding
Time series: y = {y0,y1,...,yn−1} L : window length (1 < L < n)
Trajectory matrix (K ×L, K = n−L+1)
X = X1 X2 X3 ··· XL =
y0 y1 y2 ··· yL−1
y1 y2 y3 ··· yL
y2 y3 y4 ··· yL+1
...
...
...
...
...
yK yK+1 yK+2 ··· yn−1
SVD
S = XT
X −→ eigenvalues: λ1 ≥ λ2 ≥ ··· ≥ λL and eigenvectors: U1,U2,...,UL
d = rank(X) = max{i : λi > 0} ≤ L Vi = XUi/
√
λi, i = 1,...,d
Principal Component Analysis for time series JOCLAD 2010 8 / 21
19. Isabel Silva Some contributions to PCA for time series
Singular Spectrum Analysis (SSA)
Embedding
Time series: y = {y0,y1,...,yn−1} L : window length (1 < L < n)
Trajectory matrix (K ×L, K = n−L+1)
X = X1 X2 X3 ··· XL =
y0 y1 y2 ··· yL−1
y1 y2 y3 ··· yL
y2 y3 y4 ··· yL+1
...
...
...
...
...
yK yK+1 yK+2 ··· yn−1
SVD
S = XT
X −→ eigenvalues: λ1 ≥ λ2 ≥ ··· ≥ λL and eigenvectors: U1,U2,...,UL
d = rank(X) = max{i : λi > 0} ≤ L Vi = XUi/
√
λi, i = 1,...,d
X = X1 +X2 +···+Xd, Xi =
√
λi Vi Ui
T, (λi,Ui,Vi) : eigentriples
Principal Component Analysis for time series JOCLAD 2010 8 / 21
20. Isabel Silva Some contributions to PCA for time series
Singular Spectrum Analysis (SSA)
Grouping
M : number of PC → Partition of {1,...,d} into M disjoint subsets I1,...,IM,
where Ik = {ik1
,...,ikp }
Construct the corresponding resultant matrix: XIk
= Xik1
+···+Xikp
Principal Component Analysis for time series JOCLAD 2010 9 / 21
21. Isabel Silva Some contributions to PCA for time series
Singular Spectrum Analysis (SSA)
Grouping
M : number of PC → Partition of {1,...,d} into M disjoint subsets I1,...,IM,
where Ik = {ik1
,...,ikp }
Construct the corresponding resultant matrix: XIk
= Xik1
+···+Xikp
X ≈ XI1
+···+XIM
The contribution of the component XIk
:
∑i∈Ik
λi
∑d
i=1 λi
Principal Component Analysis for time series JOCLAD 2010 9 / 21
22. Isabel Silva Some contributions to PCA for time series
Singular Spectrum Analysis (SSA)
Grouping
M : number of PC → Partition of {1,...,d} into M disjoint subsets I1,...,IM,
where Ik = {ik1
,...,ikp }
Construct the corresponding resultant matrix: XIk
= Xik1
+···+Xikp
X ≈ XI1
+···+XIM
The contribution of the component XIk
:
∑i∈Ik
λi
∑d
i=1 λi
Depend on the objective of the study
Inspection of the singular values (λi) and vectors (Ui,Vi)
To use supplementary information for the parameter choice [Hassani (2007)]:
◮ Periodicity on dataset, periodogram analysis, pairwise scatterplots of singular
vectors, ...
Principal Component Analysis for time series JOCLAD 2010 9 / 21
23. Isabel Silva Some contributions to PCA for time series
Singular Spectrum Analysis (SSA)
Diagonal Averaging
Transform XIk
= xij
(k)
L,K
i,j=1
,k = 1,...,M, into a new series ˜XIk
= {˜y
(k)
0 ,..., ˜y
(k)
n−1},
↓
˜y
(k)
t is obtained by averaging xij
(k) over all i,j : i+j = t +2, t = 0,...n−1
Principal Component Analysis for time series JOCLAD 2010 10 / 21
24. Isabel Silva Some contributions to PCA for time series
Singular Spectrum Analysis (SSA)
Diagonal Averaging
Transform XIk
= xij
(k)
L,K
i,j=1
,k = 1,...,M, into a new series ˜XIk
= {˜y
(k)
0 ,..., ˜y
(k)
n−1},
↓
˜y
(k)
t is obtained by averaging xij
(k) over all i,j : i+j = t +2, t = 0,...n−1
L∗ = min{L,K}; K∗ = max{L,K}; x∗
ij
(k)
= xij
(k) if L < K; x∗
ij
(k)
= xji
(k) if L ≥ K
˜y
(k)
t =
1
t +1 ∑
t+1
p=1
x∗
p,t−p+2
(k)
, if 0 ≤ t < L∗ −1
1
L∗ ∑
L∗
p=1
x∗
p,t−p+2
(k)
, if L∗ −1 ≤ t < K∗
1
n−t ∑
n−K∗+1
p=t−K∗+2
x∗
p,t−p+2
(k)
, if K∗ ≤ t < n
Principal Component Analysis for time series JOCLAD 2010 10 / 21
25. Isabel Silva Some contributions to PCA for time series
Singular Spectrum Analysis (SSA)
Diagonal Averaging
Transform XIk
= xij
(k)
L,K
i,j=1
,k = 1,...,M, into a new series ˜XIk
= {˜y
(k)
0 ,..., ˜y
(k)
n−1},
↓
˜y
(k)
t is obtained by averaging xij
(k) over all i,j : i+j = t +2, t = 0,...n−1
L∗ = min{L,K}; K∗ = max{L,K}; x∗
ij
(k)
= xij
(k) if L < K; x∗
ij
(k)
= xji
(k) if L ≥ K
˜y
(k)
t =
1
t +1 ∑
t+1
p=1
x∗
p,t−p+2
(k)
, if 0 ≤ t < L∗ −1
1
L∗ ∑
L∗
p=1
x∗
p,t−p+2
(k)
, if L∗ −1 ≤ t < K∗
1
n−t ∑
n−K∗+1
p=t−K∗+2
x∗
p,t−p+2
(k)
, if K∗ ≤ t < n
y ≈ ˜XI1
+···+ ˜XIM ⇐⇒ yt =
M
∑
k=1
˜y
(k)
t , t = 0,...,n−1
Principal Component Analysis for time series JOCLAD 2010 10 / 21
26. Isabel Silva Some contributions to PCA for time series
Singular Spectrum Analysis (SSA)
Multichannel SSA [Golyandina and Stepanov (2005)]
Extension of SSA to p time series of length n :
{y1,...,yp} where yi = {yi,0,yi,1,...,yi,n−1},i = 1,...,p
Principal Component Analysis for time series JOCLAD 2010 11 / 21
27. Isabel Silva Some contributions to PCA for time series
Singular Spectrum Analysis (SSA)
Multichannel SSA [Golyandina and Stepanov (2005)]
Extension of SSA to p time series of length n :
{y1,...,yp} where yi = {yi,0,yi,1,...,yi,n−1},i = 1,...,p
Apply SSA to a large trajectory matrix (K ×Lp)
X =
y1,0 ··· y1,L−1 y2,0 ··· y2,L−1 ··· yp,0 ··· yp,L−1
y1,1 ··· y1,L y2,1 ··· y2,L ··· yp,1 ··· yp,L
...
...
...
...
...
...
...
...
...
...
y1,K ··· y1,n−1 y2,K ··· y2,n−1 ··· yp,K ··· yp,n−1
Principal Component Analysis for time series JOCLAD 2010 11 / 21
28. Isabel Silva Some contributions to PCA for time series
Illustration
Practical problems
Choice of the dimension L → L ≈ n/2 or depending of the periodicity of data
Selection of M and the way of grouping the indices
Illustration JOCLAD 2010 12 / 21
29. Isabel Silva Some contributions to PCA for time series
Illustration
Practical problems
Choice of the dimension L → L ≈ n/2 or depending of the periodicity of data
Selection of M and the way of grouping the indices
Rodrigues and de Carvalho (2008): carefully choice of L and M → they can
compromise the analysis results
Illustration JOCLAD 2010 12 / 21
30. Isabel Silva Some contributions to PCA for time series
Illustration
Practical problems
Choice of the dimension L → L ≈ n/2 or depending of the periodicity of data
Selection of M and the way of grouping the indices
Rodrigues and de Carvalho (2008): carefully choice of L and M → they can
compromise the analysis results
Dataset: Monthly average number of
occupied hotel rooms, from 1963 to
1976 (Source: Time Series Data Library,
http://robjhyndman.com/TSDL//)
Software: SSA - Matlab Tools for
SSA (Eric Breitenberger) and ssa.m
(Francisco Alonso) Jan1963 Dec1976
400
500
600
700
800
900
1000
1100
1200
month
numberofoccupiedrooms
Illustration JOCLAD 2010 12 / 21
32. Isabel Silva Some contributions to PCA for time series
Illustration
1 2 3 4 5 6 7 8
400
500
600
700
800
1 2 3 4 5 6 7 8
−50
0
50
residual=y−y_reconstructed
y
y_reconstructed
Illustration JOCLAD 2010 14 / 21
33. Isabel Silva Some contributions to PCA for time series
Illustration
Monthly number of occupied rooms
Jan1963 Dec1976
400
500
600
700
800
900
1000
1100
1200
month
numberofoccupiedrooms
Illustration JOCLAD 2010 15 / 21
34. Isabel Silva Some contributions to PCA for time series
Illustration
Monthly number of occupied rooms
Jan1963 Dec1976
400
500
600
700
800
900
1000
1100
1200
month
numberofoccupiedrooms
n = 168, L = 12, K = 168−12+1 = 157
Illustration JOCLAD 2010 15 / 21
36. Isabel Silva Some contributions to PCA for time series
Illustration
Normalized singular values of the monthly number of occupied rooms
If n,L and K are sufficiently large, each harmonic produces two eigentriples with
close singular values
0 2 4 6 8 10 12
0
5
10
15
20
25
30
35
40
45
i
normalizedλ
i
Illustration JOCLAD 2010 17 / 21
37. Isabel Silva Some contributions to PCA for time series
Illustration
The contribution of the components XI1
: 97.96%, XI2_3
: 1.42%, XI4_5
: 0,32%
20 40 60 80 100 120 140 160
400
600
800
1000
1200
20 40 60 80 100 120 140 160
−500
0
500
1000
1500
20 40 60 80 100 120 140 160
−500
0
500
1000
1500
y
y_rec_PC1
y
y_rec_PC_2_3
y
y_rec_PC_4_5
Illustration JOCLAD 2010 18 / 21
38. Isabel Silva Some contributions to PCA for time series
Illustration
The contribution of the component XI1_5
: 99.70%
20 40 60 80 100 120 140 160
400
500
600
700
800
900
1000
1100
1200
20 40 60 80 100 120 140 160
−100
−50
0
50
100
y
y_rec_PC_1_to_5
residuals
Illustration JOCLAD 2010 19 / 21
39. Isabel Silva Some contributions to PCA for time series
Final remarks
PCA is a very popular and widely used tool for reducing the dimension of high
dimensional data
Classical PCA does not take into account dependence between observations
Time series can be considered as variables or as measurements → observation
times are the variables
Results of the different PCA-based techniques are not directly comparable
Practical problems of SSA: choice of L and M
Final remarks JOCLAD 2010 20 / 21
40. Isabel Silva Some contributions to PCA for time series
References
Brillinger , D. R., 2001. Time Series: Data Analysis and Theory. Classics in Applied Mathematics,
36, SIAM.
Golyandina, N., Nekrutkin, V. and Zhigljavsky, A., 2001. Analysis of Time Series Structure: SSA and
related techniques. Chapman & Hall/CRC Monographs on Statistics & Applied Probability.
Golyandina, N. and Stepanov, D., 2005. SSA-based approaches to analysis and forecast of
multidimensional time series. In Proceedings of the Fifth Workshop on Simulation, pp. 293 ˝U298.
Hassani, H., 2007. Singular Spectrum Analysis: Methodology and Comparison. Journal of Data
Science, Vol. 5, pp. 239 ˝U257.
Jolliffe, I. T., 2002. Principal Component Analysis. Springer, New York, 2nd ed..
Rodrigues, P. C. and De Carvalho, M., 2008. Monitoring Calibration of the Singular Spectrum
Analysis Method. In Proceedings of COMPSTAT’2008, Vol. 2, pp. 955-964, Physica-Verlag.
Shumway, R. and Stoffer, D., 2000. Time Series Analysis and Its Applications. Springer, New York.
References JOCLAD 2010 21 / 21