Seattle.Slides.7

Estimation of Multi-Granger Network Causal Models.
Andrey Skripnikov1
( joint work with George Michailidis1)
1Department of Statistics
University of Florida
August 6, 2015
1

Motivation. Background.
Network Granger causality focuses on estimating Granger causal
effects from p-time series and it can be operationalized through
Vector Autoregressive Models (VAR)(Lutkepohl, 1991)
The latter represent a popular class of time series models that has
been widely used in applied econometrics and finance and more
recently in biomedical applications( Bernanke et al, 2005; Michailidis
et al, 2013).
Factor models are widely used in finance/economics with purpose of
dimension reduction. These models significantly help with performing
time and resource-efficient estimation of covariance matrices in
high-dimensional cases(Fan et al, 2011).
2

Motivation. Present work.
In this work, we discuss joint estimation and model selection issues of
multiple Granger causal networks. For a simple motivational setup lets
assume that we observe the same set of p variables over two diﬀerent
entities over time. As an example one can monitor GDP, income,
etc(variables) of USA and Canada(entities) over time.
There are two goals:
to estimate contemporaneous dependence of variable values at time t
with variable values at time t + 1 for each of the entities(through
VAR models);
to estimate error covariances for each of the entities(through factor
models).
3

Single VAR model formulation.
Notation:
Xt = (Xt
1 , ..., Xt
p )T , where Xt
i - value of variable i at time t,
t = ( t
1, ..., t
p)T , the vector of observation errors at time t.
Model setup:
Xt
= Ap×pXt−1
+ t
, t
∼ N(O, Σp×p), (1)
4

Assumptions and constraints
A - sparse(always appropriate in high-dimensional cases).
There are some of the constraints on covariance Σ that have been
looked at before:
- Σ = σ2I (Lutkepohl, 1991),
- Σ−1 is sparse(Basu, Michailidis, 2013),
- Σ follows a K-factor model.
t
= ΛFt + U, U ∼ N(0, ΣU), cov(Ft) = IK
Σ = ΛΛ + ΣU, (2)
where Ft - K × 1 vector of unobserved factors(K - number of factors,
K < p), Λ - p × K matrix of factor loadings, ΣU - p × p matrix with
a sparse inverse(idiosyncratic component).
We will be using the third one as in ﬁnance and economics it makes
sense to assume that errors depend on a few main common
underlying factors. 5

Problem (1) has an equivalent formulation as a standard regression
problem:
Y = Zβ + , ∼ N(O, Σnew ) (3)
Let Y = (X1, ..., Xp) , where Xi = (XT
i , ..., X1
i ) ,i = 1, ..., p.
Let WT×p =




XT−1
1 XT−1
2 ... XT−1
p
XT−2
1 XT−2
2 ... XT−2
p
... ... ... ... ...
X0
1 X0
2 ..... X0
p



.
Let Z = Ip×p ⊗ WT×p.
Let β = (A11, A12, ..., A1p, A21, ..., A2p, ..., App) (matrix A stretched into a
vector).
Let = ( 1, ..., p) , where i = ( T
i , ..., 1
i ), i = 1, ..., p.
6

We consider process { t} to be:
covariance-stationary
uncorrelated over time
From these conditions it follows that for the demanding setup
∼ N(O, Σnew ), where Σnew = Σ ⊗ IT×T .
In case of a known true Σ the optimization criterion is a standard lasso
problem:
min
β
||Σ
−1/2
new (Y − Zβ)||2
2 + λ||β||1. (4)
7

Single VAR model formulation. The algorithm.
The Algorithm
Step 1 Get an estimate ˆΣ−1
new of Σ−1
new .
Step 2 Use that estimate ˆΣ−1
new for the following optimization task:
min
β
||ˆΣ
−1/2
new (Y − Zβ)||2
2 + λ||β||1.
The approach to completing step 1 is the following:
- Calculate ˆAOLS = (Z Z)−1Z Y .
- Get residuals ˆ = Y − Z ˆAOLS , calculate their covariance matrix.
- Do an eigenvalue decomposition of the residual covariance matrix: the
part corresponding to spiked eigenvalues will act as an estimate of ΛΛ ,
while the remaining part we will use to estimate ΣU(Σ−1
U ).
- Use graphical lasso to estimate ΣU(Σ−1
U )
- Get an estimate ˆΣ = ˆΛˆΛ + ˆΣU, take an inverse of it and get
ˆΣ−1
new = ˆΣ−1 ⊗ IT×T
8

Joint VAR model formulation.
Notation:
Xt = (Xt
1 , ..., Xt
p )T , where Xt
Y t = (Y t
1 , ..., Y t
p )T , where Y t
t = ( t
1, ..., t
2p)T , the vector of observation errors at time t.
Model setup:
Xt
Y t = A
Xt−1
Y t−1 + t
, t
∼ N(O, Σ), (5)
where
A2p x 2p =
(A11)p×p Op×p
Op×p (A22)p×p
, (6)
9

Joint VAR model formulation. Assumptions and
constraints.
To keep it simple we are making the following assumptions:
Assumptions and constraints
Σ =
(Σ11)p×p Op×p
Op×p (Σ22)p×p
.
A11 ∼ A22 (A11 is similar to A22) and both are sparse. Although we
are dealing with two diﬀerent entities we still expect to see common
patterns of relationship between p variables on those entities.
Σ11 and Σ22 follow K-factor model (2).
10

Joint VAR model formulation. The algorithm.
The Algorithm
Step 1. Get estimates for Σ−1
11 and Σ−1
22 as we did in the algorithm for
separate estimation.
Step 2. Use these estimates to run a joint optimization procedure for
estimating A11 and A22.
The standard regression setup for the joint problem:
Y1
Y2
=
Z1 O
O Z2
β + , ∼ N(O, Σnew ), (7)
where Σnew =
(Σ11)new O
O (Σ22)new
, β =
β1
β2
.
11

Joint estimation algorithm
The optimization criterion is a generalized fused lasso problem:
||
((Σ1)new )−1/2Y1
((Σ2)new )−1/2Y2
−
((Σ1)new )−1/2Z1β11
((Σ2)new )−1/2Z2β22
||2
2
+ γλ|β| + λ|β11 − β22|, (8)
12

Simulation study - data generation mechanism.
1. A is generated with maximum eigenvalue 0.6 so that the resulting VAR
model is stationary.
2. All the results are obtanied over the common grid for the sparsity
parameter: 200 ∗ (0.5)seq(0,50,length=150).
3. For Σ = ΛΛ + ΣU:
Λp×K = (b)ji , bji ∼ N(0, 1), j ≤ p, i ≤ K,
For ΣU - generated a diagonal matrix to make sure that the inverse is
sparse and that the signal ratio of ΛΛ to ΣU is not small(restricted it
to be between 1.5 and 3)
4. As for the other characteristics: SNR=1, edge density of transition
matrices A is 0.1, K(number of factors)=1, 50 replications.
13

Simulation study. Joint VAR estimation.
Comparing the empirical estimate of covariance inverse matrix with
factor model estimate
Estimates are based on residuals we get after calculating OLS
estimate of A.
Measure used - ||ˆΣ − Σtrue||F /||Σtrue||F (normalized Frobenius
diﬀerence)
Inverse Covariance Empirical Factor model
p= 5 ,t= 20 8.064 0.48
p= 5 ,t= 30 4.319 0.57
p= 5 ,t= 50 2.971 0.633
p= 10 ,t= 30 15.726 0.401
p= 10 ,t= 50 5.954 0.533
p= 10 ,t= 80 4.246 0.69
14

Simulation study. Joint vs Separate method.
Matrices A11 and A22 have the same sparsity pattern but diﬀerent
element values.
Matrices Σ11 and Σ22 are identical.
Table for 1-step mean squared forecasting error:
Joint method Separate method
p= 5 ,t= 20 0.49(0.36) 0.53(0.40)
p= 5 ,t= 30 0.47(0.36) 0.49(0.37)
p= 5 ,t= 50 0.47(0.32) 0.48(0.33)
p= 10 ,t= 30 0.43(0.34) 0.47(0.35)
p= 10 ,t= 50 0.52(0.39) 0.54(0.41)
p= 10 ,t= 80 0.41(0.26) 0.43(0.26)
15

Simulation study. Estimation of A’s.
ROC curves Frobenius diﬀerence
0.0 0.2 0.4 0.6 0.8 1.0
0.20.61.0
p= 10 t= 30
FP
FN
J S
0.30.5
0.0 0.2 0.4 0.6 0.8 1.0
0.20.40.60.81.0
p= 10 t= 50
FP
FN
q
J S
0.10.20.30.4
0.2 0.4 0.6 0.8 1.0
0.40.60.81.0
p= 10 t= 80
FP
FN
J S
0.100.200.30
16

When more heterogeneity is introduced, advantage of the joint method
starts to diminish. Example:
Matrices A11 and A22 have slightly diﬀerent sparsity pattern and
diﬀerent element values.
Matrices Σ11 and Σ22 are identical.
17

0.0 0.2 0.4 0.6 0.8 1.0
0.20.61.0
p= 10 t= 30
FP
FN
J S
0.20.40.60.8
0.0 0.2 0.4 0.6 0.8 1.0
0.20.61.0
p= 10 t= 50
FP
FN
q
J S
0.20.40.60.8
0.0 0.2 0.4 0.6 0.8 1.0
0.20.40.60.81.0
p= 10 t= 80
FP
FN
q
q
q
q
J S
0.20.40.6
18

Conclusion.
Main ideas:
Joint modeling is very important when one doesn’t have enough data
for separate estimation. Especially in case of high-dimensional sparse
models joint method will almost always outperform separate method.
In the high-dimensional setting factor modeling appears to have a
dominant advantage in inverse covariance estimation compared to
simply using an empirical estimate.
Possible improvements:
Problem can be extended to any number H of entities.
Coming up with a less computationally demanding setup.
Coming up with software that allows more ﬂexibility in terms of
picking sparsity and fusion parameters in generalized lasso
optimization problem.
19

Thanks!
Contact address: usdandres@uﬂ.edu
20

Simulation study. Single VAR estimation.
Here we compare performance of the following three methods for single
VAR estimation:
- using factor models estimate of Σ;
- using least squares approach(Σ = I);
- using oracles approach(we just plug in the true Σ from simulations).
So all the results come from solving criterion (4) for diﬀerent plugged in
Σ’s.
21

Simulation study. Single VAR estimation. Mean forecast
error.
Table for 1-step mean squared forecasting error:
Our method Least squares Oracle
p= 5 ,t= 20 0.52(0.93) 0.54(0.95) 0.49(0.93)
p= 5 ,t= 30 0.59(0.69) 0.63(0.70) 0.58(0.69)
p= 5 ,t= 50 0.52(0.59) 0.57(0.63) 0.52(0.60)
p= 10 ,t= 30 0.63(0.80) 0.78(1.01) 0.63(0.80)
p= 10 ,t= 50 0.66(1.01) 0.69(0.97) 0.66(1.00)
p= 10 ,t= 80 0.46(0.45) 0.54(0.58) 0.44(0.44)
22

0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
p= 5 t= 20
FP
FN
q
q
q
F LS O
0.20.40.60.81.0
0.0 0.2 0.4 0.6 0.8 1.0
0.20.61.0
p= 5 t= 30
FP
FN
q
q
q
F LS O
0.20.40.60.8
0.0 0.2 0.4 0.6 0.8 1.0
0.20.61.0
p= 5 t= 50
FP
FN
q
q
q
q
F LS O
0.10.30.50.7
23

0.0 0.2 0.4 0.6 0.8 1.0
0.20.40.60.81.0
p= 10 t= 30
FP
FN
F LS O
0.30.50.7
0.2 0.4 0.6 0.8 1.0
0.40.60.81.0
p= 10 t= 50
FP
FN
qq
F LS O
0.20.40.6
0.2 0.4 0.6 0.8 1.0
0.50.70.9
p= 10 t= 80
FP
FN
q
q
q
F LS O
0.20.40.6
24

Simulation study. Joint VAR estimation.
Unfortunately we have a software limitation: the only generalized lasso
package in R can optimize criterion (8) for ﬁxed ratio γ. Each time we run
it, it only gives solutions at knots where ratio of sparsity parameter to
fusion parameter equals γ. So we can’t just set an unrestricted
2-dimensional grid for sparsity and fusion parameters, there always going
to be that ratio dependence between them. We still tried a couple of
approaches to run around that limitation.
After running the algorithm for a bunch of values of γ I picked one that
performed best and simply used it for comparison with separate method.
The experiments I ran were for very similar matrices A11 and A22, and
γ = 0.5 tended to give better results(so more accent on fusion than
sparsity).

Simulation study. Joint vs Separate method.
For heterogeneous case A11 and A22 are generated the following way:
ﬁrst generate matrices with same sparsity pattern and with 5 % edge
density
add 3 % of edges to each of the 2 matrices separately
For the case of 10 variables in the vast majority of cases we will get
heterogeneous structures for matrices A11 and A22.
26

Seattle.Slides.7

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to Seattle.Slides.7

Similar to Seattle.Slides.7 (20)

Seattle.Slides.7