SlideShare a Scribd company logo
1 of 26
Download to read offline
Estimation of Multi-Granger Network Causal Models.
Andrey Skripnikov1
( joint work with George Michailidis1)
1Department of Statistics
University of Florida
August 6, 2015
1
Motivation. Background.
Network Granger causality focuses on estimating Granger causal
effects from p-time series and it can be operationalized through
Vector Autoregressive Models (VAR)(Lutkepohl, 1991)
The latter represent a popular class of time series models that has
been widely used in applied econometrics and finance and more
recently in biomedical applications( Bernanke et al, 2005; Michailidis
et al, 2013).
Factor models are widely used in finance/economics with purpose of
dimension reduction. These models significantly help with performing
time and resource-efficient estimation of covariance matrices in
high-dimensional cases(Fan et al, 2011).
2
Motivation. Present work.
In this work, we discuss joint estimation and model selection issues of
multiple Granger causal networks. For a simple motivational setup lets
assume that we observe the same set of p variables over two different
entities over time. As an example one can monitor GDP, income,
etc(variables) of USA and Canada(entities) over time.
There are two goals:
to estimate contemporaneous dependence of variable values at time t
with variable values at time t + 1 for each of the entities(through
VAR models);
to estimate error covariances for each of the entities(through factor
models).
3
Single VAR model formulation.
Notation:
Xt = (Xt
1 , ..., Xt
p )T , where Xt
i - value of variable i at time t,
t = ( t
1, ..., t
p)T , the vector of observation errors at time t.
Model setup:
Xt
= Ap×pXt−1
+ t
, t
∼ N(O, Σp×p), (1)
4
Single VAR model formulation.
Assumptions and constraints
A - sparse(always appropriate in high-dimensional cases).
There are some of the constraints on covariance Σ that have been
looked at before:
- Σ = σ2I (Lutkepohl, 1991),
- Σ−1 is sparse(Basu, Michailidis, 2013),
- Σ follows a K-factor model.
t
= ΛFt + U, U ∼ N(0, ΣU), cov(Ft) = IK
Σ = ΛΛ + ΣU, (2)
where Ft - K × 1 vector of unobserved factors(K - number of factors,
K < p), Λ - p × K matrix of factor loadings, ΣU - p × p matrix with
a sparse inverse(idiosyncratic component).
We will be using the third one as in finance and economics it makes
sense to assume that errors depend on a few main common
underlying factors. 5
Single VAR model formulation.
Problem (1) has an equivalent formulation as a standard regression
problem:
Y = Zβ + , ∼ N(O, Σnew ) (3)
Let Y = (X1, ..., Xp) , where Xi = (XT
i , ..., X1
i ) ,i = 1, ..., p.
Let WT×p =




XT−1
1 XT−1
2 ... XT−1
p
XT−2
1 XT−2
2 ... XT−2
p
... ... ... ... ...
X0
1 X0
2 ..... X0
p



.
Let Z = Ip×p ⊗ WT×p.
Let β = (A11, A12, ..., A1p, A21, ..., A2p, ..., App) (matrix A stretched into a
vector).
Let = ( 1, ..., p) , where i = ( T
i , ..., 1
i ), i = 1, ..., p.
6
Single VAR model formulation.
We consider process { t} to be:
covariance-stationary
uncorrelated over time
From these conditions it follows that for the demanding setup
∼ N(O, Σnew ), where Σnew = Σ ⊗ IT×T .
In case of a known true Σ the optimization criterion is a standard lasso
problem:
min
β
||Σ
−1/2
new (Y − Zβ)||2
2 + λ||β||1. (4)
7
Single VAR model formulation. The algorithm.
The Algorithm
Step 1 Get an estimate ˆΣ−1
new of Σ−1
new .
Step 2 Use that estimate ˆΣ−1
new for the following optimization task:
min
β
||ˆΣ
−1/2
new (Y − Zβ)||2
2 + λ||β||1.
The approach to completing step 1 is the following:
- Calculate ˆAOLS = (Z Z)−1Z Y .
- Get residuals ˆ = Y − Z ˆAOLS , calculate their covariance matrix.
- Do an eigenvalue decomposition of the residual covariance matrix: the
part corresponding to spiked eigenvalues will act as an estimate of ΛΛ ,
while the remaining part we will use to estimate ΣU(Σ−1
U ).
- Use graphical lasso to estimate ΣU(Σ−1
U )
- Get an estimate ˆΣ = ˆΛˆΛ + ˆΣU, take an inverse of it and get
ˆΣ−1
new = ˆΣ−1 ⊗ IT×T
8
Joint VAR model formulation.
Notation:
Xt = (Xt
1 , ..., Xt
p )T , where Xt
i - value of variable i at time t,
Y t = (Y t
1 , ..., Y t
p )T , where Y t
i - value of variable i at time t,
t = ( t
1, ..., t
2p)T , the vector of observation errors at time t.
Model setup:
Xt
Y t = A
Xt−1
Y t−1 + t
, t
∼ N(O, Σ), (5)
where
A2p x 2p =
(A11)p×p Op×p
Op×p (A22)p×p
, (6)
9
Joint VAR model formulation. Assumptions and
constraints.
To keep it simple we are making the following assumptions:
Assumptions and constraints
Σ =
(Σ11)p×p Op×p
Op×p (Σ22)p×p
.
A11 ∼ A22 (A11 is similar to A22) and both are sparse. Although we
are dealing with two different entities we still expect to see common
patterns of relationship between p variables on those entities.
Σ11 and Σ22 follow K-factor model (2).
10
Joint VAR model formulation. The algorithm.
The Algorithm
Step 1. Get estimates for Σ−1
11 and Σ−1
22 as we did in the algorithm for
separate estimation.
Step 2. Use these estimates to run a joint optimization procedure for
estimating A11 and A22.
The standard regression setup for the joint problem:
Y1
Y2
=
Z1 O
O Z2
β + , ∼ N(O, Σnew ), (7)
where Σnew =
(Σ11)new O
O (Σ22)new
, β =
β1
β2
.
11
Joint estimation algorithm
The optimization criterion is a generalized fused lasso problem:
||
((Σ1)new )−1/2Y1
((Σ2)new )−1/2Y2
−
((Σ1)new )−1/2Z1β11
((Σ2)new )−1/2Z2β22
||2
2
+ γλ|β| + λ|β11 − β22|, (8)
12
Simulation study - data generation mechanism.
1. A is generated with maximum eigenvalue 0.6 so that the resulting VAR
model is stationary.
2. All the results are obtanied over the common grid for the sparsity
parameter: 200 ∗ (0.5)seq(0,50,length=150).
3. For Σ = ΛΛ + ΣU:
Λp×K = (b)ji , bji ∼ N(0, 1), j ≤ p, i ≤ K,
For ΣU - generated a diagonal matrix to make sure that the inverse is
sparse and that the signal ratio of ΛΛ to ΣU is not small(restricted it
to be between 1.5 and 3)
4. As for the other characteristics: SNR=1, edge density of transition
matrices A is 0.1, K(number of factors)=1, 50 replications.
13
Simulation study. Joint VAR estimation.
Comparing the empirical estimate of covariance inverse matrix with
factor model estimate
Estimates are based on residuals we get after calculating OLS
estimate of A.
Measure used - ||ˆΣ − Σtrue||F /||Σtrue||F (normalized Frobenius
difference)
Inverse Covariance Empirical Factor model
p= 5 ,t= 20 8.064 0.48
p= 5 ,t= 30 4.319 0.57
p= 5 ,t= 50 2.971 0.633
p= 10 ,t= 30 15.726 0.401
p= 10 ,t= 50 5.954 0.533
p= 10 ,t= 80 4.246 0.69
14
Simulation study. Joint vs Separate method.
Matrices A11 and A22 have the same sparsity pattern but different
element values.
Matrices Σ11 and Σ22 are identical.
Table for 1-step mean squared forecasting error:
Joint method Separate method
p= 5 ,t= 20 0.49(0.36) 0.53(0.40)
p= 5 ,t= 30 0.47(0.36) 0.49(0.37)
p= 5 ,t= 50 0.47(0.32) 0.48(0.33)
p= 10 ,t= 30 0.43(0.34) 0.47(0.35)
p= 10 ,t= 50 0.52(0.39) 0.54(0.41)
p= 10 ,t= 80 0.41(0.26) 0.43(0.26)
15
Simulation study. Estimation of A’s.
ROC curves Frobenius difference
0.0 0.2 0.4 0.6 0.8 1.0
0.20.61.0
p= 10 t= 30
FP
FN
J S
0.30.5
0.0 0.2 0.4 0.6 0.8 1.0
0.20.40.60.81.0
p= 10 t= 50
FP
FN
q
J S
0.10.20.30.4
0.2 0.4 0.6 0.8 1.0
0.40.60.81.0
p= 10 t= 80
FP
FN
J S
0.100.200.30
16
Simulation study. Estimation of A’s.
When more heterogeneity is introduced, advantage of the joint method
starts to diminish. Example:
Matrices A11 and A22 have slightly different sparsity pattern and
different element values.
Matrices Σ11 and Σ22 are identical.
17
Simulation study. Estimation of A’s.
ROC curves Frobenius difference
0.0 0.2 0.4 0.6 0.8 1.0
0.20.61.0
p= 10 t= 30
FP
FN
J S
0.20.40.60.8
0.0 0.2 0.4 0.6 0.8 1.0
0.20.61.0
p= 10 t= 50
FP
FN
q
J S
0.20.40.60.8
0.0 0.2 0.4 0.6 0.8 1.0
0.20.40.60.81.0
p= 10 t= 80
FP
FN
q
q
q
q
J S
0.20.40.6
18
Conclusion.
Main ideas:
Joint modeling is very important when one doesn’t have enough data
for separate estimation. Especially in case of high-dimensional sparse
models joint method will almost always outperform separate method.
In the high-dimensional setting factor modeling appears to have a
dominant advantage in inverse covariance estimation compared to
simply using an empirical estimate.
Possible improvements:
Problem can be extended to any number H of entities.
Coming up with a less computationally demanding setup.
Coming up with software that allows more flexibility in terms of
picking sparsity and fusion parameters in generalized lasso
optimization problem.
19
Thanks!
Contact address: usdandres@ufl.edu
20
Simulation study. Single VAR estimation.
Here we compare performance of the following three methods for single
VAR estimation:
- using factor models estimate of Σ;
- using least squares approach(Σ = I);
- using oracles approach(we just plug in the true Σ from simulations).
So all the results come from solving criterion (4) for different plugged in
Σ’s.
21
Simulation study. Single VAR estimation. Mean forecast
error.
Table for 1-step mean squared forecasting error:
Our method Least squares Oracle
p= 5 ,t= 20 0.52(0.93) 0.54(0.95) 0.49(0.93)
p= 5 ,t= 30 0.59(0.69) 0.63(0.70) 0.58(0.69)
p= 5 ,t= 50 0.52(0.59) 0.57(0.63) 0.52(0.60)
p= 10 ,t= 30 0.63(0.80) 0.78(1.01) 0.63(0.80)
p= 10 ,t= 50 0.66(1.01) 0.69(0.97) 0.66(1.00)
p= 10 ,t= 80 0.46(0.45) 0.54(0.58) 0.44(0.44)
22
Simulation study. Single VAR estimation.
ROC curves Frobenius difference
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
p= 5 t= 20
FP
FN
q
q
q
F LS O
0.20.40.60.81.0
0.0 0.2 0.4 0.6 0.8 1.0
0.20.61.0
p= 5 t= 30
FP
FN
q
q
q
F LS O
0.20.40.60.8
0.0 0.2 0.4 0.6 0.8 1.0
0.20.61.0
p= 5 t= 50
FP
FN
q
q
q
q
F LS O
0.10.30.50.7
23
Simulation study. Single VAR estimation.
ROC curves Frobenius difference
0.0 0.2 0.4 0.6 0.8 1.0
0.20.40.60.81.0
p= 10 t= 30
FP
FN
F LS O
0.30.50.7
0.2 0.4 0.6 0.8 1.0
0.40.60.81.0
p= 10 t= 50
FP
FN
qq
F LS O
0.20.40.6
0.2 0.4 0.6 0.8 1.0
0.50.70.9
p= 10 t= 80
FP
FN
q
q
q
F LS O
0.20.40.6
24
Simulation study. Joint VAR estimation.
Unfortunately we have a software limitation: the only generalized lasso
package in R can optimize criterion (8) for fixed ratio γ. Each time we run
it, it only gives solutions at knots where ratio of sparsity parameter to
fusion parameter equals γ. So we can’t just set an unrestricted
2-dimensional grid for sparsity and fusion parameters, there always going
to be that ratio dependence between them. We still tried a couple of
approaches to run around that limitation.
After running the algorithm for a bunch of values of γ I picked one that
performed best and simply used it for comparison with separate method.
The experiments I ran were for very similar matrices A11 and A22, and
γ = 0.5 tended to give better results(so more accent on fusion than
sparsity).
Simulation study. Joint vs Separate method.
For heterogeneous case A11 and A22 are generated the following way:
first generate matrices with same sparsity pattern and with 5 % edge
density
add 3 % of edges to each of the 2 matrices separately
For the case of 10 variables in the vast majority of cases we will get
heterogeneous structures for matrices A11 and A22.
26

More Related Content

What's hot

What's hot (20)

Fi review5
Fi review5Fi review5
Fi review5
 
Propensity albert
Propensity albertPropensity albert
Propensity albert
 
Nber slides11 lecture2
Nber slides11 lecture2Nber slides11 lecture2
Nber slides11 lecture2
 
Regularization and variable selection via elastic net
Regularization and variable selection via elastic netRegularization and variable selection via elastic net
Regularization and variable selection via elastic net
 
Nonparametric testing for exogeneity with discrete regressors and instruments
Nonparametric testing for exogeneity with discrete regressors and instrumentsNonparametric testing for exogeneity with discrete regressors and instruments
Nonparametric testing for exogeneity with discrete regressors and instruments
 
Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach
 
Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4
 
Side 2019 #9
Side 2019 #9Side 2019 #9
Side 2019 #9
 
Dag in mmhc
Dag in mmhcDag in mmhc
Dag in mmhc
 
When Classifier Selection meets Information Theory: A Unifying View
When Classifier Selection meets Information Theory: A Unifying ViewWhen Classifier Selection meets Information Theory: A Unifying View
When Classifier Selection meets Information Theory: A Unifying View
 
Varese italie seminar
Varese italie seminarVarese italie seminar
Varese italie seminar
 
Side 2019 #12
Side 2019 #12Side 2019 #12
Side 2019 #12
 
On The Numerical Solution of Picard Iteration Method for Fractional Integro -...
On The Numerical Solution of Picard Iteration Method for Fractional Integro -...On The Numerical Solution of Picard Iteration Method for Fractional Integro -...
On The Numerical Solution of Picard Iteration Method for Fractional Integro -...
 
Handling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithmHandling missing data with expectation maximization algorithm
Handling missing data with expectation maximization algorithm
 
Formulation of model likelihood functions
Formulation of model likelihood functionsFormulation of model likelihood functions
Formulation of model likelihood functions
 
Linear Regression Ordinary Least Squares Distributed Calculation Example
Linear Regression Ordinary Least Squares Distributed Calculation ExampleLinear Regression Ordinary Least Squares Distributed Calculation Example
Linear Regression Ordinary Least Squares Distributed Calculation Example
 
Principal Component Analysis for Tensor Analysis and EEG classification
Principal Component Analysis for Tensor Analysis and EEG classificationPrincipal Component Analysis for Tensor Analysis and EEG classification
Principal Component Analysis for Tensor Analysis and EEG classification
 
Cunningham slides-ch2
Cunningham slides-ch2Cunningham slides-ch2
Cunningham slides-ch2
 
Multiattribute Decision Making
Multiattribute Decision MakingMultiattribute Decision Making
Multiattribute Decision Making
 
Introductory maths analysis chapter 07 official
Introductory maths analysis   chapter 07 officialIntroductory maths analysis   chapter 07 official
Introductory maths analysis chapter 07 official
 

Viewers also liked

Iraqi Refugees Benefit Sept 15 e
Iraqi Refugees Benefit Sept 15 eIraqi Refugees Benefit Sept 15 e
Iraqi Refugees Benefit Sept 15 e
Marguerite Thompson
 

Viewers also liked (17)

Oct.22nd.Presentation.Final
Oct.22nd.Presentation.FinalOct.22nd.Presentation.Final
Oct.22nd.Presentation.Final
 
Presentation
PresentationPresentation
Presentation
 
Iraqi Refugees Benefit Sept 15 e
Iraqi Refugees Benefit Sept 15 eIraqi Refugees Benefit Sept 15 e
Iraqi Refugees Benefit Sept 15 e
 
Banco del libro rafaelino
Banco del libro rafaelinoBanco del libro rafaelino
Banco del libro rafaelino
 
Tugas Praktikum 1
Tugas Praktikum 1Tugas Praktikum 1
Tugas Praktikum 1
 
NABCE 2015
NABCE 2015NABCE 2015
NABCE 2015
 
SEARCH REPORT IN PROPERTY TRANSACTIONS
SEARCH REPORT IN PROPERTY TRANSACTIONSSEARCH REPORT IN PROPERTY TRANSACTIONS
SEARCH REPORT IN PROPERTY TRANSACTIONS
 
Alfiandhani suci mutiara_(h11115508)
Alfiandhani suci mutiara_(h11115508)Alfiandhani suci mutiara_(h11115508)
Alfiandhani suci mutiara_(h11115508)
 
Навантажувачі. Логістика. Складування
Навантажувачі. Логістика. СкладуванняНавантажувачі. Логістика. Складування
Навантажувачі. Логістика. Складування
 
Twitter
TwitterTwitter
Twitter
 
Komputer Statistik
Komputer StatistikKomputer Statistik
Komputer Statistik
 
Polygon tutorial experto
Polygon tutorial expertoPolygon tutorial experto
Polygon tutorial experto
 
IVA årgangsforelæsning om Dokk1 oktober 2015
IVA årgangsforelæsning om Dokk1 oktober 2015IVA årgangsforelæsning om Dokk1 oktober 2015
IVA årgangsforelæsning om Dokk1 oktober 2015
 
NACS guide
NACS guideNACS guide
NACS guide
 
CSS权威指南
CSS权威指南CSS权威指南
CSS权威指南
 
What is Trust Operations
What is Trust OperationsWhat is Trust Operations
What is Trust Operations
 
GNOME Contribution
GNOME ContributionGNOME Contribution
GNOME Contribution
 

Similar to Seattle.Slides.7

Formulas for Surface Weighted Numbers on Graph
Formulas for Surface Weighted Numbers on GraphFormulas for Surface Weighted Numbers on Graph
Formulas for Surface Weighted Numbers on Graph
ijtsrd
 
The linear regression model: Theory and Application
The linear regression model: Theory and ApplicationThe linear regression model: Theory and Application
The linear regression model: Theory and Application
University of Salerno
 
Bayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iBayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type i
Alexander Decker
 
Bayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iBayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type i
Alexander Decker
 
directed-research-report
directed-research-reportdirected-research-report
directed-research-report
Ryen Krusinga
 
Interpolation of-geofield-parameters
Interpolation of-geofield-parametersInterpolation of-geofield-parameters
Interpolation of-geofield-parameters
Cemal Ardil
 
Asymptotic properties of bayes factor in one way repeated measurements model
Asymptotic properties of bayes factor in one  way repeated measurements modelAsymptotic properties of bayes factor in one  way repeated measurements model
Asymptotic properties of bayes factor in one way repeated measurements model
Alexander Decker
 
Asymptotic properties of bayes factor in one way repeated measurements model
Asymptotic properties of bayes factor in one  way repeated measurements modelAsymptotic properties of bayes factor in one  way repeated measurements model
Asymptotic properties of bayes factor in one way repeated measurements model
Alexander Decker
 

Similar to Seattle.Slides.7 (20)

Talk 4
Talk 4Talk 4
Talk 4
 
Project2
Project2Project2
Project2
 
Formulas for Surface Weighted Numbers on Graph
Formulas for Surface Weighted Numbers on GraphFormulas for Surface Weighted Numbers on Graph
Formulas for Surface Weighted Numbers on Graph
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Mech ma6452 snm_notes
Mech ma6452 snm_notesMech ma6452 snm_notes
Mech ma6452 snm_notes
 
The linear regression model: Theory and Application
The linear regression model: Theory and ApplicationThe linear regression model: Theory and Application
The linear regression model: Theory and Application
 
Investigation on the Pattern Synthesis of Subarray Weights for Low EMI Applic...
Investigation on the Pattern Synthesis of Subarray Weights for Low EMI Applic...Investigation on the Pattern Synthesis of Subarray Weights for Low EMI Applic...
Investigation on the Pattern Synthesis of Subarray Weights for Low EMI Applic...
 
Lecture Notes in Econometrics Arsen Palestini.pdf
Lecture Notes in Econometrics Arsen Palestini.pdfLecture Notes in Econometrics Arsen Palestini.pdf
Lecture Notes in Econometrics Arsen Palestini.pdf
 
Litvinenko nlbu2016
Litvinenko nlbu2016Litvinenko nlbu2016
Litvinenko nlbu2016
 
Solving inverse problems via non-linear Bayesian Update of PCE coefficients
Solving inverse problems via non-linear Bayesian Update of PCE coefficientsSolving inverse problems via non-linear Bayesian Update of PCE coefficients
Solving inverse problems via non-linear Bayesian Update of PCE coefficients
 
Econometrics of panel data - a presentation
Econometrics of panel data - a presentationEconometrics of panel data - a presentation
Econometrics of panel data - a presentation
 
Bayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iBayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type i
 
Bayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iBayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type i
 
On Continuous Approximate Solution of Ordinary Differential Equations
On Continuous Approximate Solution of Ordinary Differential EquationsOn Continuous Approximate Solution of Ordinary Differential Equations
On Continuous Approximate Solution of Ordinary Differential Equations
 
A DERIVATIVE FREE HIGH ORDERED HYBRID EQUATION SOLVER
A DERIVATIVE FREE HIGH ORDERED HYBRID EQUATION SOLVERA DERIVATIVE FREE HIGH ORDERED HYBRID EQUATION SOLVER
A DERIVATIVE FREE HIGH ORDERED HYBRID EQUATION SOLVER
 
directed-research-report
directed-research-reportdirected-research-report
directed-research-report
 
Interpolation of-geofield-parameters
Interpolation of-geofield-parametersInterpolation of-geofield-parameters
Interpolation of-geofield-parameters
 
Data classification sammer
Data classification sammer Data classification sammer
Data classification sammer
 
Asymptotic properties of bayes factor in one way repeated measurements model
Asymptotic properties of bayes factor in one  way repeated measurements modelAsymptotic properties of bayes factor in one  way repeated measurements model
Asymptotic properties of bayes factor in one way repeated measurements model
 
Asymptotic properties of bayes factor in one way repeated measurements model
Asymptotic properties of bayes factor in one  way repeated measurements modelAsymptotic properties of bayes factor in one  way repeated measurements model
Asymptotic properties of bayes factor in one way repeated measurements model
 

Seattle.Slides.7

  • 1. Estimation of Multi-Granger Network Causal Models. Andrey Skripnikov1 ( joint work with George Michailidis1) 1Department of Statistics University of Florida August 6, 2015 1
  • 2. Motivation. Background. Network Granger causality focuses on estimating Granger causal effects from p-time series and it can be operationalized through Vector Autoregressive Models (VAR)(Lutkepohl, 1991) The latter represent a popular class of time series models that has been widely used in applied econometrics and finance and more recently in biomedical applications( Bernanke et al, 2005; Michailidis et al, 2013). Factor models are widely used in finance/economics with purpose of dimension reduction. These models significantly help with performing time and resource-efficient estimation of covariance matrices in high-dimensional cases(Fan et al, 2011). 2
  • 3. Motivation. Present work. In this work, we discuss joint estimation and model selection issues of multiple Granger causal networks. For a simple motivational setup lets assume that we observe the same set of p variables over two different entities over time. As an example one can monitor GDP, income, etc(variables) of USA and Canada(entities) over time. There are two goals: to estimate contemporaneous dependence of variable values at time t with variable values at time t + 1 for each of the entities(through VAR models); to estimate error covariances for each of the entities(through factor models). 3
  • 4. Single VAR model formulation. Notation: Xt = (Xt 1 , ..., Xt p )T , where Xt i - value of variable i at time t, t = ( t 1, ..., t p)T , the vector of observation errors at time t. Model setup: Xt = Ap×pXt−1 + t , t ∼ N(O, Σp×p), (1) 4
  • 5. Single VAR model formulation. Assumptions and constraints A - sparse(always appropriate in high-dimensional cases). There are some of the constraints on covariance Σ that have been looked at before: - Σ = σ2I (Lutkepohl, 1991), - Σ−1 is sparse(Basu, Michailidis, 2013), - Σ follows a K-factor model. t = ΛFt + U, U ∼ N(0, ΣU), cov(Ft) = IK Σ = ΛΛ + ΣU, (2) where Ft - K × 1 vector of unobserved factors(K - number of factors, K < p), Λ - p × K matrix of factor loadings, ΣU - p × p matrix with a sparse inverse(idiosyncratic component). We will be using the third one as in finance and economics it makes sense to assume that errors depend on a few main common underlying factors. 5
  • 6. Single VAR model formulation. Problem (1) has an equivalent formulation as a standard regression problem: Y = Zβ + , ∼ N(O, Σnew ) (3) Let Y = (X1, ..., Xp) , where Xi = (XT i , ..., X1 i ) ,i = 1, ..., p. Let WT×p =     XT−1 1 XT−1 2 ... XT−1 p XT−2 1 XT−2 2 ... XT−2 p ... ... ... ... ... X0 1 X0 2 ..... X0 p    . Let Z = Ip×p ⊗ WT×p. Let β = (A11, A12, ..., A1p, A21, ..., A2p, ..., App) (matrix A stretched into a vector). Let = ( 1, ..., p) , where i = ( T i , ..., 1 i ), i = 1, ..., p. 6
  • 7. Single VAR model formulation. We consider process { t} to be: covariance-stationary uncorrelated over time From these conditions it follows that for the demanding setup ∼ N(O, Σnew ), where Σnew = Σ ⊗ IT×T . In case of a known true Σ the optimization criterion is a standard lasso problem: min β ||Σ −1/2 new (Y − Zβ)||2 2 + λ||β||1. (4) 7
  • 8. Single VAR model formulation. The algorithm. The Algorithm Step 1 Get an estimate ˆΣ−1 new of Σ−1 new . Step 2 Use that estimate ˆΣ−1 new for the following optimization task: min β ||ˆΣ −1/2 new (Y − Zβ)||2 2 + λ||β||1. The approach to completing step 1 is the following: - Calculate ˆAOLS = (Z Z)−1Z Y . - Get residuals ˆ = Y − Z ˆAOLS , calculate their covariance matrix. - Do an eigenvalue decomposition of the residual covariance matrix: the part corresponding to spiked eigenvalues will act as an estimate of ΛΛ , while the remaining part we will use to estimate ΣU(Σ−1 U ). - Use graphical lasso to estimate ΣU(Σ−1 U ) - Get an estimate ˆΣ = ˆΛˆΛ + ˆΣU, take an inverse of it and get ˆΣ−1 new = ˆΣ−1 ⊗ IT×T 8
  • 9. Joint VAR model formulation. Notation: Xt = (Xt 1 , ..., Xt p )T , where Xt i - value of variable i at time t, Y t = (Y t 1 , ..., Y t p )T , where Y t i - value of variable i at time t, t = ( t 1, ..., t 2p)T , the vector of observation errors at time t. Model setup: Xt Y t = A Xt−1 Y t−1 + t , t ∼ N(O, Σ), (5) where A2p x 2p = (A11)p×p Op×p Op×p (A22)p×p , (6) 9
  • 10. Joint VAR model formulation. Assumptions and constraints. To keep it simple we are making the following assumptions: Assumptions and constraints Σ = (Σ11)p×p Op×p Op×p (Σ22)p×p . A11 ∼ A22 (A11 is similar to A22) and both are sparse. Although we are dealing with two different entities we still expect to see common patterns of relationship between p variables on those entities. Σ11 and Σ22 follow K-factor model (2). 10
  • 11. Joint VAR model formulation. The algorithm. The Algorithm Step 1. Get estimates for Σ−1 11 and Σ−1 22 as we did in the algorithm for separate estimation. Step 2. Use these estimates to run a joint optimization procedure for estimating A11 and A22. The standard regression setup for the joint problem: Y1 Y2 = Z1 O O Z2 β + , ∼ N(O, Σnew ), (7) where Σnew = (Σ11)new O O (Σ22)new , β = β1 β2 . 11
  • 12. Joint estimation algorithm The optimization criterion is a generalized fused lasso problem: || ((Σ1)new )−1/2Y1 ((Σ2)new )−1/2Y2 − ((Σ1)new )−1/2Z1β11 ((Σ2)new )−1/2Z2β22 ||2 2 + γλ|β| + λ|β11 − β22|, (8) 12
  • 13. Simulation study - data generation mechanism. 1. A is generated with maximum eigenvalue 0.6 so that the resulting VAR model is stationary. 2. All the results are obtanied over the common grid for the sparsity parameter: 200 ∗ (0.5)seq(0,50,length=150). 3. For Σ = ΛΛ + ΣU: Λp×K = (b)ji , bji ∼ N(0, 1), j ≤ p, i ≤ K, For ΣU - generated a diagonal matrix to make sure that the inverse is sparse and that the signal ratio of ΛΛ to ΣU is not small(restricted it to be between 1.5 and 3) 4. As for the other characteristics: SNR=1, edge density of transition matrices A is 0.1, K(number of factors)=1, 50 replications. 13
  • 14. Simulation study. Joint VAR estimation. Comparing the empirical estimate of covariance inverse matrix with factor model estimate Estimates are based on residuals we get after calculating OLS estimate of A. Measure used - ||ˆΣ − Σtrue||F /||Σtrue||F (normalized Frobenius difference) Inverse Covariance Empirical Factor model p= 5 ,t= 20 8.064 0.48 p= 5 ,t= 30 4.319 0.57 p= 5 ,t= 50 2.971 0.633 p= 10 ,t= 30 15.726 0.401 p= 10 ,t= 50 5.954 0.533 p= 10 ,t= 80 4.246 0.69 14
  • 15. Simulation study. Joint vs Separate method. Matrices A11 and A22 have the same sparsity pattern but different element values. Matrices Σ11 and Σ22 are identical. Table for 1-step mean squared forecasting error: Joint method Separate method p= 5 ,t= 20 0.49(0.36) 0.53(0.40) p= 5 ,t= 30 0.47(0.36) 0.49(0.37) p= 5 ,t= 50 0.47(0.32) 0.48(0.33) p= 10 ,t= 30 0.43(0.34) 0.47(0.35) p= 10 ,t= 50 0.52(0.39) 0.54(0.41) p= 10 ,t= 80 0.41(0.26) 0.43(0.26) 15
  • 16. Simulation study. Estimation of A’s. ROC curves Frobenius difference 0.0 0.2 0.4 0.6 0.8 1.0 0.20.61.0 p= 10 t= 30 FP FN J S 0.30.5 0.0 0.2 0.4 0.6 0.8 1.0 0.20.40.60.81.0 p= 10 t= 50 FP FN q J S 0.10.20.30.4 0.2 0.4 0.6 0.8 1.0 0.40.60.81.0 p= 10 t= 80 FP FN J S 0.100.200.30 16
  • 17. Simulation study. Estimation of A’s. When more heterogeneity is introduced, advantage of the joint method starts to diminish. Example: Matrices A11 and A22 have slightly different sparsity pattern and different element values. Matrices Σ11 and Σ22 are identical. 17
  • 18. Simulation study. Estimation of A’s. ROC curves Frobenius difference 0.0 0.2 0.4 0.6 0.8 1.0 0.20.61.0 p= 10 t= 30 FP FN J S 0.20.40.60.8 0.0 0.2 0.4 0.6 0.8 1.0 0.20.61.0 p= 10 t= 50 FP FN q J S 0.20.40.60.8 0.0 0.2 0.4 0.6 0.8 1.0 0.20.40.60.81.0 p= 10 t= 80 FP FN q q q q J S 0.20.40.6 18
  • 19. Conclusion. Main ideas: Joint modeling is very important when one doesn’t have enough data for separate estimation. Especially in case of high-dimensional sparse models joint method will almost always outperform separate method. In the high-dimensional setting factor modeling appears to have a dominant advantage in inverse covariance estimation compared to simply using an empirical estimate. Possible improvements: Problem can be extended to any number H of entities. Coming up with a less computationally demanding setup. Coming up with software that allows more flexibility in terms of picking sparsity and fusion parameters in generalized lasso optimization problem. 19
  • 21. Simulation study. Single VAR estimation. Here we compare performance of the following three methods for single VAR estimation: - using factor models estimate of Σ; - using least squares approach(Σ = I); - using oracles approach(we just plug in the true Σ from simulations). So all the results come from solving criterion (4) for different plugged in Σ’s. 21
  • 22. Simulation study. Single VAR estimation. Mean forecast error. Table for 1-step mean squared forecasting error: Our method Least squares Oracle p= 5 ,t= 20 0.52(0.93) 0.54(0.95) 0.49(0.93) p= 5 ,t= 30 0.59(0.69) 0.63(0.70) 0.58(0.69) p= 5 ,t= 50 0.52(0.59) 0.57(0.63) 0.52(0.60) p= 10 ,t= 30 0.63(0.80) 0.78(1.01) 0.63(0.80) p= 10 ,t= 50 0.66(1.01) 0.69(0.97) 0.66(1.00) p= 10 ,t= 80 0.46(0.45) 0.54(0.58) 0.44(0.44) 22
  • 23. Simulation study. Single VAR estimation. ROC curves Frobenius difference 0.0 0.2 0.4 0.6 0.8 1.0 0.00.40.8 p= 5 t= 20 FP FN q q q F LS O 0.20.40.60.81.0 0.0 0.2 0.4 0.6 0.8 1.0 0.20.61.0 p= 5 t= 30 FP FN q q q F LS O 0.20.40.60.8 0.0 0.2 0.4 0.6 0.8 1.0 0.20.61.0 p= 5 t= 50 FP FN q q q q F LS O 0.10.30.50.7 23
  • 24. Simulation study. Single VAR estimation. ROC curves Frobenius difference 0.0 0.2 0.4 0.6 0.8 1.0 0.20.40.60.81.0 p= 10 t= 30 FP FN F LS O 0.30.50.7 0.2 0.4 0.6 0.8 1.0 0.40.60.81.0 p= 10 t= 50 FP FN qq F LS O 0.20.40.6 0.2 0.4 0.6 0.8 1.0 0.50.70.9 p= 10 t= 80 FP FN q q q F LS O 0.20.40.6 24
  • 25. Simulation study. Joint VAR estimation. Unfortunately we have a software limitation: the only generalized lasso package in R can optimize criterion (8) for fixed ratio γ. Each time we run it, it only gives solutions at knots where ratio of sparsity parameter to fusion parameter equals γ. So we can’t just set an unrestricted 2-dimensional grid for sparsity and fusion parameters, there always going to be that ratio dependence between them. We still tried a couple of approaches to run around that limitation. After running the algorithm for a bunch of values of γ I picked one that performed best and simply used it for comparison with separate method. The experiments I ran were for very similar matrices A11 and A22, and γ = 0.5 tended to give better results(so more accent on fusion than sparsity).
  • 26. Simulation study. Joint vs Separate method. For heterogeneous case A11 and A22 are generated the following way: first generate matrices with same sparsity pattern and with 5 % edge density add 3 % of edges to each of the 2 matrices separately For the case of 10 variables in the vast majority of cases we will get heterogeneous structures for matrices A11 and A22. 26