Slides for the C3S workshop in Cologne, in which I presented an extension to multiple temporal scale of Granger causality and Partial Information Decomposition, using the robust state space formulation
Seismic Method Estimate velocity from seismic data.pptx
Multiscale Granger Causality and Information Decomposition
1. Department of Data Analysis – University of Ghent
Multiscale Granger Causality
and Information Decomposition
Daniele Marinazzo
C3S Conference
September 2017
FBK and University of Trento
Luca Faes
http://users.ugent.be/~dmarinaz/
daniele.marinazzo@ugent.be
@dan_marinazzo
University of Bari and INFN
Sebastiano Stramaglia
2. X
S
Y
Y = (Y1, Y2, … , Yn, … ,YN)
X=(X1, X2, … , Xn, … ,XN)
• Target system Y:
• Dynamical System S={S1,...,SM}
S={X1,...,XM-1 , Y} = {X,Y}X={X1,...,XM-1}
X1
S
Y
…
X2
Introduction
Information Dynamics:
Theory
Synergy and
Redundancy
Conclusions
Information Dynamics:
Application
DYNAMICAL SYSTEMS AND PROCESSES
ny
n
n-1, n-2, …
nx
,...],[ 21
nnn xxx
,...],[ 21
nnn yyy
y
x
presentpast
• Information Storage:
• Information Transfer:
);(
nnx xxIS
)|;(
nnnxy xyxIT
[JT Lizier et al, Information Sci 2012]
[T Schreiber et al, Phys Rev Lett 2000]
Regularity of x
Influence yx
Information Dynamics
Information-theoretical quantities can be computed from (co)variance, exact for
Gaussian processes
3. n
p
k
knk
T
nnnn UZVXU
1
A][
State Space (SS) representation
Vector Autoregressive (VAR) representation of
nnn
nnn
ESU
ESS
C
K1State eq.
Observation eq.
},,{},{ VZXYXU
)()(
C][ XV
nn
XV
nn ESVX
)()(
C][ XZ
nn
XZ
nn ESZX
)()(
C X
nn
X
n ESX
][ 2
,11
2
| nXVZX E
][
2)(
,11
2
|
XV
nXVX E
][
2)(
,11
2
|
XZ
nXZX E
][
2)(
,11
2
|
X
nXX E
• Practical computation of prediction and entropy measures
The SS representation is closed under the formation of sub-models
[l. Barnett and A.K. Seth, PRE 91(4) 040101, 2015;
L Faes et al, arXiv preprint:1602.06155, 2016]
Estimation of VAR parameters:
,kA )cov( n
SUBMODELS:
Obs. without Z
Obs. without V
Obs. without V,Z
Estimation of SS parameters:
)cov(,),,,( nEVVKCA
)cov(,),,,( )(
22222
XV
nEVVKCA
)cov(,),,,( )(
33333
XZ
nEVVKCA
)cov(,),,,( )(
44444
XZV
nEVVKCA
least squares
BIC for model order selection
Partial variances:
DARE
(discrete algebraic Ricatti Equation)
• All partial variances can be computed from the VAR parameters estimated only once!
INFORMATION DECOMPOSITIONSTATE SPACE REPRESENTATION
6. INTRODUCTION
x
NnyxY nnn ,...,1},{
x
...
2x
x
x
1x 9x2x ...NnyxY nnn ,...,1},{
/,...,1,
~
NnYY nn
RESCALING (scale factor ):
1) AVERAGING
2) DOWNSAMPLING
NnYY
l
lnn ,...,,
1~ 1
0
MULTISCALE ANALYSIS OF TIME SERIES: CHANGE OF TIME SCALE
2x
1x 9x2x
INTRODUCTIONMETHODS
Example:
x~
3,9 N
Example:
3,9 N
1
0
1
0
1
,
1
l
lnn
l
lnn yyxx
/,...,1,},{ NnyxY nnn
[M Costa et al, Phys. Rev. Lett. 89, 2002]
• Rescaling can be seen as a two-step procedure
• Traditional procedure for rescaling
[J. Valencia et al, IEEE Trans. Biomed Eng. 56, 2009]
1x
3
~ 987
9
xxx
x
1x
3
987
3
xxx
x
93
~xx
... ...
3x
7. INTRODUCTIONMETHODS
nnn
nnn
EZY
EZZ
~~~~
~~~~~
1
C
KA
n
p
k
knkn YY UA
1
Observed
time series
Rescaled
time series
MULTISCALE REPRESENTATION OF LINEAR PROCESSES USING STATE SPACE MODELS
),( ΣAVAR
)cov( nUΣ
),,( ΣBAVARMA
1
01
~
l
lnl
p
k
knkn YY UBA
1,...,1,0,
1
ll IB
)
~
,
~
,
~
,
~
( VKCASS
),,,( VKCASS
),,,,( SRQCASS
0I0000
00I000
000000
0000I0
00000I
BBBAAA
A
qqpp 1111
~
DOWNSAMPLINGAVERAGING
][
~
111 qBBAAC
TT
0 ][
~
00B00IK
T
00)
~
cov(
~
ΣBBV nE
nnn
nnn
EZY
EZZ
C
KA1
nnn
nnn
VXY
WXX
C
A1
VRCCAA
~
,
~
,
~
VKAS
KVKAQAQ
~~~
~~~~~
1
TT
1
Discrete Algebraic
Ricatti Equation
The State Space model defining the multivariate linear process after rescaling
can be obtained from the original VAR parameters and the scale factor
Averaged time series
1[Aoki & Havenner, Econ. Rev. 10, 1991]
[1]
2[Solo, Neural Comp 28, 2016]
[2]
[2,3]
3[Barnett & Seth, Phys. Rev. E 91, 2015]
8. BRAIN-TO-HEART DYNAMICS
0.25 (1)
xnnn
nnn
wxy
uxx
2
1
5.0
25.0
nnnn
nnnn
wxyy
uyxx
75
31
5.025.0
75.025.0
y
0.5 (2) 0.5 (7)
0.75 (3)
0.25 (5)
MULTISCALE COMPUTATION OF INFORMATION DYNAMICS FOR VAR PROCESSES
• Unidirectional interaction:
Information
Storage
Information
Transfer
Sx
Sy
Sx
Sy
Txy
Tyx
Txy
0.25 (1)
x y
• Bidirectional interaction:
Information
Storage
Information
Transfer
Sx
SxSy
Sy
Tyx Tyx
Txy
Txy
• Averaging step: introduces autocorrelations ( Storage)
does not alter causal interactions ( Transfer)
• Downsampling step: removes autocorrelations
elicits scale-dependent causal interactions
TE peaks at scales compatible with the interaction delay
Tyx
INTRODUCTIONSIMULATIONS
9. 9
a) Modern climate data b) Paleoclimate data
L Faes, S Stramaglia, G Nollo, D Marinazzo, ‘Multiscale Granger causality’, ArXiv 2017 https://arxiv.org/abs/1703.08487
MULTISCALE INTERACTIONS IN CLIMATOLOGY
global land-ocean temperature index and CO2
concentration measured at monthly resolution
from March 1958 to February 2017 (708 data
points)
GT and CO2 concentration on the Vostok
Ice Core data, extended by the EPICA Dome C
data which go back to 800,000 years ago
10. JOINT INFORMATION
In the presence of two sources Yi and Yk, and a
target Yj, we want to quantify the information
transferred to Yj from the sources Yi and Yk
taken together
TRANSFER ENTROPY
JOINT TRANSFER ENTROPY
11. Interaction Information Decomposition (IID) Partial Information Decomposition (PID)
L Faes, D Marinazzo, S Stramaglia, 'Multiscale information decomposition: exact computation for multivariate Gaussian processes',
Entropy, special issue on Multivariate entropy measures and their applications, 2017, 19(8), 408.5.
PARTIAL INFORMATION DECOMPOSITION
Synergy and redundancy as mutually
exclusive phenomena
12. L Faes, D Marinazzo, S Stramaglia, 'Multiscale information decomposition: exact computation for multivariate Gaussian processes',
Entropy, special issue on Multivariate entropy measures and their applications, 2017, 19(8), 408.5.
PARTIAL INFORMATION DECOMPOSITION
Distinct non-negative measures of redundancy and synergy,
thereby accounting for the possibility that redundancy and
synergy may coexist as separate elements of information
modification.
The interaction TE is actually a measure of the ‘net’ synergy
manifested in the transfer of information from the two
sources to the target.
PID components cannot be obtained through classic information
theory simply subtracting conditional MI terms: one more relation
is needed to solve all the quantities. Shannon information theory
does not univocally determine this decomposition
Redundancy is defined as the minimum of the information
provided by each individual source to the target
This choice satisfies the desirable property that the redundant TE
is independent of the correlation between the source processes.
13. 13
• Validation on simulated linear stochastic processes
Exact profiles of IID and PID measures
Simulation scheme
MULTISCALE INFORMATION DECOMPOSITION
14. Interaction Information Decomposition (IID) Partial Information Decomposition (PID)
MULTISCALE ID IN EPILEPSY
We look at 64 cortical electrodes as
targets, and two depth hippocampal
electrodes (11 and 12) as drivers
[M. Kramer et al., Epilepsy Research 79, 173-186, 2008]
15. MULTISCALE ID IN EPILEPSY
We look at 64 cortical electrodes as
targets, and two depth hippocampal
electrodes (11 and 12) as drivers
[M. Kramer et al., Epilepsy Research 79, 173-186, 2008]
16. http://users.ugent.be/~dmarinaz/
daniele.marinazzo@ugent.be
@dan_marinazzo
• Faes et al., Multiscale Granger Causality, ArXiv 2017
https://arxiv.org/abs/1703.08487
• Faes et al., Multiscale Information Decomposition: Exact
Computation for Multivariate Gaussian Processes,
Entropy 2017, 19(8), 408; doi:10.3390/e19080408
• Faes et al. On the interpretability and computational
reliability of frequency-domain Granger causality, F1000
research 2017 https://f1000research.com/articles/6-
1710/v1
• https://github.com/danielemarinazzo - www.lucafaes.net
THANKS