A non-Gaussian model for causal discovery in the presence of hidden common causes

Shohei Shimizu
Shiga University / Osaka University
Japan
1
A non-Gaussian model for causal
discovery in the presence of hidden
common causes
2016 Munich Workshop on
Causal Inference and Information Theory

Abstract
• Managing hidden common causes is
essential in causal discovery
• Non-causally-related observed variables
can be correlated due to hidden common
causes
• Propose a linear non-Gaussian model for
estimating causal direction in cases with
hidden common causes
2

Motivation
Illustrative example

Strong correlation btw chocolate
consumption and number of Nobel
laureates (Messerli12NEJM)
4
2002-2011Chocolate consumption (kg/yr/capita)
Num.Nobellaureatesper10millionpop.
Corr. 0.791
P-value < 0.001

Eating more chocolate increases
num. Nobel laureates?
• Interpretational drift (Maurage+13, J. Nutrition)
5
Chclt Nobel
?Chclt Nobel
or
GDP GDP
Chclt Nobel
or
GDP
Corr. 0.791
P-value < 0.001
Nobel
Chocolate
Hidden
Common
cause
Manage this gap!
Hidden
Common
cause
Hidden
Common
cause

Structural causal models
(Pearl, 2000,2009; cf. Bollen, 1989)
• A framework for describing causal relations
• Generally speaking, if the value of 𝑥1 has
been changed and then that of 𝑥2 changes,
then 𝑥1 causes 𝑥2
7
 
 2122
111
,,
,
efxgx
efgx


x1 x2
f
e1 e2
GDP
NobelChclt

Challenge in causal discovery
8
Hidden common cause
 
 2122
111
,,
,
efxgx
efgx


Data matrix
x1
x2
 21... ,~ xxpdii
obs.1
Assume that either of
the three generated
the data
Estimate which of the
three models generated
the data
obs.nobs.2 …
x1 x2
f
x1 x2
f
x1 x2
f
e1 e2 e1 e2 e1 e2
     fpepep ,, 21
Hidden common cause Hidden common cause
 
 222
1211
,
,,
efgx
efxgx

  
 222
111
,
,
efgx
efgx


     fpepep ,, 21      fpepep ,, 21

Under what conditions
can we manage the gap?
• We have shown that it is possible under the three
assumptions: i) linearity; ii) Acyclicty;
iii) non-Gaussianity (Hoyer+08IJAR; Shimizu+14JMLR):
• Classical Bayesian network approach incapable
9
x1 x2
?
x1 x2
or
f1 f1
x1 x2
f1
or
21211212
11121
efxbx
efx




21212
11122121
efx
efxbx




22212
11121
efx
efx





Basic non-Gaussian model
(No hidden common cause)
S. Shimizu, P. O. Hoyer, A. Hyvärinen
and A. Kerminen
Journal of Machine Learning Research
2006

Linear Non-Gaussian Acyclic
Model (LiNGAM) (Shimizu et al., 2006)
• Identifiable: causal directions and coefficients
• Various extensions including nonlinear (Hoyer+08NIPS,
Zhang+09UAI) and cyclic (Lacerda+08UAI) models
11
i
ij
jiji exbx  
x1 x2
x3
21b
23b13b
2e
3e
1e
Linearity
Acyclicity
Non-Gaussian errors ei
Independence of errors ei
(no hidden common causes)

1212
Different directions give
different data distributions
Gaussian Non-Gaussian
(ex. uniform)
Model 1:
Model 2:
x1
x2
x1
x2
e1
e2
x1
x2
e1
e2
x1
x2
x1
x2
x1
x2
212
11
8.0 exx
ex


22
121 8.0
ex
exx


    1varvar 21  xx
    ,021  eEeE

13
Independent Component Analysis
(ICA) (Jutten & Herault, 1991; Comon, 1994; Hyvarinen et al., 2001)
• Observed variables are modeled by
where
– Hidden variables are non-Gaussian and
independent
• Then, mixing matrix A is identifiable up to
permutation and scaling of the columns
Asx 
 pjsj ,,1 


p
j
jiji sax
1
or
ix

Sketch of the identifiability proof
• Different directions give different zero/non-
zero patterns of the mixing matrices
– No zeros on the diagonal in the causal model
– No permutation indeterminacy
14
 


















2
1
212
1
1
01
e
e
bx
x

21212
11
exbx
ex


A sx
 


















2
112
2
1
10
1
e
eb
x
x

A sx22
12121
ex
exbx


x1
x2
e1
e2
x1
x2
e1
e2
0
0
Model 1:
Model 2:

LiNGAM with hidden
common causes
P. O. Hoyer, S. Shimizu, A. Kerminen,
and M. Palviainen
Int. J. Approximate Reasoning
2008

qf
2121
1
22
1
1
11
exbfx
efx
Q
q
qq
Q
q
qq








i
ij
jij
Q
q
qiqi exbfx   1

• Extension to incorporate non-Gaussian hidden
common causes
LiNGAM with hidden
common causes (Hoyer+08IJAR)
16
where are independent:),,1( Qqfq 
x1 x2 2e1e
1f 2f

i
ij
jij
Q
q
qiqi exbfx   1

2
:2 f
ef1
:1 f
ef
qfWLG, hidden common causes
are assumed to be independent
Independent hidden
common causes
17
x1 x2 2e1e
1f
e 2f
e
x1 x2 2e1e
1f 2f
Dependent hidden
common causes






























2
1
2221
11
2221
11
2
1
00
2
1
f
f
aa
a
e
e
aa
a
f
f
f
f

Non-Gaussian
x2
x1
Gaussian e1,e2, f1
x2
• Faithfulness on 𝑥𝑖, 𝑓𝑖 + Number of 𝑓𝑖 given
Different directions give different
zero/non-zero patterns (Hoyer+08IJAR)
18
x1 x2
f1
x1 x2
f1
x1 x2
f1
Models
1.
2.
3.






**0
*0*






***
*0*






**0
***
A
A

Previous estimation methods
(Hoyer+08IJAR; Henao+11JMLR)
• Explicitly model hidden common causes
• Do model comparison based on maximum
likelihood principle or Bayesian approach
• Need to specify their number and distributions,
which is difficult in general
19
x1 x2
f1
x1 x2
orfQ f1 fQ
… …
2e1e2e1e

Our proposal:
A Bayesian LiNGAM
approach
S. Shimizu and K. Bollen.
Journal of Machine Learning Research,
2014
and something extra

Key idea (1/2)
• Transform the model to a model with
no hidden common causes
21
)1(
1x )1(
2x
)(
2
m
x
)1(
1x
x1 x2
f1 fQ…
2e1e
)1(
2e)1(
1e
)(
2
m
e)(
1
m
e
……
21b
21b
21b
)(
2
m

)1(
2
LiNGAM with no hidden
common causes but with
possibly different
intercepts over obs.
LiNGAM with
hidden common
causes
)1(
1
)(
1
m


Key idea (2/2)
• Include the sums of hidden common causes as
the model parameters, i.e., observation-specific
intercepts:
• Not explicitly model hidden common causes
– Neither necessary to specify the number of hidden
common causes Q nor estimate the coefficients
22
)(
2
m

)(
2
)(
121
1
)(
2
)(
2
mm
Q
q
m
qq
m
exbfx  
m-th obs.:
q2
Obs.-specific
intercept

• Compare the marginal likelihoods wth data stndrdzd
• Once a direction has been estimated, compute the
posterior of the connection strength b21 or b12
• Many obs.-specific intercepts
– Similar to mixed models and multi-level models
– Informative prior
)()(
121
)(
2
)(
2
)(
1
)(
1
)(
1
m
i
mmm
mmm
exbx
ex




Bayesian model selection
23
),,1;2,1()(
nmim
i 
Model 3 (x1  x2)
)(
2
)(
2
)(
2
)(
1
)(
212
)(
1
)(
1
mmm
mmmm
ex
exbx




Model 4 (x1  x2)

Prior for the observation-specific
intercepts
• Motivation: Central limit theorem
– Sums of independent variables tend to be more Gaussian
• Approximate the density by a bell-shaped curve dist.
– Dependent due to hidden common causes
• Select the hyper-parameter values
that maximize the marginal likelihood
24
 

Q
q
m
qq
m
Q
q
m
qq
m
ff
1
)(
2
)(
2
1
)(
1
)(
1 , 
~)(
2
)(
1






m
m


t-distribution with sd ,
correlation , and DOF12
21,
v
}8.0,.6.0,4.0{, 21 
)(m
qf
(here, 8)

Error distributions and other
priors used in the experiment
• Error distributions
– Fixed to be the Laplace distribution
– Possible to be estimated assuming a family of
generalized Gaussian distributions, for
example
• Priors for the other parameters
25
)75.0,0(~
)75.0,0(~
)1,1(~
2
21
2
12
12
Nb
Nb
U 
)1,0(~)(
)1,0(~)(
2
1
Uestd
Uestd
)(),( 21 epep

Sociology data
• Source: General Social Survey (n=1380)
– Non-farm background, ages 35-44, white, male, in the labor
force, no missing data for any of the covariates, 1972-2006
• 15 pairs with known temporal directions
(Duncan+1972)
27
Status attainment model
(Duncan et al., 1972)
x2: Son’s Income

Numbers of successes
(n=1380)
28
FE
✔
✔
Cf. LiNGAM-GU-UK (Chen+13NECO) 0.20; PNL(Zhang+09UAI): 0.60
Known (temporal)
orderings of 15 pairs
Son’s
Education
Father’s
Education
Son’s
Income
Son’s
Occupation
…
f1
f1

Conclusion
• Estimation of causal direction in the presence of
hidden common causes is a major challenge in
causal discovery
• Proposed a linear non-Gaussian SEM approach
– Not necessary to model individual hidden common
causes
• Future directions
– Cyclic cases: Using some prior for forcing the
identifiability condition of Lacerda+08UAI?
– Non-stationarity: Combining with Kun’s method
(Huang+15IJACI)?
30

A non-Gaussian model for causal discovery in the presence of hidden common causes

Recommended

Recommended

More Related Content

Similar to A non-Gaussian model for causal discovery in the presence of hidden common causes

Similar to A non-Gaussian model for causal discovery in the presence of hidden common causes (20)

More from Shiga University, RIKEN

More from Shiga University, RIKEN (7)

Recently uploaded

Recently uploaded (20)

A non-Gaussian model for causal discovery in the presence of hidden common causes