SlideShare a Scribd company logo
1 of 44
Download to read offline
Shohei Shimizu
Osaka University, Japan
1
Non-Gaussian structural equation
models for causal discovery
2016 Probabilistic Graphical Model Workshop:
Sparsity, Structure and High-dimensionality
References:
https://sites.google.com/site/sshimizu06/home/lingampapers
Abstract
• Estimation of causal direction and
connection strength of two observed
variables in the presence of hidden
common causes
• A key challenge in causal discovery
• Propose a non-Gaussian model
– Not require us to specify the number of hidden
common causes
2
Illustrative example
Significant correlation btw Chocolate
consumption and Num. Nobel laureates
(Messerli12NEJM)
4
2002-2011Chocolate consumption (kg/yr/capita)
Num.Nobellaureatesper10millionpop.
Corr. 0.791
P-value < 0.001
Eating more chocolate increases
the number of Nobel laureates??
• Interpretational Drift (Maurage+13, J. Nutrition)
5
Chclt Nobel
?Chclt Nobel
or
GDP GDP
Chclt Nobel
or
GDP
Corr. 0.791
P-value < 0.001
Nobel
Chocolate
Hidden
Common
cause
Manage this gap!
Hidden
Common
cause
Hidden
Common
cause
Under what conditions
can we manage this gap?
• We have shown that it is possible under the
three assumptions (Hoyer+08IJAR; Shimizu+14JMLR)
– Linearity
– Acyclity
– Non-Gaussianity
• Performing interventions often very hard
• Theory closely related to independent
component analysis (ICA) (Hyvarinen+01)
6
7
Many application areas
Epidemiology Economics
Neuroscience Chemistry
Sleep
problems
Depression
mood
Sleep
problems
Depression
mood ?
or
OpInc.gr(t)
Empl.gr(t)
Sales.gr(t)
R&D.gr(t)
Empl.gr(t+1)
Sales.gr(t+1)
R&D(.grt+1)
OpInc.gr(t+1)
Empl.gr(t+2)
Sales.gr(t+2)
R&D.gr(t+2)
OpInc.gr(t+2)
(Moneta et al., 2012)(Rosenstrom et al., 2012)
Policy evaluation
(Campomanes et al., 2014)
Causal information flow
Improving health and QOL
(Boukrina & Graves, 2013)
What changes absorption spectra?
Brief review of structural
causal models
Structural causal models
(Pearl, 2000)
• A framework for describing causal relations
(or data generating processes)
• An example of linear cases:
• Generally speaking, if the value of 𝑥1 has
been changed and then that of 𝑥2 changes,
then 𝑥1 causes 𝑥2
9
𝒙 𝟐 ∶= 𝒃 𝟐𝟏 𝒙 𝟏 + 𝒆 𝟐
𝒙 𝟏 ∶= 𝒆 𝟏
x2x1
e1 e2
e1 and e2 dependent
73
Changing the value of x1
from c to d
• Replacing the function determining x1 with
a constant c, denoted by do(x1=c), and
then change the constant to d (Pearl, 2000)
21212
11
exbx
ex


21212
1
exbx
cx


Intervention: do(x1=c)
x2x1
e1 e2
x2x1
c e2
74
Average causal effect
(Rubin, 1974; Pearl, 2000)
• Average causal effect of x1 on x2 when
changing x1 from c to d
– Computed based on the models with do(x1=d) and
do(x1=c)
•
     
 cdb
cxdoxEdxdoxE


21
1212 ||
 cdbxE
dcx
212
1
bychangewill)(then
,tofromofvaluethechangedhaveyouIf
Formulating the problem
13
Estimation of causal direction
• Suppose that data X was randomly generated
from either of the following two models:
• Estimate which model generated the data X based
on the data X only
or
21212
11
exbx
ex


22
12121
ex
exbx


Model 1: Model 2:
)0( 21 b
x1x2
e2 e1
x1x2
e2 e1
12b21b
)0( 12 b
Major difficulty
• Errors and are often dependent
• Regression coefficient of on is not
equal to even if we know the right
causal direction
14
or
21212
11
exbx
ex


22
12121
ex
exbx


Model 1: Model 2:
x1x2
e2 e1
x1x2
e2 e1
12b21b
21b
1e 2e
1x2x
)0( 21 b )0( 12 b
Hidden common causes
• Such dependency is typically introduced
by hidden common causes, say
15
or
Model 1’: Model 2’:
x1x2
e’2 e’1
21b


2
21211212
1
11111
e
efxbx
e
efx




1f
1f
x1x2
e’2 e’1
12b
1f


2
21212
1
11112121
e
efx
e
efxbx




A well-known guideline
(Pearl2000; Spirtes+1993)
• Observe the hidden common cause ,
incorporate it in the models,
and carry out three-variable analysis
• Errors independent!
16
1f
or
Model 1’: Model 2’:
x1x2
e’2 e’1
21b
21211212
11111
efxbx
efx




1f
x1x2
e’2 e’1
12b
1f
21212
11112121
efx
efxbx




21, ee 
Following the guideline is often
very hard
• A large number of hidden common causes
may exist (Q unknown)
• Often no idea what they are
17
Qfff ,,, 21 
or
Model 1’: Model 2’:
x1x2
e’2 e’1
21b
221212
111
efxbx
efx
q
qq
q
qq






1f
222
112121
efx
efxbx
q
qq
q
qq






Qf
x1x2
e’2 e’1
12b
1f Qf
18
Estimation of causal direction
in the presence of
hidden common causes
• Estimate which model generated the data X
or
Model 1’: Model 2’:
x1x2
e’2 e’1
21b
221212
111
efxbx
efx
q
qq
q
qq






1f
222
112121
efx
efxbx
q
qq
q
qq






Qf
x1x2
e’2 e’1
12b
1f Qf
qf
Note
• If we intervene on x1 (and x2), we have no
hidden common causes
• But, ethically and costly often difficult to do
interventions
19
Model 1’:
x1x2
e’2 e’1
21b
221212
111
efxbx
efx
q
qq
q
qq






1f Qf
Model 1’’:
x1x2
e’2 c
21b
cx 1
1f Qf
221212 efxbx
q
qq
 
1. Estimation of causal direction when temporal
information is not available
2. Managing hidden common causes
20
Major challenges
x1 x2
?
x1 x2
or
x1 x2 ?x1 x2 or
f1 f1
Basic non-Gaussian model
(No hidden common cause)
S. Shimizu, P. O. Hoyer, A. Hyvärinen
and A. Kerminen.
Journal of Machine Learning Research,
2006.
• Implying no hidden common causes
• The two models distinguishable if the errors
e1 and e2 are non-Gaussian
(Dodge+00CSTM; Shimizu+06JMLR)
Independent errors
22
or
21212
11
exbx
ex


22
12121
ex
exbx


Model 1: Model 2:
x1x2
e2 e1
x1x2
e2 e1
12b21b
)0,( 2112 bb
2323
Different directions give
different data distributions
Gaussian Non-Gaussian
Model 1:
Model 2:
x1
x2
x1
x2
e1
e2
x1
x2
e1
e2
x1
x2
x1
x2
x1
x2
212
11
8.0 exx
ex


22
121 8.0
ex
exx


    1varvar 21  xx
    ,021  eEeE
24
Independent Component Analysis
(ICA) (Jutten & Herault, 1991; Comon, 1994)
• Observed random vector x is modeled by
where
– The mixing matrix A = [ ]
– The hidden variables (independent components)
are non-Gaussian and mutually independent
• Then, A is identifiable up to permutation and
scaling of the columns
Asx 
is


p
j
jiji sax
1
or
ija
Sketch of the identifiability proof
• Different directions give different zero/non-
zero patterns of the mixing matrices
– No zeros on the diagonal in the causal model
– No permutation indeterminacy
25
 


















2
1
212
1
1
01
e
e
bx
x

21212
11
exbx
ex


A sx
 


















2
112
2
1
10
1
e
eb
x
x

A sx
22
12121
ex
exbx


x1
x2
e1
e2
x1
x2
e1
e2
0
0
Linear Non-Gaussian Acyclic
Models (LiNGAM) (Shimizu+06JMLR)
• Identifiable: Directions, coefficients, and intercepts
– Can be uniquely estimated without knowing the causal
structure
26
i
ij
jijii exbx  

x1 x2
x3
21b
23b13b
2e
3e
1e
Acyclicity
Non-Gaussian errors ei
Independence of errors ei
(no hidden common causes)
Extensions
• Cyclic models (Lacerda+08UAI; Hyvarinen+13JMLR)
• Time series (Hyvarinen+10JMLR; Huang+15IJCAI; Gong15ICML)
• Nonlinearity (Zhang+09UAI; Peters+14JMLR; cf. Imoto02PSB)
• Discrete variables (Peters+11TPAMI; Park+15NIPS)
27
  iiiii exofparentsffx  
1,
1
2,
x1x2e2 e1
)()()(
0
ttt
k
exBx  
 
LiNGAM with hidden
common causes
P. O. Hoyer, S. Shimizu, A. Kerminen,
and M. Palviainen.
Int. J. Approximate Reasoning
2008
• Extension to incorporate non-Gaussian hidden
common causes
i
ij
jij
Q
q
qiqii exbfx   1

LiNGAM with hidden
common causes (Hoyer+08IJAR)
29
where are independent:),,1( Qqfq 
qf
x1 x2 2e1e
1f 2f
2121
1
222
1
1
111
exbfx
efx
Q
q
qq
Q
q
qq








qfWLG, hidden common causes
are assumed to be independent
Independent hidden
common causes
i
ij
jij
Q
q
qiqii exbfx   1

30
x1 x2 2e1e
1f
e 2f
e
x1 x2 2e1e
1
:1 f
ef 2
:2 f
ef
1f 2f
Dependent hidden
common causes






























2
1
2221
11
2221
11
2
1
00
2
1
f
f
aa
a
e
e
aa
a
f
f
f
f
Different causal directions give
different data distributions
(Hoyer, Shimizu, Kerminen and Palviainen, 2008, IJAR)
• Faithfulness + N. hidden common causes “known”
31
x1 x2
f1
x1 x2
or
fQ f1 fQ
… …
2e1e2e1e
2121
1
222
1
1
111
exbfx
efx
Q
q
qq
Q
q
qq








2
1
222
1212
1
111
efx
exbfx
Q
q
qq
Q
q
qq








1x1x
2x2x
Previous estimation approaches
• Explicitly model hidden common causes and
compare two models with opposite directions of
causation
– Maximum likelihood principle (Hoyer+08IJAR)
– Bayesian model selection (Henao & Winther, 2011, JMLR)
• Require us to specify the number of hidden
common causes, which is difficult in general
32
x1 x2
f1
x1 x2
orfQ f1 fQ
… …
2e1e2e1e
Our proposal:
a Bayesian approach
S. Shimizu and K. Bollen.
Journal of Machine Learning Research,
2014
)(
2
m

)1(
1x )1(
2x
)(
2
m
x
)1(
1x
)(
2
)(
121
1
)(
22
)(
2
mm
Q
q
m
qq
m
exbfx  

Key idea (1/2)
• Another look at the LiNGAM with hidden common
causes:
34
x1 x2
f1 fQ…
2e1e
m-th obs.:
)1(
2e)1(
1e
)(
2
m
e)(
1
m
e
……
21b
21b
21b
)(
22
m
 
)1(
22  
Observations are generated from the LiNGAM
model with possibly different intercepts )(
22
m
 
Key idea (2/2)
• Include the sums of hidden common
causes as the observation-specific
intercepts:
• Not explicitly model hidden common
causes
– Neither necessary to specify the number of
hidden common causes Q nor estimate the
coefficients
35
)(
2
m

)(
2
)(
121
1
)(
22
)(
2
mm
Q
q
m
qq
m
exbfx  
m-th obs.:
q2
Obs.-specific
intercept
• Compare the marginal likelihoods of these two
models with opposite directions
• Many additional parameters
– Similar to mixed models and multi-level models
– Informative Prior for the observation-specific intercepts
)()(
121
)(
22
)(
2
)(
1
)(
11
)(
1
m
i
mmm
mmm
exbx
ex




Bayesian model selection
36
),,1;2,1()(
nmim
i 
Model 3 (x1  x2)
)(
2
)(
22
)(
2
)(
1
)(
212
)(
11
)(
1
mmm
mmmm
ex
exbx




Model 4 (x1  x2)
v
Prior for the observation-specific
intercepts
• Motivation: Central limit theorem
– Sums of independent variables tend to be more Gaussian
• Approximate the density by a bell-shaped curve dist.
• Select the hyper-parameter values that maximize the
marginal likelihood
–
– DOF fixed to be 6 in the experiments below
37
 

Q
q
m
qq
m
Q
q
m
qq
m
ff
1
)(
2
)(
2
1
)(
1
)(
1 , 
~)(
2
)(
1






m
m


t-distribution with sd ,
correlation , and DOF12
21,
v
)},(sd0.1,),(sd2.0,0{ lll xx   }9.0,,1.0,0{12  
The chocolate data revisited
Corr. 0.791
P-value < 0.001
Nobel
Chocolate
Gaussianity rejected for both
``Chocolate consumption”
and ``Num. Nobel laureates’’
Model comparison
• No method available before to compare these two
39
Conclusions
Conclusions
• Estimation of causal direction in the presence of
hidden common causes is a major challenge in
causal discovery
• Proposed a linear non-Gaussian SEM with
possibly different intercepts
– Not require to specify the number of hidden common
causes
• Future work
– Sensitivity to the choice of prior distributions
– Better estimation methods computationally and
statistically efficient … and many others
41
42
Pairwise
analysis
High-dimensional cases
• Huge number of candidate networks
• Analyze every pair of variables and Integrate the
results to get an entire causal ordering
• Simpler than trying all the combinations of
causal orders
43
x1
x2
x4
x3
f1
f3
x1 x2
x3 x4
x1
x2
x4
x3
f1
f3
Full graph
Prune
redundant
edges
Integrate
the results
Non-Gaussian
x2
x1
Gaussian e1,e2, f1
x2
• Faithfulness on 𝑥𝑖, 𝑓𝑖 + Number of 𝑓𝑖 given
Different zero/non-zero patterns
of the mixing matrices (Hoyer+08IJAR)
44
x1 x2
f1
x1 x2
f1
x1 x2
f1
Models
1.
2.
3.






**0
*0*






***
*0*






**0
***
A
A

More Related Content

What's hot

Ch6.4 & 6.8 Systems of Equations and Inequalities
Ch6.4 & 6.8 Systems of Equations and InequalitiesCh6.4 & 6.8 Systems of Equations and Inequalities
Ch6.4 & 6.8 Systems of Equations and Inequalitiesmdicken
 
8th alg -l5.6--jan7
8th alg -l5.6--jan78th alg -l5.6--jan7
8th alg -l5.6--jan7jdurst65
 
Chapter3 econometrics
Chapter3 econometricsChapter3 econometrics
Chapter3 econometricsVu Vo
 
Chapter8
Chapter8Chapter8
Chapter8Vu Vo
 
Ancestral Causal Inference - WIML 2016 @ NIPS
Ancestral Causal Inference - WIML 2016 @ NIPSAncestral Causal Inference - WIML 2016 @ NIPS
Ancestral Causal Inference - WIML 2016 @ NIPSSara Magliacane
 
Talk: Joint causal inference on observational and experimental data - NIPS 20...
Talk: Joint causal inference on observational and experimental data - NIPS 20...Talk: Joint causal inference on observational and experimental data - NIPS 20...
Talk: Joint causal inference on observational and experimental data - NIPS 20...Sara Magliacane
 
Causally regularized machine learning
Causally regularized machine learningCausally regularized machine learning
Causally regularized machine learningWanjin Yu
 
Regression analysis by Muthama JM
Regression analysis by Muthama JMRegression analysis by Muthama JM
Regression analysis by Muthama JMJapheth Muthama
 
Math 3 exponential functions
Math 3 exponential functionsMath 3 exponential functions
Math 3 exponential functionsRon_Eick
 
Inverse of functions
Inverse of functionsInverse of functions
Inverse of functionsLeo Crisologo
 
11.application of matrix algebra to multivariate data using standardize scores
11.application of matrix algebra to multivariate data using standardize scores11.application of matrix algebra to multivariate data using standardize scores
11.application of matrix algebra to multivariate data using standardize scoresAlexander Decker
 
Application of matrix algebra to multivariate data using standardize scores
Application of matrix algebra to multivariate data using standardize scoresApplication of matrix algebra to multivariate data using standardize scores
Application of matrix algebra to multivariate data using standardize scoresAlexander Decker
 
Worst Practices in Statistical Data Analysis
Worst Practices in  Statistical Data AnalysisWorst Practices in  Statistical Data Analysis
Worst Practices in Statistical Data AnalysisRichard Gill
 
Estimation of Parameters and Missing Responses In Second Order Response Surfa...
Estimation of Parameters and Missing Responses In Second Order Response Surfa...Estimation of Parameters and Missing Responses In Second Order Response Surfa...
Estimation of Parameters and Missing Responses In Second Order Response Surfa...inventionjournals
 

What's hot (18)

Ch6.4 & 6.8 Systems of Equations and Inequalities
Ch6.4 & 6.8 Systems of Equations and InequalitiesCh6.4 & 6.8 Systems of Equations and Inequalities
Ch6.4 & 6.8 Systems of Equations and Inequalities
 
8th alg -l5.6--jan7
8th alg -l5.6--jan78th alg -l5.6--jan7
8th alg -l5.6--jan7
 
Chapter3 econometrics
Chapter3 econometricsChapter3 econometrics
Chapter3 econometrics
 
Chapter8
Chapter8Chapter8
Chapter8
 
Ancestral Causal Inference - WIML 2016 @ NIPS
Ancestral Causal Inference - WIML 2016 @ NIPSAncestral Causal Inference - WIML 2016 @ NIPS
Ancestral Causal Inference - WIML 2016 @ NIPS
 
Talk: Joint causal inference on observational and experimental data - NIPS 20...
Talk: Joint causal inference on observational and experimental data - NIPS 20...Talk: Joint causal inference on observational and experimental data - NIPS 20...
Talk: Joint causal inference on observational and experimental data - NIPS 20...
 
gls
glsgls
gls
 
Causally regularized machine learning
Causally regularized machine learningCausally regularized machine learning
Causally regularized machine learning
 
Regression analysis by Muthama JM
Regression analysis by Muthama JMRegression analysis by Muthama JM
Regression analysis by Muthama JM
 
Math 3 exponential functions
Math 3 exponential functionsMath 3 exponential functions
Math 3 exponential functions
 
Inverse of functions
Inverse of functionsInverse of functions
Inverse of functions
 
11.application of matrix algebra to multivariate data using standardize scores
11.application of matrix algebra to multivariate data using standardize scores11.application of matrix algebra to multivariate data using standardize scores
11.application of matrix algebra to multivariate data using standardize scores
 
Application of matrix algebra to multivariate data using standardize scores
Application of matrix algebra to multivariate data using standardize scoresApplication of matrix algebra to multivariate data using standardize scores
Application of matrix algebra to multivariate data using standardize scores
 
Worst Practices in Statistical Data Analysis
Worst Practices in  Statistical Data AnalysisWorst Practices in  Statistical Data Analysis
Worst Practices in Statistical Data Analysis
 
Functions
FunctionsFunctions
Functions
 
Estimation of Parameters and Missing Responses In Second Order Response Surfa...
Estimation of Parameters and Missing Responses In Second Order Response Surfa...Estimation of Parameters and Missing Responses In Second Order Response Surfa...
Estimation of Parameters and Missing Responses In Second Order Response Surfa...
 
Review for final exam
Review for final examReview for final exam
Review for final exam
 
Discrete and Continuous Random Variables
Discrete and Continuous Random VariablesDiscrete and Continuous Random Variables
Discrete and Continuous Random Variables
 

Viewers also liked

構造方程式モデルによる因果推論: 因果構造探索に関する最近の発展
構造方程式モデルによる因果推論: 因果構造探索に関する最近の発展構造方程式モデルによる因果推論: 因果構造探索に関する最近の発展
構造方程式モデルによる因果推論: 因果構造探索に関する最近の発展Shiga University, RIKEN
 
構造方程式モデルによる因果探索と非ガウス性
構造方程式モデルによる因果探索と非ガウス性構造方程式モデルによる因果探索と非ガウス性
構造方程式モデルによる因果探索と非ガウス性Shiga University, RIKEN
 
因果探索: 観察データから 因果仮説を探索する
因果探索: 観察データから因果仮説を探索する因果探索: 観察データから因果仮説を探索する
因果探索: 観察データから 因果仮説を探索するShiga University, RIKEN
 
非ガウス性を利用した 因果構造探索
非ガウス性を利用した因果構造探索非ガウス性を利用した因果構造探索
非ガウス性を利用した 因果構造探索Shiga University, RIKEN
 
PDF uncertainties the LHC made easy: a compression algorithm for the combinat...
PDF uncertainties the LHC made easy: a compression algorithm for the combinat...PDF uncertainties the LHC made easy: a compression algorithm for the combinat...
PDF uncertainties the LHC made easy: a compression algorithm for the combinat...juanrojochacon
 
因果探索: 基本から最近の発展までを概説
因果探索: 基本から最近の発展までを概説因果探索: 基本から最近の発展までを概説
因果探索: 基本から最近の発展までを概説Shiga University, RIKEN
 
『バックドア基準の入門』@統数研研究集会
『バックドア基準の入門』@統数研研究集会『バックドア基準の入門』@統数研研究集会
『バックドア基準の入門』@統数研研究集会takehikoihayashi
 

Viewers also liked (7)

構造方程式モデルによる因果推論: 因果構造探索に関する最近の発展
構造方程式モデルによる因果推論: 因果構造探索に関する最近の発展構造方程式モデルによる因果推論: 因果構造探索に関する最近の発展
構造方程式モデルによる因果推論: 因果構造探索に関する最近の発展
 
構造方程式モデルによる因果探索と非ガウス性
構造方程式モデルによる因果探索と非ガウス性構造方程式モデルによる因果探索と非ガウス性
構造方程式モデルによる因果探索と非ガウス性
 
因果探索: 観察データから 因果仮説を探索する
因果探索: 観察データから因果仮説を探索する因果探索: 観察データから因果仮説を探索する
因果探索: 観察データから 因果仮説を探索する
 
非ガウス性を利用した 因果構造探索
非ガウス性を利用した因果構造探索非ガウス性を利用した因果構造探索
非ガウス性を利用した 因果構造探索
 
PDF uncertainties the LHC made easy: a compression algorithm for the combinat...
PDF uncertainties the LHC made easy: a compression algorithm for the combinat...PDF uncertainties the LHC made easy: a compression algorithm for the combinat...
PDF uncertainties the LHC made easy: a compression algorithm for the combinat...
 
因果探索: 基本から最近の発展までを概説
因果探索: 基本から最近の発展までを概説因果探索: 基本から最近の発展までを概説
因果探索: 基本から最近の発展までを概説
 
『バックドア基準の入門』@統数研研究集会
『バックドア基準の入門』@統数研研究集会『バックドア基準の入門』@統数研研究集会
『バックドア基準の入門』@統数研研究集会
 

Similar to Non-Gaussian structural equation models for causal discovery

Maths iii quick review by Dr Asish K Mukhopadhyay
Maths iii quick review by Dr Asish K MukhopadhyayMaths iii quick review by Dr Asish K Mukhopadhyay
Maths iii quick review by Dr Asish K MukhopadhyayDr. Asish K Mukhopadhyay
 
Optimization Techniques.pdf
Optimization Techniques.pdfOptimization Techniques.pdf
Optimization Techniques.pdfanandsimple
 
Applied numerical methods lec8
Applied numerical methods lec8Applied numerical methods lec8
Applied numerical methods lec8Yasser Ahmed
 
optimizedBell.pptx
optimizedBell.pptxoptimizedBell.pptx
optimizedBell.pptxRichard Gill
 
Regression analysis presentation
Regression analysis presentationRegression analysis presentation
Regression analysis presentationMuhammadFaisal733
 
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.pptArtificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.pptAnonymous9etQKwW
 
An RKHS Approach to Systematic Kernel Selection in Nonlinear System Identific...
An RKHS Approach to Systematic Kernel Selection in Nonlinear System Identific...An RKHS Approach to Systematic Kernel Selection in Nonlinear System Identific...
An RKHS Approach to Systematic Kernel Selection in Nonlinear System Identific...Yusuf Bhujwalla
 
Supporting Vector Machine
Supporting Vector MachineSupporting Vector Machine
Supporting Vector MachineSumit Singh
 

Similar to Non-Gaussian structural equation models for causal discovery (15)

ERF Training Workshop Panel Data 3
ERF Training WorkshopPanel Data 3ERF Training WorkshopPanel Data 3
ERF Training Workshop Panel Data 3
 
MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...
MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...
MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...
 
Maths iii quick review by Dr Asish K Mukhopadhyay
Maths iii quick review by Dr Asish K MukhopadhyayMaths iii quick review by Dr Asish K Mukhopadhyay
Maths iii quick review by Dr Asish K Mukhopadhyay
 
Les5e ppt 08
Les5e ppt 08Les5e ppt 08
Les5e ppt 08
 
Optimization Techniques.pdf
Optimization Techniques.pdfOptimization Techniques.pdf
Optimization Techniques.pdf
 
Implicit differentiation
Implicit differentiationImplicit differentiation
Implicit differentiation
 
Applied numerical methods lec8
Applied numerical methods lec8Applied numerical methods lec8
Applied numerical methods lec8
 
optimizedBell.pptx
optimizedBell.pptxoptimizedBell.pptx
optimizedBell.pptx
 
Regression analysis presentation
Regression analysis presentationRegression analysis presentation
Regression analysis presentation
 
sumstats.ppt
sumstats.pptsumstats.ppt
sumstats.ppt
 
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
 
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.pptArtificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
 
117 chap8 slides
117 chap8 slides117 chap8 slides
117 chap8 slides
 
An RKHS Approach to Systematic Kernel Selection in Nonlinear System Identific...
An RKHS Approach to Systematic Kernel Selection in Nonlinear System Identific...An RKHS Approach to Systematic Kernel Selection in Nonlinear System Identific...
An RKHS Approach to Systematic Kernel Selection in Nonlinear System Identific...
 
Supporting Vector Machine
Supporting Vector MachineSupporting Vector Machine
Supporting Vector Machine
 

Recently uploaded

Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxdharshini369nike
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Masticationvidulajaib
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Cytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxCytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxVarshiniMK
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |aasikanpl
 

Recently uploaded (20)

Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Temporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of MasticationTemporomandibular joint Muscles of Mastication
Temporomandibular joint Muscles of Mastication
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Cytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxCytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptx
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Lajpat Nagar (Delhi) |
 

Non-Gaussian structural equation models for causal discovery

  • 1. Shohei Shimizu Osaka University, Japan 1 Non-Gaussian structural equation models for causal discovery 2016 Probabilistic Graphical Model Workshop: Sparsity, Structure and High-dimensionality References: https://sites.google.com/site/sshimizu06/home/lingampapers
  • 2. Abstract • Estimation of causal direction and connection strength of two observed variables in the presence of hidden common causes • A key challenge in causal discovery • Propose a non-Gaussian model – Not require us to specify the number of hidden common causes 2
  • 4. Significant correlation btw Chocolate consumption and Num. Nobel laureates (Messerli12NEJM) 4 2002-2011Chocolate consumption (kg/yr/capita) Num.Nobellaureatesper10millionpop. Corr. 0.791 P-value < 0.001
  • 5. Eating more chocolate increases the number of Nobel laureates?? • Interpretational Drift (Maurage+13, J. Nutrition) 5 Chclt Nobel ?Chclt Nobel or GDP GDP Chclt Nobel or GDP Corr. 0.791 P-value < 0.001 Nobel Chocolate Hidden Common cause Manage this gap! Hidden Common cause Hidden Common cause
  • 6. Under what conditions can we manage this gap? • We have shown that it is possible under the three assumptions (Hoyer+08IJAR; Shimizu+14JMLR) – Linearity – Acyclity – Non-Gaussianity • Performing interventions often very hard • Theory closely related to independent component analysis (ICA) (Hyvarinen+01) 6
  • 7. 7 Many application areas Epidemiology Economics Neuroscience Chemistry Sleep problems Depression mood Sleep problems Depression mood ? or OpInc.gr(t) Empl.gr(t) Sales.gr(t) R&D.gr(t) Empl.gr(t+1) Sales.gr(t+1) R&D(.grt+1) OpInc.gr(t+1) Empl.gr(t+2) Sales.gr(t+2) R&D.gr(t+2) OpInc.gr(t+2) (Moneta et al., 2012)(Rosenstrom et al., 2012) Policy evaluation (Campomanes et al., 2014) Causal information flow Improving health and QOL (Boukrina & Graves, 2013) What changes absorption spectra?
  • 8. Brief review of structural causal models
  • 9. Structural causal models (Pearl, 2000) • A framework for describing causal relations (or data generating processes) • An example of linear cases: • Generally speaking, if the value of 𝑥1 has been changed and then that of 𝑥2 changes, then 𝑥1 causes 𝑥2 9 𝒙 𝟐 ∶= 𝒃 𝟐𝟏 𝒙 𝟏 + 𝒆 𝟐 𝒙 𝟏 ∶= 𝒆 𝟏 x2x1 e1 e2 e1 and e2 dependent
  • 10. 73 Changing the value of x1 from c to d • Replacing the function determining x1 with a constant c, denoted by do(x1=c), and then change the constant to d (Pearl, 2000) 21212 11 exbx ex   21212 1 exbx cx   Intervention: do(x1=c) x2x1 e1 e2 x2x1 c e2
  • 11. 74 Average causal effect (Rubin, 1974; Pearl, 2000) • Average causal effect of x1 on x2 when changing x1 from c to d – Computed based on the models with do(x1=d) and do(x1=c) •        cdb cxdoxEdxdoxE   21 1212 ||  cdbxE dcx 212 1 bychangewill)(then ,tofromofvaluethechangedhaveyouIf
  • 13. 13 Estimation of causal direction • Suppose that data X was randomly generated from either of the following two models: • Estimate which model generated the data X based on the data X only or 21212 11 exbx ex   22 12121 ex exbx   Model 1: Model 2: )0( 21 b x1x2 e2 e1 x1x2 e2 e1 12b21b )0( 12 b
  • 14. Major difficulty • Errors and are often dependent • Regression coefficient of on is not equal to even if we know the right causal direction 14 or 21212 11 exbx ex   22 12121 ex exbx   Model 1: Model 2: x1x2 e2 e1 x1x2 e2 e1 12b21b 21b 1e 2e 1x2x )0( 21 b )0( 12 b
  • 15. Hidden common causes • Such dependency is typically introduced by hidden common causes, say 15 or Model 1’: Model 2’: x1x2 e’2 e’1 21b   2 21211212 1 11111 e efxbx e efx     1f 1f x1x2 e’2 e’1 12b 1f   2 21212 1 11112121 e efx e efxbx    
  • 16. A well-known guideline (Pearl2000; Spirtes+1993) • Observe the hidden common cause , incorporate it in the models, and carry out three-variable analysis • Errors independent! 16 1f or Model 1’: Model 2’: x1x2 e’2 e’1 21b 21211212 11111 efxbx efx     1f x1x2 e’2 e’1 12b 1f 21212 11112121 efx efxbx     21, ee 
  • 17. Following the guideline is often very hard • A large number of hidden common causes may exist (Q unknown) • Often no idea what they are 17 Qfff ,,, 21  or Model 1’: Model 2’: x1x2 e’2 e’1 21b 221212 111 efxbx efx q qq q qq       1f 222 112121 efx efxbx q qq q qq       Qf x1x2 e’2 e’1 12b 1f Qf
  • 18. 18 Estimation of causal direction in the presence of hidden common causes • Estimate which model generated the data X or Model 1’: Model 2’: x1x2 e’2 e’1 21b 221212 111 efxbx efx q qq q qq       1f 222 112121 efx efxbx q qq q qq       Qf x1x2 e’2 e’1 12b 1f Qf qf
  • 19. Note • If we intervene on x1 (and x2), we have no hidden common causes • But, ethically and costly often difficult to do interventions 19 Model 1’: x1x2 e’2 e’1 21b 221212 111 efxbx efx q qq q qq       1f Qf Model 1’’: x1x2 e’2 c 21b cx 1 1f Qf 221212 efxbx q qq  
  • 20. 1. Estimation of causal direction when temporal information is not available 2. Managing hidden common causes 20 Major challenges x1 x2 ? x1 x2 or x1 x2 ?x1 x2 or f1 f1
  • 21. Basic non-Gaussian model (No hidden common cause) S. Shimizu, P. O. Hoyer, A. Hyvärinen and A. Kerminen. Journal of Machine Learning Research, 2006.
  • 22. • Implying no hidden common causes • The two models distinguishable if the errors e1 and e2 are non-Gaussian (Dodge+00CSTM; Shimizu+06JMLR) Independent errors 22 or 21212 11 exbx ex   22 12121 ex exbx   Model 1: Model 2: x1x2 e2 e1 x1x2 e2 e1 12b21b )0,( 2112 bb
  • 23. 2323 Different directions give different data distributions Gaussian Non-Gaussian Model 1: Model 2: x1 x2 x1 x2 e1 e2 x1 x2 e1 e2 x1 x2 x1 x2 x1 x2 212 11 8.0 exx ex   22 121 8.0 ex exx       1varvar 21  xx     ,021  eEeE
  • 24. 24 Independent Component Analysis (ICA) (Jutten & Herault, 1991; Comon, 1994) • Observed random vector x is modeled by where – The mixing matrix A = [ ] – The hidden variables (independent components) are non-Gaussian and mutually independent • Then, A is identifiable up to permutation and scaling of the columns Asx  is   p j jiji sax 1 or ija
  • 25. Sketch of the identifiability proof • Different directions give different zero/non- zero patterns of the mixing matrices – No zeros on the diagonal in the causal model – No permutation indeterminacy 25                     2 1 212 1 1 01 e e bx x  21212 11 exbx ex   A sx                     2 112 2 1 10 1 e eb x x  A sx 22 12121 ex exbx   x1 x2 e1 e2 x1 x2 e1 e2 0 0
  • 26. Linear Non-Gaussian Acyclic Models (LiNGAM) (Shimizu+06JMLR) • Identifiable: Directions, coefficients, and intercepts – Can be uniquely estimated without knowing the causal structure 26 i ij jijii exbx    x1 x2 x3 21b 23b13b 2e 3e 1e Acyclicity Non-Gaussian errors ei Independence of errors ei (no hidden common causes)
  • 27. Extensions • Cyclic models (Lacerda+08UAI; Hyvarinen+13JMLR) • Time series (Hyvarinen+10JMLR; Huang+15IJCAI; Gong15ICML) • Nonlinearity (Zhang+09UAI; Peters+14JMLR; cf. Imoto02PSB) • Discrete variables (Peters+11TPAMI; Park+15NIPS) 27   iiiii exofparentsffx   1, 1 2, x1x2e2 e1 )()()( 0 ttt k exBx    
  • 28. LiNGAM with hidden common causes P. O. Hoyer, S. Shimizu, A. Kerminen, and M. Palviainen. Int. J. Approximate Reasoning 2008
  • 29. • Extension to incorporate non-Gaussian hidden common causes i ij jij Q q qiqii exbfx   1  LiNGAM with hidden common causes (Hoyer+08IJAR) 29 where are independent:),,1( Qqfq  qf x1 x2 2e1e 1f 2f 2121 1 222 1 1 111 exbfx efx Q q qq Q q qq        
  • 30. qfWLG, hidden common causes are assumed to be independent Independent hidden common causes i ij jij Q q qiqii exbfx   1  30 x1 x2 2e1e 1f e 2f e x1 x2 2e1e 1 :1 f ef 2 :2 f ef 1f 2f Dependent hidden common causes                               2 1 2221 11 2221 11 2 1 00 2 1 f f aa a e e aa a f f f f
  • 31. Different causal directions give different data distributions (Hoyer, Shimizu, Kerminen and Palviainen, 2008, IJAR) • Faithfulness + N. hidden common causes “known” 31 x1 x2 f1 x1 x2 or fQ f1 fQ … … 2e1e2e1e 2121 1 222 1 1 111 exbfx efx Q q qq Q q qq         2 1 222 1212 1 111 efx exbfx Q q qq Q q qq         1x1x 2x2x
  • 32. Previous estimation approaches • Explicitly model hidden common causes and compare two models with opposite directions of causation – Maximum likelihood principle (Hoyer+08IJAR) – Bayesian model selection (Henao & Winther, 2011, JMLR) • Require us to specify the number of hidden common causes, which is difficult in general 32 x1 x2 f1 x1 x2 orfQ f1 fQ … … 2e1e2e1e
  • 33. Our proposal: a Bayesian approach S. Shimizu and K. Bollen. Journal of Machine Learning Research, 2014
  • 34. )( 2 m  )1( 1x )1( 2x )( 2 m x )1( 1x )( 2 )( 121 1 )( 22 )( 2 mm Q q m qq m exbfx    Key idea (1/2) • Another look at the LiNGAM with hidden common causes: 34 x1 x2 f1 fQ… 2e1e m-th obs.: )1( 2e)1( 1e )( 2 m e)( 1 m e …… 21b 21b 21b )( 22 m   )1( 22   Observations are generated from the LiNGAM model with possibly different intercepts )( 22 m  
  • 35. Key idea (2/2) • Include the sums of hidden common causes as the observation-specific intercepts: • Not explicitly model hidden common causes – Neither necessary to specify the number of hidden common causes Q nor estimate the coefficients 35 )( 2 m  )( 2 )( 121 1 )( 22 )( 2 mm Q q m qq m exbfx   m-th obs.: q2 Obs.-specific intercept
  • 36. • Compare the marginal likelihoods of these two models with opposite directions • Many additional parameters – Similar to mixed models and multi-level models – Informative Prior for the observation-specific intercepts )()( 121 )( 22 )( 2 )( 1 )( 11 )( 1 m i mmm mmm exbx ex     Bayesian model selection 36 ),,1;2,1()( nmim i  Model 3 (x1  x2) )( 2 )( 22 )( 2 )( 1 )( 212 )( 11 )( 1 mmm mmmm ex exbx     Model 4 (x1  x2)
  • 37. v Prior for the observation-specific intercepts • Motivation: Central limit theorem – Sums of independent variables tend to be more Gaussian • Approximate the density by a bell-shaped curve dist. • Select the hyper-parameter values that maximize the marginal likelihood – – DOF fixed to be 6 in the experiments below 37    Q q m qq m Q q m qq m ff 1 )( 2 )( 2 1 )( 1 )( 1 ,  ~)( 2 )( 1       m m   t-distribution with sd , correlation , and DOF12 21, v )},(sd0.1,),(sd2.0,0{ lll xx   }9.0,,1.0,0{12  
  • 38. The chocolate data revisited Corr. 0.791 P-value < 0.001 Nobel Chocolate Gaussianity rejected for both ``Chocolate consumption” and ``Num. Nobel laureates’’
  • 39. Model comparison • No method available before to compare these two 39
  • 41. Conclusions • Estimation of causal direction in the presence of hidden common causes is a major challenge in causal discovery • Proposed a linear non-Gaussian SEM with possibly different intercepts – Not require to specify the number of hidden common causes • Future work – Sensitivity to the choice of prior distributions – Better estimation methods computationally and statistically efficient … and many others 41
  • 42. 42
  • 43. Pairwise analysis High-dimensional cases • Huge number of candidate networks • Analyze every pair of variables and Integrate the results to get an entire causal ordering • Simpler than trying all the combinations of causal orders 43 x1 x2 x4 x3 f1 f3 x1 x2 x3 x4 x1 x2 x4 x3 f1 f3 Full graph Prune redundant edges Integrate the results
  • 44. Non-Gaussian x2 x1 Gaussian e1,e2, f1 x2 • Faithfulness on 𝑥𝑖, 𝑓𝑖 + Number of 𝑓𝑖 given Different zero/non-zero patterns of the mixing matrices (Hoyer+08IJAR) 44 x1 x2 f1 x1 x2 f1 x1 x2 f1 Models 1. 2. 3.       **0 *0*       *** *0*       **0 *** A A