Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A non-Gaussian model for causal discovery in the presence of hidden common causes

1,626 views

Published on

Talk slides at 2016 Munich Workshop on
Causal Inference and Information Theory

Published in: Science
  • Be the first to comment

A non-Gaussian model for causal discovery in the presence of hidden common causes

  1. 1. Shohei Shimizu Shiga University / Osaka University Japan 1 A non-Gaussian model for causal discovery in the presence of hidden common causes 2016 Munich Workshop on Causal Inference and Information Theory
  2. 2. Abstract • Managing hidden common causes is essential in causal discovery • Non-causally-related observed variables can be correlated due to hidden common causes • Propose a linear non-Gaussian model for estimating causal direction in cases with hidden common causes 2
  3. 3. Motivation Illustrative example
  4. 4. Strong correlation btw chocolate consumption and number of Nobel laureates (Messerli12NEJM) 4 2002-2011Chocolate consumption (kg/yr/capita) Num.Nobellaureatesper10millionpop. Corr. 0.791 P-value < 0.001
  5. 5. Eating more chocolate increases num. Nobel laureates? • Interpretational drift (Maurage+13, J. Nutrition) 5 Chclt Nobel ?Chclt Nobel or GDP GDP Chclt Nobel or GDP Corr. 0.791 P-value < 0.001 Nobel Chocolate Hidden Common cause Manage this gap! Hidden Common cause Hidden Common cause
  6. 6. Formulating the problem
  7. 7. Structural causal models (Pearl, 2000,2009; cf. Bollen, 1989) • A framework for describing causal relations • Generally speaking, if the value of 𝑥1 has been changed and then that of 𝑥2 changes, then 𝑥1 causes 𝑥2 7    2122 111 ,, , efxgx efgx   x1 x2 f e1 e2 GDP NobelChclt
  8. 8. Challenge in causal discovery 8 Hidden common cause    2122 111 ,, , efxgx efgx   Data matrix x1 x2  21... ,~ xxpdii obs.1 Assume that either of the three generated the data Estimate which of the three models generated the data obs.nobs.2 … x1 x2 f x1 x2 f x1 x2 f e1 e2 e1 e2 e1 e2      fpepep ,, 21 Hidden common cause Hidden common cause    222 1211 , ,, efgx efxgx      222 111 , , efgx efgx        fpepep ,, 21      fpepep ,, 21
  9. 9. Under what conditions can we manage the gap? • We have shown that it is possible under the three assumptions: i) linearity; ii) Acyclicty; iii) non-Gaussianity (Hoyer+08IJAR; Shimizu+14JMLR): • Classical Bayesian network approach incapable 9 x1 x2 ? x1 x2 or f1 f1 x1 x2 f1 or 21211212 11121 efxbx efx     21212 11122121 efx efxbx     22212 11121 efx efx    
  10. 10. Basic non-Gaussian model (No hidden common cause) S. Shimizu, P. O. Hoyer, A. Hyvärinen and A. Kerminen Journal of Machine Learning Research 2006
  11. 11. Linear Non-Gaussian Acyclic Model (LiNGAM) (Shimizu et al., 2006) • Identifiable: causal directions and coefficients • Various extensions including nonlinear (Hoyer+08NIPS, Zhang+09UAI) and cyclic (Lacerda+08UAI) models 11 i ij jiji exbx   x1 x2 x3 21b 23b13b 2e 3e 1e Linearity Acyclicity Non-Gaussian errors ei Independence of errors ei (no hidden common causes)
  12. 12. 1212 Different directions give different data distributions Gaussian Non-Gaussian (ex. uniform) Model 1: Model 2: x1 x2 x1 x2 e1 e2 x1 x2 e1 e2 x1 x2 x1 x2 x1 x2 212 11 8.0 exx ex   22 121 8.0 ex exx       1varvar 21  xx     ,021  eEeE
  13. 13. 13 Independent Component Analysis (ICA) (Jutten & Herault, 1991; Comon, 1994; Hyvarinen et al., 2001) • Observed variables are modeled by where – Hidden variables are non-Gaussian and independent • Then, mixing matrix A is identifiable up to permutation and scaling of the columns Asx   pjsj ,,1    p j jiji sax 1 or ix
  14. 14. Sketch of the identifiability proof • Different directions give different zero/non- zero patterns of the mixing matrices – No zeros on the diagonal in the causal model – No permutation indeterminacy 14                     2 1 212 1 1 01 e e bx x  21212 11 exbx ex   A sx                     2 112 2 1 10 1 e eb x x  A sx22 12121 ex exbx   x1 x2 e1 e2 x1 x2 e1 e2 0 0 Model 1: Model 2:
  15. 15. LiNGAM with hidden common causes P. O. Hoyer, S. Shimizu, A. Kerminen, and M. Palviainen Int. J. Approximate Reasoning 2008
  16. 16. qf 2121 1 22 1 1 11 exbfx efx Q q qq Q q qq         i ij jij Q q qiqi exbfx   1  • Extension to incorporate non-Gaussian hidden common causes LiNGAM with hidden common causes (Hoyer+08IJAR) 16 where are independent:),,1( Qqfq  x1 x2 2e1e 1f 2f
  17. 17. i ij jij Q q qiqi exbfx   1  2 :2 f ef1 :1 f ef qfWLG, hidden common causes are assumed to be independent Independent hidden common causes 17 x1 x2 2e1e 1f e 2f e x1 x2 2e1e 1f 2f Dependent hidden common causes                               2 1 2221 11 2221 11 2 1 00 2 1 f f aa a e e aa a f f f f
  18. 18. Non-Gaussian x2 x1 Gaussian e1,e2, f1 x2 • Faithfulness on 𝑥𝑖, 𝑓𝑖 + Number of 𝑓𝑖 given Different directions give different zero/non-zero patterns (Hoyer+08IJAR) 18 x1 x2 f1 x1 x2 f1 x1 x2 f1 Models 1. 2. 3.       **0 *0*       *** *0*       **0 *** A A
  19. 19. Previous estimation methods (Hoyer+08IJAR; Henao+11JMLR) • Explicitly model hidden common causes • Do model comparison based on maximum likelihood principle or Bayesian approach • Need to specify their number and distributions, which is difficult in general 19 x1 x2 f1 x1 x2 orfQ f1 fQ … … 2e1e2e1e
  20. 20. Our proposal: A Bayesian LiNGAM approach S. Shimizu and K. Bollen. Journal of Machine Learning Research, 2014 and something extra
  21. 21. Key idea (1/2) • Transform the model to a model with no hidden common causes 21 )1( 1x )1( 2x )( 2 m x )1( 1x x1 x2 f1 fQ… 2e1e )1( 2e)1( 1e )( 2 m e)( 1 m e …… 21b 21b 21b )( 2 m  )1( 2 LiNGAM with no hidden common causes but with possibly different intercepts over obs. LiNGAM with hidden common causes )1( 1 )( 1 m 
  22. 22. Key idea (2/2) • Include the sums of hidden common causes as the model parameters, i.e., observation-specific intercepts: • Not explicitly model hidden common causes – Neither necessary to specify the number of hidden common causes Q nor estimate the coefficients 22 )( 2 m  )( 2 )( 121 1 )( 2 )( 2 mm Q q m qq m exbfx   m-th obs.: q2 Obs.-specific intercept
  23. 23. • Compare the marginal likelihoods wth data stndrdzd • Once a direction has been estimated, compute the posterior of the connection strength b21 or b12 • Many obs.-specific intercepts – Similar to mixed models and multi-level models – Informative prior )()( 121 )( 2 )( 2 )( 1 )( 1 )( 1 m i mmm mmm exbx ex     Bayesian model selection 23 ),,1;2,1()( nmim i  Model 3 (x1  x2) )( 2 )( 2 )( 2 )( 1 )( 212 )( 1 )( 1 mmm mmmm ex exbx     Model 4 (x1  x2)
  24. 24. Prior for the observation-specific intercepts • Motivation: Central limit theorem – Sums of independent variables tend to be more Gaussian • Approximate the density by a bell-shaped curve dist. – Dependent due to hidden common causes • Select the hyper-parameter values that maximize the marginal likelihood 24    Q q m qq m Q q m qq m ff 1 )( 2 )( 2 1 )( 1 )( 1 ,  ~)( 2 )( 1       m m   t-distribution with sd , correlation , and DOF12 21, v }8.0,.6.0,4.0{, 21  )(m qf (here, 8)
  25. 25. Error distributions and other priors used in the experiment • Error distributions – Fixed to be the Laplace distribution – Possible to be estimated assuming a family of generalized Gaussian distributions, for example • Priors for the other parameters 25 )75.0,0(~ )75.0,0(~ )1,1(~ 2 21 2 12 12 Nb Nb U  )1,0(~)( )1,0(~)( 2 1 Uestd Uestd )(),( 21 epep
  26. 26. Experiment on sociology data
  27. 27. Sociology data • Source: General Social Survey (n=1380) – Non-farm background, ages 35-44, white, male, in the labor force, no missing data for any of the covariates, 1972-2006 • 15 pairs with known temporal directions (Duncan+1972) 27 Status attainment model (Duncan et al., 1972) x2: Son’s Income
  28. 28. Numbers of successes (n=1380) 28 FE ✔ ✔ Cf. LiNGAM-GU-UK (Chen+13NECO) 0.20; PNL(Zhang+09UAI): 0.60 Known (temporal) orderings of 15 pairs Son’s Education Father’s Education Son’s Income Son’s Occupation … f1 f1
  29. 29. Conclusion
  30. 30. Conclusion • Estimation of causal direction in the presence of hidden common causes is a major challenge in causal discovery • Proposed a linear non-Gaussian SEM approach – Not necessary to model individual hidden common causes • Future directions – Cyclic cases: Using some prior for forcing the identifiability condition of Lacerda+08UAI? – Non-stationarity: Combining with Kun’s method (Huang+15IJACI)? 30

×