Towards Causal
Representation Learning
2021.07.06
Suyeong Park
Causal Inference KR Study
Towards Causal Representation Learning
● Authors
○ Bernhard Schölkopf
○ Francesco Locatello
○ Stefan Bauer
○ Nan Rosemary Ke
○ Nal Kalchbrenner
○ Anirudh Goyal
○ Yoshua Bengio
● Paper
● Talks: Yoshua’s talk, Francesco’s talk
2
source: https://www.is.mpg.de/~bs , https://yoshuabengio.org/
Yoshua Bengio
Bernhard Schölkopf
I. Intro to Causal Learning
3
1. Machine Learning
● 기존 Machine Learning의 문제점
○ i.i.d. 가정이 generalization을 어렵게 한다.
○ distribution shift 상황에서 robust하게 학습이 어렵다.
○ 새로운 시나리오에서도 기존의 모델을 (재)사용하기 어렵다.
○ causality를 모른다.
4
1. Machine Learning
5
U
S B
?
X
source: https://www.nature.com/articles/332495a0.pdf
6
Classification: cows vs. camels A camel?
source: https://bayesgroup.github.io/bmml_sem/2019/Kodryan_Invariant%20Risk%20Minimization.pdf
Spurious Correlation!
1. Machine Learning
1. Machine Learning with Causality
● data generating process를 structural하게 표현하고 intervention을 적용함으로써 이
문제점들(distribution shifts, …)을 해결할 수 있다!
7
intervention
● 기존 Machine Learning의 문제점
○ i.i.d. 가정이 generalization을 어렵게 한다.
○ distribution shift 상황에서 robust하게 학습이 어렵다.
○ 새로운 시나리오에서도 기존의 모델을 (재)사용하기 어렵다.
○ causality를 모른다.
2. Levels of Causal Modeling
● Related questions:
○ What is the probability that this particular image contains a dog?
○ What is the probability of heart failure given certain diagnostic measurements (e.g., blood
pressure) carried out on a patient?
8
2. Levels of Causal Modeling
● Related questions:
○ Is increasing the number of storks in a country going to boost its human birth rate?
○ Would fewer people smoke if cigarettes were more socially stigmatized?
9
2. Levels of Causal Modeling
● Related questions:
○ Counterfactual: Would a given patient have suffered heart failure if they had started
exercising a year earlier?
○ Intervention: how does the probability of heart failure change if we convince a patient to
exercise regularly?
10
2. Levels of Causal Modeling
● Learning from data
○ Observational vs Interventional
○ Unstructured vs Structured
11
II. Causal Representation Learning
12
3. Intervention
● SCM ;
● Intervention: 를 바꾸거나, 를 상수로지정하거나, 의 functional
form을 바꾸는것
○ Hard(perfect) intervention:
○ Soft(imperfect) intervention:
○ Uncertain: intervention으로 어떤 메커니즘이나 변수가 바뀔지 모르는 것
13
3. SCMs: Structural Causal Models
● Intervention
14
3. Intervention
source: paper
3. The Reichenbach Principle (Common Cause Principle)
If two observables X and Y are statistically dependent, then there exists a variable Z
that causally influences both and explains all the dependence in the sense of making
them independent when conditioned on Z.
15
4. Independent Causal Mechanisms (ICM)
The causal generative process of a system’s variables is composed of autonomous modules
that do not inform or influence each other. In the probabilistic case, this means that the
conditional distribution of each variable given its causes (i.e., its mechanism) does not
inform or influence the other mechanisms.
16
4. Independent Causal Mechanisms (ICM)
● Autonomous, Invariance
○ 이 변수에만 intervention을 해서 이 메커니즘만 제거하고 싶다! 다른 메커니즘에는
영향을 주지 않고(동일한 세팅) intervention을 하고 싶다!
● Causal factorization
○
○ 한 메커니즘을 바꿔도 다른 메커니즘은 바뀌지 않는다.
○ 한 메커니즘을 안다고 다른 메커니즘에 대해 알 수 있는 것은 아니다.
17
4. Sparse Mechanism Shift (SMS)
Small distribution changes tend to manifest themselves in a sparse or local way in the
causal/disentangled factorization, i.e., they should usually not affect all factors
simultaneously.
18
4. Sparse Mechanism Shift (SMS)
● 한 메커니즘을변경시켜도다른 메커니즘들에영향을주지 않는다.
● Algorithmic independence
○ Kolmogorov complexity
○ Mutual algorithmic information로 메커니즘의독립성을계산할수 있다.
19
5. Causal Discovery and Machine Learning
● Assumptions about function class
○ faithfulness와 같은 가정을 도입하여 causal discovery를 할 수 있다.
○ 하지만 practical하게 function complexity에 대한 제약을 가할 필요가 있다.
○ ANM (Additive Noise Model)
20
III. What can we do further?
21
6. Learning Causal Variables: Problems of ML
1. Learning Disentangled Representations
○ Disentangled representation
22
1. Learning Disentangled Representations
6. Learning Causal Variables: Problems of ML
23
source: paper
6. Learning Causal Variables: Problems of ML
2. Learning Transferable Mechanisms
○ reusable models
○ 빛이 변해서 (이미지) 데이터 분포가 바뀔 수 있다면, 이런 것들도 고려할 수 있어야
한다.
24
6. Learning Causal Variables: Problems of ML
3. Learning interventional world models and reasoning
○ Konrad Lorenz’ notion of thinking as acting in an imagined space
○ 궁극적으로, 어떤 action을 반영하고 대체 시나리오를 상상해볼 수 있도록 만드는
key가 causal representation learning일거야!
25
7. Implications for ML
1. Semi-Supervised Learning
○ Semi-supervised learning을 위해선 causal direction이 아닌 anticausal
direction(effect가 input이고 cause가 output인 경우)에서만 가능하다.
26
source: paper
causal direction anti-causal direction
7. Implications for ML
2. Adversarial Vulnerability
○ causal direction은 classifier가 adversarial attack에 취약한지 여부에 영향을 준다.
27
7. Implications for ML
3. Robustness and strong generalization
○ Out-of-distribution generalization은 out-of-distribution dataset에서도 empirical
risk를 minimize하는 것
○ Robust prediction은 OOD risk의 maximum을 minimize하는 것
28
7. Implications for ML
4. Pre-training, Data Augmentation, Self-supervision
5. Reinforcement Learning
6. Scientific Applications
7. Multi-Task Learning and Continual Learning
29
8. Future Work
● Learning non-linear causal relations at scale
● Learning causal variables
● Understanding the biases of existing DL approaches
● Learning causally correct models for the world and the agent
30
Q&A
31

Towards Causal Representation Learning

  • 1.
  • 2.
    Towards Causal RepresentationLearning ● Authors ○ Bernhard Schölkopf ○ Francesco Locatello ○ Stefan Bauer ○ Nan Rosemary Ke ○ Nal Kalchbrenner ○ Anirudh Goyal ○ Yoshua Bengio ● Paper ● Talks: Yoshua’s talk, Francesco’s talk 2 source: https://www.is.mpg.de/~bs , https://yoshuabengio.org/ Yoshua Bengio Bernhard Schölkopf
  • 3.
    I. Intro toCausal Learning 3
  • 4.
    1. Machine Learning ●기존 Machine Learning의 문제점 ○ i.i.d. 가정이 generalization을 어렵게 한다. ○ distribution shift 상황에서 robust하게 학습이 어렵다. ○ 새로운 시나리오에서도 기존의 모델을 (재)사용하기 어렵다. ○ causality를 모른다. 4
  • 5.
    1. Machine Learning 5 U SB ? X source: https://www.nature.com/articles/332495a0.pdf
  • 6.
    6 Classification: cows vs.camels A camel? source: https://bayesgroup.github.io/bmml_sem/2019/Kodryan_Invariant%20Risk%20Minimization.pdf Spurious Correlation! 1. Machine Learning
  • 7.
    1. Machine Learningwith Causality ● data generating process를 structural하게 표현하고 intervention을 적용함으로써 이 문제점들(distribution shifts, …)을 해결할 수 있다! 7 intervention ● 기존 Machine Learning의 문제점 ○ i.i.d. 가정이 generalization을 어렵게 한다. ○ distribution shift 상황에서 robust하게 학습이 어렵다. ○ 새로운 시나리오에서도 기존의 모델을 (재)사용하기 어렵다. ○ causality를 모른다.
  • 8.
    2. Levels ofCausal Modeling ● Related questions: ○ What is the probability that this particular image contains a dog? ○ What is the probability of heart failure given certain diagnostic measurements (e.g., blood pressure) carried out on a patient? 8
  • 9.
    2. Levels ofCausal Modeling ● Related questions: ○ Is increasing the number of storks in a country going to boost its human birth rate? ○ Would fewer people smoke if cigarettes were more socially stigmatized? 9
  • 10.
    2. Levels ofCausal Modeling ● Related questions: ○ Counterfactual: Would a given patient have suffered heart failure if they had started exercising a year earlier? ○ Intervention: how does the probability of heart failure change if we convince a patient to exercise regularly? 10
  • 11.
    2. Levels ofCausal Modeling ● Learning from data ○ Observational vs Interventional ○ Unstructured vs Structured 11
  • 12.
  • 13.
    3. Intervention ● SCM; ● Intervention: 를 바꾸거나, 를 상수로지정하거나, 의 functional form을 바꾸는것 ○ Hard(perfect) intervention: ○ Soft(imperfect) intervention: ○ Uncertain: intervention으로 어떤 메커니즘이나 변수가 바뀔지 모르는 것 13
  • 14.
    3. SCMs: StructuralCausal Models ● Intervention 14 3. Intervention source: paper
  • 15.
    3. The ReichenbachPrinciple (Common Cause Principle) If two observables X and Y are statistically dependent, then there exists a variable Z that causally influences both and explains all the dependence in the sense of making them independent when conditioned on Z. 15
  • 16.
    4. Independent CausalMechanisms (ICM) The causal generative process of a system’s variables is composed of autonomous modules that do not inform or influence each other. In the probabilistic case, this means that the conditional distribution of each variable given its causes (i.e., its mechanism) does not inform or influence the other mechanisms. 16
  • 17.
    4. Independent CausalMechanisms (ICM) ● Autonomous, Invariance ○ 이 변수에만 intervention을 해서 이 메커니즘만 제거하고 싶다! 다른 메커니즘에는 영향을 주지 않고(동일한 세팅) intervention을 하고 싶다! ● Causal factorization ○ ○ 한 메커니즘을 바꿔도 다른 메커니즘은 바뀌지 않는다. ○ 한 메커니즘을 안다고 다른 메커니즘에 대해 알 수 있는 것은 아니다. 17
  • 18.
    4. Sparse MechanismShift (SMS) Small distribution changes tend to manifest themselves in a sparse or local way in the causal/disentangled factorization, i.e., they should usually not affect all factors simultaneously. 18
  • 19.
    4. Sparse MechanismShift (SMS) ● 한 메커니즘을변경시켜도다른 메커니즘들에영향을주지 않는다. ● Algorithmic independence ○ Kolmogorov complexity ○ Mutual algorithmic information로 메커니즘의독립성을계산할수 있다. 19
  • 20.
    5. Causal Discoveryand Machine Learning ● Assumptions about function class ○ faithfulness와 같은 가정을 도입하여 causal discovery를 할 수 있다. ○ 하지만 practical하게 function complexity에 대한 제약을 가할 필요가 있다. ○ ANM (Additive Noise Model) 20
  • 21.
    III. What canwe do further? 21
  • 22.
    6. Learning CausalVariables: Problems of ML 1. Learning Disentangled Representations ○ Disentangled representation 22
  • 23.
    1. Learning DisentangledRepresentations 6. Learning Causal Variables: Problems of ML 23 source: paper
  • 24.
    6. Learning CausalVariables: Problems of ML 2. Learning Transferable Mechanisms ○ reusable models ○ 빛이 변해서 (이미지) 데이터 분포가 바뀔 수 있다면, 이런 것들도 고려할 수 있어야 한다. 24
  • 25.
    6. Learning CausalVariables: Problems of ML 3. Learning interventional world models and reasoning ○ Konrad Lorenz’ notion of thinking as acting in an imagined space ○ 궁극적으로, 어떤 action을 반영하고 대체 시나리오를 상상해볼 수 있도록 만드는 key가 causal representation learning일거야! 25
  • 26.
    7. Implications forML 1. Semi-Supervised Learning ○ Semi-supervised learning을 위해선 causal direction이 아닌 anticausal direction(effect가 input이고 cause가 output인 경우)에서만 가능하다. 26 source: paper causal direction anti-causal direction
  • 27.
    7. Implications forML 2. Adversarial Vulnerability ○ causal direction은 classifier가 adversarial attack에 취약한지 여부에 영향을 준다. 27
  • 28.
    7. Implications forML 3. Robustness and strong generalization ○ Out-of-distribution generalization은 out-of-distribution dataset에서도 empirical risk를 minimize하는 것 ○ Robust prediction은 OOD risk의 maximum을 minimize하는 것 28
  • 29.
    7. Implications forML 4. Pre-training, Data Augmentation, Self-supervision 5. Reinforcement Learning 6. Scientific Applications 7. Multi-Task Learning and Continual Learning 29
  • 30.
    8. Future Work ●Learning non-linear causal relations at scale ● Learning causal variables ● Understanding the biases of existing DL approaches ● Learning causally correct models for the world and the agent 30
  • 31.