SlideShare a Scribd company logo
1 of 38
Download to read offline
Causal Confusion in Imitation Learning
Pim de Haan, Dinesh Jayaraman, and Sergey Levine (NeurIPS 2019)
Dongmin Lee
April, 2020
Causal Confusion in Imitation Learning April, 2020 1 / 38
Outline
1 Introduction
2 Causality and Causal Inference
3 Causality in Imitation Learning
4 Experiments Setting
5 Resolving Causal Misidentification
Causal Graph-Parameterized Policy Learning
Targeted Intervention
6 Experiments
Causal Confusion in Imitation Learning April, 2020 2 / 38
Introduction
Introduction
What is imitation learning?
Learning a policy from examples of expert behavior
The goal of imitation learning is to reproduce the expert’s demonstrations
with generalized intention
Causal Confusion in Imitation Learning April, 2020 3 / 38
Introduction
Introduction
Methods of imitation learning
1 Behavioral cloning
2 Inverse reinforcement learning
Causal Confusion in Imitation Learning April, 2020 4 / 38
Introduction
Introduction
Methods of imitation learning
1 Behavioral cloning - Selected!
2 Inverse reinforcement learning
Causal Confusion in Imitation Learning April, 2020 5 / 38
Introduction
Introduction
What is behavioral cloning?
Causal Confusion in Imitation Learning April, 2020 6 / 38
Introduction
Introduction
Fundamental problem of behavioral cloning: distributional shift
Training and testing state distributions are different, induced respectively
by the expert and learned policies
Causal Confusion in Imitation Learning April, 2020 7 / 38
Introduction
Introduction
What happens under distributional shift? causal misidentification
Behavioral cloning to learn to drive a car in two scenarios (tasks):
policy attends to indicators policy attends to pedestrian
More information yields worse performance Distinguishing correlates of
expert actions in the demonstration set from true causes is usually very
difficult
Causal Confusion in Imitation Learning April, 2020 8 / 38
Introduction
Introduction
What happens under distributional shift? causal misidentification
Behavioral cloning to learn to drive a car in two scenarios (tasks):
policy attends to indicators policy attends to pedestrian
More information yields worse performance Distinguishing correlates of
expert actions in the demonstration set from true causes is usually very
difficult
Causal Confusion in Imitation Learning April, 2020 9 / 38
Introduction
Introduction
What happens under distributional shift? causal misidentification
Behavioral cloning to learn to drive a car in two scenarios (tasks):
policy attends to indicators policy attends to pedestrian
More information yields worse performance
Distinguishing correlates of expert actions in demonstration set from true
causes is usually very difficult...
Causal Confusion in Imitation Learning April, 2020 10 / 38
Causality and Causal Inference
Causality and Causal Inference
What is causality and causal inference?
Causal Confusion in Imitation Learning April, 2020 11 / 38
Causality and Causal Inference
Causality and Causal Inference
Who am I?
Why do you think this picture is a cat?
Causal Confusion in Imitation Learning April, 2020 12 / 38
Causality and Causal Inference
Causality and Causal Inference
Who am I?
Why do you think this picture is a cat?
Causal Confusion in Imitation Learning April, 2020 13 / 38
Causality and Causal Inference
Causality and Causal Inference
What is causality?
Causality (also referred to as causation, or cause and effect) is efficacy
by which one event (a cause) contributes to the production of another
event (an effect)
The cause is partly responsible for the effect, and the effect is partly
dependent on the cause
What is causal inference?
Causal inference is general problem of deducing cause-effect
relationships among variables Causal discovery approaches allow
causal inference from pre-recorded observations under constraints
Causal Confusion in Imitation Learning April, 2020 14 / 38
Causality and Causal Inference
Causality and Causal Inference
What is causality?
Causality (also referred to as causation, or cause and effect) is efficacy
by which one event (a cause) contributes to the production of another
event (an effect)
The cause is partly responsible for the effect, and the effect is partly
dependent on the cause
What is causal inference?
Causal inference is general problem of deducing cause-effect
relationships among variables
Causal discovery approaches allow causal inference from pre-recorded
observations under constraints
Causal Confusion in Imitation Learning April, 2020 15 / 38
Causality and Causal Inference
Causality and Causal Inference
Yoshua Bengio: presentations at NeurIPS, 2019 and AAAI, 2020
Causal Confusion in Imitation Learning April, 2020 16 / 38
Causality and Causal Inference
Causality and Causal Inference
Yann LeCun: ”Lots of people in ML/DL know that causal inference is an
important way to improve generalization. The question is how to do it.”
Recent papers:
25 papers at NeurIPS 2018 workshop on causal learning 18 papers at
NeurIPS 2019 conference 13 papers at AAAI 2020 conference
=⇒ Currently hot topic of research in machine learning
Causal Confusion in Imitation Learning April, 2020 17 / 38
Causality and Causal Inference
Causality and Causal Inference
Yann LeCun: ”Lots of people in ML/DL know that causal inference is an
important way to improve generalization. The question is how to do it.”
Recent papers:
25 papers at NeurIPS 2018 workshop on causal learning
18 papers at NeurIPS 2019 conference
13 papers at AAAI 2020 conference
=⇒ Currently hot topic of research in machine learning!
Causal Confusion in Imitation Learning April, 2020 18 / 38
Causality and Causal Inference
Causality and Causal Inference
Concepts of causality:
Association vs Causation
Intervention and do-calculus
Causal Confusion in Imitation Learning April, 2020 19 / 38
Causality and Causal Inference
Causality and Causal Inference
Association: P(Y |X)
Is there correlation between two variables?
Is mutual information zero?
Statistical inference for observations of two variables
Causation: P(Y |do(X))
Is there causality between two variables?
Does one variable respond to intervention?
Randomized intervention for one variable
Causal Confusion in Imitation Learning April, 2020 20 / 38
Causality and Causal Inference
Causality and Causal Inference
Confounder (also confounding variable, confounding factor) in causality:
The cause of Y is X, and when there is confounder Z that affects X and
Y at the same time, confounding effect occurs We define that
P(Y |do(X)) = P(Y |X)
because observational quantity contains information about correlation
between X and Z, and interventional quantity does not (since X is not
correlated with Z)
Causal Confusion in Imitation Learning April, 2020 21 / 38
Causality and Causal Inference
Causality and Causal Inference
Confounder (also confounding variable, confounding factor) in causality:
The cause of Y is X, and when there is confounder Z that affects X and
Y at the same time, confounding effect occurs
We define that
P(Y |do(X)) = P(Y |X)
because observational quantity contains information about correlation
between X and Z, while interventional quantity does not (since X is not
correlated with Z in randomized intervention)
Causal Confusion in Imitation Learning April, 2020 22 / 38
Causality in Imitation Learning
Causality in Imitation Learning
Imitation learning (i.e., behavioral cloning) setting:
Xt
= [Xt
1, Xt
2, ..., Xt
n]: expert’s state observations at each time t
At
: expert’s actions
=⇒ The goal is to learn a mapping π from Xt
to At
using all (Xt
, At
) from the
expert’s demonstrations
Causal Confusion in Imitation Learning April, 2020 23 / 38
Causality in Imitation Learning
Causality in Imitation Learning
Causal structures:
Expert actions At
are influenced by some information in state
observations Xt
A confounder Zt
influences each state variable in Xt
Some unknown subset of disentangled factors of Xt
(causes) affect
expert actions, and the rest (nuisance variables) do not
Interventional query:
P(At
|do(Xt
))
Causal Confusion in Imitation Learning April, 2020 24 / 38
Experiments Setting
Experiments Setting
Benchmarks:
We add information about previous action, which tends to correlate with
current action in the expert data
Mountain Car
2 state dims
past action integer added
MuJoCo Hopper
11 state dims
3 past action dims added
Causal Confusion in Imitation Learning April, 2020 25 / 38
Experiments Setting
Experiments Setting
Benchmarks:
We add information about previous action, which tends to correlate with
current action in the expert data
30 state variables (disentangled observations)
setting to a disentangled representation by training β-VAE
Causal Confusion in Imitation Learning April, 2020 26 / 38
Experiments Setting
Experiments Setting
Demonstrating causal misidentification
Causal Confusion in Imitation Learning April, 2020 27 / 38
Resolving Causal Misidentification
Resolving Causal Misidentification
Proposed solutions:
1 Learn a policy that map from states to expert actions using randomized
interventional queries
2 Find the true causal intervention using the trained policy
Causal Confusion in Imitation Learning April, 2020 28 / 38
Resolving Causal Misidentification
Resolving Causal Misidentification
1 Causal graph-parametrized policy learning
2 Targeted intervention: find true graph
Expert queries
Rewards
Causal Confusion in Imitation Learning April, 2020 29 / 38
Resolving Causal Misidentification Causal Graph-Parameterized Policy Learning
Causal Graph-Parameterized Policy Learning
Causal graphs as masks
Each state variable Xi in X may either be a cause or not, so there are 2n
possible graphs
We parameterize causal graph G as a vector of n binary variables
Causal Confusion in Imitation Learning April, 2020 30 / 38
Resolving Causal Misidentification Causal Graph-Parameterized Policy Learning
Causal Graph-Parameterized Policy Learning
Graph-parameterized policy
A single graph-parameterized policy
πG(X) = fφ([X G, G])
where is element-wise multiplication, and [·, ·] denotes concatenation
Causal Confusion in Imitation Learning April, 2020 31 / 38
Resolving Causal Misidentification Causal Graph-Parameterized Policy Learning
Causal Graph-Parameterized Policy Learning
Parameters φ of policy network fφ are trained through gradient descent to
minimize:
EG[ (fφ([Xi G, G]), Ai)]
G is drawn uniformly as a bernoulli random vector over all 2n
graphs
: mean squared error loss or cross-entropy loss
Xi, Ai: observations and actions in batches
Causal Confusion in Imitation Learning April, 2020 32 / 38
Resolving Causal Misidentification Targeted Intervention
Targeted Intervention
Expert query mode and policy execution mode
Causal Confusion in Imitation Learning April, 2020 33 / 38
Experiments
Experiments
Intervention by policy execution on Atari games
Causal Confusion in Imitation Learning April, 2020 34 / 38
Experiments
Experiments
Intervention by policy execution on MountainCar and Hopper
Causal Confusion in Imitation Learning April, 2020 35 / 38
Experiments
Experiments
Interpreting the learned causal graph
samples from (top row) learned causal graph
and (bottom row) random causal graph on Pong
Causal Confusion in Imitation Learning April, 2020 36 / 38
Experiments
Experiments
Necessity of disentanglement
intervention on (dis)entangled MountainCar
Causal Confusion in Imitation Learning April, 2020 37 / 38
Thank You!
Any Questions?
Causal Confusion in Imitation Learning April, 2020 38 / 38

More Related Content

What's hot

An Introduction to Causal Discovery, a Bayesian Network Approach
An Introduction to Causal Discovery, a Bayesian Network ApproachAn Introduction to Causal Discovery, a Bayesian Network Approach
An Introduction to Causal Discovery, a Bayesian Network ApproachCOST action BM1006
 
A direct method for estimating linear non-Gaussian acyclic models
A direct method for estimating linear non-Gaussian acyclic modelsA direct method for estimating linear non-Gaussian acyclic models
A direct method for estimating linear non-Gaussian acyclic modelsShiga University, RIKEN
 
密度比推定による時系列データの異常検知
密度比推定による時系列データの異常検知密度比推定による時系列データの異常検知
密度比推定による時系列データの異常検知- Core Concept Technologies
 
機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual Talks機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual TalksYuya Unno
 
統計的因果推論への招待 -因果構造探索を中心に-
統計的因果推論への招待 -因果構造探索を中心に-統計的因果推論への招待 -因果構造探索を中心に-
統計的因果推論への招待 -因果構造探索を中心に-Shiga University, RIKEN
 
レコメンドエンジン作成コンテストの勝ち方
レコメンドエンジン作成コンテストの勝ち方レコメンドエンジン作成コンテストの勝ち方
レコメンドエンジン作成コンテストの勝ち方Shun Nukui
 
第8章 ガウス過程回帰による異常検知
第8章 ガウス過程回帰による異常検知第8章 ガウス過程回帰による異常検知
第8章 ガウス過程回帰による異常検知Chika Inoshita
 
MixMatch: A Holistic Approach to Semi- Supervised Learning
MixMatch: A Holistic Approach to Semi- Supervised LearningMixMatch: A Holistic Approach to Semi- Supervised Learning
MixMatch: A Holistic Approach to Semi- Supervised Learningharmonylab
 
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健Preferred Networks
 
ロジスティック回帰分析の入門 -予測モデル構築-
ロジスティック回帰分析の入門 -予測モデル構築-ロジスティック回帰分析の入門 -予測モデル構築-
ロジスティック回帰分析の入門 -予測モデル構築-Koichiro Gibo
 
[研究室論文紹介用スライド] Adversarial Contrastive Estimation
[研究室論文紹介用スライド] Adversarial Contrastive Estimation[研究室論文紹介用スライド] Adversarial Contrastive Estimation
[研究室論文紹介用スライド] Adversarial Contrastive EstimationMakoto Takenaka
 
Kaggleのテクニック
KaggleのテクニックKaggleのテクニック
KaggleのテクニックYasunori Ozaki
 
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)Preferred Networks
 
分割時系列解析(ITS)の入門
分割時系列解析(ITS)の入門分割時系列解析(ITS)の入門
分割時系列解析(ITS)の入門Koichiro Gibo
 
機械学習によるデータ分析まわりのお話
機械学習によるデータ分析まわりのお話機械学習によるデータ分析まわりのお話
機械学習によるデータ分析まわりのお話Ryota Kamoshida
 
不均衡データのクラス分類
不均衡データのクラス分類不均衡データのクラス分類
不均衡データのクラス分類Shintaro Fukushima
 
ニューラルネットと深層学習の歴史
ニューラルネットと深層学習の歴史ニューラルネットと深層学習の歴史
ニューラルネットと深層学習の歴史Akinori Abe
 
クラシックな機械学習の入門  11.評価方法
クラシックな機械学習の入門  11.評価方法クラシックな機械学習の入門  11.評価方法
クラシックな機械学習の入門  11.評価方法Hiroshi Nakagawa
 
20141220 tokyowebmining state_spacemodel
20141220 tokyowebmining state_spacemodel20141220 tokyowebmining state_spacemodel
20141220 tokyowebmining state_spacemodelKenny ISHIMURA
 
機械学習~データを予測に変える技術~で化学に挑む! (サイエンスアゴラ2021)
機械学習~データを予測に変える技術~で化学に挑む! (サイエンスアゴラ2021)機械学習~データを予測に変える技術~で化学に挑む! (サイエンスアゴラ2021)
機械学習~データを予測に変える技術~で化学に挑む! (サイエンスアゴラ2021)Ichigaku Takigawa
 

What's hot (20)

An Introduction to Causal Discovery, a Bayesian Network Approach
An Introduction to Causal Discovery, a Bayesian Network ApproachAn Introduction to Causal Discovery, a Bayesian Network Approach
An Introduction to Causal Discovery, a Bayesian Network Approach
 
A direct method for estimating linear non-Gaussian acyclic models
A direct method for estimating linear non-Gaussian acyclic modelsA direct method for estimating linear non-Gaussian acyclic models
A direct method for estimating linear non-Gaussian acyclic models
 
密度比推定による時系列データの異常検知
密度比推定による時系列データの異常検知密度比推定による時系列データの異常検知
密度比推定による時系列データの異常検知
 
機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual Talks機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual Talks
 
統計的因果推論への招待 -因果構造探索を中心に-
統計的因果推論への招待 -因果構造探索を中心に-統計的因果推論への招待 -因果構造探索を中心に-
統計的因果推論への招待 -因果構造探索を中心に-
 
レコメンドエンジン作成コンテストの勝ち方
レコメンドエンジン作成コンテストの勝ち方レコメンドエンジン作成コンテストの勝ち方
レコメンドエンジン作成コンテストの勝ち方
 
第8章 ガウス過程回帰による異常検知
第8章 ガウス過程回帰による異常検知第8章 ガウス過程回帰による異常検知
第8章 ガウス過程回帰による異常検知
 
MixMatch: A Holistic Approach to Semi- Supervised Learning
MixMatch: A Holistic Approach to Semi- Supervised LearningMixMatch: A Holistic Approach to Semi- Supervised Learning
MixMatch: A Holistic Approach to Semi- Supervised Learning
 
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
PyData.Tokyo Meetup #21 講演資料「Optuna ハイパーパラメータ最適化フレームワーク」太田 健
 
ロジスティック回帰分析の入門 -予測モデル構築-
ロジスティック回帰分析の入門 -予測モデル構築-ロジスティック回帰分析の入門 -予測モデル構築-
ロジスティック回帰分析の入門 -予測モデル構築-
 
[研究室論文紹介用スライド] Adversarial Contrastive Estimation
[研究室論文紹介用スライド] Adversarial Contrastive Estimation[研究室論文紹介用スライド] Adversarial Contrastive Estimation
[研究室論文紹介用スライド] Adversarial Contrastive Estimation
 
Kaggleのテクニック
KaggleのテクニックKaggleのテクニック
Kaggleのテクニック
 
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
自然言語処理を 役立てるのはなぜ難しいのか(2022/10/25東大大学院「自然言語処理応用」)
 
分割時系列解析(ITS)の入門
分割時系列解析(ITS)の入門分割時系列解析(ITS)の入門
分割時系列解析(ITS)の入門
 
機械学習によるデータ分析まわりのお話
機械学習によるデータ分析まわりのお話機械学習によるデータ分析まわりのお話
機械学習によるデータ分析まわりのお話
 
不均衡データのクラス分類
不均衡データのクラス分類不均衡データのクラス分類
不均衡データのクラス分類
 
ニューラルネットと深層学習の歴史
ニューラルネットと深層学習の歴史ニューラルネットと深層学習の歴史
ニューラルネットと深層学習の歴史
 
クラシックな機械学習の入門  11.評価方法
クラシックな機械学習の入門  11.評価方法クラシックな機械学習の入門  11.評価方法
クラシックな機械学習の入門  11.評価方法
 
20141220 tokyowebmining state_spacemodel
20141220 tokyowebmining state_spacemodel20141220 tokyowebmining state_spacemodel
20141220 tokyowebmining state_spacemodel
 
機械学習~データを予測に変える技術~で化学に挑む! (サイエンスアゴラ2021)
機械学習~データを予測に変える技術~で化学に挑む! (サイエンスアゴラ2021)機械学習~データを予測に変える技術~で化学に挑む! (サイエンスアゴラ2021)
機械学習~データを予測に変える技術~で化学に挑む! (サイエンスアゴラ2021)
 

Similar to Causal Confusion in Imitation Learning

Causal Confusion in Imitation Learning
Causal Confusion in Imitation LearningCausal Confusion in Imitation Learning
Causal Confusion in Imitation LearningDongmin Lee
 
Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...
Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...
Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...SYRTO Project
 
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, a...
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, a...Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, a...
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, a...Toshihiro Kamishima
 
Causality for Policy Assessment and 
Impact Analysis
Causality for Policy Assessment and 
Impact AnalysisCausality for Policy Assessment and 
Impact Analysis
Causality for Policy Assessment and 
Impact AnalysisBayesia USA
 
Impact Analysis V12
Impact Analysis V12Impact Analysis V12
Impact Analysis V12Bayesia USA
 
GWT International Conference 2022 - Practice that transforms intergenerationa...
GWT International Conference 2022 - Practice that transforms intergenerationa...GWT International Conference 2022 - Practice that transforms intergenerationa...
GWT International Conference 2022 - Practice that transforms intergenerationa...Alison Clyde
 
Causal inference in practice: Here, there, causality is everywhere
Causal inference in practice: Here, there, causality is everywhereCausal inference in practice: Here, there, causality is everywhere
Causal inference in practice: Here, there, causality is everywhereAmit Sharma
 
M2 l10 fairness, accountability, and transparency
M2 l10 fairness, accountability, and transparencyM2 l10 fairness, accountability, and transparency
M2 l10 fairness, accountability, and transparencyBoPeng76
 
Relationship Between Monetary Policy And The Stock Market
Relationship Between Monetary Policy And The Stock MarketRelationship Between Monetary Policy And The Stock Market
Relationship Between Monetary Policy And The Stock MarketCasey Hudson
 
M.E.Bontempi-Panel data: Models, estimation,and the role of attrition and Mea...
M.E.Bontempi-Panel data: Models, estimation,and the role of attrition and Mea...M.E.Bontempi-Panel data: Models, estimation,and the role of attrition and Mea...
M.E.Bontempi-Panel data: Models, estimation,and the role of attrition and Mea...Istituto nazionale di statistica
 
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docxfelicidaddinwoodie
 
Questioning our attitudes and feelings towards Persuasive Technology PT2019
Questioning our attitudes and feelings towards Persuasive Technology PT2019Questioning our attitudes and feelings towards Persuasive Technology PT2019
Questioning our attitudes and feelings towards Persuasive Technology PT2019Robby van Delden
 
Algorithmic Bias In Education
Algorithmic Bias In EducationAlgorithmic Bias In Education
Algorithmic Bias In EducationAudrey Britton
 
Pol soc 360_01_research_design
Pol soc 360_01_research_designPol soc 360_01_research_design
Pol soc 360_01_research_designatrantham
 
Systemic and Systematic risk - Monica Billio, Massimiliano Caporin, Roberto P...
Systemic and Systematic risk - Monica Billio, Massimiliano Caporin, Roberto P...Systemic and Systematic risk - Monica Billio, Massimiliano Caporin, Roberto P...
Systemic and Systematic risk - Monica Billio, Massimiliano Caporin, Roberto P...SYRTO Project
 
Desirable Properties of Parameterizations of Rationality in Rizzo and Whitman...
Desirable Properties of Parameterizations of Rationality in Rizzo and Whitman...Desirable Properties of Parameterizations of Rationality in Rizzo and Whitman...
Desirable Properties of Parameterizations of Rationality in Rizzo and Whitman...Oghenovo Obrimah
 
Causal inference in practice
Causal inference in practiceCausal inference in practice
Causal inference in practiceAmit Sharma
 
08 Inference for Networks – DYAD Model Overview (2017)
08 Inference for Networks – DYAD Model Overview (2017)08 Inference for Networks – DYAD Model Overview (2017)
08 Inference for Networks – DYAD Model Overview (2017)Duke Network Analysis Center
 

Similar to Causal Confusion in Imitation Learning (20)

Causal Confusion in Imitation Learning
Causal Confusion in Imitation LearningCausal Confusion in Imitation Learning
Causal Confusion in Imitation Learning
 
Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...
Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...
Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...
 
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, a...
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, a...Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, a...
Future Directions of Fairness-Aware Data Mining: Recommendation, Causality, a...
 
Causality for Policy Assessment and 
Impact Analysis
Causality for Policy Assessment and 
Impact AnalysisCausality for Policy Assessment and 
Impact Analysis
Causality for Policy Assessment and 
Impact Analysis
 
Impact Analysis V12
Impact Analysis V12Impact Analysis V12
Impact Analysis V12
 
GWT International Conference 2022 - Practice that transforms intergenerationa...
GWT International Conference 2022 - Practice that transforms intergenerationa...GWT International Conference 2022 - Practice that transforms intergenerationa...
GWT International Conference 2022 - Practice that transforms intergenerationa...
 
Causal inference in practice: Here, there, causality is everywhere
Causal inference in practice: Here, there, causality is everywhereCausal inference in practice: Here, there, causality is everywhere
Causal inference in practice: Here, there, causality is everywhere
 
M2 l10 fairness, accountability, and transparency
M2 l10 fairness, accountability, and transparencyM2 l10 fairness, accountability, and transparency
M2 l10 fairness, accountability, and transparency
 
Relationship Between Monetary Policy And The Stock Market
Relationship Between Monetary Policy And The Stock MarketRelationship Between Monetary Policy And The Stock Market
Relationship Between Monetary Policy And The Stock Market
 
M.E.Bontempi-Panel data: Models, estimation,and the role of attrition and Mea...
M.E.Bontempi-Panel data: Models, estimation,and the role of attrition and Mea...M.E.Bontempi-Panel data: Models, estimation,and the role of attrition and Mea...
M.E.Bontempi-Panel data: Models, estimation,and the role of attrition and Mea...
 
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
1PPA 670 Public Policy AnalysisBasic Policy Terms an.docx
 
Questioning our attitudes and feelings towards Persuasive Technology PT2019
Questioning our attitudes and feelings towards Persuasive Technology PT2019Questioning our attitudes and feelings towards Persuasive Technology PT2019
Questioning our attitudes and feelings towards Persuasive Technology PT2019
 
Algorithmic Bias In Education
Algorithmic Bias In EducationAlgorithmic Bias In Education
Algorithmic Bias In Education
 
01-research-r.pdf
01-research-r.pdf01-research-r.pdf
01-research-r.pdf
 
Pol soc 360_01_research_design
Pol soc 360_01_research_designPol soc 360_01_research_design
Pol soc 360_01_research_design
 
Conceptual Challenges in Big Data Practices
Conceptual Challenges in Big Data PracticesConceptual Challenges in Big Data Practices
Conceptual Challenges in Big Data Practices
 
Systemic and Systematic risk - Monica Billio, Massimiliano Caporin, Roberto P...
Systemic and Systematic risk - Monica Billio, Massimiliano Caporin, Roberto P...Systemic and Systematic risk - Monica Billio, Massimiliano Caporin, Roberto P...
Systemic and Systematic risk - Monica Billio, Massimiliano Caporin, Roberto P...
 
Desirable Properties of Parameterizations of Rationality in Rizzo and Whitman...
Desirable Properties of Parameterizations of Rationality in Rizzo and Whitman...Desirable Properties of Parameterizations of Rationality in Rizzo and Whitman...
Desirable Properties of Parameterizations of Rationality in Rizzo and Whitman...
 
Causal inference in practice
Causal inference in practiceCausal inference in practice
Causal inference in practice
 
08 Inference for Networks – DYAD Model Overview (2017)
08 Inference for Networks – DYAD Model Overview (2017)08 Inference for Networks – DYAD Model Overview (2017)
08 Inference for Networks – DYAD Model Overview (2017)
 

More from Dongmin Lee

Character Controllers using Motion VAEs
Character Controllers using Motion VAEsCharacter Controllers using Motion VAEs
Character Controllers using Motion VAEsDongmin Lee
 
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Dongmin Lee
 
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...Dongmin Lee
 
Exploration Strategies in Reinforcement Learning
Exploration Strategies in Reinforcement LearningExploration Strategies in Reinforcement Learning
Exploration Strategies in Reinforcement LearningDongmin Lee
 
Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)Dongmin Lee
 
Let's do Inverse RL
Let's do Inverse RLLet's do Inverse RL
Let's do Inverse RLDongmin Lee
 
모두를 위한 PG여행 가이드
모두를 위한 PG여행 가이드모두를 위한 PG여행 가이드
모두를 위한 PG여행 가이드Dongmin Lee
 
Safe Reinforcement Learning
Safe Reinforcement LearningSafe Reinforcement Learning
Safe Reinforcement LearningDongmin Lee
 
안.전.제.일. 강화학습!
안.전.제.일. 강화학습!안.전.제.일. 강화학습!
안.전.제.일. 강화학습!Dongmin Lee
 
Planning and Learning with Tabular Methods
Planning and Learning with Tabular MethodsPlanning and Learning with Tabular Methods
Planning and Learning with Tabular MethodsDongmin Lee
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed BanditsDongmin Lee
 
강화학습 알고리즘의 흐름도 Part 2
강화학습 알고리즘의 흐름도 Part 2강화학습 알고리즘의 흐름도 Part 2
강화학습 알고리즘의 흐름도 Part 2Dongmin Lee
 
강화학습의 흐름도 Part 1
강화학습의 흐름도 Part 1강화학습의 흐름도 Part 1
강화학습의 흐름도 Part 1Dongmin Lee
 
강화학습의 개요
강화학습의 개요강화학습의 개요
강화학습의 개요Dongmin Lee
 

More from Dongmin Lee (14)

Character Controllers using Motion VAEs
Character Controllers using Motion VAEsCharacter Controllers using Motion VAEs
Character Controllers using Motion VAEs
 
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
 
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...
 
Exploration Strategies in Reinforcement Learning
Exploration Strategies in Reinforcement LearningExploration Strategies in Reinforcement Learning
Exploration Strategies in Reinforcement Learning
 
Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)
 
Let's do Inverse RL
Let's do Inverse RLLet's do Inverse RL
Let's do Inverse RL
 
모두를 위한 PG여행 가이드
모두를 위한 PG여행 가이드모두를 위한 PG여행 가이드
모두를 위한 PG여행 가이드
 
Safe Reinforcement Learning
Safe Reinforcement LearningSafe Reinforcement Learning
Safe Reinforcement Learning
 
안.전.제.일. 강화학습!
안.전.제.일. 강화학습!안.전.제.일. 강화학습!
안.전.제.일. 강화학습!
 
Planning and Learning with Tabular Methods
Planning and Learning with Tabular MethodsPlanning and Learning with Tabular Methods
Planning and Learning with Tabular Methods
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
강화학습 알고리즘의 흐름도 Part 2
강화학습 알고리즘의 흐름도 Part 2강화학습 알고리즘의 흐름도 Part 2
강화학습 알고리즘의 흐름도 Part 2
 
강화학습의 흐름도 Part 1
강화학습의 흐름도 Part 1강화학습의 흐름도 Part 1
강화학습의 흐름도 Part 1
 
강화학습의 개요
강화학습의 개요강화학습의 개요
강화학습의 개요
 

Recently uploaded

Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork
 
AntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptxAntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptxLina Kadam
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfShreyas Pandit
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsResearcher Researcher
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosVictor Morales
 
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdfLivre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdfsaad175691
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Sumanth A
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewsandhya757531
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.elesangwon
 
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...arifengg7
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProRay Yuan Liu
 
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
Secure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech LabsSecure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech Labsamber724300
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSneha Padhiar
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Romil Mishra
 
Structural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot MuiliStructural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot MuiliNimot Muili
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTSneha Padhiar
 
The Satellite applications in telecommunication
The Satellite applications in telecommunicationThe Satellite applications in telecommunication
The Satellite applications in telecommunicationnovrain7111
 
Introduction to Machine Learning Part1.pptx
Introduction to Machine Learning Part1.pptxIntroduction to Machine Learning Part1.pptx
Introduction to Machine Learning Part1.pptxPavan Mohan Neelamraju
 
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...IJAEMSJORNAL
 

Recently uploaded (20)

Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
 
AntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptxAntColonyOptimizationManetNetworkAODV.pptx
AntColonyOptimizationManetNetworkAODV.pptx
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdf
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending Actuators
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitos
 
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdfLivre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf
Livre Implementing_Six_Sigma_and_Lean_A_prac([Ron_Basu]_).pdf
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overview
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
 
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
Analysis and Evaluation of Dal Lake Biomass for Conversion to Fuel/Green fert...
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision Pro
 
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
 
Secure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech LabsSecure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech Labs
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________
 
Structural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot MuiliStructural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
 
The Satellite applications in telecommunication
The Satellite applications in telecommunicationThe Satellite applications in telecommunication
The Satellite applications in telecommunication
 
Introduction to Machine Learning Part1.pptx
Introduction to Machine Learning Part1.pptxIntroduction to Machine Learning Part1.pptx
Introduction to Machine Learning Part1.pptx
 
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
 

Causal Confusion in Imitation Learning

  • 1. Causal Confusion in Imitation Learning Pim de Haan, Dinesh Jayaraman, and Sergey Levine (NeurIPS 2019) Dongmin Lee April, 2020 Causal Confusion in Imitation Learning April, 2020 1 / 38
  • 2. Outline 1 Introduction 2 Causality and Causal Inference 3 Causality in Imitation Learning 4 Experiments Setting 5 Resolving Causal Misidentification Causal Graph-Parameterized Policy Learning Targeted Intervention 6 Experiments Causal Confusion in Imitation Learning April, 2020 2 / 38
  • 3. Introduction Introduction What is imitation learning? Learning a policy from examples of expert behavior The goal of imitation learning is to reproduce the expert’s demonstrations with generalized intention Causal Confusion in Imitation Learning April, 2020 3 / 38
  • 4. Introduction Introduction Methods of imitation learning 1 Behavioral cloning 2 Inverse reinforcement learning Causal Confusion in Imitation Learning April, 2020 4 / 38
  • 5. Introduction Introduction Methods of imitation learning 1 Behavioral cloning - Selected! 2 Inverse reinforcement learning Causal Confusion in Imitation Learning April, 2020 5 / 38
  • 6. Introduction Introduction What is behavioral cloning? Causal Confusion in Imitation Learning April, 2020 6 / 38
  • 7. Introduction Introduction Fundamental problem of behavioral cloning: distributional shift Training and testing state distributions are different, induced respectively by the expert and learned policies Causal Confusion in Imitation Learning April, 2020 7 / 38
  • 8. Introduction Introduction What happens under distributional shift? causal misidentification Behavioral cloning to learn to drive a car in two scenarios (tasks): policy attends to indicators policy attends to pedestrian More information yields worse performance Distinguishing correlates of expert actions in the demonstration set from true causes is usually very difficult Causal Confusion in Imitation Learning April, 2020 8 / 38
  • 9. Introduction Introduction What happens under distributional shift? causal misidentification Behavioral cloning to learn to drive a car in two scenarios (tasks): policy attends to indicators policy attends to pedestrian More information yields worse performance Distinguishing correlates of expert actions in the demonstration set from true causes is usually very difficult Causal Confusion in Imitation Learning April, 2020 9 / 38
  • 10. Introduction Introduction What happens under distributional shift? causal misidentification Behavioral cloning to learn to drive a car in two scenarios (tasks): policy attends to indicators policy attends to pedestrian More information yields worse performance Distinguishing correlates of expert actions in demonstration set from true causes is usually very difficult... Causal Confusion in Imitation Learning April, 2020 10 / 38
  • 11. Causality and Causal Inference Causality and Causal Inference What is causality and causal inference? Causal Confusion in Imitation Learning April, 2020 11 / 38
  • 12. Causality and Causal Inference Causality and Causal Inference Who am I? Why do you think this picture is a cat? Causal Confusion in Imitation Learning April, 2020 12 / 38
  • 13. Causality and Causal Inference Causality and Causal Inference Who am I? Why do you think this picture is a cat? Causal Confusion in Imitation Learning April, 2020 13 / 38
  • 14. Causality and Causal Inference Causality and Causal Inference What is causality? Causality (also referred to as causation, or cause and effect) is efficacy by which one event (a cause) contributes to the production of another event (an effect) The cause is partly responsible for the effect, and the effect is partly dependent on the cause What is causal inference? Causal inference is general problem of deducing cause-effect relationships among variables Causal discovery approaches allow causal inference from pre-recorded observations under constraints Causal Confusion in Imitation Learning April, 2020 14 / 38
  • 15. Causality and Causal Inference Causality and Causal Inference What is causality? Causality (also referred to as causation, or cause and effect) is efficacy by which one event (a cause) contributes to the production of another event (an effect) The cause is partly responsible for the effect, and the effect is partly dependent on the cause What is causal inference? Causal inference is general problem of deducing cause-effect relationships among variables Causal discovery approaches allow causal inference from pre-recorded observations under constraints Causal Confusion in Imitation Learning April, 2020 15 / 38
  • 16. Causality and Causal Inference Causality and Causal Inference Yoshua Bengio: presentations at NeurIPS, 2019 and AAAI, 2020 Causal Confusion in Imitation Learning April, 2020 16 / 38
  • 17. Causality and Causal Inference Causality and Causal Inference Yann LeCun: ”Lots of people in ML/DL know that causal inference is an important way to improve generalization. The question is how to do it.” Recent papers: 25 papers at NeurIPS 2018 workshop on causal learning 18 papers at NeurIPS 2019 conference 13 papers at AAAI 2020 conference =⇒ Currently hot topic of research in machine learning Causal Confusion in Imitation Learning April, 2020 17 / 38
  • 18. Causality and Causal Inference Causality and Causal Inference Yann LeCun: ”Lots of people in ML/DL know that causal inference is an important way to improve generalization. The question is how to do it.” Recent papers: 25 papers at NeurIPS 2018 workshop on causal learning 18 papers at NeurIPS 2019 conference 13 papers at AAAI 2020 conference =⇒ Currently hot topic of research in machine learning! Causal Confusion in Imitation Learning April, 2020 18 / 38
  • 19. Causality and Causal Inference Causality and Causal Inference Concepts of causality: Association vs Causation Intervention and do-calculus Causal Confusion in Imitation Learning April, 2020 19 / 38
  • 20. Causality and Causal Inference Causality and Causal Inference Association: P(Y |X) Is there correlation between two variables? Is mutual information zero? Statistical inference for observations of two variables Causation: P(Y |do(X)) Is there causality between two variables? Does one variable respond to intervention? Randomized intervention for one variable Causal Confusion in Imitation Learning April, 2020 20 / 38
  • 21. Causality and Causal Inference Causality and Causal Inference Confounder (also confounding variable, confounding factor) in causality: The cause of Y is X, and when there is confounder Z that affects X and Y at the same time, confounding effect occurs We define that P(Y |do(X)) = P(Y |X) because observational quantity contains information about correlation between X and Z, and interventional quantity does not (since X is not correlated with Z) Causal Confusion in Imitation Learning April, 2020 21 / 38
  • 22. Causality and Causal Inference Causality and Causal Inference Confounder (also confounding variable, confounding factor) in causality: The cause of Y is X, and when there is confounder Z that affects X and Y at the same time, confounding effect occurs We define that P(Y |do(X)) = P(Y |X) because observational quantity contains information about correlation between X and Z, while interventional quantity does not (since X is not correlated with Z in randomized intervention) Causal Confusion in Imitation Learning April, 2020 22 / 38
  • 23. Causality in Imitation Learning Causality in Imitation Learning Imitation learning (i.e., behavioral cloning) setting: Xt = [Xt 1, Xt 2, ..., Xt n]: expert’s state observations at each time t At : expert’s actions =⇒ The goal is to learn a mapping π from Xt to At using all (Xt , At ) from the expert’s demonstrations Causal Confusion in Imitation Learning April, 2020 23 / 38
  • 24. Causality in Imitation Learning Causality in Imitation Learning Causal structures: Expert actions At are influenced by some information in state observations Xt A confounder Zt influences each state variable in Xt Some unknown subset of disentangled factors of Xt (causes) affect expert actions, and the rest (nuisance variables) do not Interventional query: P(At |do(Xt )) Causal Confusion in Imitation Learning April, 2020 24 / 38
  • 25. Experiments Setting Experiments Setting Benchmarks: We add information about previous action, which tends to correlate with current action in the expert data Mountain Car 2 state dims past action integer added MuJoCo Hopper 11 state dims 3 past action dims added Causal Confusion in Imitation Learning April, 2020 25 / 38
  • 26. Experiments Setting Experiments Setting Benchmarks: We add information about previous action, which tends to correlate with current action in the expert data 30 state variables (disentangled observations) setting to a disentangled representation by training β-VAE Causal Confusion in Imitation Learning April, 2020 26 / 38
  • 27. Experiments Setting Experiments Setting Demonstrating causal misidentification Causal Confusion in Imitation Learning April, 2020 27 / 38
  • 28. Resolving Causal Misidentification Resolving Causal Misidentification Proposed solutions: 1 Learn a policy that map from states to expert actions using randomized interventional queries 2 Find the true causal intervention using the trained policy Causal Confusion in Imitation Learning April, 2020 28 / 38
  • 29. Resolving Causal Misidentification Resolving Causal Misidentification 1 Causal graph-parametrized policy learning 2 Targeted intervention: find true graph Expert queries Rewards Causal Confusion in Imitation Learning April, 2020 29 / 38
  • 30. Resolving Causal Misidentification Causal Graph-Parameterized Policy Learning Causal Graph-Parameterized Policy Learning Causal graphs as masks Each state variable Xi in X may either be a cause or not, so there are 2n possible graphs We parameterize causal graph G as a vector of n binary variables Causal Confusion in Imitation Learning April, 2020 30 / 38
  • 31. Resolving Causal Misidentification Causal Graph-Parameterized Policy Learning Causal Graph-Parameterized Policy Learning Graph-parameterized policy A single graph-parameterized policy πG(X) = fφ([X G, G]) where is element-wise multiplication, and [·, ·] denotes concatenation Causal Confusion in Imitation Learning April, 2020 31 / 38
  • 32. Resolving Causal Misidentification Causal Graph-Parameterized Policy Learning Causal Graph-Parameterized Policy Learning Parameters φ of policy network fφ are trained through gradient descent to minimize: EG[ (fφ([Xi G, G]), Ai)] G is drawn uniformly as a bernoulli random vector over all 2n graphs : mean squared error loss or cross-entropy loss Xi, Ai: observations and actions in batches Causal Confusion in Imitation Learning April, 2020 32 / 38
  • 33. Resolving Causal Misidentification Targeted Intervention Targeted Intervention Expert query mode and policy execution mode Causal Confusion in Imitation Learning April, 2020 33 / 38
  • 34. Experiments Experiments Intervention by policy execution on Atari games Causal Confusion in Imitation Learning April, 2020 34 / 38
  • 35. Experiments Experiments Intervention by policy execution on MountainCar and Hopper Causal Confusion in Imitation Learning April, 2020 35 / 38
  • 36. Experiments Experiments Interpreting the learned causal graph samples from (top row) learned causal graph and (bottom row) random causal graph on Pong Causal Confusion in Imitation Learning April, 2020 36 / 38
  • 37. Experiments Experiments Necessity of disentanglement intervention on (dis)entangled MountainCar Causal Confusion in Imitation Learning April, 2020 37 / 38
  • 38. Thank You! Any Questions? Causal Confusion in Imitation Learning April, 2020 38 / 38