SlideShare a Scribd company logo
1 of 24
Download to read offline
Understanding Autoencoders
with Interventions
Felix Leeb – MPI-IS
Causality Discussion Group – 8 March 2023
Representation Learning
• The fundamental claim of representation learning is that our
problem can be better solved using a different (smaller) space
than the input (ambient) space → the Manifold Hypothesis
• So, break down our solution into two pieces:
1. Organize input into a more useful form → learn a representation
2. Focusing on what is left → solving the actual problem/s (not shown)
• In other words, if the input lives in , we only need , where
2
Understanding the representation
• High-level features of good representations:
❑Extensible – easily integrate expert knowledge
❑Compact – efficient time and space complexity
❑Extrapolate – generalize on a semantic level
❑Robust – not sensitive to unimportant changes
❑Self-aware – estimates uncertainties
• Why bother with story-telling, when the performance is what matters?
• Form connections with past work → educational
• Identify weaknesses and motivate improvements → innovative
3
Key question: On the quest for good representations, how can we make sense of what we have?
The Mythos of Model Interpretability
by Zachary Lipton (2017)
Disentanglement (the obvious)
• With the manifold hypothesis, we assumed there are a small
number of underlying factors that give rise to the observation, so
how about the representation just disentangles those factors?
• Simple inductive bias: maximize statistical independence
between latent variables to ensure there’s no overlapping
information
• What if the factors are not statistically independent?
What about non-trivial variable structure?
4
Example from Yoshua Bengio: a fork and knife are not statistically
independent, but can however be separately manipulated.
β-VAE, FVAE, DIP-VAE, TC-VAE, β-TC-VAE, etc.
Causality: Genuinely predictive models
• Statistical models identify patterns in
the dataset, but these correlations
may be spurious → non-predictive!
• ICM Principle – although individual
factors may not be independent,
the true process to be is comprised
of independent mechanisms
(→ interventions in an SCM)
• However, without strong assumptions or supervision, the true causal variables
cannot be identified (much less the full mechanisms) → guarantees are unrealistic
5
Towards Causal Representation Learning by Schölkopf et al. (2021) Locatello et al. (2018; arXiv 1811.12359)
Guiding Principle: Interventional Consistency
7
noise
interventions
Identifiability problem: impossible to guarantee that the model learns the true causal drivers
The effect of each individual lever may be different, but if they are equivalent in aggregate,
then you can’t distinguish the “true” from the learned generative process.
→ Now let’s focus on the causal structure of our learned generative process?
noise
interventions
Setting the Scene
• I give you a trained (beta-)VAE using a
deep CNN (500k params). Nothing special.
• Trained on 3D-Shapes – synthetic process
with relatively small observations and
6 independent DOFs (no supervision).
• The true factors of variation are
independent, and we see disentanglement
…but is that the full story?
Ours:
Disentangled:
Curiosity #1:
Prior doesn’t match the aggregate posterior
• Prior (green) doesn’t match the
aggregate posterior (blue)
• Latent variables are not statistically
independent
• But maybe that’s the point – you
can only trust your regularization
objective so much.
Weird
Ms. Statistics
Curiosity #2:
Decoder extends beyond the training manifold
• Decoder can still generate sensible
samples beyond the aggregate
posterior → good for generative
modeling
• Decoder doesn’t just invert the
encoder, but is doing more work
“for free”.
Weird
Manifold Man
2D Latent Traversal
VAEs display some interventional consistency out of the box
→How can we use that?
Hypothesis: Latent Space vs Latent Manifold
Can we separate the semantic information S (→ necessary to reconstruct
the sample) from the any “exogenous” information U in the latent space
which the decoder ignores anyway?
For each observation, find a latent vector that makes the
subsequent reconstruction as good as possible (without
straying too far from the prior)
For each point in the latent space, place it as close to the
data manifold as possible consistent with the encoder
(for reconstruction)
Latent Responses
• We quantify the semantic change in the
sample by measuring the effect of the
intervention in
• This enables quantifying the relationship
between latent variables by observing
how interventions “propagate”
→ how is semantics captured?
13
?
Probing the Learned Manifold
• Assuming the reconstructions have sufficiently high fidelity,
we can treat
• Response function:
• Interventional Response:
14
The response function projects
the perturbed point back onto the
latent manifold*
*similar to memorization in Radhakrishnan et. al. (2018; arXiv: 1810.10333)
Ambient Space Latent Space Latent Space
Ambient Space
Latent
Manifold
Data
Manifold
Generative
Manifold
“Response”
Manifold
Structure of the Latent Space
Assuming the fidelity of reconstructions is sufficiently high, the response
function filters out noise leaving only the semantic information in the
latent code.
15
Latent Response Matrix
• Define as resampling only
• To identify the causal links between the
latent variables, we intervene on one latent
variable at a time and compute the average
resulting effect on all latent variables
• Note that for this model, interventions on
many of the latent variables doesn’t result in
any significant effect → non-informative
17
Curiosity #3: Unexpected structure emerges
• Despite the true factors being
statistically independent, the
learned variables are not
• Perhaps the latent variables
contains additional structure
selected (implicitly) by our
inductive biases (e.g. continuity)
Cool
Frau Causality
→ What is this unexpected structure in the learned generative process?
Causal Disentanglement
• Conventionally disentanglement is evaluated by quantifying
how predictive each latent variable is for each true factor
• But for a generative model, what matters is how well a latent
variable controls a desired true factor
Conditioned Response Matrix (Causal)
DCI Responsibility Matrix (Statistical)
Eastwood et al. (2018; OpenReview By-7dz-AZ)
20
Latent Response Maps
• Starting from a 2D projection of the latent space, we can evaluate the
latent motion all over the latent space to map out the latent manifold
directly.
• Think of the response map as a field showing how far the model will
move in the latent space to reach the manifold
• We can use the divergence of the response map to get a sense
whether the response is converging or diverging at any point in the
latent space
• Lastly, the mean curvature , tells us where the response is
converging to → the latent manifold
23
Example Response Map
Blue:
Orange:
Arrow:
Double-helix Toy Example
• Given noisy samples from a double helix (3D ambient
space), our representation is 2D
24
Traversing the Helix Manifold
• Now that we can explicitly map out the latent manifold, we can directly
traverse along the maximum curvature regions of the latent space to
avoid leaving the manifold
→ semantic interpolations
Interpolating between two
(orange) samples. Naively we
take the Euclidean shortest path
(red), but using the response
maps, we can find a more
meaningful path (green)
25
Latent Space Ambient Space
So… what’s the manifold look like?
Divergence Mean Curvature
Decoded Samples
Note, floor color changes when
crossing the “decision boundaries”
where the latent response has high
divergence.
The high curvature regions (i.e. where
the responses converge to) resemble
10 categories ordered as a circle →
ground truth hue!
29
Latent Response Matrix
Conditioned Response Matrix
Corresponding Reconstructions
Another Opportunity for some Interpolations
30
Shortest path (Euclidean)
Best path (using response maps)
Conclusions
• Naïve disentanglement fails to capture:
• Non-trivial geometry of the true factors (e.g. periodicity)
• Relationships between true factors (e.g. facial hair vs. sex)
• Latent Responses – in reality, the true factors are out of scope, so let’s use
the causal machinery to understand the learned process in its own right.
• Identify causal links between learned variables directly
• Condition on true factors to evaluate causal disentanglement → fairness of
generative model
• Visualize the learned manifold directly (to reveal learned hidden geometry)
Yup!
Frau Causality
Hmm
Manifold Man
Hmm
Ms. Statistics
For links to identifiability: Reizinger et al. (2022: arXiv:2206.02416) 41
42
2D MNIST
• Everyone who worked on the
project: Stefan Bauer,
Michel Besserve, and of
course Bernhard Schölkopf
• Thanks to: the EI department
and the MPI-IS
• For more details, see our
arXiv paper: 2106.16091
Thank you!

More Related Content

Similar to causality_discussion_slides_final.pdf

Automatic Differentiation and SciML in Reality: What can go wrong, and what t...
Automatic Differentiation and SciML in Reality: What can go wrong, and what t...Automatic Differentiation and SciML in Reality: What can go wrong, and what t...
Automatic Differentiation and SciML in Reality: What can go wrong, and what t...Chris Rackauckas
 
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Gingles Caroline
 
State representation learning for control: an overview
State representation learning for control: an overview State representation learning for control: an overview
State representation learning for control: an overview Natalia Díaz Rodríguez
 
Poor man's missing value imputation
Poor man's missing value imputationPoor man's missing value imputation
Poor man's missing value imputationLeonardo Auslender
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningPramit Choudhary
 
Errors of Artificial Intelligence, their Correction and Simplicity Revolution...
Errors of Artificial Intelligence, their Correction and Simplicity Revolution...Errors of Artificial Intelligence, their Correction and Simplicity Revolution...
Errors of Artificial Intelligence, their Correction and Simplicity Revolution...Alexander Gorban
 
Statistical learning vs. Machine Learning
Statistical learning vs. Machine LearningStatistical learning vs. Machine Learning
Statistical learning vs. Machine LearningAtanu Ray
 
The Philosophical Aspects of Data Modelling
The Philosophical Aspects of Data ModellingThe Philosophical Aspects of Data Modelling
The Philosophical Aspects of Data ModellingEmir Muñoz
 
Artificial immune systems and the grand challenge for non classical computation
Artificial immune systems and the grand challenge for non classical computationArtificial immune systems and the grand challenge for non classical computation
Artificial immune systems and the grand challenge for non classical computationUltraUploader
 
A measure to evaluate latent variable model fit by sensitivity analysis
A measure to evaluate latent variable model fit by sensitivity analysisA measure to evaluate latent variable model fit by sensitivity analysis
A measure to evaluate latent variable model fit by sensitivity analysisDaniel Oberski
 
Some Take-Home Message about Machine Learning
Some Take-Home Message about Machine LearningSome Take-Home Message about Machine Learning
Some Take-Home Message about Machine LearningGianluca Bontempi
 
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdfLeonardo Auslender
 
Barra Presentation
Barra PresentationBarra Presentation
Barra Presentationspgreiner
 
​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!Eindhoven University of Technology / JADS
 
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!Sri Ambati
 
Knowledge And Patterns
Knowledge And PatternsKnowledge And Patterns
Knowledge And PatternsDavid Wilson
 

Similar to causality_discussion_slides_final.pdf (20)

The Tower of Knowledge A Generic System Architecture
The Tower of Knowledge A Generic System ArchitectureThe Tower of Knowledge A Generic System Architecture
The Tower of Knowledge A Generic System Architecture
 
Automatic Differentiation and SciML in Reality: What can go wrong, and what t...
Automatic Differentiation and SciML in Reality: What can go wrong, and what t...Automatic Differentiation and SciML in Reality: What can go wrong, and what t...
Automatic Differentiation and SciML in Reality: What can go wrong, and what t...
 
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
 
State representation learning for control: an overview
State representation learning for control: an overview State representation learning for control: an overview
State representation learning for control: an overview
 
Poor man's missing value imputation
Poor man's missing value imputationPoor man's missing value imputation
Poor man's missing value imputation
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
 
Errors of Artificial Intelligence, their Correction and Simplicity Revolution...
Errors of Artificial Intelligence, their Correction and Simplicity Revolution...Errors of Artificial Intelligence, their Correction and Simplicity Revolution...
Errors of Artificial Intelligence, their Correction and Simplicity Revolution...
 
Statistical learning vs. Machine Learning
Statistical learning vs. Machine LearningStatistical learning vs. Machine Learning
Statistical learning vs. Machine Learning
 
The Philosophical Aspects of Data Modelling
The Philosophical Aspects of Data ModellingThe Philosophical Aspects of Data Modelling
The Philosophical Aspects of Data Modelling
 
Artificial immune systems and the grand challenge for non classical computation
Artificial immune systems and the grand challenge for non classical computationArtificial immune systems and the grand challenge for non classical computation
Artificial immune systems and the grand challenge for non classical computation
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
A measure to evaluate latent variable model fit by sensitivity analysis
A measure to evaluate latent variable model fit by sensitivity analysisA measure to evaluate latent variable model fit by sensitivity analysis
A measure to evaluate latent variable model fit by sensitivity analysis
 
Some Take-Home Message about Machine Learning
Some Take-Home Message about Machine LearningSome Take-Home Message about Machine Learning
Some Take-Home Message about Machine Learning
 
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf
 
Barra Presentation
Barra PresentationBarra Presentation
Barra Presentation
 
​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!
 
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
 
Knowledge And Patterns
Knowledge And PatternsKnowledge And Patterns
Knowledge And Patterns
 
Mathematical modeling
Mathematical modelingMathematical modeling
Mathematical modeling
 

Recently uploaded

Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 

Recently uploaded (20)

Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 

causality_discussion_slides_final.pdf

  • 1. Understanding Autoencoders with Interventions Felix Leeb – MPI-IS Causality Discussion Group – 8 March 2023
  • 2. Representation Learning • The fundamental claim of representation learning is that our problem can be better solved using a different (smaller) space than the input (ambient) space → the Manifold Hypothesis • So, break down our solution into two pieces: 1. Organize input into a more useful form → learn a representation 2. Focusing on what is left → solving the actual problem/s (not shown) • In other words, if the input lives in , we only need , where 2
  • 3. Understanding the representation • High-level features of good representations: ❑Extensible – easily integrate expert knowledge ❑Compact – efficient time and space complexity ❑Extrapolate – generalize on a semantic level ❑Robust – not sensitive to unimportant changes ❑Self-aware – estimates uncertainties • Why bother with story-telling, when the performance is what matters? • Form connections with past work → educational • Identify weaknesses and motivate improvements → innovative 3 Key question: On the quest for good representations, how can we make sense of what we have? The Mythos of Model Interpretability by Zachary Lipton (2017)
  • 4. Disentanglement (the obvious) • With the manifold hypothesis, we assumed there are a small number of underlying factors that give rise to the observation, so how about the representation just disentangles those factors? • Simple inductive bias: maximize statistical independence between latent variables to ensure there’s no overlapping information • What if the factors are not statistically independent? What about non-trivial variable structure? 4 Example from Yoshua Bengio: a fork and knife are not statistically independent, but can however be separately manipulated. β-VAE, FVAE, DIP-VAE, TC-VAE, β-TC-VAE, etc.
  • 5. Causality: Genuinely predictive models • Statistical models identify patterns in the dataset, but these correlations may be spurious → non-predictive! • ICM Principle – although individual factors may not be independent, the true process to be is comprised of independent mechanisms (→ interventions in an SCM) • However, without strong assumptions or supervision, the true causal variables cannot be identified (much less the full mechanisms) → guarantees are unrealistic 5 Towards Causal Representation Learning by Schölkopf et al. (2021) Locatello et al. (2018; arXiv 1811.12359)
  • 6. Guiding Principle: Interventional Consistency 7 noise interventions Identifiability problem: impossible to guarantee that the model learns the true causal drivers The effect of each individual lever may be different, but if they are equivalent in aggregate, then you can’t distinguish the “true” from the learned generative process. → Now let’s focus on the causal structure of our learned generative process? noise interventions
  • 7. Setting the Scene • I give you a trained (beta-)VAE using a deep CNN (500k params). Nothing special. • Trained on 3D-Shapes – synthetic process with relatively small observations and 6 independent DOFs (no supervision). • The true factors of variation are independent, and we see disentanglement …but is that the full story? Ours: Disentangled:
  • 8. Curiosity #1: Prior doesn’t match the aggregate posterior • Prior (green) doesn’t match the aggregate posterior (blue) • Latent variables are not statistically independent • But maybe that’s the point – you can only trust your regularization objective so much. Weird Ms. Statistics
  • 9. Curiosity #2: Decoder extends beyond the training manifold • Decoder can still generate sensible samples beyond the aggregate posterior → good for generative modeling • Decoder doesn’t just invert the encoder, but is doing more work “for free”. Weird Manifold Man 2D Latent Traversal VAEs display some interventional consistency out of the box →How can we use that?
  • 10. Hypothesis: Latent Space vs Latent Manifold Can we separate the semantic information S (→ necessary to reconstruct the sample) from the any “exogenous” information U in the latent space which the decoder ignores anyway? For each observation, find a latent vector that makes the subsequent reconstruction as good as possible (without straying too far from the prior) For each point in the latent space, place it as close to the data manifold as possible consistent with the encoder (for reconstruction)
  • 11. Latent Responses • We quantify the semantic change in the sample by measuring the effect of the intervention in • This enables quantifying the relationship between latent variables by observing how interventions “propagate” → how is semantics captured? 13 ?
  • 12. Probing the Learned Manifold • Assuming the reconstructions have sufficiently high fidelity, we can treat • Response function: • Interventional Response: 14 The response function projects the perturbed point back onto the latent manifold* *similar to memorization in Radhakrishnan et. al. (2018; arXiv: 1810.10333) Ambient Space Latent Space Latent Space Ambient Space Latent Manifold Data Manifold Generative Manifold “Response” Manifold
  • 13. Structure of the Latent Space Assuming the fidelity of reconstructions is sufficiently high, the response function filters out noise leaving only the semantic information in the latent code. 15
  • 14. Latent Response Matrix • Define as resampling only • To identify the causal links between the latent variables, we intervene on one latent variable at a time and compute the average resulting effect on all latent variables • Note that for this model, interventions on many of the latent variables doesn’t result in any significant effect → non-informative 17
  • 15. Curiosity #3: Unexpected structure emerges • Despite the true factors being statistically independent, the learned variables are not • Perhaps the latent variables contains additional structure selected (implicitly) by our inductive biases (e.g. continuity) Cool Frau Causality → What is this unexpected structure in the learned generative process?
  • 16. Causal Disentanglement • Conventionally disentanglement is evaluated by quantifying how predictive each latent variable is for each true factor • But for a generative model, what matters is how well a latent variable controls a desired true factor Conditioned Response Matrix (Causal) DCI Responsibility Matrix (Statistical) Eastwood et al. (2018; OpenReview By-7dz-AZ) 20
  • 17. Latent Response Maps • Starting from a 2D projection of the latent space, we can evaluate the latent motion all over the latent space to map out the latent manifold directly. • Think of the response map as a field showing how far the model will move in the latent space to reach the manifold • We can use the divergence of the response map to get a sense whether the response is converging or diverging at any point in the latent space • Lastly, the mean curvature , tells us where the response is converging to → the latent manifold 23 Example Response Map Blue: Orange: Arrow:
  • 18. Double-helix Toy Example • Given noisy samples from a double helix (3D ambient space), our representation is 2D 24
  • 19. Traversing the Helix Manifold • Now that we can explicitly map out the latent manifold, we can directly traverse along the maximum curvature regions of the latent space to avoid leaving the manifold → semantic interpolations Interpolating between two (orange) samples. Naively we take the Euclidean shortest path (red), but using the response maps, we can find a more meaningful path (green) 25 Latent Space Ambient Space
  • 20. So… what’s the manifold look like? Divergence Mean Curvature Decoded Samples Note, floor color changes when crossing the “decision boundaries” where the latent response has high divergence. The high curvature regions (i.e. where the responses converge to) resemble 10 categories ordered as a circle → ground truth hue!
  • 21. 29 Latent Response Matrix Conditioned Response Matrix Corresponding Reconstructions
  • 22. Another Opportunity for some Interpolations 30 Shortest path (Euclidean) Best path (using response maps)
  • 23. Conclusions • Naïve disentanglement fails to capture: • Non-trivial geometry of the true factors (e.g. periodicity) • Relationships between true factors (e.g. facial hair vs. sex) • Latent Responses – in reality, the true factors are out of scope, so let’s use the causal machinery to understand the learned process in its own right. • Identify causal links between learned variables directly • Condition on true factors to evaluate causal disentanglement → fairness of generative model • Visualize the learned manifold directly (to reveal learned hidden geometry) Yup! Frau Causality Hmm Manifold Man Hmm Ms. Statistics For links to identifiability: Reizinger et al. (2022: arXiv:2206.02416) 41
  • 24. 42 2D MNIST • Everyone who worked on the project: Stefan Bauer, Michel Besserve, and of course Bernhard Schölkopf • Thanks to: the EI department and the MPI-IS • For more details, see our arXiv paper: 2106.16091 Thank you!