SlideShare a Scribd company logo
1 of 17
Download to read offline
Emergence of Invariance and
Disentangling in Deep Representations
2018.02.14.
Sangwoo Mo
Overview
• Emergence of Invariance and Disentangling in Deep Representations
• Authors: Achille & Soatto (UCLA)
• Appeared in ICML 2017 Workshop
• Contribution
• Investigate relation between properties for representation
• Propose measure for network complexity
1. Relation between Properties
Properties for Representation
• Representation ! is a stochastic function of data ", that is useful for given task # while
nuisance $ affects to the data
• “Good representation” should satisfy
• sufficient: % !; # = %("; #)
• minimal: minimize % !; " among sufficient !
• invariant: minimize % !; $
• disentangled: minimize *+ ! = ,-(.(!) ∥ ∏1 . !1 )
• However, we will show that only minimal sufficiency is essential; i.e. invariance and
disentanglement are automatically satisfied in some model assumption
* TC: total correlation
** Actually, the assumption is not mild; still, the result is quite interesting
$
#
" !
Minimal Sufficiency ⇒ IB Lagrangian
• Information Bottleneck (IB) Lagrangian
ℒ = $ % & + ( ⋅ *(&; -)
• Minimizing IB Lagrangian yields the minimal sufficient representation
• By Data Processing Inequality (DPI), deep network
- → &0 → ⋯ → &2
• satisfies that * &2; - ≤ *(&0; -); i.e. stacking layer increases minimality
• Q. In real scenario, we do not optimize IB Lagrangian. Does it still apply?
• A. SGD implicitly implies minimality
* ResNet also satisfies the Markov chain, when we define & as “block”
Model Assumption
• Now we will show that under some model assumption
1. minimality implies invariance and disentanglement
2. SGD implicitly implies minimality
• Model Assumption
• Assume log-uniform prior on !; i.e. " !# ∝ 1/|!#|
• Assume posterior !#|( = *# ⋅ ,!# where *# ∼ log 1(−4#/2, 4#)
• 4# will be also optimized (Variational Dropout; Kingma’ 2015)
• Then weight information is
8 !; ( = −
1
2
:
#;<
=>? @
log 4# + B
Minimality ⇒ Invariance & Disentanglement
• Proposition 1. For a single layer " = $ ⋅ &,
' ( $; * ≤ ( "; & + -. " ≤ ' ( $; * + /
• where ' is some strictly increasing function and / = 0(1/ dim & )
• Corollary 1. For MLP,
( "8; & ≤ min
:;8
( ":<=, ": ≤ min
:;8
( ?: ⋅ ":, ":
• Here, we can only obtain the upper bound
• ⇒ minimality implies invariance and disentanglement
SGD ⇒ Minimize Weight Information
• Proposition 2. Let " be the Hessian at the local minimum #$. Assume (#$, ') is optimal
solution of IB Lagrangian ℒ = " + , + ' ⋅ /(,; 1). Then
/ $; 2 ≤ 4[log " ∗ + log #$ :
:
− log '4:]
• where 4 = dim($) and ⋅ ∗ is nuclear norm
• Empirical evidence: SGD converges to the flat minima; i.e. " ∗ = tr(") is small
• ⇒ SGD implicitly minimize the weight information
2. Measure Network Complexity
Revisit Overfitting
• Let !"($, &) be data distribution and (( ⋅ |$; ,) be neural network
• Decompose cross entropy loss
-.,/ 0 , = - 0 2 + 4 2; 0 , + 56(( ! − 4(0; ,|2)
• Since 4(0; ,|2) is intractable, we use 4(0; ,) as a regularizer; i.e. solve IB Lagrangian
ℒ = -.,/ 0 , + 9 ⋅ 4(0; ,)
• Also, we will use 4(0; ,) as a measure for model complexity
• 4(0; ,) is small if underfitting, large if overfitting
Intrinsic error sufficiency model efficiency overfitting
Revisit Rethinking Generalization
• [Zhang’ 2017] claimed that we need new generalization theory for deep learning
* [Zhang’ 2017] Understanding Deep Learning Requires Rethinking Generalization. ICLR 2017.
Revisit Rethinking Generalization
• [Zhang’ 2017] claimed that we need new generalization theory for deep learning
* [Zhang’ 2017] Understanding Deep Learning Requires Rethinking Generalization. ICLR 2017.
Revisit Rethinking Generalization
• [Zhang’ 2017] claimed that we need new generalization theory for deep learning
• Random Label Test: Deep learning easily fits random label
* [Zhang’ 2017] Understanding Deep Learning Requires Rethinking Generalization. ICLR 2017.
Revisit Rethinking Generalization
• [Zhang’ 2017] claimed that we need new generalization theory for deep learning
• Random Label Test: Deep learning easily fits random label
• For random label neural network overfits, but weight information increases
* [Zhang’ 2017] Understanding Deep Learning Requires Rethinking Generalization. ICLR 2017.
Revisit Rethinking Generalization
• [Zhang’ 2017] claimed that we need new generalization theory for deep learning
• Random Label Test: Deep learning easily fits random label
• For random label neural network overfits, but weight information increases
• …and it recovers bias-variance tradeoff
* [Zhang’ 2017] Understanding Deep Learning Requires Rethinking Generalization. ICLR 2017.
Effect of !
• As ! increases, "($; &) decreases, and ( become more invariant (= lose information)
Conclusion
• Conclusion
1. Authors proposed the properties for “good representation”, and minimal sufficiency
is sufficient for invariance and disentanglement
2. Authors proposed the measure for neural network, which solves the paradox of the
rethinking generalization paper
• Research Question
1. minimality ⇒ invariance satisfies in general, but how about ⇒ disentanglement?
In which assumption can we guarantee disentanglement?
2. Weight information seems to be an alternative measure for generalization theory.
How can we estimate "($; &) efficiently for general neural network?

More Related Content

What's hot

Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion ModelsSangwoo Mo
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsSangwoo Mo
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density ModelsSangwoo Mo
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Sangwoo Mo
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video TransformersSangwoo Mo
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveySangwoo Mo
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Sangwoo Mo
 
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...Koh Takeuchi
 
Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...Taiji Suzuki
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationYan Xu
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningSangmin Woo
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learningKien Le
 
Machine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural NetworksMachine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural NetworksAndrew Ferlitsch
 
Introduction to Hamiltonian Neural Networks
Introduction to Hamiltonian Neural NetworksIntroduction to Hamiltonian Neural Networks
Introduction to Hamiltonian Neural NetworksMiles Cranmer
 
Spectral clustering Tutorial
Spectral clustering TutorialSpectral clustering Tutorial
Spectral clustering TutorialZitao Liu
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)Jinwon Lee
 
Activation function
Activation functionActivation function
Activation functionAstha Jain
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsSungchul Kim
 
Sigmoid function machine learning made simple
Sigmoid function  machine learning made simpleSigmoid function  machine learning made simple
Sigmoid function machine learning made simpleDevansh16
 

What's hot (20)

Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation Survey
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
 
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
 
Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...Iclr2020: Compression based bound for non-compressed network: unified general...
Iclr2020: Compression based bound for non-compressed network: unified general...
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Ffnn
FfnnFfnn
Ffnn
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Machine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural NetworksMachine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Convolutional Neural Networks
 
Introduction to Hamiltonian Neural Networks
Introduction to Hamiltonian Neural NetworksIntroduction to Hamiltonian Neural Networks
Introduction to Hamiltonian Neural Networks
 
Spectral clustering Tutorial
Spectral clustering TutorialSpectral clustering Tutorial
Spectral clustering Tutorial
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
 
Activation function
Activation functionActivation function
Activation function
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential Equations
 
Sigmoid function machine learning made simple
Sigmoid function  machine learning made simpleSigmoid function  machine learning made simple
Sigmoid function machine learning made simple
 

Similar to Emergence of Invariance and Disentangling in Deep Representations

The marginal value of adaptive gradient methods in machine learning
The marginal value of adaptive gradient methods in machine learningThe marginal value of adaptive gradient methods in machine learning
The marginal value of adaptive gradient methods in machine learningJamie Seol
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017StampedeCon
 
Java Building Blocks
Java Building BlocksJava Building Blocks
Java Building BlocksCate Huston
 
Generating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaGenerating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaAndre Pemmelaar
 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sVidyasagar Bhargava
 
Understanding deep learning requires rethinking generalization
Understanding deep learning requires rethinking generalizationUnderstanding deep learning requires rethinking generalization
Understanding deep learning requires rethinking generalizationJamie Seol
 
Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)Jon Lederman
 
The Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmThe Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmESCOM
 
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Spark Summit
 
L5. Data Transformation and Feature Engineering
L5. Data Transformation and Feature EngineeringL5. Data Transformation and Feature Engineering
L5. Data Transformation and Feature EngineeringMachine Learning Valencia
 
Preparing Java 7 Certifications
Preparing Java 7 CertificationsPreparing Java 7 Certifications
Preparing Java 7 CertificationsGiacomo Veneri
 
On the Validity of Bayesian Neural Networks for Uncertainty Estimation
On the Validity of Bayesian Neural Networks for Uncertainty EstimationOn the Validity of Bayesian Neural Networks for Uncertainty Estimation
On the Validity of Bayesian Neural Networks for Uncertainty EstimationJane Dane
 
Genetic programming
Genetic programmingGenetic programming
Genetic programmingYun-Yan Chi
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionTe-Yen Liu
 
09_dm1_knn_2022_23.pdf
09_dm1_knn_2022_23.pdf09_dm1_knn_2022_23.pdf
09_dm1_knn_2022_23.pdfArafathJazeeb1
 
Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Manchor Ko
 
Functional programming techniques in regular JavaScript
Functional programming techniques in regular JavaScriptFunctional programming techniques in regular JavaScript
Functional programming techniques in regular JavaScriptPavel Klimiankou
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learningNimrita Koul
 
SPATIAL POINT PATTERNS
SPATIAL POINT PATTERNSSPATIAL POINT PATTERNS
SPATIAL POINT PATTERNSLiemNguyenDuy
 

Similar to Emergence of Invariance and Disentangling in Deep Representations (20)

The marginal value of adaptive gradient methods in machine learning
The marginal value of adaptive gradient methods in machine learningThe marginal value of adaptive gradient methods in machine learning
The marginal value of adaptive gradient methods in machine learning
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Java Building Blocks
Java Building BlocksJava Building Blocks
Java Building Blocks
 
Generating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaGenerating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in julia
 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner's
 
Understanding deep learning requires rethinking generalization
Understanding deep learning requires rethinking generalizationUnderstanding deep learning requires rethinking generalization
Understanding deep learning requires rethinking generalization
 
Realtime Analytics
Realtime AnalyticsRealtime Analytics
Realtime Analytics
 
Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)Deep Learning Sample Class (Jon Lederman)
Deep Learning Sample Class (Jon Lederman)
 
The Back Propagation Learning Algorithm
The Back Propagation Learning AlgorithmThe Back Propagation Learning Algorithm
The Back Propagation Learning Algorithm
 
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models (DB T...
 
L5. Data Transformation and Feature Engineering
L5. Data Transformation and Feature EngineeringL5. Data Transformation and Feature Engineering
L5. Data Transformation and Feature Engineering
 
Preparing Java 7 Certifications
Preparing Java 7 CertificationsPreparing Java 7 Certifications
Preparing Java 7 Certifications
 
On the Validity of Bayesian Neural Networks for Uncertainty Estimation
On the Validity of Bayesian Neural Networks for Uncertainty EstimationOn the Validity of Bayesian Neural Networks for Uncertainty Estimation
On the Validity of Bayesian Neural Networks for Uncertainty Estimation
 
Genetic programming
Genetic programmingGenetic programming
Genetic programming
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
09_dm1_knn_2022_23.pdf
09_dm1_knn_2022_23.pdf09_dm1_knn_2022_23.pdf
09_dm1_knn_2022_23.pdf
 
Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014Dictionary Learning in Games - GDC 2014
Dictionary Learning in Games - GDC 2014
 
Functional programming techniques in regular JavaScript
Functional programming techniques in regular JavaScriptFunctional programming techniques in regular JavaScript
Functional programming techniques in regular JavaScript
 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
 
SPATIAL POINT PATTERNS
SPATIAL POINT PATTERNSSPATIAL POINT PATTERNS
SPATIAL POINT PATTERNS
 

More from Sangwoo Mo

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation LearningSangwoo Mo
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataSangwoo Mo
 
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningSangwoo Mo
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...Sangwoo Mo
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSangwoo Mo
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General AudiencesSangwoo Mo
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingSangwoo Mo
 
Neural Processes
Neural ProcessesNeural Processes
Neural ProcessesSangwoo Mo
 
Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Sangwoo Mo
 
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...Sangwoo Mo
 
Topology for Computing: Homology
Topology for Computing: HomologyTopology for Computing: Homology
Topology for Computing: HomologySangwoo Mo
 
Reinforcement Learning with Deep Energy-Based Policies
Reinforcement Learning with Deep Energy-Based PoliciesReinforcement Learning with Deep Energy-Based Policies
Reinforcement Learning with Deep Energy-Based PoliciesSangwoo Mo
 
Statistical Decision Theory
Statistical Decision TheoryStatistical Decision Theory
Statistical Decision TheorySangwoo Mo
 
On Unifying Deep Generative Models
On Unifying Deep Generative ModelsOn Unifying Deep Generative Models
On Unifying Deep Generative ModelsSangwoo Mo
 
Dropout as a Bayesian Approximation
Dropout as a Bayesian ApproximationDropout as a Bayesian Approximation
Dropout as a Bayesian ApproximationSangwoo Mo
 

More from Sangwoo Mo (15)

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation Learning
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated Data
 
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement Learning
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Neural Processes
Neural ProcessesNeural Processes
Neural Processes
 
Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)
 
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
 
Topology for Computing: Homology
Topology for Computing: HomologyTopology for Computing: Homology
Topology for Computing: Homology
 
Reinforcement Learning with Deep Energy-Based Policies
Reinforcement Learning with Deep Energy-Based PoliciesReinforcement Learning with Deep Energy-Based Policies
Reinforcement Learning with Deep Energy-Based Policies
 
Statistical Decision Theory
Statistical Decision TheoryStatistical Decision Theory
Statistical Decision Theory
 
On Unifying Deep Generative Models
On Unifying Deep Generative ModelsOn Unifying Deep Generative Models
On Unifying Deep Generative Models
 
Dropout as a Bayesian Approximation
Dropout as a Bayesian ApproximationDropout as a Bayesian Approximation
Dropout as a Bayesian Approximation
 

Recently uploaded

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Emergence of Invariance and Disentangling in Deep Representations

  • 1. Emergence of Invariance and Disentangling in Deep Representations 2018.02.14. Sangwoo Mo
  • 2. Overview • Emergence of Invariance and Disentangling in Deep Representations • Authors: Achille & Soatto (UCLA) • Appeared in ICML 2017 Workshop • Contribution • Investigate relation between properties for representation • Propose measure for network complexity
  • 3. 1. Relation between Properties
  • 4. Properties for Representation • Representation ! is a stochastic function of data ", that is useful for given task # while nuisance $ affects to the data • “Good representation” should satisfy • sufficient: % !; # = %("; #) • minimal: minimize % !; " among sufficient ! • invariant: minimize % !; $ • disentangled: minimize *+ ! = ,-(.(!) ∥ ∏1 . !1 ) • However, we will show that only minimal sufficiency is essential; i.e. invariance and disentanglement are automatically satisfied in some model assumption * TC: total correlation ** Actually, the assumption is not mild; still, the result is quite interesting $ # " !
  • 5. Minimal Sufficiency ⇒ IB Lagrangian • Information Bottleneck (IB) Lagrangian ℒ = $ % & + ( ⋅ *(&; -) • Minimizing IB Lagrangian yields the minimal sufficient representation • By Data Processing Inequality (DPI), deep network - → &0 → ⋯ → &2 • satisfies that * &2; - ≤ *(&0; -); i.e. stacking layer increases minimality • Q. In real scenario, we do not optimize IB Lagrangian. Does it still apply? • A. SGD implicitly implies minimality * ResNet also satisfies the Markov chain, when we define & as “block”
  • 6. Model Assumption • Now we will show that under some model assumption 1. minimality implies invariance and disentanglement 2. SGD implicitly implies minimality • Model Assumption • Assume log-uniform prior on !; i.e. " !# ∝ 1/|!#| • Assume posterior !#|( = *# ⋅ ,!# where *# ∼ log 1(−4#/2, 4#) • 4# will be also optimized (Variational Dropout; Kingma’ 2015) • Then weight information is 8 !; ( = − 1 2 : #;< =>? @ log 4# + B
  • 7. Minimality ⇒ Invariance & Disentanglement • Proposition 1. For a single layer " = $ ⋅ &, ' ( $; * ≤ ( "; & + -. " ≤ ' ( $; * + / • where ' is some strictly increasing function and / = 0(1/ dim & ) • Corollary 1. For MLP, ( "8; & ≤ min :;8 ( ":<=, ": ≤ min :;8 ( ?: ⋅ ":, ": • Here, we can only obtain the upper bound • ⇒ minimality implies invariance and disentanglement
  • 8. SGD ⇒ Minimize Weight Information • Proposition 2. Let " be the Hessian at the local minimum #$. Assume (#$, ') is optimal solution of IB Lagrangian ℒ = " + , + ' ⋅ /(,; 1). Then / $; 2 ≤ 4[log " ∗ + log #$ : : − log '4:] • where 4 = dim($) and ⋅ ∗ is nuclear norm • Empirical evidence: SGD converges to the flat minima; i.e. " ∗ = tr(") is small • ⇒ SGD implicitly minimize the weight information
  • 9. 2. Measure Network Complexity
  • 10. Revisit Overfitting • Let !"($, &) be data distribution and (( ⋅ |$; ,) be neural network • Decompose cross entropy loss -.,/ 0 , = - 0 2 + 4 2; 0 , + 56(( ! − 4(0; ,|2) • Since 4(0; ,|2) is intractable, we use 4(0; ,) as a regularizer; i.e. solve IB Lagrangian ℒ = -.,/ 0 , + 9 ⋅ 4(0; ,) • Also, we will use 4(0; ,) as a measure for model complexity • 4(0; ,) is small if underfitting, large if overfitting Intrinsic error sufficiency model efficiency overfitting
  • 11. Revisit Rethinking Generalization • [Zhang’ 2017] claimed that we need new generalization theory for deep learning * [Zhang’ 2017] Understanding Deep Learning Requires Rethinking Generalization. ICLR 2017.
  • 12. Revisit Rethinking Generalization • [Zhang’ 2017] claimed that we need new generalization theory for deep learning * [Zhang’ 2017] Understanding Deep Learning Requires Rethinking Generalization. ICLR 2017.
  • 13. Revisit Rethinking Generalization • [Zhang’ 2017] claimed that we need new generalization theory for deep learning • Random Label Test: Deep learning easily fits random label * [Zhang’ 2017] Understanding Deep Learning Requires Rethinking Generalization. ICLR 2017.
  • 14. Revisit Rethinking Generalization • [Zhang’ 2017] claimed that we need new generalization theory for deep learning • Random Label Test: Deep learning easily fits random label • For random label neural network overfits, but weight information increases * [Zhang’ 2017] Understanding Deep Learning Requires Rethinking Generalization. ICLR 2017.
  • 15. Revisit Rethinking Generalization • [Zhang’ 2017] claimed that we need new generalization theory for deep learning • Random Label Test: Deep learning easily fits random label • For random label neural network overfits, but weight information increases • …and it recovers bias-variance tradeoff * [Zhang’ 2017] Understanding Deep Learning Requires Rethinking Generalization. ICLR 2017.
  • 16. Effect of ! • As ! increases, "($; &) decreases, and ( become more invariant (= lose information)
  • 17. Conclusion • Conclusion 1. Authors proposed the properties for “good representation”, and minimal sufficiency is sufficient for invariance and disentanglement 2. Authors proposed the measure for neural network, which solves the paradox of the rethinking generalization paper • Research Question 1. minimality ⇒ invariance satisfies in general, but how about ⇒ disentanglement? In which assumption can we guarantee disentanglement? 2. Weight information seems to be an alternative measure for generalization theory. How can we estimate "($; &) efficiently for general neural network?