SlideShare a Scribd company logo
1 of 14
Download to read offline
Deep Learning Theory Lecture Note
Chapter 1-2 (part 1)
2022.03.02.
KAIST ALIN-LAB
Sangwoo Mo
1
• Setup
• Consider feedforward networks
• Often only consider shallow networks
• Omit advanced architectures (e.g., ResNet, RNN, Transformer) for simplicity
• Consider supervised learning
• Given training samples { 𝑥!, 𝑦! }, minimize empirical risk
• We aim to minimize the (population) risk
• Consider (1-dim) binary classification (𝑦 ∈ {+1, −1}) with squared loss
• Omit advanced learning schemes (e.g., self-supervised) for simplicity
Overview of the lecture
2
• Topics of deep learning theory
• Recall that we aim to minimize the risk ℛ( (
𝑓)
• It can be decomposed to 3 terms ( ̅
𝑓 ∈ ℱ is some reference solution):
1. Approximation: The hypothesis space ℱ is expressive enough
• Risk of global optima ̅
𝑓 is small
2. Optimization: Can find the (near-) global optima with SGD
• Empirical risk of learned model (
𝑓 ≈ global optima ̅
𝑓
3. Generalization: Learned model can also predict unseen samples
• Empirical risk /
ℛ ≈ population risk ℛ
Overview of the lecture
3
Treat them
together
ℱ: hypothesis space
𝑓 ∈ ℱ: hypothesis (NN in our case)
%
𝑓 ∈ ℱ: hypothesis that minimizes empirical risk &
ℛ(𝑓)
̅
𝑓 ∈ ℱ: hypothesis that minimizes population risk ℛ(𝑓)
• Approximation → Bound function norm
• NN can approximate an arbitrary smooth (Lipschitz) function in a compact domain 𝑆
• ∀𝑔 ∈ 𝒞(𝑆) (space of smooth func.), ∃𝑓 ∈ ℱ such that ℛ 𝑓 − ℛ 𝑔 < 𝜖
• We bound the gap of risks by a function norm
• Specifically, consider two function norms:
• Uniform norm (worst-case)
• 𝐿+ norm (avg. case)
Chap 1. Approximation
4
(closed and bounded, e.g., [0,1])
Loss ℓ is 𝜌-Lipschitz
• Overview of the chapter
• In this chapter, we prove the approximation of finite-width NN
• (2.1) Constructive proof for specific activations
• (2.2) Universal approximation for general activations
• Here, carefully check the assumption of activation function 𝜎 (e.g., sigmoid, ReLU)
• Spoiler
• Chap 3. Define NN as an infinite-width NN 𝑓 = ∫ 𝜎(⋯ ) – a.k.a. Barron’s construction
• Sample finite nodes to approx. integral ⇒ Error goes to 0
• Chap 4. An infinite-width NN near initialization is analytically represented – a.k.a. NTK
• Corresponding hypothesis space (RKHS) is a universal approximator
Chap 2. Approximation of finite-width NN
5
• Univariate case
• Smooth function can be approximated by a piece-wise constant function
• can approximate arbitrary function
• It is a 2-layer MLP with an indicator activation 𝟏[𝑥 ≥ 0]
Chap 2.1 Constructive proof
6
• Univariate case
• Smooth function can be approximated by a piece-wise constant function
• can approximate arbitrary function
• It is a 2-layer MLP with an indicator activation 𝟏[𝑥 ≥ 0]
Chap 2.1 Constructive proof
7
key logic
# of nodes 𝑚 ∝ 1/error
• Multivariate case
• This logic can be extended to
• A compact set 𝑈 ⊂ ℝ5 can be approx. by a partition of rectangles
Chap 2.1 Constructive proof
8
• Multivariate case
• This logic can be extended to
• A compact set 𝑈 ⊂ ℝ5 can be approx. by a partition of rectangles
Chap 2.1 Constructive proof
9
# of nodes ∝ (1/𝛿)^𝑑
(curse of dimension)
• Multivariate case
• This logic can be extended to
• A compact set 𝑈 ⊂ ℝ5 can be approx. by a partition of rectangles
• Similar to before, 2-layer MLP can approximate arbitrary 𝑔
• However, the indicator activation 𝟏:!
is an uncommon choice
• Instead, we approximate 𝟏:!
with 2-layer ReLU composition
⇒ A 3-layer MLP of ReLU activation can approx. arbitrary multivariate 𝑔
Chap 2.1 Constructive proof
10
Only guarantee 𝐿! norm (not uniform norm)
• Multivariate case
• A 3-layer MLP of ReLU activation can approx. arbitrary multivariate 𝑔
• Proof. The only remaining step is approximating 𝟏:!
with ReLU
Chap 2.1 Constructive proof
11
Indicator for 1-dim interval
Indicator for 𝑑-dim rectangle
• Multivariate case
• A 3-layer MLP of ReLU activation can approx. arbitrary multivariate 𝑔
• Proof. The only remaining step is approximating 𝟏:!
with ReLU
Chap 2.1 Constructive proof
12
key logic
• (2.1) Constructive proof
• Approximate an univariate function 𝑔 ∈ ℝ → ℝ
with a 2-layer MLP with an indicator activation (by uniform norm)
• Approximate a multivariate function 𝑔 ∈ ℝ5 → ℝ
with a 3-layer MLP with a ReLU activation (by 𝐿+ norm)
• Also note that this construction requires exponential (over dim) nodes
Summary
13
14
Thank you for listening! 😀

More Related Content

What's hot

Introduction to MAML (Model Agnostic Meta Learning) with Discussions
Introduction to MAML (Model Agnostic Meta Learning) with DiscussionsIntroduction to MAML (Model Agnostic Meta Learning) with Discussions
Introduction to MAML (Model Agnostic Meta Learning) with DiscussionsJoonyoung Yi
 
[DL輪読会]Randomized Prior Functions for Deep Reinforcement Learning
[DL輪読会]Randomized Prior Functions for Deep Reinforcement Learning[DL輪読会]Randomized Prior Functions for Deep Reinforcement Learning
[DL輪読会]Randomized Prior Functions for Deep Reinforcement LearningDeep Learning JP
 
【学会発表】U-Net++とSE-Netを統合した画像セグメンテーションのための転移学習モデル【IBIS2020】
【学会発表】U-Net++とSE-Netを統合した画像セグメンテーションのための転移学習モデル【IBIS2020】【学会発表】U-Net++とSE-Netを統合した画像セグメンテーションのための転移学習モデル【IBIS2020】
【学会発表】U-Net++とSE-Netを統合した画像セグメンテーションのための転移学習モデル【IBIS2020】YutaSuzuki27
 
[DL輪読会]Encoder-Decoder with Atrous Separable Convolution for Semantic Image S...
[DL輪読会]Encoder-Decoder with Atrous Separable Convolution for Semantic Image S...[DL輪読会]Encoder-Decoder with Atrous Separable Convolution for Semantic Image S...
[DL輪読会]Encoder-Decoder with Atrous Separable Convolution for Semantic Image S...Deep Learning JP
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksJinwon Lee
 
Mobilenetv1 v2 slide
Mobilenetv1 v2 slideMobilenetv1 v2 slide
Mobilenetv1 v2 slide威智 黃
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sangwoo Mo
 
Diffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesisDiffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesisBeerenSahu
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
論文紹介:End-to-End Object Detection with Transformers
論文紹介:End-to-End Object Detection with Transformers論文紹介:End-to-End Object Detection with Transformers
論文紹介:End-to-End Object Detection with TransformersToru Tamaki
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)Dong Guo
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
 
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Universitat Politècnica de Catalunya
 
Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 ISungbin Lim
 

What's hot (20)

Introduction to MAML (Model Agnostic Meta Learning) with Discussions
Introduction to MAML (Model Agnostic Meta Learning) with DiscussionsIntroduction to MAML (Model Agnostic Meta Learning) with Discussions
Introduction to MAML (Model Agnostic Meta Learning) with Discussions
 
[DL輪読会]Randomized Prior Functions for Deep Reinforcement Learning
[DL輪読会]Randomized Prior Functions for Deep Reinforcement Learning[DL輪読会]Randomized Prior Functions for Deep Reinforcement Learning
[DL輪読会]Randomized Prior Functions for Deep Reinforcement Learning
 
【学会発表】U-Net++とSE-Netを統合した画像セグメンテーションのための転移学習モデル【IBIS2020】
【学会発表】U-Net++とSE-Netを統合した画像セグメンテーションのための転移学習モデル【IBIS2020】【学会発表】U-Net++とSE-Netを統合した画像セグメンテーションのための転移学習モデル【IBIS2020】
【学会発表】U-Net++とSE-Netを統合した画像セグメンテーションのための転移学習モデル【IBIS2020】
 
CNN
CNNCNN
CNN
 
[DL輪読会]Encoder-Decoder with Atrous Separable Convolution for Semantic Image S...
[DL輪読会]Encoder-Decoder with Atrous Separable Convolution for Semantic Image S...[DL輪読会]Encoder-Decoder with Atrous Separable Convolution for Semantic Image S...
[DL輪読会]Encoder-Decoder with Atrous Separable Convolution for Semantic Image S...
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
 
Mobilenetv1 v2 slide
Mobilenetv1 v2 slideMobilenetv1 v2 slide
Mobilenetv1 v2 slide
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
 
Meta learning tutorial
Meta learning tutorialMeta learning tutorial
Meta learning tutorial
 
Diffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesisDiffusion models beat gans on image synthesis
Diffusion models beat gans on image synthesis
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 
Deep ar presentation
Deep ar presentationDeep ar presentation
Deep ar presentation
 
論文紹介:End-to-End Object Detection with Transformers
論文紹介:End-to-End Object Detection with Transformers論文紹介:End-to-End Object Detection with Transformers
論文紹介:End-to-End Object Detection with Transformers
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
 
Wasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 IWasserstein GAN 수학 이해하기 I
Wasserstein GAN 수학 이해하기 I
 

Similar to Deep Learning Theory Seminar (Chap 1-2, part 1)

20211019 When does label smoothing help_shared ver
20211019 When does label smoothing help_shared ver20211019 When does label smoothing help_shared ver
20211019 When does label smoothing help_shared verHsing-chuan Hsieh
 
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...Taiji Suzuki
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryAhmed Yousry
 
Neural Networks. Overview
Neural Networks. OverviewNeural Networks. Overview
Neural Networks. OverviewOleksandr Baiev
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningSungchul Kim
 
Markov Chain Monte Carlo explained
Markov Chain Monte Carlo explainedMarkov Chain Monte Carlo explained
Markov Chain Monte Carlo explaineddariodigiuni
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with PerformersJoonhyung Lee
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2ananth
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptxasdq4
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Sangwoo Mo
 
generalized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombegeneralized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombeMatt Challacombe
 
Shor's discrete logarithm quantum algorithm for elliptic curves
 Shor's discrete logarithm quantum algorithm for elliptic curves Shor's discrete logarithm quantum algorithm for elliptic curves
Shor's discrete logarithm quantum algorithm for elliptic curvesXequeMateShannon
 
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...Naoki Hayashi
 
머피의 머신러닝: 17장 Markov Chain and HMM
머피의 머신러닝: 17장  Markov Chain and HMM머피의 머신러닝: 17장  Markov Chain and HMM
머피의 머신러닝: 17장 Markov Chain and HMMJungkyu Lee
 

Similar to Deep Learning Theory Seminar (Chap 1-2, part 1) (20)

lecture_09.pptx
lecture_09.pptxlecture_09.pptx
lecture_09.pptx
 
20211019 When does label smoothing help_shared ver
20211019 When does label smoothing help_shared ver20211019 When does label smoothing help_shared ver
20211019 When does label smoothing help_shared ver
 
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
[NeurIPS2020 (spotlight)] Generalization bound of globally optimal non convex...
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
 
Neural Networks. Overview
Neural Networks. OverviewNeural Networks. Overview
Neural Networks. Overview
 
Circuitanlys2
Circuitanlys2Circuitanlys2
Circuitanlys2
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Markov Chain Monte Carlo explained
Markov Chain Monte Carlo explainedMarkov Chain Monte Carlo explained
Markov Chain Monte Carlo explained
 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
 
Rethinking Attention with Performers
Rethinking Attention with PerformersRethinking Attention with Performers
Rethinking Attention with Performers
 
L05 language model_part2
L05 language model_part2L05 language model_part2
L05 language model_part2
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptx
 
Ucb2
Ucb2Ucb2
Ucb2
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)
 
RecursionWeek8.ppt
RecursionWeek8.pptRecursionWeek8.ppt
RecursionWeek8.ppt
 
generalized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombegeneralized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombe
 
Shor's discrete logarithm quantum algorithm for elliptic curves
 Shor's discrete logarithm quantum algorithm for elliptic curves Shor's discrete logarithm quantum algorithm for elliptic curves
Shor's discrete logarithm quantum algorithm for elliptic curves
 
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
Bayesian Generalization Error and Real Log Canonical Threshold in Non-negativ...
 
머피의 머신러닝: 17장 Markov Chain and HMM
머피의 머신러닝: 17장  Markov Chain and HMM머피의 머신러닝: 17장  Markov Chain and HMM
머피의 머신러닝: 17장 Markov Chain and HMM
 

More from Sangwoo Mo

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation LearningSangwoo Mo
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataSangwoo Mo
 
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningSangwoo Mo
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...Sangwoo Mo
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSangwoo Mo
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video TransformersSangwoo Mo
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksSangwoo Mo
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsSangwoo Mo
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear ComplexitySangwoo Mo
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsSangwoo Mo
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Sangwoo Mo
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General AudiencesSangwoo Mo
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingSangwoo Mo
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveySangwoo Mo
 
Neural Processes
Neural ProcessesNeural Processes
Neural ProcessesSangwoo Mo
 
Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Sangwoo Mo
 
Recursive Neural Networks
Recursive Neural NetworksRecursive Neural Networks
Recursive Neural NetworksSangwoo Mo
 
Emergence of Invariance and Disentangling in Deep Representations
Emergence of Invariance and Disentangling in Deep RepresentationsEmergence of Invariance and Disentangling in Deep Representations
Emergence of Invariance and Disentangling in Deep RepresentationsSangwoo Mo
 
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...Sangwoo Mo
 
Topology for Computing: Homology
Topology for Computing: HomologyTopology for Computing: Homology
Topology for Computing: HomologySangwoo Mo
 

More from Sangwoo Mo (20)

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation Learning
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated Data
 
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement Learning
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
 
Self-Attention with Linear Complexity
Self-Attention with Linear ComplexitySelf-Attention with Linear Complexity
Self-Attention with Linear Complexity
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit Gradients
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation Survey
 
Neural Processes
Neural ProcessesNeural Processes
Neural Processes
 
Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)
 
Recursive Neural Networks
Recursive Neural NetworksRecursive Neural Networks
Recursive Neural Networks
 
Emergence of Invariance and Disentangling in Deep Representations
Emergence of Invariance and Disentangling in Deep RepresentationsEmergence of Invariance and Disentangling in Deep Representations
Emergence of Invariance and Disentangling in Deep Representations
 
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
 
Topology for Computing: Homology
Topology for Computing: HomologyTopology for Computing: Homology
Topology for Computing: Homology
 

Recently uploaded

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Deep Learning Theory Seminar (Chap 1-2, part 1)

  • 1. Deep Learning Theory Lecture Note Chapter 1-2 (part 1) 2022.03.02. KAIST ALIN-LAB Sangwoo Mo 1
  • 2. • Setup • Consider feedforward networks • Often only consider shallow networks • Omit advanced architectures (e.g., ResNet, RNN, Transformer) for simplicity • Consider supervised learning • Given training samples { 𝑥!, 𝑦! }, minimize empirical risk • We aim to minimize the (population) risk • Consider (1-dim) binary classification (𝑦 ∈ {+1, −1}) with squared loss • Omit advanced learning schemes (e.g., self-supervised) for simplicity Overview of the lecture 2
  • 3. • Topics of deep learning theory • Recall that we aim to minimize the risk ℛ( ( 𝑓) • It can be decomposed to 3 terms ( ̅ 𝑓 ∈ ℱ is some reference solution): 1. Approximation: The hypothesis space ℱ is expressive enough • Risk of global optima ̅ 𝑓 is small 2. Optimization: Can find the (near-) global optima with SGD • Empirical risk of learned model ( 𝑓 ≈ global optima ̅ 𝑓 3. Generalization: Learned model can also predict unseen samples • Empirical risk / ℛ ≈ population risk ℛ Overview of the lecture 3 Treat them together ℱ: hypothesis space 𝑓 ∈ ℱ: hypothesis (NN in our case) % 𝑓 ∈ ℱ: hypothesis that minimizes empirical risk & ℛ(𝑓) ̅ 𝑓 ∈ ℱ: hypothesis that minimizes population risk ℛ(𝑓)
  • 4. • Approximation → Bound function norm • NN can approximate an arbitrary smooth (Lipschitz) function in a compact domain 𝑆 • ∀𝑔 ∈ 𝒞(𝑆) (space of smooth func.), ∃𝑓 ∈ ℱ such that ℛ 𝑓 − ℛ 𝑔 < 𝜖 • We bound the gap of risks by a function norm • Specifically, consider two function norms: • Uniform norm (worst-case) • 𝐿+ norm (avg. case) Chap 1. Approximation 4 (closed and bounded, e.g., [0,1]) Loss ℓ is 𝜌-Lipschitz
  • 5. • Overview of the chapter • In this chapter, we prove the approximation of finite-width NN • (2.1) Constructive proof for specific activations • (2.2) Universal approximation for general activations • Here, carefully check the assumption of activation function 𝜎 (e.g., sigmoid, ReLU) • Spoiler • Chap 3. Define NN as an infinite-width NN 𝑓 = ∫ 𝜎(⋯ ) – a.k.a. Barron’s construction • Sample finite nodes to approx. integral ⇒ Error goes to 0 • Chap 4. An infinite-width NN near initialization is analytically represented – a.k.a. NTK • Corresponding hypothesis space (RKHS) is a universal approximator Chap 2. Approximation of finite-width NN 5
  • 6. • Univariate case • Smooth function can be approximated by a piece-wise constant function • can approximate arbitrary function • It is a 2-layer MLP with an indicator activation 𝟏[𝑥 ≥ 0] Chap 2.1 Constructive proof 6
  • 7. • Univariate case • Smooth function can be approximated by a piece-wise constant function • can approximate arbitrary function • It is a 2-layer MLP with an indicator activation 𝟏[𝑥 ≥ 0] Chap 2.1 Constructive proof 7 key logic # of nodes 𝑚 ∝ 1/error
  • 8. • Multivariate case • This logic can be extended to • A compact set 𝑈 ⊂ ℝ5 can be approx. by a partition of rectangles Chap 2.1 Constructive proof 8
  • 9. • Multivariate case • This logic can be extended to • A compact set 𝑈 ⊂ ℝ5 can be approx. by a partition of rectangles Chap 2.1 Constructive proof 9 # of nodes ∝ (1/𝛿)^𝑑 (curse of dimension)
  • 10. • Multivariate case • This logic can be extended to • A compact set 𝑈 ⊂ ℝ5 can be approx. by a partition of rectangles • Similar to before, 2-layer MLP can approximate arbitrary 𝑔 • However, the indicator activation 𝟏:! is an uncommon choice • Instead, we approximate 𝟏:! with 2-layer ReLU composition ⇒ A 3-layer MLP of ReLU activation can approx. arbitrary multivariate 𝑔 Chap 2.1 Constructive proof 10 Only guarantee 𝐿! norm (not uniform norm)
  • 11. • Multivariate case • A 3-layer MLP of ReLU activation can approx. arbitrary multivariate 𝑔 • Proof. The only remaining step is approximating 𝟏:! with ReLU Chap 2.1 Constructive proof 11 Indicator for 1-dim interval Indicator for 𝑑-dim rectangle
  • 12. • Multivariate case • A 3-layer MLP of ReLU activation can approx. arbitrary multivariate 𝑔 • Proof. The only remaining step is approximating 𝟏:! with ReLU Chap 2.1 Constructive proof 12 key logic
  • 13. • (2.1) Constructive proof • Approximate an univariate function 𝑔 ∈ ℝ → ℝ with a 2-layer MLP with an indicator activation (by uniform norm) • Approximate a multivariate function 𝑔 ∈ ℝ5 → ℝ with a 3-layer MLP with a ReLU activation (by 𝐿+ norm) • Also note that this construction requires exponential (over dim) nodes Summary 13
  • 14. 14 Thank you for listening! 😀