SlideShare a Scribd company logo
1 of 9
Download to read offline
Deep Learning Theory Lecture Note
Chapter 3 (part 2)
2022.04.06.
KAIST ALIN-LAB
Sangwoo Mo
1
• Maurey sampling technique
• Let 𝑋 = 𝔼𝑉 for random variable 𝑉 supported on a set 𝑆
• Finite-sample approx. of 𝑋 is &
𝑋 =
!
"
∑#$!
"
𝑉# for 𝑉# iid sampled from 𝑝(𝑉)
• Here, &
𝑋 ≈ 𝑋 as 𝑘 → ∞ (precisely, 𝑋 − &
𝑋 = 𝑂(1/𝑘))
• It is very intuitive… let’s prove it!
(3.3) How to sample finite networks?
2
𝑉 is on a Hilbert space (i.e., has a inner product)
• Maurey sampling technique
• Formal statement
(3.3) How to sample finite networks?
3
= 𝑂(1/𝑘),
goes to zero as 𝑘 → ∞
• Maurey sampling technique
• Formal statement
(3.3) How to sample finite networks?
4
= 𝑂(1/𝑘),
goes to zero as 𝑘 → ∞
(1) We bound the 𝔼 form
(2) If 𝔼 over 𝑉!, … , 𝑉" is some value 𝐾,
there exists some 𝑈!, … , 𝑈" with value ≤ 𝐾
(we need only one realization of 𝑈!, … , 𝑈" that satisfies
!
"
∑# 𝑈# ≈ 𝑋)
This technique is called “probabilistic method”!
• Maurey sampling technique
• Proof of Lemma 3.1
(3.3) How to sample finite networks?
5
𝔼𝑉#𝔼𝑉
$ − 𝔼𝑉#𝑋 − 𝔼𝑉
$𝑋 + 𝑋 %
= 𝑋 %
− 𝑋 %
− 𝑋 %
+ 𝑋 %
= 0
𝑉 %
− 2𝔼𝑉𝑋 + 𝑋 %
= 𝑉 %
− 2 𝑋 %
+ 𝑋 %
= 𝑉 %
− 𝑋 %
• Sampling finite-width network
• Lemma 3.1 assumes that 𝑝(𝑉) is probability – (1) nonzero, (2) sum is 1
• However, our “weight distribution” of infinite-width NN is not a probability!
• Our infinite-width NN
• The weight distribution of (𝑤, 𝑏) is sin ⋯ – (1) can be negative, (2) sum is not 1
• Q. How to extend Maurey sampling for general weight distribution?
(3.3) How to sample finite networks?
6
• Sampling finite-width network
• For simplicity, let the infinite-width NN be
where 𝜇 is a signed measure over weight vectors 𝑤 ∈ ℝ%
• 𝑔 is some abstract node (e.g., 𝑔 𝑥; 𝑤 = 𝜎(𝑎⊺𝑥 + 𝑏) for 𝑤 = {𝑎, 𝑏})
• To convert the general signed measure 𝜇 to a probability measure,
1) Introduce a sign parameter 𝑠 ∈ {±1} and consider nonnegative measure 𝜇±
• For 𝜇 = 𝜇& − 𝜇', both 𝜇& and 𝜇' are nonnegative (Jordan decomposition)
• Multiply 𝑠 = +1 for 𝜇& and 𝑠 = −1 for 𝜇' regions (Pr s = +1 = 𝜇& / 𝜇 )
2) Normalize nonnegative measure 𝜇/‖𝜇‖ to make sum 1
• Multiply the normalizing constant ‖𝜇‖ to the output 𝑔(𝑥; 𝑤, 𝑠)
• After the conversion, one can extend Maurey sampling for general signed measure
(3.3) How to sample finite networks?
7
‖ Infinite NN – Finite NN ‖
• Sampling finite-width network
• Applying Maurey sampling, the approx. error of finite-width NN is
• (3.1) Univariate case
• (3.2) Baron’s construction
(3.3) How to sample finite networks?
8
Approx. error ≤
Approx. error ≤
9
Thank you for listening! 😀

More Related Content

What's hot

Ensemble Data Assimilation on a Non-Conservative Adaptive Mesh
Ensemble Data Assimilation on a Non-Conservative Adaptive MeshEnsemble Data Assimilation on a Non-Conservative Adaptive Mesh
Ensemble Data Assimilation on a Non-Conservative Adaptive MeshColinGuider
 
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...Taiji Suzuki
 
K means and dbscan
K means and dbscanK means and dbscan
K means and dbscanYan Xu
 
DDPG algortihm for angry birds
DDPG algortihm for angry birdsDDPG algortihm for angry birds
DDPG algortihm for angry birdsWangyu Han
 
IJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsIJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsAkisato Kimura
 
Tensor Spectral Clustering
Tensor Spectral ClusteringTensor Spectral Clustering
Tensor Spectral ClusteringAustin Benson
 
Optimal real-time landing using DNN
Optimal real-time landing using DNNOptimal real-time landing using DNN
Optimal real-time landing using DNN홍배 김
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms Hakky St
 
K means clustering
K means clusteringK means clustering
K means clusteringThomas K T
 
K means clustering | K Means ++
K means clustering | K Means ++K means clustering | K Means ++
K means clustering | K Means ++sabbirantor
 
K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERINGsingh7599
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descentSuraj Parmar
 
Spectral clustering Tutorial
Spectral clustering TutorialSpectral clustering Tutorial
Spectral clustering TutorialZitao Liu
 
Firefly exact MCMC for Big Data
Firefly exact MCMC for Big DataFirefly exact MCMC for Big Data
Firefly exact MCMC for Big DataGianvito Siciliano
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descentkandelin
 
9 Hierarchical Clustering
9 Hierarchical Clustering9 Hierarchical Clustering
9 Hierarchical ClusteringVishal Dutt
 

What's hot (20)

Ensemble Data Assimilation on a Non-Conservative Adaptive Mesh
Ensemble Data Assimilation on a Non-Conservative Adaptive MeshEnsemble Data Assimilation on a Non-Conservative Adaptive Mesh
Ensemble Data Assimilation on a Non-Conservative Adaptive Mesh
 
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
 
K means and dbscan
K means and dbscanK means and dbscan
K means and dbscan
 
DDPG algortihm for angry birds
DDPG algortihm for angry birdsDDPG algortihm for angry birds
DDPG algortihm for angry birds
 
Quick Look At Clustering
Quick Look At ClusteringQuick Look At Clustering
Quick Look At Clustering
 
Bfs dfs mst
Bfs dfs mstBfs dfs mst
Bfs dfs mst
 
IJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsIJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphs
 
Tensor Spectral Clustering
Tensor Spectral ClusteringTensor Spectral Clustering
Tensor Spectral Clustering
 
Optimal real-time landing using DNN
Optimal real-time landing using DNNOptimal real-time landing using DNN
Optimal real-time landing using DNN
 
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
 
Fast Multipole Algorithm
Fast Multipole AlgorithmFast Multipole Algorithm
Fast Multipole Algorithm
 
K means clustering
K means clusteringK means clustering
K means clustering
 
K means clustering | K Means ++
K means clustering | K Means ++K means clustering | K Means ++
K means clustering | K Means ++
 
K MEANS CLUSTERING
K MEANS CLUSTERINGK MEANS CLUSTERING
K MEANS CLUSTERING
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descent
 
Spectral clustering Tutorial
Spectral clustering TutorialSpectral clustering Tutorial
Spectral clustering Tutorial
 
Firefly exact MCMC for Big Data
Firefly exact MCMC for Big DataFirefly exact MCMC for Big Data
Firefly exact MCMC for Big Data
 
Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descent
 
Birch1
Birch1Birch1
Birch1
 
9 Hierarchical Clustering
9 Hierarchical Clustering9 Hierarchical Clustering
9 Hierarchical Clustering
 

Similar to Deep Learning Theory Seminar (Chap 3, part 2)

Elementary statistical inference1
Elementary statistical inference1Elementary statistical inference1
Elementary statistical inference1SEMINARGROOT
 
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingHsing-chuan Hsieh
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matchingtaeseon ryu
 
Average Sensitivity of Graph Algorithms
Average Sensitivity of Graph AlgorithmsAverage Sensitivity of Graph Algorithms
Average Sensitivity of Graph AlgorithmsYuichi Yoshida
 
Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Tsuyoshi Sakama
 
A compact zero knowledge proof to restrict message space in homomorphic encry...
A compact zero knowledge proof to restrict message space in homomorphic encry...A compact zero knowledge proof to restrict message space in homomorphic encry...
A compact zero knowledge proof to restrict message space in homomorphic encry...MITSUNARI Shigeo
 
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNINGARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNINGmohanapriyastp
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!ChenYiHuang5
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural NetworksNatan Katz
 
NIPS KANSAI Reading Group #5: State Aware Imitation Learning
NIPS KANSAI Reading Group #5: State Aware Imitation LearningNIPS KANSAI Reading Group #5: State Aware Imitation Learning
NIPS KANSAI Reading Group #5: State Aware Imitation LearningEiji Uchibe
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsSungchul Kim
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satChenYiHuang5
 
Probabilistic Models of Time Series and Sequences
Probabilistic Models of Time Series and SequencesProbabilistic Models of Time Series and Sequences
Probabilistic Models of Time Series and SequencesZitao Liu
 
Domain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewDomain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewChia-Ching Lin
 
is anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesis anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesNAVER Engineering
 
Schelkunoff Polynomial Method for Antenna Synthesis
Schelkunoff Polynomial Method for Antenna SynthesisSchelkunoff Polynomial Method for Antenna Synthesis
Schelkunoff Polynomial Method for Antenna SynthesisSwapnil Bangera
 

Similar to Deep Learning Theory Seminar (Chap 3, part 2) (20)

Elementary statistical inference1
Elementary statistical inference1Elementary statistical inference1
Elementary statistical inference1
 
Efficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketchingEfficient anomaly detection via matrix sketching
Efficient anomaly detection via matrix sketching
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matching
 
Average Sensitivity of Graph Algorithms
Average Sensitivity of Graph AlgorithmsAverage Sensitivity of Graph Algorithms
Average Sensitivity of Graph Algorithms
 
Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章Elements of Statistical Learning 読み会 第2章
Elements of Statistical Learning 読み会 第2章
 
Av 738- Adaptive Filtering - Background Material
Av 738- Adaptive Filtering - Background MaterialAv 738- Adaptive Filtering - Background Material
Av 738- Adaptive Filtering - Background Material
 
A compact zero knowledge proof to restrict message space in homomorphic encry...
A compact zero knowledge proof to restrict message space in homomorphic encry...A compact zero knowledge proof to restrict message space in homomorphic encry...
A compact zero knowledge proof to restrict message space in homomorphic encry...
 
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNINGARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
 
Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!Paper study: Attention, learn to solve routing problems!
Paper study: Attention, learn to solve routing problems!
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
NIPS KANSAI Reading Group #5: State Aware Imitation Learning
NIPS KANSAI Reading Group #5: State Aware Imitation LearningNIPS KANSAI Reading Group #5: State Aware Imitation Learning
NIPS KANSAI Reading Group #5: State Aware Imitation Learning
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential Equations
 
Markov Chain Basic
Markov Chain BasicMarkov Chain Basic
Markov Chain Basic
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
Lecture1
Lecture1Lecture1
Lecture1
 
Probabilistic Models of Time Series and Sequences
Probabilistic Models of Time Series and SequencesProbabilistic Models of Time Series and Sequences
Probabilistic Models of Time Series and Sequences
 
Domain adaptation: A Theoretical View
Domain adaptation: A Theoretical ViewDomain adaptation: A Theoretical View
Domain adaptation: A Theoretical View
 
Fa18_P1.pptx
Fa18_P1.pptxFa18_P1.pptx
Fa18_P1.pptx
 
is anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesis anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayes
 
Schelkunoff Polynomial Method for Antenna Synthesis
Schelkunoff Polynomial Method for Antenna SynthesisSchelkunoff Polynomial Method for Antenna Synthesis
Schelkunoff Polynomial Method for Antenna Synthesis
 

More from Sangwoo Mo

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation LearningSangwoo Mo
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataSangwoo Mo
 
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningSangwoo Mo
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...Sangwoo Mo
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSangwoo Mo
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion ModelsSangwoo Mo
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video TransformersSangwoo Mo
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksSangwoo Mo
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsSangwoo Mo
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsSangwoo Mo
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Sangwoo Mo
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General AudiencesSangwoo Mo
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingSangwoo Mo
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveySangwoo Mo
 
Neural Processes
Neural ProcessesNeural Processes
Neural ProcessesSangwoo Mo
 
Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Sangwoo Mo
 
Recursive Neural Networks
Recursive Neural NetworksRecursive Neural Networks
Recursive Neural NetworksSangwoo Mo
 
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...Sangwoo Mo
 
Topology for Computing: Homology
Topology for Computing: HomologyTopology for Computing: Homology
Topology for Computing: HomologySangwoo Mo
 
Reinforcement Learning with Deep Energy-Based Policies
Reinforcement Learning with Deep Energy-Based PoliciesReinforcement Learning with Deep Energy-Based Policies
Reinforcement Learning with Deep Energy-Based PoliciesSangwoo Mo
 

More from Sangwoo Mo (20)

Brief History of Visual Representation Learning
Brief History of Visual Representation LearningBrief History of Visual Representation Learning
Brief History of Visual Representation Learning
 
Learning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated DataLearning Visual Representations from Uncurated Data
Learning Visual Representations from Uncurated Data
 
Hyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement LearningHyperbolic Deep Reinforcement Learning
Hyperbolic Deep Reinforcement Learning
 
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
A Unified Framework for Computer Vision Tasks: (Conditional) Generative Model...
 
Self-supervised Learning Lecture Note
Self-supervised Learning Lecture NoteSelf-supervised Learning Lecture Note
Self-supervised Learning Lecture Note
 
Introduction to Diffusion Models
Introduction to Diffusion ModelsIntroduction to Diffusion Models
Introduction to Diffusion Models
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
 
Score-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential EquationsScore-Based Generative Modeling through Stochastic Differential Equations
Score-Based Generative Modeling through Stochastic Differential Equations
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit Gradients
 
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
 
Generative Models for General Audiences
Generative Models for General AudiencesGenerative Models for General Audiences
Generative Models for General Audiences
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Domain Transfer and Adaptation Survey
Domain Transfer and Adaptation SurveyDomain Transfer and Adaptation Survey
Domain Transfer and Adaptation Survey
 
Neural Processes
Neural ProcessesNeural Processes
Neural Processes
 
Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)Improved Trainings of Wasserstein GANs (WGAN-GP)
Improved Trainings of Wasserstein GANs (WGAN-GP)
 
Recursive Neural Networks
Recursive Neural NetworksRecursive Neural Networks
Recursive Neural Networks
 
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable...
 
Topology for Computing: Homology
Topology for Computing: HomologyTopology for Computing: Homology
Topology for Computing: Homology
 
Reinforcement Learning with Deep Energy-Based Policies
Reinforcement Learning with Deep Energy-Based PoliciesReinforcement Learning with Deep Energy-Based Policies
Reinforcement Learning with Deep Energy-Based Policies
 

Recently uploaded

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Recently uploaded (20)

Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

Deep Learning Theory Seminar (Chap 3, part 2)

  • 1. Deep Learning Theory Lecture Note Chapter 3 (part 2) 2022.04.06. KAIST ALIN-LAB Sangwoo Mo 1
  • 2. • Maurey sampling technique • Let 𝑋 = 𝔼𝑉 for random variable 𝑉 supported on a set 𝑆 • Finite-sample approx. of 𝑋 is & 𝑋 = ! " ∑#$! " 𝑉# for 𝑉# iid sampled from 𝑝(𝑉) • Here, & 𝑋 ≈ 𝑋 as 𝑘 → ∞ (precisely, 𝑋 − & 𝑋 = 𝑂(1/𝑘)) • It is very intuitive… let’s prove it! (3.3) How to sample finite networks? 2 𝑉 is on a Hilbert space (i.e., has a inner product)
  • 3. • Maurey sampling technique • Formal statement (3.3) How to sample finite networks? 3 = 𝑂(1/𝑘), goes to zero as 𝑘 → ∞
  • 4. • Maurey sampling technique • Formal statement (3.3) How to sample finite networks? 4 = 𝑂(1/𝑘), goes to zero as 𝑘 → ∞ (1) We bound the 𝔼 form (2) If 𝔼 over 𝑉!, … , 𝑉" is some value 𝐾, there exists some 𝑈!, … , 𝑈" with value ≤ 𝐾 (we need only one realization of 𝑈!, … , 𝑈" that satisfies ! " ∑# 𝑈# ≈ 𝑋) This technique is called “probabilistic method”!
  • 5. • Maurey sampling technique • Proof of Lemma 3.1 (3.3) How to sample finite networks? 5 𝔼𝑉#𝔼𝑉 $ − 𝔼𝑉#𝑋 − 𝔼𝑉 $𝑋 + 𝑋 % = 𝑋 % − 𝑋 % − 𝑋 % + 𝑋 % = 0 𝑉 % − 2𝔼𝑉𝑋 + 𝑋 % = 𝑉 % − 2 𝑋 % + 𝑋 % = 𝑉 % − 𝑋 %
  • 6. • Sampling finite-width network • Lemma 3.1 assumes that 𝑝(𝑉) is probability – (1) nonzero, (2) sum is 1 • However, our “weight distribution” of infinite-width NN is not a probability! • Our infinite-width NN • The weight distribution of (𝑤, 𝑏) is sin ⋯ – (1) can be negative, (2) sum is not 1 • Q. How to extend Maurey sampling for general weight distribution? (3.3) How to sample finite networks? 6
  • 7. • Sampling finite-width network • For simplicity, let the infinite-width NN be where 𝜇 is a signed measure over weight vectors 𝑤 ∈ ℝ% • 𝑔 is some abstract node (e.g., 𝑔 𝑥; 𝑤 = 𝜎(𝑎⊺𝑥 + 𝑏) for 𝑤 = {𝑎, 𝑏}) • To convert the general signed measure 𝜇 to a probability measure, 1) Introduce a sign parameter 𝑠 ∈ {±1} and consider nonnegative measure 𝜇± • For 𝜇 = 𝜇& − 𝜇', both 𝜇& and 𝜇' are nonnegative (Jordan decomposition) • Multiply 𝑠 = +1 for 𝜇& and 𝑠 = −1 for 𝜇' regions (Pr s = +1 = 𝜇& / 𝜇 ) 2) Normalize nonnegative measure 𝜇/‖𝜇‖ to make sum 1 • Multiply the normalizing constant ‖𝜇‖ to the output 𝑔(𝑥; 𝑤, 𝑠) • After the conversion, one can extend Maurey sampling for general signed measure (3.3) How to sample finite networks? 7 ‖ Infinite NN – Finite NN ‖
  • 8. • Sampling finite-width network • Applying Maurey sampling, the approx. error of finite-width NN is • (3.1) Univariate case • (3.2) Baron’s construction (3.3) How to sample finite networks? 8 Approx. error ≤ Approx. error ≤
  • 9. 9 Thank you for listening! 😀