"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Deep Learning Theory Seminar (Chap 3, part 2)
1. Deep Learning Theory Lecture Note
Chapter 3 (part 2)
2022.04.06.
KAIST ALIN-LAB
Sangwoo Mo
1
2. • Maurey sampling technique
• Let 𝑋 = 𝔼𝑉 for random variable 𝑉 supported on a set 𝑆
• Finite-sample approx. of 𝑋 is &
𝑋 =
!
"
∑#$!
"
𝑉# for 𝑉# iid sampled from 𝑝(𝑉)
• Here, &
𝑋 ≈ 𝑋 as 𝑘 → ∞ (precisely, 𝑋 − &
𝑋 = 𝑂(1/𝑘))
• It is very intuitive… let’s prove it!
(3.3) How to sample finite networks?
2
𝑉 is on a Hilbert space (i.e., has a inner product)
3. • Maurey sampling technique
• Formal statement
(3.3) How to sample finite networks?
3
= 𝑂(1/𝑘),
goes to zero as 𝑘 → ∞
4. • Maurey sampling technique
• Formal statement
(3.3) How to sample finite networks?
4
= 𝑂(1/𝑘),
goes to zero as 𝑘 → ∞
(1) We bound the 𝔼 form
(2) If 𝔼 over 𝑉!, … , 𝑉" is some value 𝐾,
there exists some 𝑈!, … , 𝑈" with value ≤ 𝐾
(we need only one realization of 𝑈!, … , 𝑈" that satisfies
!
"
∑# 𝑈# ≈ 𝑋)
This technique is called “probabilistic method”!
6. • Sampling finite-width network
• Lemma 3.1 assumes that 𝑝(𝑉) is probability – (1) nonzero, (2) sum is 1
• However, our “weight distribution” of infinite-width NN is not a probability!
• Our infinite-width NN
• The weight distribution of (𝑤, 𝑏) is sin ⋯ – (1) can be negative, (2) sum is not 1
• Q. How to extend Maurey sampling for general weight distribution?
(3.3) How to sample finite networks?
6
7. • Sampling finite-width network
• For simplicity, let the infinite-width NN be
where 𝜇 is a signed measure over weight vectors 𝑤 ∈ ℝ%
• 𝑔 is some abstract node (e.g., 𝑔 𝑥; 𝑤 = 𝜎(𝑎⊺𝑥 + 𝑏) for 𝑤 = {𝑎, 𝑏})
• To convert the general signed measure 𝜇 to a probability measure,
1) Introduce a sign parameter 𝑠 ∈ {±1} and consider nonnegative measure 𝜇±
• For 𝜇 = 𝜇& − 𝜇', both 𝜇& and 𝜇' are nonnegative (Jordan decomposition)
• Multiply 𝑠 = +1 for 𝜇& and 𝑠 = −1 for 𝜇' regions (Pr s = +1 = 𝜇& / 𝜇 )
2) Normalize nonnegative measure 𝜇/‖𝜇‖ to make sum 1
• Multiply the normalizing constant ‖𝜇‖ to the output 𝑔(𝑥; 𝑤, 𝑠)
• After the conversion, one can extend Maurey sampling for general signed measure
(3.3) How to sample finite networks?
7
‖ Infinite NN – Finite NN ‖
8. • Sampling finite-width network
• Applying Maurey sampling, the approx. error of finite-width NN is
• (3.1) Univariate case
• (3.2) Baron’s construction
(3.3) How to sample finite networks?
8
Approx. error ≤
Approx. error ≤