SlideShare a Scribd company logo
1 of 21
Download to read offline
Hikaru Ibayashi@CSCI699 (Feb., 15th, 2023)
Rethinking classic learning theory
in deep neural networks
Today’s papers
• Understanding deep learning requires rethinking generalization
• An experiment suggests we need to “rethink” classic learning theory
• Uniform convergence may be unable to explain generalization in deep learning
• Proposed a learning task where classic learning theory provably fails
Understanding deep learning requires
rethinking generalization [2]
Notations for Classification Tasks
• : Input
• : Label
• :
Training set
• : Hypothesis class
• : Learning algorithm
x ∈
𝒳
y ∈
𝒴
S = ((x1, y1), …, (xm, ym))
ℋ
𝒜
(S → ℋ)
• : Train loss
• : Test loss
• : Generalization error
ℒS(h) =
1
m
m
∑
i=1
ℓ (h (xi), yi)
ℒ
𝒟
(h) = E(x,y)∼
𝒟
[ℓ(h(x), y)]
Δh = ℒ
𝒟
(h) − ℒS(h)
Can we give a bound?
Complexity measures and generalization error bound
• Complexity measures
• Measures of how complex a hypothesis class is
• The less complex, the more generalizing
• VC dimension: Roughly corresponds to the # of parameters
• Bound: with probability (Theorem 6.11 in [1])
• Rademacher complexity: How wrong a hypothesis can be
•
• Bound: (Theorem 26.3 in [1])
Δh ≤
1
δ
2VC(ℋ)
m
1 − δ
RC(ℓ ∘ ℋ, S) =
1
m
E
σ∼{±1}m
[
sup
h∈ℋ
m
∑
i=1
σi ℓ (h (xi), yi)]
Δh ≤ 2 E
S∼Dm
[RC(ℓ ∘ ℋ, S)]
When it comes to DNN,
we need to “rethink” those
Randomization test
horse bird truck
CIFAR-10
horse bird
truck
CIFAR-10 with random labels
😎
• Zero training error
• Strong generalization
🤪
• Zero training error
• No generalization
• VC dimension
• Can’t distinguish 😎 and 🤪 because they both have
• Rademacher complexity
• Recall that Rademacher complexity means how wrong can be
• Randomization test showed can be super wrong, such as 🤪
•
•
• Because , this bound is vacuous
VC(ℋ)
h
h
RC(ℓ ∘ ℋ, S) =
1
m
E
σ∼{±1}m
[
sup
h∈ℋ
m
∑
i=1
σi ℓ (h (xi), yi)]
= 1
Δh ≤ 2 E
S∼Dm
[RC(ℓ ∘ ℋ, S)] = 2 ⋅ 1
−1 ≤ Δh ≤ 1
horse bird
truck
🤪
Failure of classic complexity measures
Failure of classic complexity measures
• Generalization error bound via VC dimension
• AlexNet trained with the CIFAR-10 dataset
• ~ # of parameters = 62,000,000
•
• let
•
VC(ℋ)
m = 50,000
δ = 0.01
Δh ≤
1
0.01
2 × 62,000,000
50,000
Role of regularization
• It’s common to use regularization for generalization
• Data Augmentation, Weight decay, Dropout
• But they are neither necessary nor su
ffi
cient
• Also, there are tricks (not for generalization)
that have implicit regularization e
ff
ects
• early stopping, batch normalization
• There could be some unidenti
fi
ed implicit
regularization e
ff
ect of
→ Invoked many research attempts trying to
identify that
𝒜
Strong generalization
without weight decan and dropout
: Hypothesis class returned by
ℋ
𝒜
𝒜
Uniform convergence may be
unable to explain generalization in deep learning [3]
Key claim
• Classic generalization error bounds are all “uniform convergence bounds”
• The
fi
rst paper, “The entire is too big. Think about an algorithm-dependent ,
otherwise, you will only get vacuous bounds”
• This paper, “Even if it’s the tightest possible algorithm-dependent one, uniform
convergence bound is vacuous”
• There is a learning task where can
fi
nd generalizing ,
but the uniform convergence bound is vacuous.
ℋ ℋ
𝒜
𝒜
h
Tightest algorithm-dependent uniform convergence bound
S = ((x1, y1), (x2, y2), …(xm, ym))
Training dataset
(deterministic)
𝒜
S′

= ((x′

1, y′

1), (x′

2, y′

2), …(x′

m, y′

m))
S′

′

= ((x′

′

1, y′

′

1), (x′

′

2, y′

′

2), …(x′

′

m, y′

′

m))
: Entire hypothesis class
ℋ
Tightest algorithm-dependent uniform convergence bound
: Entire hypothesis class
ℋ
Support of training dataset
Sδ
ℋδ
𝒜
Tightest algorithm-dependent uniform convergence bound
: Entire hypothesis class
ℋ
Support of training dataset
Sδ
ℋδ
𝒜
Largest possible generalization error in ℋδ
• :
• is small constant, is large
•
• Given a
fi
xed vector s.t.
, ,
• : ,
• : Gradient descent
x ∈
𝒳
x = (x1, x2) (x1 ∈ ℝK
, x2 ∈ ℝD
)
K D
y ∈
𝒴
= {−1, + 1}
u ∥u∥2 =
1
m
x1 = 2 ⋅ y ⋅ u x2 ∼
𝒩
(
0,
32
D
I
)
h ∈ ℋ w = (w1, w2) ∈ ℝK+D
(hw(x) = w1x1 + w2x2)
𝒜
A setup where UC bound provably fails
• ℒ(γ)
(y′

, y) =
1 yy′

≤ 0
1 −
yy′

γ
yy′

∈ (0,γ)
0 yy′

≥ γ
x1 x2
Low-dimensional signal
High dimensional noise
Main theorem
1. Gradient descent can
fi
nd a generalizing
2. But the “tightest algorithm-dependent uniform convergence bound” is vacuous
h
Proof (Intuition)
• always comes with some
• But for any , you can
fi
nd the following
1.
2. , where
3. has generalization error less than
4. completely misclassi
fi
es
ϵunif-alg Sδ
Sδ S*
S* ∈ Sδ
S′

* ∈ Sδ S′

* = {((x1, − x2), y) ∣ ((x1, x2), y) ∈ S*}
hS*
ϵ
hS*
S′

*
Support of training dataset
Sδ
S* S′

*
• : Two-layer ReLU networks (100k hidden units)
• : stochastic gradient descent
• : hypersphere surface with radius 1 and 1.1
(of 1000-dimensional)
• :
• Generate by
fl
ipping the radius
• Notice and
ℋ
𝒜
x ∈
𝒳
y ∈
𝒴
{−1, + 1}
S′

S ∼
𝒟
S′

∼
𝒟
An experiment where UC bound fails when
x y = + 1
when
x y = − 1
Original S S′

generalizes well but performs poorly on
hS S′

Because of this extra margin,
misclassi
fi
es
hS S′

Deep learning conjecture
• Over-parameterized deep networks mainly behave like a very simple model
(such as a linear model) and roughly
fi
t the training data
• Plenty of parameters are unused but some of them learn “unnecessary
knowledge” from training data
• Such “unnecessary knowledge” does not a
ff
ect generalization performance
• Example: Even if I have knowledge that “the earth is
fl
at,” I can have normal
conversations in 99% of my daily life and few people think I’m strange.
• However, we can always
fi
nd a dataset where such unnecessary knowledge
seriously a
ff
ect the performance, which establishes a loose uniform
convergence bound.
My understanding: Should we focus on this?
References
1. Shalev-Shwartz, Shai, and Shai Ben-David. Understanding machine learning: From
theory to algorithms. Cambridge university press, (2014)
2. Zhang, Chiyuan, et al. "Understanding deep learning requires rethinking
generalization." arXiv preprint arXiv:1611.03530 (2016).
3. Nagarajan, Vaishnavh, and J. Zico Kolter. "Uniform convergence may be unable to
explain generalization in deep learning." Advances in Neural Information
Processing Systems 32 (2019).

More Related Content

Similar to Rethinking of Generalization

Unit-II -Soft Computing.pdf
Unit-II -Soft Computing.pdfUnit-II -Soft Computing.pdf
Unit-II -Soft Computing.pdfRamya Nellutla
 
53158699-d7c5-4e6e-af19-1f642992cc58-161011142651.pptx
53158699-d7c5-4e6e-af19-1f642992cc58-161011142651.pptx53158699-d7c5-4e6e-af19-1f642992cc58-161011142651.pptx
53158699-d7c5-4e6e-af19-1f642992cc58-161011142651.pptxsaadurrehman35
 
Fuzzy Logic.pptx
Fuzzy Logic.pptxFuzzy Logic.pptx
Fuzzy Logic.pptxImXaib
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelineChenYiHuang5
 
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNINGARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNINGmohanapriyastp
 
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdfMcSwathi
 
Variational inference
Variational inference  Variational inference
Variational inference Natan Katz
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Universitat Politècnica de Catalunya
 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesNatan Katz
 
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 2
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 2ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 2
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 2tingyuansenastro
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Fuzzy Set Theory
Fuzzy Set TheoryFuzzy Set Theory
Fuzzy Set TheoryAMIT KUMAR
 
Supervised Prediction of Graph Summaries
Supervised Prediction of Graph SummariesSupervised Prediction of Graph Summaries
Supervised Prediction of Graph SummariesDaniil Mirylenka
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Pooyan Jamshidi
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validationgmorishita
 
Fuzzy logic and fuzzy time series edited
Fuzzy logic and fuzzy time series   editedFuzzy logic and fuzzy time series   edited
Fuzzy logic and fuzzy time series editedProf Dr S.M.Aqil Burney
 

Similar to Rethinking of Generalization (20)

Unit-II -Soft Computing.pdf
Unit-II -Soft Computing.pdfUnit-II -Soft Computing.pdf
Unit-II -Soft Computing.pdf
 
53158699-d7c5-4e6e-af19-1f642992cc58-161011142651.pptx
53158699-d7c5-4e6e-af19-1f642992cc58-161011142651.pptx53158699-d7c5-4e6e-af19-1f642992cc58-161011142651.pptx
53158699-d7c5-4e6e-af19-1f642992cc58-161011142651.pptx
 
Fuzzy Logic.pptx
Fuzzy Logic.pptxFuzzy Logic.pptx
Fuzzy Logic.pptx
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
support vector machine
support vector machinesupport vector machine
support vector machine
 
MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...
MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...
MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...
 
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNINGARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
 
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf
 
Lec10.pptx
Lec10.pptxLec10.pptx
Lec10.pptx
 
Variational inference
Variational inference  Variational inference
Variational inference
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
 
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 2
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 2ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 2
ANU ASTR 4004 / 8004 Astronomical Computing : Lecture 2
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Fuzzy Set Theory
Fuzzy Set TheoryFuzzy Set Theory
Fuzzy Set Theory
 
Supervised Prediction of Graph Summaries
Supervised Prediction of Graph SummariesSupervised Prediction of Graph Summaries
Supervised Prediction of Graph Summaries
 
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
Ensembles of Many Diverse Weak Defenses can be Strong: Defending Deep Neural ...
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
 
Fuzzy logic and fuzzy time series edited
Fuzzy logic and fuzzy time series   editedFuzzy logic and fuzzy time series   edited
Fuzzy logic and fuzzy time series edited
 
ann-ics320Part4.ppt
ann-ics320Part4.pptann-ics320Part4.ppt
ann-ics320Part4.ppt
 

Recently uploaded

办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 

Recently uploaded (20)

办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 

Rethinking of Generalization

  • 1. Hikaru Ibayashi@CSCI699 (Feb., 15th, 2023) Rethinking classic learning theory in deep neural networks
  • 2. Today’s papers • Understanding deep learning requires rethinking generalization • An experiment suggests we need to “rethink” classic learning theory • Uniform convergence may be unable to explain generalization in deep learning • Proposed a learning task where classic learning theory provably fails
  • 3. Understanding deep learning requires rethinking generalization [2]
  • 4. Notations for Classification Tasks • : Input • : Label • : Training set • : Hypothesis class • : Learning algorithm x ∈ 𝒳 y ∈ 𝒴 S = ((x1, y1), …, (xm, ym)) ℋ 𝒜 (S → ℋ) • : Train loss • : Test loss • : Generalization error ℒS(h) = 1 m m ∑ i=1 ℓ (h (xi), yi) ℒ 𝒟 (h) = E(x,y)∼ 𝒟 [ℓ(h(x), y)] Δh = ℒ 𝒟 (h) − ℒS(h) Can we give a bound?
  • 5. Complexity measures and generalization error bound • Complexity measures • Measures of how complex a hypothesis class is • The less complex, the more generalizing • VC dimension: Roughly corresponds to the # of parameters • Bound: with probability (Theorem 6.11 in [1]) • Rademacher complexity: How wrong a hypothesis can be • • Bound: (Theorem 26.3 in [1]) Δh ≤ 1 δ 2VC(ℋ) m 1 − δ RC(ℓ ∘ ℋ, S) = 1 m E σ∼{±1}m [ sup h∈ℋ m ∑ i=1 σi ℓ (h (xi), yi)] Δh ≤ 2 E S∼Dm [RC(ℓ ∘ ℋ, S)] When it comes to DNN, we need to “rethink” those
  • 6. Randomization test horse bird truck CIFAR-10 horse bird truck CIFAR-10 with random labels 😎 • Zero training error • Strong generalization 🤪 • Zero training error • No generalization
  • 7. • VC dimension • Can’t distinguish 😎 and 🤪 because they both have • Rademacher complexity • Recall that Rademacher complexity means how wrong can be • Randomization test showed can be super wrong, such as 🤪 • • • Because , this bound is vacuous VC(ℋ) h h RC(ℓ ∘ ℋ, S) = 1 m E σ∼{±1}m [ sup h∈ℋ m ∑ i=1 σi ℓ (h (xi), yi)] = 1 Δh ≤ 2 E S∼Dm [RC(ℓ ∘ ℋ, S)] = 2 ⋅ 1 −1 ≤ Δh ≤ 1 horse bird truck 🤪 Failure of classic complexity measures
  • 8. Failure of classic complexity measures • Generalization error bound via VC dimension • AlexNet trained with the CIFAR-10 dataset • ~ # of parameters = 62,000,000 • • let • VC(ℋ) m = 50,000 δ = 0.01 Δh ≤ 1 0.01 2 × 62,000,000 50,000
  • 9. Role of regularization • It’s common to use regularization for generalization • Data Augmentation, Weight decay, Dropout • But they are neither necessary nor su ffi cient • Also, there are tricks (not for generalization) that have implicit regularization e ff ects • early stopping, batch normalization • There could be some unidenti fi ed implicit regularization e ff ect of → Invoked many research attempts trying to identify that 𝒜 Strong generalization without weight decan and dropout : Hypothesis class returned by ℋ 𝒜 𝒜
  • 10. Uniform convergence may be unable to explain generalization in deep learning [3]
  • 11. Key claim • Classic generalization error bounds are all “uniform convergence bounds” • The fi rst paper, “The entire is too big. Think about an algorithm-dependent , otherwise, you will only get vacuous bounds” • This paper, “Even if it’s the tightest possible algorithm-dependent one, uniform convergence bound is vacuous” • There is a learning task where can fi nd generalizing , but the uniform convergence bound is vacuous. ℋ ℋ 𝒜 𝒜 h
  • 12. Tightest algorithm-dependent uniform convergence bound S = ((x1, y1), (x2, y2), …(xm, ym)) Training dataset (deterministic) 𝒜 S′  = ((x′  1, y′  1), (x′  2, y′  2), …(x′  m, y′  m)) S′  ′  = ((x′  ′  1, y′  ′  1), (x′  ′  2, y′  ′  2), …(x′  ′  m, y′  ′  m)) : Entire hypothesis class ℋ
  • 13. Tightest algorithm-dependent uniform convergence bound : Entire hypothesis class ℋ Support of training dataset Sδ ℋδ 𝒜
  • 14. Tightest algorithm-dependent uniform convergence bound : Entire hypothesis class ℋ Support of training dataset Sδ ℋδ 𝒜 Largest possible generalization error in ℋδ
  • 15. • : • is small constant, is large • • Given a fi xed vector s.t. , , • : , • : Gradient descent x ∈ 𝒳 x = (x1, x2) (x1 ∈ ℝK , x2 ∈ ℝD ) K D y ∈ 𝒴 = {−1, + 1} u ∥u∥2 = 1 m x1 = 2 ⋅ y ⋅ u x2 ∼ 𝒩 ( 0, 32 D I ) h ∈ ℋ w = (w1, w2) ∈ ℝK+D (hw(x) = w1x1 + w2x2) 𝒜 A setup where UC bound provably fails • ℒ(γ) (y′  , y) = 1 yy′  ≤ 0 1 − yy′  γ yy′  ∈ (0,γ) 0 yy′  ≥ γ x1 x2 Low-dimensional signal High dimensional noise
  • 16. Main theorem 1. Gradient descent can fi nd a generalizing 2. But the “tightest algorithm-dependent uniform convergence bound” is vacuous h
  • 17. Proof (Intuition) • always comes with some • But for any , you can fi nd the following 1. 2. , where 3. has generalization error less than 4. completely misclassi fi es ϵunif-alg Sδ Sδ S* S* ∈ Sδ S′  * ∈ Sδ S′  * = {((x1, − x2), y) ∣ ((x1, x2), y) ∈ S*} hS* ϵ hS* S′  * Support of training dataset Sδ S* S′  *
  • 18. • : Two-layer ReLU networks (100k hidden units) • : stochastic gradient descent • : hypersphere surface with radius 1 and 1.1 (of 1000-dimensional) • : • Generate by fl ipping the radius • Notice and ℋ 𝒜 x ∈ 𝒳 y ∈ 𝒴 {−1, + 1} S′  S ∼ 𝒟 S′  ∼ 𝒟 An experiment where UC bound fails when x y = + 1 when x y = − 1 Original S S′ 
  • 19. generalizes well but performs poorly on hS S′  Because of this extra margin, misclassi fi es hS S′ 
  • 20. Deep learning conjecture • Over-parameterized deep networks mainly behave like a very simple model (such as a linear model) and roughly fi t the training data • Plenty of parameters are unused but some of them learn “unnecessary knowledge” from training data • Such “unnecessary knowledge” does not a ff ect generalization performance • Example: Even if I have knowledge that “the earth is fl at,” I can have normal conversations in 99% of my daily life and few people think I’m strange. • However, we can always fi nd a dataset where such unnecessary knowledge seriously a ff ect the performance, which establishes a loose uniform convergence bound. My understanding: Should we focus on this?
  • 21. References 1. Shalev-Shwartz, Shai, and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge university press, (2014) 2. Zhang, Chiyuan, et al. "Understanding deep learning requires rethinking generalization." arXiv preprint arXiv:1611.03530 (2016). 3. Nagarajan, Vaishnavh, and J. Zico Kolter. "Uniform convergence may be unable to explain generalization in deep learning." Advances in Neural Information Processing Systems 32 (2019).