SlideShare a Scribd company logo
Deep Double Descent
kevin
Modern Learning Theory
● Bigger models tend to overfit
Modern Learning Theory
● Bigger models tend to overfit
○ Bias-Variance trade-off
○ Weight Regularization
○ Augmentation
○ Dropout
○ BatchNorm
○ Early stop
○ Data-dependent regularization (mixup, etc.)
○ ...
Modern Learning Theory
● Bigger models tend to overfit
● Bigger models are always better
Reconciling modern machine learning practice and the bias-variance trade-off
Modern Learning Theory
● Bigger models tend to overfit
● Bigger models are always better
● Bigger models not good in some regime
https://mltheory.org/deep.pdf
Modern Learning Theory
● Bigger models tend to overfit
● Bigger models are always better
● Bigger models not good in some regime
● Even more data hurt!
https://mltheory.org/deep.pdf
TL;DR
- Model-wise double descent
- There is a regime where bigger models are worse
- Sample-wise non-monotonicity
- There is a regime where more samples hurts
- Epoch-wise double descent
- There is a regime where training longer reverses overfitting
Generalization in Deep Learning Era
- Network can fit `anything` even random noise
- Larger capacity than people imagine before
UNDERSTANDING DEEP LEARNING REQUIRES RETHINKING GENERALIZATION
Generalization in Deep Learning Era
- Over-parameterized network performs
IN SEARCH OF THE REAL INDUCTIVE BIAS : ON THE ROLE OF IMPLICIT REGULARIZATION IN DEEP LEARNING
Generalization in Deep Learning Era
- Deep network regulairze itself (has better loss landscape)
Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes
Generalization in Deep Learning Era
SENSITIVITY AND GENERALIZATION IN NEURAL NETWORKS: AN EMPIRICAL STUDY
Model-wise double descent
Architecture
- ResNet18, CNN, Transformers
Noise Label
● `Hard` Distribution
● label noise is sampled only
once and not per epoch
Model-wise double descent
Noise Label
● `Hard` Distribution
● label noise is sampled only
once and not per epoch
Model-wise double descent
- Model-wise double descent
across different architectures,
datasets, optimizers, and
training procedures
- Also in adversarial training
Model-wise double descent
Model-wise & Epoch-wise double descent
Epoch-wise double descent
Sufficiently large models can undergo a “double descent” behavior where test error first decreases then
increases near the interpolation threshold, and then decreases again.
Increasing the train time increases the EMC—and thus a sufficiently large model transitions from under-
to over-parameterized over the course of training.
Epoch-wise double descent
Conventional training is split into two phases:
1. In the first phase, the network learns a function with a small generalization gap
2. In the second phase, the network starts to over-fit the data leading to an increase in test error
Not the complete picture
- Some regimes, the test error decreases again and may achieve a lower value at the end of training
as compared to the first minimum
Reminds
- Information bottleneck
- Lottery ticket hypothesis
Epoch-wise double descent
Epoch-wise double descent
Cifar10 Cifar100
Sample-wise non-monotonicity
More data doesn’t improve
For both models, more data hurt performance
Sample-wise non-monotonicity
Transformers
- language-translation task with no
added label noise.
Two effects combined
- More samples
- Larger models
4.5x more samples hurts performance
for intermediate model
Sample-wise non-monotonicity
Conclusion
Take home message :
Model behaves unexpectedly in transition regime
- Training longer reverses overfitting
- Double the training epoch is a technique in some task
(eg. object detection)
- Bigger models are worse
- Fitting training set is an indicator
- Also called Effective Model Complexity (EMC)
- More data hurts
- sticky :(
- Generalization is still the Holy Grail in deep learning
- remains the open question (both exp. & theory)
- Connect data complexity with model complexity is still difficult
- NAS in some sence systematically solve this problem
Know your data & model
- noise level (problem difficulty)
- model capacity (fitting power)

More Related Content

What's hot

Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023
HyunJoon Jung
 
transferlearning.pptx
transferlearning.pptxtransferlearning.pptx
transferlearning.pptx
Amit Kumar
 
Supervised Machine Learning Techniques common algorithms and its application
Supervised Machine Learning Techniques common algorithms and its applicationSupervised Machine Learning Techniques common algorithms and its application
Supervised Machine Learning Techniques common algorithms and its application
Tara ram Goyal
 
Introduction to XGboost
Introduction to XGboostIntroduction to XGboost
Introduction to XGboost
Shuai Zhang
 
NLP for Biomedical Applications
NLP for Biomedical ApplicationsNLP for Biomedical Applications
NLP for Biomedical Applications
NVIDIA
 
Machine learning
Machine learningMachine learning
Machine learning
ADARSHMISHRA126
 
Machine learning
Machine learningMachine learning
Machine learning
Sanjay krishne
 
비디오 특성 분석 및 딥러닝을 이용한 실시간 인코딩 효율 최적화
비디오 특성 분석 및 딥러닝을 이용한 실시간 인코딩 효율 최적화비디오 특성 분석 및 딥러닝을 이용한 실시간 인코딩 효율 최적화
비디오 특성 분석 및 딥러닝을 이용한 실시간 인코딩 효율 최적화
if kakao
 
Computer Vision with Deep Learning
Computer Vision with Deep LearningComputer Vision with Deep Learning
Computer Vision with Deep Learning
Capgemini
 
IRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
IRJET- Breast Cancer Prediction using Supervised Machine Learning AlgorithmsIRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
IRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
IRJET Journal
 
Intro/Overview on Machine Learning Presentation -2
Intro/Overview on Machine Learning Presentation -2Intro/Overview on Machine Learning Presentation -2
Intro/Overview on Machine Learning Presentation -2
Ankit Gupta
 
Generative models
Generative modelsGenerative models
Generative models
Birger Moell
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)
Ha Phuong
 
How Azure helps to build better business processes and customer experiences w...
How Azure helps to build better business processes and customer experiences w...How Azure helps to build better business processes and customer experiences w...
How Azure helps to build better business processes and customer experiences w...
Maxim Salnikov
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Rahul Kumar
 
Write a program to show the example of hybrid inheritance.
Write a program to show the example of hybrid inheritance.Write a program to show the example of hybrid inheritance.
Write a program to show the example of hybrid inheritance.
Mumbai B.Sc.IT Study
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
Machine Learning Valencia
 

What's hot (17)

Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023
 
transferlearning.pptx
transferlearning.pptxtransferlearning.pptx
transferlearning.pptx
 
Supervised Machine Learning Techniques common algorithms and its application
Supervised Machine Learning Techniques common algorithms and its applicationSupervised Machine Learning Techniques common algorithms and its application
Supervised Machine Learning Techniques common algorithms and its application
 
Introduction to XGboost
Introduction to XGboostIntroduction to XGboost
Introduction to XGboost
 
NLP for Biomedical Applications
NLP for Biomedical ApplicationsNLP for Biomedical Applications
NLP for Biomedical Applications
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
비디오 특성 분석 및 딥러닝을 이용한 실시간 인코딩 효율 최적화
비디오 특성 분석 및 딥러닝을 이용한 실시간 인코딩 효율 최적화비디오 특성 분석 및 딥러닝을 이용한 실시간 인코딩 효율 최적화
비디오 특성 분석 및 딥러닝을 이용한 실시간 인코딩 효율 최적화
 
Computer Vision with Deep Learning
Computer Vision with Deep LearningComputer Vision with Deep Learning
Computer Vision with Deep Learning
 
IRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
IRJET- Breast Cancer Prediction using Supervised Machine Learning AlgorithmsIRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
IRJET- Breast Cancer Prediction using Supervised Machine Learning Algorithms
 
Intro/Overview on Machine Learning Presentation -2
Intro/Overview on Machine Learning Presentation -2Intro/Overview on Machine Learning Presentation -2
Intro/Overview on Machine Learning Presentation -2
 
Generative models
Generative modelsGenerative models
Generative models
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)
 
How Azure helps to build better business processes and customer experiences w...
How Azure helps to build better business processes and customer experiences w...How Azure helps to build better business processes and customer experiences w...
How Azure helps to build better business processes and customer experiences w...
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Write a program to show the example of hybrid inheritance.
Write a program to show the example of hybrid inheritance.Write a program to show the example of hybrid inheritance.
Write a program to show the example of hybrid inheritance.
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
 

Similar to Deep Double Descent

Day 4
Day 4Day 4
Day 4
HuyPhmNht2
 
The deep bootstrap 논문 리뷰
The deep bootstrap 논문 리뷰The deep bootstrap 논문 리뷰
The deep bootstrap 논문 리뷰
Seonghoon Jung
 
The deep bootstrap framework review
The deep bootstrap framework reviewThe deep bootstrap framework review
The deep bootstrap framework review
taeseon ryu
 
BSSML17 - Ensembles
BSSML17 - EnsemblesBSSML17 - Ensembles
BSSML17 - Ensembles
BigML, Inc
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
SOUMIT KAR
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
Anuj Gupta
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
NAGARAJANS68
 
Tricking a DNN with adversarial examples
Tricking a DNN with adversarial examplesTricking a DNN with adversarial examples
Tricking a DNN with adversarial examples
Ojasava Paras
 
Dnn guidelines
Dnn guidelinesDnn guidelines
Dnn guidelines
Naitik Shukla
 
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
Daniel Katz
 
Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18
Cloudera, Inc.
 
VSSML17 L2. Ensembles and Logistic Regressions
VSSML17 L2. Ensembles and Logistic RegressionsVSSML17 L2. Ensembles and Logistic Regressions
VSSML17 L2. Ensembles and Logistic Regressions
BigML, Inc
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
Jon Lederman
 
Prior On Model Space
Prior On Model SpacePrior On Model Space
Prior On Model Space
Meir Maor
 
Transfer learning-presentation
Transfer learning-presentationTransfer learning-presentation
Transfer learning-presentation
Bushra Jbawi
 
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptx
Knoldus Inc.
 
Super tickets in pre trained language models
Super tickets in pre trained language modelsSuper tickets in pre trained language models
Super tickets in pre trained language models
HyunKyu Jeon
 
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
Sunghoon Joo
 
Semi supervised learning machine learning made simple
Semi supervised learning  machine learning made simpleSemi supervised learning  machine learning made simple
Semi supervised learning machine learning made simple
Devansh16
 
Ml2 production
Ml2 productionMl2 production
Ml2 production
Nikhil Ketkar
 

Similar to Deep Double Descent (20)

Day 4
Day 4Day 4
Day 4
 
The deep bootstrap 논문 리뷰
The deep bootstrap 논문 리뷰The deep bootstrap 논문 리뷰
The deep bootstrap 논문 리뷰
 
The deep bootstrap framework review
The deep bootstrap framework reviewThe deep bootstrap framework review
The deep bootstrap framework review
 
BSSML17 - Ensembles
BSSML17 - EnsemblesBSSML17 - Ensembles
BSSML17 - Ensembles
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
 
Tricking a DNN with adversarial examples
Tricking a DNN with adversarial examplesTricking a DNN with adversarial examples
Tricking a DNN with adversarial examples
 
Dnn guidelines
Dnn guidelinesDnn guidelines
Dnn guidelines
 
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
 
Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18Multi task learning stepping away from narrow expert models 7.11.18
Multi task learning stepping away from narrow expert models 7.11.18
 
VSSML17 L2. Ensembles and Logistic Regressions
VSSML17 L2. Ensembles and Logistic RegressionsVSSML17 L2. Ensembles and Logistic Regressions
VSSML17 L2. Ensembles and Logistic Regressions
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Prior On Model Space
Prior On Model SpacePrior On Model Space
Prior On Model Space
 
Transfer learning-presentation
Transfer learning-presentationTransfer learning-presentation
Transfer learning-presentation
 
How to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptxHow to fine-tune and develop your own large language model.pptx
How to fine-tune and develop your own large language model.pptx
 
Super tickets in pre trained language models
Super tickets in pre trained language modelsSuper tickets in pre trained language models
Super tickets in pre trained language models
 
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
 
Semi supervised learning machine learning made simple
Semi supervised learning  machine learning made simpleSemi supervised learning  machine learning made simple
Semi supervised learning machine learning made simple
 
Ml2 production
Ml2 productionMl2 production
Ml2 production
 

More from Kai-Wen Zhao

Learning visual representation without human label
Learning visual representation without human labelLearning visual representation without human label
Learning visual representation without human label
Kai-Wen Zhao
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person Detection
Kai-Wen Zhao
 
Learning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldLearning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifold
Kai-Wen Zhao
 
Toward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBOToward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBO
Kai-Wen Zhao
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
Kai-Wen Zhao
 
Paper Review: An exact mapping between the Variational Renormalization Group ...
Paper Review: An exact mapping between the Variational Renormalization Group ...Paper Review: An exact mapping between the Variational Renormalization Group ...
Paper Review: An exact mapping between the Variational Renormalization Group ...
Kai-Wen Zhao
 
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
Kai-Wen Zhao
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
Kai-Wen Zhao
 

More from Kai-Wen Zhao (8)

Learning visual representation without human label
Learning visual representation without human labelLearning visual representation without human label
Learning visual representation without human label
 
Recent Object Detection Research & Person Detection
Recent Object Detection Research & Person DetectionRecent Object Detection Research & Person Detection
Recent Object Detection Research & Person Detection
 
Learning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldLearning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifold
 
Toward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBOToward Disentanglement through Understand ELBO
Toward Disentanglement through Understand ELBO
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
 
Paper Review: An exact mapping between the Variational Renormalization Group ...
Paper Review: An exact mapping between the Variational Renormalization Group ...Paper Review: An exact mapping between the Variational Renormalization Group ...
Paper Review: An exact mapping between the Variational Renormalization Group ...
 
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
NIPS paper review 2014: A Differential Equation for Modeling Nesterov’s Accel...
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
 

Recently uploaded

STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 

Recently uploaded (20)

STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 

Deep Double Descent

  • 2. Modern Learning Theory ● Bigger models tend to overfit
  • 3. Modern Learning Theory ● Bigger models tend to overfit ○ Bias-Variance trade-off ○ Weight Regularization ○ Augmentation ○ Dropout ○ BatchNorm ○ Early stop ○ Data-dependent regularization (mixup, etc.) ○ ...
  • 4. Modern Learning Theory ● Bigger models tend to overfit ● Bigger models are always better Reconciling modern machine learning practice and the bias-variance trade-off
  • 5. Modern Learning Theory ● Bigger models tend to overfit ● Bigger models are always better ● Bigger models not good in some regime https://mltheory.org/deep.pdf
  • 6. Modern Learning Theory ● Bigger models tend to overfit ● Bigger models are always better ● Bigger models not good in some regime ● Even more data hurt! https://mltheory.org/deep.pdf
  • 7. TL;DR - Model-wise double descent - There is a regime where bigger models are worse - Sample-wise non-monotonicity - There is a regime where more samples hurts - Epoch-wise double descent - There is a regime where training longer reverses overfitting
  • 8. Generalization in Deep Learning Era - Network can fit `anything` even random noise - Larger capacity than people imagine before UNDERSTANDING DEEP LEARNING REQUIRES RETHINKING GENERALIZATION
  • 9. Generalization in Deep Learning Era - Over-parameterized network performs IN SEARCH OF THE REAL INDUCTIVE BIAS : ON THE ROLE OF IMPLICIT REGULARIZATION IN DEEP LEARNING
  • 10. Generalization in Deep Learning Era - Deep network regulairze itself (has better loss landscape) Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes
  • 11. Generalization in Deep Learning Era SENSITIVITY AND GENERALIZATION IN NEURAL NETWORKS: AN EMPIRICAL STUDY
  • 12. Model-wise double descent Architecture - ResNet18, CNN, Transformers Noise Label ● `Hard` Distribution ● label noise is sampled only once and not per epoch
  • 13. Model-wise double descent Noise Label ● `Hard` Distribution ● label noise is sampled only once and not per epoch
  • 14. Model-wise double descent - Model-wise double descent across different architectures, datasets, optimizers, and training procedures - Also in adversarial training
  • 16. Model-wise & Epoch-wise double descent
  • 17. Epoch-wise double descent Sufficiently large models can undergo a “double descent” behavior where test error first decreases then increases near the interpolation threshold, and then decreases again. Increasing the train time increases the EMC—and thus a sufficiently large model transitions from under- to over-parameterized over the course of training.
  • 18. Epoch-wise double descent Conventional training is split into two phases: 1. In the first phase, the network learns a function with a small generalization gap 2. In the second phase, the network starts to over-fit the data leading to an increase in test error Not the complete picture - Some regimes, the test error decreases again and may achieve a lower value at the end of training as compared to the first minimum Reminds - Information bottleneck - Lottery ticket hypothesis
  • 21. Sample-wise non-monotonicity More data doesn’t improve For both models, more data hurt performance
  • 22. Sample-wise non-monotonicity Transformers - language-translation task with no added label noise. Two effects combined - More samples - Larger models 4.5x more samples hurts performance for intermediate model
  • 24. Conclusion Take home message : Model behaves unexpectedly in transition regime - Training longer reverses overfitting - Double the training epoch is a technique in some task (eg. object detection) - Bigger models are worse - Fitting training set is an indicator - Also called Effective Model Complexity (EMC) - More data hurts - sticky :( - Generalization is still the Holy Grail in deep learning - remains the open question (both exp. & theory) - Connect data complexity with model complexity is still difficult - NAS in some sence systematically solve this problem Know your data & model - noise level (problem difficulty) - model capacity (fitting power)