SlideShare a Scribd company logo
1 of 60
Download to read offline
High-dimensional dynamics
of generalization error
in neural networks
Hikaru Ibayashi
Background
Deep Neural Network’s Mystery
Background
Deep Neural Network’s Mystery
● Huge amount of parameters
○ Ex.) VGG net has 155 million parameters
● Train data size can be small
○ Ex.) ImageNet has 1.2 million samples
● Still shows high generalization performance
Background
Deep Neural Network’s Mystery
● Huge amount of parameters
○ Ex.) VGG net has 155 million parameters
● Train data size can be small
○ Ex.) ImageNet has 1.2 million samples
● Still shows high generalization performance
Background
Deep Neural Network’s Mystery
● Huge amount of parameters
○ Ex.) VGG net has 155 million parameters
● Train data size can be small
○ Ex.) ImageNet has 1.2 million samples
● Still shows high generalization performance
Background
Chiyuan Zhang et al.
Empirically Raised this Mystery in 2017
Background
Labels
Background
LabelsCan Fit Perfectly
Background
Labels
Shuffled Labels
Can Fit Perfectly
Background
Labels
Shuffled Labels
Can Fit Perfectly
Can Fit Perfectly
Background
Labels
Shuffled Labels
Shuffled Labels
Can Fit Perfectly
Can Fit Perfectly
Background
Labels
Shuffled Labels
Shuffled Labels
Can Fit Perfectly
Can Fit Perfectly
Can Fit Perfectly
Background
Labels
Shuffled Labels
Shuffled Labels
Can Fit Perfectly
Can Fit Perfectly
Can Fit Perfectly
& Generalize Well
Today’s Paper
Today’s Paper
Ex.)
Learning Dynamics
Teacher-Student Learning Dynamics
Teacher-Student Scenario
Teacher-Student Learning Dynamics
Teacher Net
Teacher-Student Learning Dynamics
Train DataTrain Data
Teacher Net
Hypothesis
Teacher-Student Learning Dynamics
Train DataTrain Data
Student Net
Teacher Net
Teacher Net
Hypothesis
follows
Teacher-Student Learning Dynamics
Train DataTrain Data
Student Net
Hypothesis
follows
with initial value
Teacher-Student Learning Dynamics
Train DataTrain Data
Student Net
Teacher Net
After simple rotation
Teacher-Student Learning Dynamics
After simple rotation
Teacher-Student Learning Dynamics
After simple rotation
Teacher-Student Learning Dynamics
After simple rotation
Test error
Teacher-Student Learning Dynamics
After simple rotation
Test error
Noise
Teacher-Student Learning Dynamics
After simple rotation
Test error
Learning
Noise
Teacher-Student Learning Dynamics
After simple rotation
Test error
Learning Overfitting
Noise
Teacher-Student Learning Dynamics
How to take expectation over ?
Teacher-Student Learning Dynamics
How to take expectation over ?
Teacher-Student Learning Dynamics
Eigen values of
where
How to take expectation over ?
Teacher-Student Learning Dynamics
Marchenko–Pastur distribution
from random matrix theory
Eigen values of
where
Teacher-Student Learning Dynamics
Final result
Teacher-Student Learning Dynamics (Result)
Eigenvalue Spectrum
Learning Dynamics
Teacher-Student Learning Dynamics (Result)
Eigenvalue Spectrum
Learning Dynamics
Teacher-Student Learning Dynamics (Result)
Eigenvalue Spectrum
Learning Dynamics
Teacher-Student Learning Dynamics (Result)
Eigenvalue Spectrum
Learning Dynamics
Teacher-Student Learning Dynamics (Result)
Eigenvalue Spectrum
Learning Dynamics
Teacher-Student Learning Dynamics (Result)
Teacher-Student Learning Dynamics (Result)
Optimal stopping time
Teacher-Student Learning Dynamics (Result)
Optimal stopping time
Signal noise ratio
Teacher-Student Learning Dynamics (Result)
Optimal stopping time
Signal noise ratio
Low quality data requires longer training time
Is Single-Layer-Linear model Reasonable?
Is Single-Layer-Linear model Reasonable?
Linear
Single-Layer
Teacher-Student
Is Single-Layer-Linear model Reasonable?
Linear
Single-Layer
Teacher-Student
Nonlinear
Two-Layers
Teacher-Student
Is Single-Layer-Linear model Reasonable?
Linear
Single-Layer
Teacher-Student
Nonlinear
Two-Layers
Teacher-Student
Nonlinear
Two-Layers
Real Data
Is Single-Layer-Linear model Reasonable?
One Difference
Is Single-Layer-Linear model Reasonable?
One Difference
Non-Linear Teacher Student (Result)
Learning Dynamics
Non-Linear Teacher Student (Result)
Learning Dynamics
Non-Linear Teacher Student (Result)
Test Error vs Hidden units
(Result of Linear case)
Non-Linear MNIST (Result)
Test Error vs Hidden units
Non-Linear MNIST (Result)
Overfitting Error vs Hidden units
Discussion
What does this paper imply?
What does this paper imply?
With early stopping,
Larger model →Better generalization
Discussion
What does this paper imply?
With early stopping,
Larger model →Better generalization
Deep Learning’s Mystery!!
Discussion
What does this paper imply?
With early stopping,
Larger model →Better generalization
How can this paper explain?
Deep Learning’s Mystery!!
Discussion
What does this paper imply?
With early stopping,
Larger model →Better generalization
How can this paper explain?
Larger model → Bigger eigenvalues
(Marchenko-Pasteur distribution)
Deep Learning’s Mystery!!
Discussion
What does this paper imply?
With early stopping,
Larger model →Better generalization
How can this paper explain?
Larger model → Bigger eigenvalues
(Marchenko-Pasteur distribution)
Deep Learning’s Mystery!!
Discussion
What does this paper imply?
With early stopping,
Larger model →Better generalization
How can this paper explain?
Larger model → Bigger eigenvalues
(Marchenko-Pasteur distribution)
Deep Learning’s Mystery!!
Discussion
Formal Result
Rademacher Complexity

More Related Content

Similar to High-dimensional dynamics of generalization error in neural networks (Explained)

Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
MLconf
 
Semi-Supervised Insight Generation from Petabyte Scale Text Data
Semi-Supervised Insight Generation from Petabyte Scale Text DataSemi-Supervised Insight Generation from Petabyte Scale Text Data
Semi-Supervised Insight Generation from Petabyte Scale Text Data
Tech Triveni
 

Similar to High-dimensional dynamics of generalization error in neural networks (Explained) (20)

Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
 
Accelerating stochastic gradient descent using adaptive mini batch size3
Accelerating stochastic gradient descent using adaptive mini batch size3Accelerating stochastic gradient descent using adaptive mini batch size3
Accelerating stochastic gradient descent using adaptive mini batch size3
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
Deep learning
Deep learningDeep learning
Deep learning
 
Day 2 (Lecture 1): Introduction to Statistical Machine Learning and Applications
Day 2 (Lecture 1): Introduction to Statistical Machine Learning and ApplicationsDay 2 (Lecture 1): Introduction to Statistical Machine Learning and Applications
Day 2 (Lecture 1): Introduction to Statistical Machine Learning and Applications
 
presentation.ppt
presentation.pptpresentation.ppt
presentation.ppt
 
Trinity of AI: data, algorithms and cloud
Trinity of AI: data, algorithms and cloudTrinity of AI: data, algorithms and cloud
Trinity of AI: data, algorithms and cloud
 
Semi-Supervised Insight Generation from Petabyte Scale Text Data
Semi-Supervised Insight Generation from Petabyte Scale Text DataSemi-Supervised Insight Generation from Petabyte Scale Text Data
Semi-Supervised Insight Generation from Petabyte Scale Text Data
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdf
 
NLP Classifier Models & Metrics
NLP Classifier Models & MetricsNLP Classifier Models & Metrics
NLP Classifier Models & Metrics
 
Transformers.pdf
Transformers.pdfTransformers.pdf
Transformers.pdf
 
[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization
 
Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptx
 
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 
Demystifying Machine Learning
Demystifying Machine LearningDemystifying Machine Learning
Demystifying Machine Learning
 
GDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game DevelopmentGDC2019 - SEED - Towards Deep Generative Models in Game Development
GDC2019 - SEED - Towards Deep Generative Models in Game Development
 

Recently uploaded

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Pteris : features, anatomy, morphology and lifecycle
Pteris : features, anatomy, morphology and lifecyclePteris : features, anatomy, morphology and lifecycle
Pteris : features, anatomy, morphology and lifecycle
Cherry
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Cherry
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
Cherry
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cherry
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 

Recently uploaded (20)

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Pteris : features, anatomy, morphology and lifecycle
Pteris : features, anatomy, morphology and lifecyclePteris : features, anatomy, morphology and lifecycle
Pteris : features, anatomy, morphology and lifecycle
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Efficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationEfficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence acceleration
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
BHUBANESHWAR ODIA CALL GIRL SEIRVEC ❣️ 72051//37929❣️ CALL GIRL IN ODIA HAND ...
BHUBANESHWAR ODIA CALL GIRL SEIRVEC ❣️ 72051//37929❣️ CALL GIRL IN ODIA HAND ...BHUBANESHWAR ODIA CALL GIRL SEIRVEC ❣️ 72051//37929❣️ CALL GIRL IN ODIA HAND ...
BHUBANESHWAR ODIA CALL GIRL SEIRVEC ❣️ 72051//37929❣️ CALL GIRL IN ODIA HAND ...
 
Genome Projects : Human, Rice,Wheat,E coli and Arabidopsis.
Genome Projects : Human, Rice,Wheat,E coli and Arabidopsis.Genome Projects : Human, Rice,Wheat,E coli and Arabidopsis.
Genome Projects : Human, Rice,Wheat,E coli and Arabidopsis.
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNA
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 

High-dimensional dynamics of generalization error in neural networks (Explained)