High-dimensional dynamics of generalization error in neural networks (Explained)

•

1 like•422 views

Explanation of "High-dimensional dynamics of generalization error in neural networks" https://arxiv.org/pdf/1710.03667.pdf

Science

High-dimensional dynamics
of generalization error
in neural networks
Hikaru Ibayashi

Background
Deep Neural Network’s Mystery

Background
Deep Neural Network’s Mystery
● Huge amount of parameters
○ Ex.) VGG net has 155 million parameters
● Train data size can be small
○ Ex.) ImageNet has 1.2 million samples
● Still shows high generalization performance

Background
Chiyuan Zhang et al.
Empirically Raised this Mystery in 2017

Background
Labels
Shuffled Labels
Can Fit Perfectly

Background
Labels
Shuffled Labels
Can Fit Perfectly
Can Fit Perfectly

Background
Labels
Shuffled Labels
Shuffled Labels
Can Fit Perfectly
Can Fit Perfectly

Background
Labels
Shuffled Labels
Shuffled Labels
Can Fit Perfectly
Can Fit Perfectly
Can Fit Perfectly

Background
Labels
Shuffled Labels
Shuffled Labels
Can Fit Perfectly
Can Fit Perfectly
Can Fit Perfectly
& Generalize Well

Teacher-Student Learning Dynamics
Teacher-Student Scenario

Teacher-Student Learning Dynamics
Teacher Net

Teacher-Student Learning Dynamics
Train DataTrain Data
Teacher Net

Hypothesis
Teacher-Student Learning Dynamics
Train DataTrain Data
Student Net
Teacher Net

Teacher Net
Hypothesis
follows
Teacher-Student Learning Dynamics
Train DataTrain Data
Student Net

Hypothesis
follows
with initial value
Teacher-Student Learning Dynamics
Train DataTrain Data
Student Net
Teacher Net

After simple rotation
Teacher-Student Learning Dynamics

After simple rotation
Test error
Teacher-Student Learning Dynamics

After simple rotation
Test error
Noise
Teacher-Student Learning Dynamics

After simple rotation
Test error
Learning
Noise
Teacher-Student Learning Dynamics

After simple rotation
Test error
Learning Overfitting
Noise
Teacher-Student Learning Dynamics

How to take expectation over ?
Teacher-Student Learning Dynamics

How to take expectation over ?
Teacher-Student Learning Dynamics
Eigen values of
where

How to take expectation over ?
Teacher-Student Learning Dynamics
Marchenko–Pastur distribution
from random matrix theory
Eigen values of
where

Teacher-Student Learning Dynamics
Final result

Teacher-Student Learning Dynamics (Result)
Eigenvalue Spectrum
Learning Dynamics

Teacher-Student Learning Dynamics (Result)

Teacher-Student Learning Dynamics (Result)
Optimal stopping time

Teacher-Student Learning Dynamics (Result)
Optimal stopping time
Signal noise ratio

Teacher-Student Learning Dynamics (Result)
Optimal stopping time
Signal noise ratio
Low quality data requires longer training time

Is Single-Layer-Linear model Reasonable?

Is Single-Layer-Linear model Reasonable?
Linear
Single-Layer
Teacher-Student

Is Single-Layer-Linear model Reasonable?
Linear
Single-Layer
Teacher-Student
Nonlinear
Two-Layers
Teacher-Student

Is Single-Layer-Linear model Reasonable?
Linear
Single-Layer
Teacher-Student
Nonlinear
Two-Layers
Teacher-Student
Nonlinear
Two-Layers
Real Data

Is Single-Layer-Linear model Reasonable?
One Difference

Non-Linear Teacher Student (Result)
Learning Dynamics

Non-Linear Teacher Student (Result)
Test Error vs Hidden units
(Result of Linear case)

Non-Linear MNIST (Result)
Test Error vs Hidden units

Non-Linear MNIST (Result)
Overfitting Error vs Hidden units

What does this paper imply?
With early stopping,
Larger model →Better generalization
Discussion

What does this paper imply?
With early stopping,
Larger model →Better generalization
Deep Learning’s Mystery!!
Discussion

What does this paper imply?
With early stopping,
Larger model →Better generalization
How can this paper explain?
Deep Learning’s Mystery!!
Discussion

What does this paper imply?
With early stopping,
Larger model →Better generalization
How can this paper explain?
Larger model → Bigger eigenvalues
(Marchenko-Pasteur distribution)
Deep Learning’s Mystery!!
Discussion

Similar to High-dimensional dynamics of generalization error in neural networks (Explained)

Tutorial on Deep Generative Models

MLReview

Accelerating stochastic gradient descent using adaptive mini batch size3

muayyad alsadi

Corinna Cortes is a Danish computer scientist known for her contributions to machine learning. She is currently the Head of Google Research, New York. Cortes is a recipient of the Paris Kanellakis Theory and Practice Award for her work on theoretical foundations of support vector machines. Cortes received her M.S. degree in physics from Copenhagen University in 1989. In the same year she joined AT&T Bell Labs as a researcher and remained there for about ten years. She received her Ph.D. in computer science from the University of Rochester in 1993. Cortes currently serves as the Head of Google Research, New York. She is an Editorial Board member of the journal Machine Learning. Cortes’ research covers a wide range of topics in machine learning, including support vector machines and data mining. In 2008, she jointly with Vladimir Vapnik received the Paris Kanellakis Theory and Practice Award for the development of a highly effective algorithm for supervised learning known as support vector machines (SVM). Today, SVM is one of the most frequently used algorithms in machine learning, which is used in many practical applications, including medical diagnosis and weather forecasting. Abstract Summary: Harnessing Neural Networks: Deep learning has demonstrated impressive performance gain in many machine learning applications. However, unveiling and realizing these performance gains is not always straightforward. Discovering the right network architecture is critical for accuracy and often requires a human in the loop. Some network architectures occasionally produce spurious outputs, and the outputs have to be restricted to meet the needs of an application. Finally, realizing the performance gain in a production system can be difficult because of extensive inference times. In this talk we discuss methods for making neural networks efficient in production systems. We also discuss an efficient method for automatically learning the network architecture, called AdaNet. We provide theoretical arguments for the algorithm and present experimental evidence for its effectiveness.

Corinna Cortes, Head of Research, Google, at MLconf NYC 2017

MLconf

Deep learning from a novice perspective

Anirban Santara

Deep learning

Ratnakar Pandey

Day 2 (Lecture 1): Introduction to Statistical Machine Learning and Applications

Aseda Owusua Addai-Deseh

presentation.ppt

MadhuriChandanbatwe

AI at scale requires a perfect storm of data, algorithms and cloud infrastructure. Modern deep learning requires large amounts of training data. We develop methods that improve data collection, aggregation and augmentation. This involves active learning, partial feedback, crowdsourcing, and generative models. We analyze large-scale machine learning methods for distributed training. We show that gradient quantization can yield best of both the worlds: accuracy and communication efficiency. We extend matrix methods to higher dimensions using tensor algebraic techniques and show superior performance. Finally, at AWS, we are developing robust software frameworks and AI cloud services at all levels of the stack.

Trinity of AI: data, algorithms and cloud

Anima Anandkumar

Existing state-of-the-art supervised methods in Machine Learning require large amounts of annotated data to achieve good performance and generalization. However, manually constructing such a training data set with sentiment labels is a labor-intensive and time-consuming task. With the proliferation of data acquisition in domains such as images, text and video, the rate at which we acquire data is greater than the rate at which we can label them. Techniques that reduce the amount of labeled data needed to achieve competitive accuracies are of paramount importance for deploying scalable, data-driven, real-world solutions. At Envestnet | Yodlee, we have deployed several advanced state-of-the-art Machine Learning solutions that process millions of data points on a daily basis with very stringent service level commitments. A key aspect of our Natural Language Processing solutions is Semi-supervised learning (SSL): A family of methods that also make use of unlabelled data for training – typically a small amount of labeled data with a large amount of unlabelled data. Pure supervised solutions fail to exploit the rich syntactic structure of the unlabelled data to improve decision boundaries. There is an abundance of published work in the field - but few papers have succeeded in showing significantly better results than state-of-the-art supervised learning. Often, methods have simplifying assumptions that fail to transfer to real-world scenarios. There is a lack of practical guidelines for deploying effective SSL solutions. We attempt to bridge that gap by sharing our learning from successful SSL models deployed in production

Semi-Supervised Insight Generation from Petabyte Scale Text Data

Tech Triveni

EssentialsOfMachineLearning.pdf

Ankita Tiwari

NLP Classifier Models & Metrics

Sanghamitra Deb

Transformers.pdf

Ali Zoljodi

[PR12] understanding deep learning requires rethinking generalization

JaeJun Yoo

Presented by Isabel Drost-Fromm, Software Developer, Apache Software Foundation/Nokia Gate 5 GmbH at Lucene/Solr Revolution 2013 Dublin Text classification automates the task of filing documents into pre-defined categories based on a set of example documents. The first step in automating classification is to transform the documents to feature vectors. Though this step is highly domain specific Apache Mahout provides you with a lot of easy to use tooling to help you get started, most of which relies heavily on Apache Lucene for analysis, tokenisation and filtering. This session shows how to use facetting to quickly get an understanding of the fields in your document. It will walk you through the steps necessary to convert your text documents into feature vectors that Mahout classifiers can use including a few anecdotes on drafting domain specific features. Configure

Text Classification Powered by Apache Mahout and Lucene

lucenerevolution

in5490-classification (1).pptx

MonicaTimber

Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.

Wuhyun Rico Shin

AI Algorithms

Dr. C.V. Suresh Babu

Deep Learning in Recommender Systems - RecSys Summer School 2017

Balázs Hidasi

Demystifying Machine Learning

Ayodele Odubela

Deep learning is becoming ubiquitous in Machine Learning (ML) research, and it's also finding its place in industry-related applications. Specifically, deep generative models have proven incredibly useful at generating and remixing realistic content from scratch, making themselves a very appealing technology in the field of AI-enhanced content authoring. As part of this year's Machine Learning Tutorial at the Game Developers Conference 2019 (GDC), Jorge Del Val from SEED will cover in an accessible manner the fundamentals of deep generative modeling, including some common algorithms and architectures. He will also discuss applications to game development and explore some recent advances in the field. The attendee will gain basic understanding of the fundamentals of generative models and how to implement them. Also, attendees will grasp potential applications in the field of game development to inspire their work and companies. This talk does not require a mathematical or machine learning background, although previous knowledge on either of those is beneficial.

GDC2019 - SEED - Towards Deep Generative Models in Game Development

Electronic Arts / DICE

Similar to High-dimensional dynamics of generalization error in neural networks (Explained) (20)

Tutorial on Deep Generative Models

Accelerating stochastic gradient descent using adaptive mini batch size3

Corinna Cortes, Head of Research, Google, at MLconf NYC 2017

Deep learning from a novice perspective

Deep learning

Day 2 (Lecture 1): Introduction to Statistical Machine Learning and Applications

presentation.ppt

Trinity of AI: data, algorithms and cloud

Semi-Supervised Insight Generation from Petabyte Scale Text Data

EssentialsOfMachineLearning.pdf

NLP Classifier Models & Metrics

Transformers.pdf

[PR12] understanding deep learning requires rethinking generalization

Text Classification Powered by Apache Mahout and Lucene

in5490-classification (1).pptx

Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.

AI Algorithms

Deep Learning in Recommender Systems - RecSys Summer School 2017

Demystifying Machine Learning

GDC2019 - SEED - Towards Deep Generative Models in Game Development

Recently uploaded

Module for Grade 9 for Asynchronous/Distance learning

levieagacer

Using deep archival observations from the Chandra X-ray Observatory, we present an analysis of linear X-ray-emitting features located within the southern portion of the Galactic center chimney, and oriented orthogonal to the Galactic plane, centered at coordinates l = 0.08◦ , b = −1.42◦ . The surface brightness and hardness ratio patterns are suggestive of a cylindrical morphology which may have been produced by a plasma outflow channel extending from the Galactic center. Our fits of the feature’s spectra favor a complex two-component model consisting of thermal and recombining plasma components, possibly a sign of shock compression or heating of the interstellar medium by outflowing material. Assuming a recombining plasma scenario, we further estimate the cooling timescale of this plasma to be on the order of a few hundred to thousands of years, leading us to speculate that a sequence of accretion events onto the Galactic Black Hole may be a plausible quasi-continuous energy source to sustain the observed morphology

X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney

Sérgio Sacani

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry

Alex Henderson

FS P2 COMBO MSTA LAST PUSH past exam papers.

takadzanijustinmaime

Dr. E. Muralinath_ Blood indices_clinical aspects

muralinath2

Pteris : features, anatomy, morphology and lifecycle

Cherry

www.whatsapp.com+917728919243 HOT & SEXY MODELS // COLLEGE GIRLS AVAILABLE FOR COMPLETE ENJOYMENT WITH HIGH PROFILE INDIAN MODEL AVAILABLE HOTEL & HOME ★ SAFE AND SECURE HIGH CLASS SERVICE AFFORDABLE RATE SATISFACTION,UNLIMITED ENJOYMENT. ★ All Meetings are confidential and no information is provided to any one at any cost. ★ EXCLUSIVE PROFILes Are Safe and Consensual with Most Limits Respected ★ Service Available In: - HOME *Star Hotel Service .In Call & Out call SeRvIcEs : ★ A-Level ★ Strip-tease ★ BBBJ (Bareback Blowjob)Receive advanced sexual techniques in different mode make their life more pleasurable. ★ Spending time in hotel rooms ★ BJ (Blowjob Without a Condom) ★ Completion (Oral to completion) ★ Covered (Covered blowjob Without a Condom)

Call Girls Ahmedabad +917728919243 call me Independent Escort Service

shivanisharma5244

Porella : features, morphology, anatomy, reproduction etc.

Cherry

Marine and terrestrial biogeochemical models are key components of the Earth System Models (ESMs) used toproject future environmental changes. However, their slow adjustment time also hinders effective use of ESMsbecause of the enormous computational resources required to integrate them to a pre-industrial equilibrium. Here,a solution to this "spin-up" problem based on "sequence acceleration", is shown to accelerate equilibration of state-of-the-art marine biogeochemical models by over an order of magnitude. The technique can be applied in a "blackbox" fashion to existing models. Even under the challenging spin-up protocols used for Intergovernmental Panelon Climate Change (IPCC) simulations, this algorithm is 5 times faster. Preliminary results suggest that terrestrialmodels can be similarly accelerated, enabling a quantification of major parametric uncertainties in ESMs, improvedestimates of metrics such as climate sensitivity, and higher model resolution than currently feasible.

Efficient spin-up of Earth System Models usingsequence acceleration

Sérgio Sacani

Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....

muralinath2

Cyanide resistant respiration pathway.pptx

Cherry

GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry

Areesha Ahmad

LUNULARIA -features, morphology, anatomy ,reproduction etc.

Cherry

Cyathodium bryophyte: morphology, anatomy, reproduction etc.

Cherry

Clean In Place(CIP).pptx .

Poonam Aher Patil

BHUBANESHWAR ODIA CALL GIRL SEIRVEC ❣️ 72051//37929❣️ CALL GIRL IN ODIA HAND ...

ODIA CALL GIRL SEIRVEC ❣️ 72051//37929❣️ CALL GIRL IN ODIA HAND TO HAND CASH PAYMENT

Genome Projects : Human, Rice,Wheat,E coli and Arabidopsis.

Cherry

Digital Dentistry.Digital Dentistryvv.pptx

MohamedFarag457087

Cot curve, melting temperature, unique and repetitive DNA

Cherry

www.seribangash.com The Mariana Trench is one of the most remarkable geological features on Earth. Here are some details about it: Location: The Mariana Trench is located in the western Pacific Ocean, east of the Mariana Islands. It stretches for about 2,550 kilometers (1,580 miles) and is known as the deepest part of the world's oceans. Depth: The trench reaches incredible depths, with its deepest point known as the Challenger Deep, which plunges down to approximately 10,984 meters (36,037 feet) below sea level. To put this into perspective, if Mount Everest, the tallest mountain on Earth, were placed at the bottom of the Challenger Deep, its peak would still be over 2 kilometers (1.25 miles) underwater. Formation: The Mariana Trench was formed by the subduction of the Pacific Plate beneath the Mariana Plate. This process creates a deep trench as the heavier Pacific Plate is forced beneath the lighter Mariana Plate. Geological Features: The trench is characterized by steep, V-shaped valleys, and its walls are composed of highly compressed sedimentary rock. At the bottom of the trench, there are also large amounts of marine sediment. Pressure: The pressure at the bottom of the Mariana Trench is immense, reaching over 1,000 times the pressure at the surface. This extreme pressure creates a challenging environment for exploration and makes it difficult for organisms to survive. Exploration: Despite its extreme conditions, the Mariana Trench has been the subject of numerous scientific expeditions and explorations. One of the most famous explorations was the dive to the Challenger Deep by Swiss scientist Jacques Piccard and U.S. Navy Lieutenant Don Walsh in 1960. More recently, in 2012, filmmaker James Cameron made a solo dive to the bottom of the Challenger Deep in the Deepsea Challenger submersible. Biological Discoveries: Despite the harsh conditions, the Mariana Trench is home to a surprising variety of life forms, including unique species of deep-sea fish, crustaceans, and microbial life. Some organisms have adapted to survive in the extreme pressure and darkness of the trench. Environmental Importance: Studying the Mariana Trench provides valuable insights into the geology, biology, and oceanography of the deep sea. It also helps scientists better understand the processes that shape the Earth's crust and the distribution of life in the oceans. Conservation: Due to its remote location and extreme depths, the Mariana Trench has remained relatively untouched by human activity. However, there is growing concern about the potential impacts of deep-sea mining and pollution on this fragile ecosystem, highlighting the need for conservation efforts to protect this unique environment. https://seribangash.com/barber-shop-business-complete-guide-for-beginners/ https://seribangash.com/legend-virat-kohli-in-cricket-history/

The Mariana Trench remarkable geological features on Earth.pptx

seri bangash

Recently uploaded (20)

Module for Grade 9 for Asynchronous/Distance learning

X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry

FS P2 COMBO MSTA LAST PUSH past exam papers.

Dr. E. Muralinath_ Blood indices_clinical aspects

Pteris : features, anatomy, morphology and lifecycle

Call Girls Ahmedabad +917728919243 call me Independent Escort Service

Porella : features, morphology, anatomy, reproduction etc.

Efficient spin-up of Earth System Models usingsequence acceleration

Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....

Cyanide resistant respiration pathway.pptx

GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry

LUNULARIA -features, morphology, anatomy ,reproduction etc.

Cyathodium bryophyte: morphology, anatomy, reproduction etc.

Clean In Place(CIP).pptx .

BHUBANESHWAR ODIA CALL GIRL SEIRVEC ❣️ 72051//37929❣️ CALL GIRL IN ODIA HAND ...

Genome Projects : Human, Rice,Wheat,E coli and Arabidopsis.

Digital Dentistry.Digital Dentistryvv.pptx

Cot curve, melting temperature, unique and repetitive DNA

The Mariana Trench remarkable geological features on Earth.pptx

High-dimensional dynamics of generalization error in neural networks (Explained)

1. High-dimensional dynamics of generalization error in neural networks Hikaru Ibayashi

2. Background Deep Neural Network’s Mystery

3. Background Deep Neural Network’s Mystery ● Huge amount of parameters ○ Ex.) VGG net has 155 million parameters ● Train data size can be small ○ Ex.) ImageNet has 1.2 million samples ● Still shows high generalization performance

4. Background Deep Neural Network’s Mystery ● Huge amount of parameters ○ Ex.) VGG net has 155 million parameters ● Train data size can be small ○ Ex.) ImageNet has 1.2 million samples ● Still shows high generalization performance

5. Background Deep Neural Network’s Mystery ● Huge amount of parameters ○ Ex.) VGG net has 155 million parameters ● Train data size can be small ○ Ex.) ImageNet has 1.2 million samples ● Still shows high generalization performance

6. Background Chiyuan Zhang et al. Empirically Raised this Mystery in 2017

7. Background Labels

8. Background LabelsCan Fit Perfectly

9. Background Labels Shuffled Labels Can Fit Perfectly

10. Background Labels Shuffled Labels Can Fit Perfectly Can Fit Perfectly

11. Background Labels Shuffled Labels Shuffled Labels Can Fit Perfectly Can Fit Perfectly

12. Background Labels Shuffled Labels Shuffled Labels Can Fit Perfectly Can Fit Perfectly Can Fit Perfectly

13. Background Labels Shuffled Labels Shuffled Labels Can Fit Perfectly Can Fit Perfectly Can Fit Perfectly & Generalize Well

14. Today’s Paper

15. Today’s Paper Ex.) Learning Dynamics

16. Teacher-Student Learning Dynamics Teacher-Student Scenario

17. Teacher-Student Learning Dynamics Teacher Net

18. Teacher-Student Learning Dynamics Train DataTrain Data Teacher Net

19. Hypothesis Teacher-Student Learning Dynamics Train DataTrain Data Student Net Teacher Net

20. Teacher Net Hypothesis follows Teacher-Student Learning Dynamics Train DataTrain Data Student Net

21. Hypothesis follows with initial value Teacher-Student Learning Dynamics Train DataTrain Data Student Net Teacher Net

22. After simple rotation Teacher-Student Learning Dynamics

23. After simple rotation Teacher-Student Learning Dynamics

24. After simple rotation Teacher-Student Learning Dynamics

25. After simple rotation Test error Teacher-Student Learning Dynamics

26. After simple rotation Test error Noise Teacher-Student Learning Dynamics

27. After simple rotation Test error Learning Noise Teacher-Student Learning Dynamics

28. After simple rotation Test error Learning Overfitting Noise Teacher-Student Learning Dynamics

29. How to take expectation over ? Teacher-Student Learning Dynamics

30. How to take expectation over ? Teacher-Student Learning Dynamics Eigen values of where

31. How to take expectation over ? Teacher-Student Learning Dynamics Marchenko–Pastur distribution from random matrix theory Eigen values of where

32. Teacher-Student Learning Dynamics Final result

33. Teacher-Student Learning Dynamics (Result) Eigenvalue Spectrum Learning Dynamics

34. Teacher-Student Learning Dynamics (Result) Eigenvalue Spectrum Learning Dynamics

35. Teacher-Student Learning Dynamics (Result) Eigenvalue Spectrum Learning Dynamics

36. Teacher-Student Learning Dynamics (Result) Eigenvalue Spectrum Learning Dynamics

37. Teacher-Student Learning Dynamics (Result) Eigenvalue Spectrum Learning Dynamics

38. Teacher-Student Learning Dynamics (Result)

39. Teacher-Student Learning Dynamics (Result) Optimal stopping time

40. Teacher-Student Learning Dynamics (Result) Optimal stopping time Signal noise ratio

41. Teacher-Student Learning Dynamics (Result) Optimal stopping time Signal noise ratio Low quality data requires longer training time

42. Is Single-Layer-Linear model Reasonable?

43. Is Single-Layer-Linear model Reasonable? Linear Single-Layer Teacher-Student

44. Is Single-Layer-Linear model Reasonable? Linear Single-Layer Teacher-Student Nonlinear Two-Layers Teacher-Student

45. Is Single-Layer-Linear model Reasonable? Linear Single-Layer Teacher-Student Nonlinear Two-Layers Teacher-Student Nonlinear Two-Layers Real Data

46. Is Single-Layer-Linear model Reasonable? One Difference

47. Is Single-Layer-Linear model Reasonable? One Difference

48. Non-Linear Teacher Student (Result) Learning Dynamics

49. Non-Linear Teacher Student (Result) Learning Dynamics

50. Non-Linear Teacher Student (Result) Test Error vs Hidden units (Result of Linear case)

51. Non-Linear MNIST (Result) Test Error vs Hidden units

52. Non-Linear MNIST (Result) Overfitting Error vs Hidden units

53. Discussion What does this paper imply?

54. What does this paper imply? With early stopping, Larger model →Better generalization Discussion

55. What does this paper imply? With early stopping, Larger model →Better generalization Deep Learning’s Mystery!! Discussion

56. What does this paper imply? With early stopping, Larger model →Better generalization How can this paper explain? Deep Learning’s Mystery!! Discussion

57. What does this paper imply? With early stopping, Larger model →Better generalization How can this paper explain? Larger model → Bigger eigenvalues (Marchenko-Pasteur distribution) Deep Learning’s Mystery!! Discussion

58. What does this paper imply? With early stopping, Larger model →Better generalization How can this paper explain? Larger model → Bigger eigenvalues (Marchenko-Pasteur distribution) Deep Learning’s Mystery!! Discussion

59. What does this paper imply? With early stopping, Larger model →Better generalization How can this paper explain? Larger model → Bigger eigenvalues (Marchenko-Pasteur distribution) Deep Learning’s Mystery!! Discussion

60. Formal Result Rademacher Complexity

High-dimensional dynamics of generalization error in neural networks (Explained)

Recommended

Recommended

More Related Content

Similar to High-dimensional dynamics of generalization error in neural networks (Explained)

Similar to High-dimensional dynamics of generalization error in neural networks (Explained) (20)

Recently uploaded

Recently uploaded (20)

High-dimensional dynamics of generalization error in neural networks (Explained)