SlideShare a Scribd company logo
1 of 24
Leipzig, June 2021 1 / 24
The statistical physics of learning revisited:
Phase transitions in layered neural networks
Elisa Oostwal
Michiel Straat
Michael Biehl
Bernoulli Institute for Mathematics,
Computer Science and Artificial Intelligence
University of Groningen / NL
Physica A Vol. 564, 2021, 125517 (open access)
Hidden unit specialization in layered neural networks: ReLU vs. sigmoidal activation
Leipzig, June 2021 2 / 24
the revival of neural networks
success of multi-layered neural networks (Deep Learning)
• availability of large amounts of training data
• increased computational power
• improved training procedures and set-ups
• task specific network designs, e.g. activation functions
many open questions / lack of theoretical understanding
Leipzig, June 2021 3 / 24
statistical physics of learning
statistical physics of neural networks
training of feed-forward neural networks:
Elizabeth Gardner (1957-1988).
The space of interactions in neural networks.
J. Phys. A 21:257-270, 1988
dynamics of attractor neural networks:
John Hopfield. Neural Networks and
physical systems with emergent
collective computational abilities.
PNAS 79(8):2554-2558, 1982
1991
2001
2011
a successful branch of learning theory:
Leipzig, June 2021 4 / 24
statistical physics of learning
Leipzig, June 2021 5 / 24
N units: high-dim. input
example: a shallow neural network
K hidden units with activation
linear output
soft committee machine
input/output function defined by
• architecture, connectivity, activation functions
• adaptive weights
↑ target
function
• regression: learning from example data
e.g.
Leipzig, June 2021 6 / 24
statistical physics of learning in a nutshell
objective/cost/energy function with
• equilibrium state: compromise/competition between
minimal energy (ground state) vs. number (volume) of available
states with higher energy
• e.g. Metropolis algorithm, noisy gradient descent (Langevin)
with equilibrium (Gibbs-Boltzmann)
control parameter: „inverse temperature“ β =1 / T
• training by stochastic optimization of all adaptive weights
„thermal averages “ over Peq e.g.
minima of free energy microcanonical entropy:
Leipzig, June 2021 7 / 24
machine learning specifics
• energy function is given for a specific set of example data:
defined w.r.t.
• typical properties: additional average of the free energy over
difficult, even for the simplest model density:
with independent identically distributed (i.i.d.)
unstructured input density
• disorder-average of the free energy requires (e.g.) replica trick
frozen disorder
Leipzig, June 2021 8 / 24
machine learning at high temperatures 
• a simplifying limit: high (formal) temperature
with finite
“learn almost nothing... (high T )
...from very many examples”
• independent i.i.d. examples:
generalization error
limitations:
- training error and generalization error cannot be distinguished
- number of examples and training temperature are coupled
- (at best) qualitative agreement with low temperature results
• large number of examples: , in the limit
Leipzig, June 2021 9 / 24
adaptive student N inputs
(K) hidden units (M)
teacher
? ? ? ? ? ? ?
modelling: student teacher scenario
training: minimization of
here: learnable rules, reliable data (outputs provided by teacher)
perfectly matching complexity K=M
two prototypical activation functions:
sigmoidal / ReLU in student and teacher
Leipzig, June 2021 10 / 24
thermodynamic limit, CLT for
normally distributed with zero mean and covariance matrix
large N: Central Limit Theorem
order parameters: model parameters:
macroscopic
properties of
the system
(+ constant) independent of details (e.g. activation)
Leipzig, June 2021 11 / 24
generalization error
on average over P({xi,xj
*})
[D. Saad, S. Solla, 1995]
[M. Straat, 2019]
sigmoidal activation
rectified linear units
Leipzig, June 2021 12 / 24
site symmetry
simplification: orthonormal teacher vectors, isotropic input density
reflects permutation symmetry, allows for hidden unit specialization
sigmoidal
hidden units
ReLU
activations
entropy
(+ constant)
Leipzig, June 2021 13 / 24
given 𝛼, determine (global and local) minima of
given: size of the training data set
K, g(z),
obtain learning curves
typical learning curves
order parameters and generalization error
as a function of the (scaled) training set size
solve:
Leipzig, June 2021 14 / 24
sigmoidal ( K = 2 )
invariance under exchange of
the two hidden units
R=S: both units ~ (w1
* + w2
*) + noise
symmetry breaking phase transition
(second order, continuous) ...
... results in a kink in
the typical learning curve
Leipzig, June 2021 15 / 24
ReLU ( K = 2 )
qualitatively identical behavior
Note: num. values of and/or
are irrelevant, scale depends a.o.
on pre-factor of g(z)
Physica A Vol. 564, 2021, 125517
Leipzig, June 2021 16 / 24
sigmoidal ( K > 2 )
K=5
permutation symmetry of h.u.
initial R=S phase
discontinuous jump in ε g
coexistence of poor and good
generalization
first order transition, local min.
R>S competes with R=S
R>S becomes global minimum
facilitates perfect learning
“anti-specialization” S>R
(overlooked in 1998!)
weak/no effect of additional
anti-specialization on
generalization error
Leipzig, June 2021 17 / 24
ReLU ( K > 2)
K=10
permutation symmetry of h.u.
initial R=S phase
continuous kink in ε g
competing minima of
poor* vs. good generalization
continuous phase transition
global minimum: R>S
local minimum: R<S
* pretty good
Physica A Vol. 564, 2021, 125517
Leipzig, June 2021 18 / 24
ReLU ( large K )
permutation symmetry of h.u.
initial R=S phase
specialized and anti-specialized
branch achieve perfect
generalization, asymptotically !
(due to partial linearity of ReLU)
continuous phase transition at
degenerate minima: R>S, R<S
Leipzig, June 2021 19 / 24
Monte Carlo simulations
histogram of
observed Rij
continous Metropolis, ReLU activation, K=4, N=50, β=1 (=T)
gen. error vs. time, specialized and unspecialized initialization
anti-specialized specialized
unspecialized
R=S
R S S R
Leipzig, June 2021 20 / 24
Monte Carlo simulations
sigmoidal activation ReLU
K= 4
large gap / high barrier between
specialized and unspecialized
states delays success of learning
anti-specialized states
display near optimal
performance for large K
stationary generalization error:
Leipzig, June 2021 21 / 24
• formal equilibrium of training at high temperature in
student/teacher model situations of supervised learning
• unspecialized and partially or anti-specialized configurations
compete as local/global minima of the free energy
• phase transitions with scaled number of examples:
 K=2: continuous symmetry-breaking transitions
with equivalent competing states
 K>2, sigmoidal activations: first order transition with
competing states of distinct generalization ability
 K>2, ReLU networks: continuous transition with
competing states of similar performance
Summary
Leipzig, June 2021 22 / 24
piece-wise linear
„sigmoidal“ activation
ReLU
increasing slope
discontinuous to
continuous
Outlook
which is the decisive
property of the activation?
• consider various activation functions (leaky ReLU, swish ... )
most important question:
• study more complex solutions beyond site-symmetry
piece-wise linear activtations
Leipzig, June 2021 23 / 24
• replica trick / annealed approximation
- low temperatures, vary # of examples and T independently
- mismatched student/teacher networks 𝐾 ≠ 𝑀
- overfitting / underfitting effects
• complementary approach:
- dynamics of stochastic gradient descent
- description in terms of ODE for order parameters
• deep networks
- many hidden layers
- tree-like architectures with uncorrelated branches
• realistic input data
- clustered / correlated data
- recent developments: Zdeborova, Mezard, Goldt et al.
outlook (selected topics)
Leipzig, June 2021 24 / 24
www.cs.rug.nl/~biehl m.biehl@rug.nl
Questions ?
see also for: algorithm development in machine learning
applications in medicine, life sciences, astronomy …

More Related Content

What's hot

Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelstuxette
 
Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...tuxette
 
Lecture 1 test
Lecture 1 testLecture 1 test
Lecture 1 testfalcarragh
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biologytuxette
 
Investigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysisInvestigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysistuxette
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNNtuxette
 
Robotics exploatory plans week 1
Robotics exploatory plans week 1Robotics exploatory plans week 1
Robotics exploatory plans week 1Kevin Kopec
 
Finite Element Methode (FEM) Notes
Finite Element Methode (FEM) NotesFinite Element Methode (FEM) Notes
Finite Element Methode (FEM) NotesZulkifli Yunus
 
010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian ProcessHa Phuong
 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC datatuxette
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIRtuxette
 
Cross-view Activity Recognition using Hankelets
Cross-view Activity Recognition using HankeletsCross-view Activity Recognition using Hankelets
Cross-view Activity Recognition using HankeletsGeorge Oleinikov
 
Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning Shenghui Wang
 
Advanced Support Vector Machine for classification in Neural Network
Advanced Support Vector Machine for classification  in Neural NetworkAdvanced Support Vector Machine for classification  in Neural Network
Advanced Support Vector Machine for classification in Neural NetworkAshwani Jha
 
Finite Element Method
Finite Element MethodFinite Element Method
Finite Element MethodBharat sharma
 

What's hot (20)

Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
 
Fem lecture
Fem lectureFem lecture
Fem lecture
 
Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...
 
Lecture 1 test
Lecture 1 testLecture 1 test
Lecture 1 test
 
[ppt]
[ppt][ppt]
[ppt]
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
 
Investigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysisInvestigating the 3D structure of the genome with Hi-C data analysis
Investigating the 3D structure of the genome with Hi-C data analysis
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNN
 
Robotics exploatory plans week 1
Robotics exploatory plans week 1Robotics exploatory plans week 1
Robotics exploatory plans week 1
 
Finite Element Methode (FEM) Notes
Finite Element Methode (FEM) NotesFinite Element Methode (FEM) Notes
Finite Element Methode (FEM) Notes
 
010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process
 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC data
 
Lecture 13 modeling_errors_and_accuracy
Lecture 13 modeling_errors_and_accuracyLecture 13 modeling_errors_and_accuracy
Lecture 13 modeling_errors_and_accuracy
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIR
 
Nonnegative Matrix Factorization with Side Information for Time Series Recove...
Nonnegative Matrix Factorization with Side Information for Time Series Recove...Nonnegative Matrix Factorization with Side Information for Time Series Recove...
Nonnegative Matrix Factorization with Side Information for Time Series Recove...
 
Fem presentation
Fem presentationFem presentation
Fem presentation
 
Cross-view Activity Recognition using Hankelets
Cross-view Activity Recognition using HankeletsCross-view Activity Recognition using Hankelets
Cross-view Activity Recognition using Hankelets
 
Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning
 
Advanced Support Vector Machine for classification in Neural Network
Advanced Support Vector Machine for classification  in Neural NetworkAdvanced Support Vector Machine for classification  in Neural Network
Advanced Support Vector Machine for classification in Neural Network
 
Finite Element Method
Finite Element MethodFinite Element Method
Finite Element Method
 

Similar to The statistical physics of learning revisted: Phase transitions in layered neural networks

Solution of a subclass of lane emden differential equation by variational ite...
Solution of a subclass of lane emden differential equation by variational ite...Solution of a subclass of lane emden differential equation by variational ite...
Solution of a subclass of lane emden differential equation by variational ite...Alexander Decker
 
11.solution of a subclass of lane emden differential equation by variational ...
11.solution of a subclass of lane emden differential equation by variational ...11.solution of a subclass of lane emden differential equation by variational ...
11.solution of a subclass of lane emden differential equation by variational ...Alexander Decker
 
Symbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSymbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSteffen Staab
 
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...University of Groningen
 
11.[36 49]solution of a subclass of lane emden differential equation by varia...
11.[36 49]solution of a subclass of lane emden differential equation by varia...11.[36 49]solution of a subclass of lane emden differential equation by varia...
11.[36 49]solution of a subclass of lane emden differential equation by varia...Alexander Decker
 
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...IJDKP
 
Lecture: Interatomic Potentials Enabled by Machine Learning
Lecture: Interatomic Potentials Enabled by Machine LearningLecture: Interatomic Potentials Enabled by Machine Learning
Lecture: Interatomic Potentials Enabled by Machine LearningDanielSchwalbeKoda
 
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...Universitat Politècnica de Catalunya
 
Wasserstein 1031 thesis [Chung il kim]
Wasserstein 1031 thesis [Chung il kim]Wasserstein 1031 thesis [Chung il kim]
Wasserstein 1031 thesis [Chung il kim]Chung-Il Kim
 
Calculus_Early_Transcendentals,_second_Edition,_by_Sullivan_and.pdf
Calculus_Early_Transcendentals,_second_Edition,_by_Sullivan_and.pdfCalculus_Early_Transcendentals,_second_Edition,_by_Sullivan_and.pdf
Calculus_Early_Transcendentals,_second_Edition,_by_Sullivan_and.pdfCONSTRUCTION WORLD SOLUTION
 
An Experimental Setup for Teaching Newton's Law of Cooling
An Experimental Setup for Teaching Newton's Law of Cooling An Experimental Setup for Teaching Newton's Law of Cooling
An Experimental Setup for Teaching Newton's Law of Cooling inventionjournals
 
lecture1-230501075743-146580ac computational chemistry .ppt
lecture1-230501075743-146580ac computational chemistry .pptlecture1-230501075743-146580ac computational chemistry .ppt
lecture1-230501075743-146580ac computational chemistry .pptDrSyedZulqarnainHaid
 
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
The Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - PhdassistanceThe Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - PhdassistancePhD Assistance
 
ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016InVID Project
 
chap4_ann.pptx
chap4_ann.pptxchap4_ann.pptx
chap4_ann.pptxImXaib
 
A First Course In With Applications Complex Analysis
A First Course In With Applications Complex AnalysisA First Course In With Applications Complex Analysis
A First Course In With Applications Complex AnalysisElizabeth Williams
 
FEM Lecture.ppt
FEM Lecture.pptFEM Lecture.ppt
FEM Lecture.pptjuzaila
 

Similar to The statistical physics of learning revisted: Phase transitions in layered neural networks (20)

Solution of a subclass of lane emden differential equation by variational ite...
Solution of a subclass of lane emden differential equation by variational ite...Solution of a subclass of lane emden differential equation by variational ite...
Solution of a subclass of lane emden differential equation by variational ite...
 
11.solution of a subclass of lane emden differential equation by variational ...
11.solution of a subclass of lane emden differential equation by variational ...11.solution of a subclass of lane emden differential equation by variational ...
11.solution of a subclass of lane emden differential equation by variational ...
 
Symbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSymbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine Learning
 
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
 
11.[36 49]solution of a subclass of lane emden differential equation by varia...
11.[36 49]solution of a subclass of lane emden differential equation by varia...11.[36 49]solution of a subclass of lane emden differential equation by varia...
11.[36 49]solution of a subclass of lane emden differential equation by varia...
 
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
 
Lecture: Interatomic Potentials Enabled by Machine Learning
Lecture: Interatomic Potentials Enabled by Machine LearningLecture: Interatomic Potentials Enabled by Machine Learning
Lecture: Interatomic Potentials Enabled by Machine Learning
 
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
 
Wasserstein 1031 thesis [Chung il kim]
Wasserstein 1031 thesis [Chung il kim]Wasserstein 1031 thesis [Chung il kim]
Wasserstein 1031 thesis [Chung il kim]
 
cb
cbcb
cb
 
Calculus_Early_Transcendentals,_second_Edition,_by_Sullivan_and.pdf
Calculus_Early_Transcendentals,_second_Edition,_by_Sullivan_and.pdfCalculus_Early_Transcendentals,_second_Edition,_by_Sullivan_and.pdf
Calculus_Early_Transcendentals,_second_Edition,_by_Sullivan_and.pdf
 
An Experimental Setup for Teaching Newton's Law of Cooling
An Experimental Setup for Teaching Newton's Law of Cooling An Experimental Setup for Teaching Newton's Law of Cooling
An Experimental Setup for Teaching Newton's Law of Cooling
 
lecture1.ppt
lecture1.pptlecture1.ppt
lecture1.ppt
 
lecture1-230501075743-146580ac computational chemistry .ppt
lecture1-230501075743-146580ac computational chemistry .pptlecture1-230501075743-146580ac computational chemistry .ppt
lecture1-230501075743-146580ac computational chemistry .ppt
 
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
 
The Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - PhdassistanceThe Advancement and Challenges in Computational Physics - Phdassistance
The Advancement and Challenges in Computational Physics - Phdassistance
 
ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016
 
chap4_ann.pptx
chap4_ann.pptxchap4_ann.pptx
chap4_ann.pptx
 
A First Course In With Applications Complex Analysis
A First Course In With Applications Complex AnalysisA First Course In With Applications Complex Analysis
A First Course In With Applications Complex Analysis
 
FEM Lecture.ppt
FEM Lecture.pptFEM Lecture.ppt
FEM Lecture.ppt
 

More from University of Groningen

Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024University of Groningen
 
Evidence for tissue and stage-specific composition of the ribosome: machine l...
Evidence for tissue and stage-specific composition of the ribosome: machine l...Evidence for tissue and stage-specific composition of the ribosome: machine l...
Evidence for tissue and stage-specific composition of the ribosome: machine l...University of Groningen
 
Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)University of Groningen
 
2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...University of Groningen
 
2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ... 2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ... University of Groningen
 
Prototype-based classifiers and their applications in the life sciences
Prototype-based classifiers and their applications in the life sciencesPrototype-based classifiers and their applications in the life sciences
Prototype-based classifiers and their applications in the life sciencesUniversity of Groningen
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learningUniversity of Groningen
 
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...University of Groningen
 
2013: Prototype-based learning and adaptive distances for classification
2013: Prototype-based learning and adaptive distances for classification2013: Prototype-based learning and adaptive distances for classification
2013: Prototype-based learning and adaptive distances for classificationUniversity of Groningen
 
2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...University of Groningen
 
2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain Data2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain DataUniversity of Groningen
 
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2016: Predicting Recurrence in Clear Cell Renal Cell CarcinomaUniversity of Groningen
 
June 2017: Biomedical applications of prototype-based classifiers and relevan...
June 2017: Biomedical applications of prototype-based classifiers and relevan...June 2017: Biomedical applications of prototype-based classifiers and relevan...
June 2017: Biomedical applications of prototype-based classifiers and relevan...University of Groningen
 
January 2020: Prototype-based systems in machine learning
January 2020: Prototype-based systems in machine learning  January 2020: Prototype-based systems in machine learning
January 2020: Prototype-based systems in machine learning University of Groningen
 

More from University of Groningen (20)

Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
 
ESE-Eyes-2023.pdf
ESE-Eyes-2023.pdfESE-Eyes-2023.pdf
ESE-Eyes-2023.pdf
 
APPIS-FDGPET.pdf
APPIS-FDGPET.pdfAPPIS-FDGPET.pdf
APPIS-FDGPET.pdf
 
stat-phys-appis-reduced.pdf
stat-phys-appis-reduced.pdfstat-phys-appis-reduced.pdf
stat-phys-appis-reduced.pdf
 
prototypes-AMALEA.pdf
prototypes-AMALEA.pdfprototypes-AMALEA.pdf
prototypes-AMALEA.pdf
 
stat-phys-AMALEA.pdf
stat-phys-AMALEA.pdfstat-phys-AMALEA.pdf
stat-phys-AMALEA.pdf
 
Evidence for tissue and stage-specific composition of the ribosome: machine l...
Evidence for tissue and stage-specific composition of the ribosome: machine l...Evidence for tissue and stage-specific composition of the ribosome: machine l...
Evidence for tissue and stage-specific composition of the ribosome: machine l...
 
Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)
 
Biehl hanze-2021
Biehl hanze-2021Biehl hanze-2021
Biehl hanze-2021
 
2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...
 
2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ... 2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ...
 
Prototype-based classifiers and their applications in the life sciences
Prototype-based classifiers and their applications in the life sciencesPrototype-based classifiers and their applications in the life sciences
Prototype-based classifiers and their applications in the life sciences
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learning
 
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
 
2013: Prototype-based learning and adaptive distances for classification
2013: Prototype-based learning and adaptive distances for classification2013: Prototype-based learning and adaptive distances for classification
2013: Prototype-based learning and adaptive distances for classification
 
2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...
 
2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain Data2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain Data
 
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
 
June 2017: Biomedical applications of prototype-based classifiers and relevan...
June 2017: Biomedical applications of prototype-based classifiers and relevan...June 2017: Biomedical applications of prototype-based classifiers and relevan...
June 2017: Biomedical applications of prototype-based classifiers and relevan...
 
January 2020: Prototype-based systems in machine learning
January 2020: Prototype-based systems in machine learning  January 2020: Prototype-based systems in machine learning
January 2020: Prototype-based systems in machine learning
 

Recently uploaded

zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Jshifa
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 

Recently uploaded (20)

zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)Recombination DNA Technology (Microinjection)
Recombination DNA Technology (Microinjection)
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 

The statistical physics of learning revisted: Phase transitions in layered neural networks

  • 1. Leipzig, June 2021 1 / 24 The statistical physics of learning revisited: Phase transitions in layered neural networks Elisa Oostwal Michiel Straat Michael Biehl Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence University of Groningen / NL Physica A Vol. 564, 2021, 125517 (open access) Hidden unit specialization in layered neural networks: ReLU vs. sigmoidal activation
  • 2. Leipzig, June 2021 2 / 24 the revival of neural networks success of multi-layered neural networks (Deep Learning) • availability of large amounts of training data • increased computational power • improved training procedures and set-ups • task specific network designs, e.g. activation functions many open questions / lack of theoretical understanding
  • 3. Leipzig, June 2021 3 / 24 statistical physics of learning statistical physics of neural networks training of feed-forward neural networks: Elizabeth Gardner (1957-1988). The space of interactions in neural networks. J. Phys. A 21:257-270, 1988 dynamics of attractor neural networks: John Hopfield. Neural Networks and physical systems with emergent collective computational abilities. PNAS 79(8):2554-2558, 1982 1991 2001 2011 a successful branch of learning theory:
  • 4. Leipzig, June 2021 4 / 24 statistical physics of learning
  • 5. Leipzig, June 2021 5 / 24 N units: high-dim. input example: a shallow neural network K hidden units with activation linear output soft committee machine input/output function defined by • architecture, connectivity, activation functions • adaptive weights ↑ target function • regression: learning from example data e.g.
  • 6. Leipzig, June 2021 6 / 24 statistical physics of learning in a nutshell objective/cost/energy function with • equilibrium state: compromise/competition between minimal energy (ground state) vs. number (volume) of available states with higher energy • e.g. Metropolis algorithm, noisy gradient descent (Langevin) with equilibrium (Gibbs-Boltzmann) control parameter: „inverse temperature“ β =1 / T • training by stochastic optimization of all adaptive weights „thermal averages “ over Peq e.g. minima of free energy microcanonical entropy:
  • 7. Leipzig, June 2021 7 / 24 machine learning specifics • energy function is given for a specific set of example data: defined w.r.t. • typical properties: additional average of the free energy over difficult, even for the simplest model density: with independent identically distributed (i.i.d.) unstructured input density • disorder-average of the free energy requires (e.g.) replica trick frozen disorder
  • 8. Leipzig, June 2021 8 / 24 machine learning at high temperatures  • a simplifying limit: high (formal) temperature with finite “learn almost nothing... (high T ) ...from very many examples” • independent i.i.d. examples: generalization error limitations: - training error and generalization error cannot be distinguished - number of examples and training temperature are coupled - (at best) qualitative agreement with low temperature results • large number of examples: , in the limit
  • 9. Leipzig, June 2021 9 / 24 adaptive student N inputs (K) hidden units (M) teacher ? ? ? ? ? ? ? modelling: student teacher scenario training: minimization of here: learnable rules, reliable data (outputs provided by teacher) perfectly matching complexity K=M two prototypical activation functions: sigmoidal / ReLU in student and teacher
  • 10. Leipzig, June 2021 10 / 24 thermodynamic limit, CLT for normally distributed with zero mean and covariance matrix large N: Central Limit Theorem order parameters: model parameters: macroscopic properties of the system (+ constant) independent of details (e.g. activation)
  • 11. Leipzig, June 2021 11 / 24 generalization error on average over P({xi,xj *}) [D. Saad, S. Solla, 1995] [M. Straat, 2019] sigmoidal activation rectified linear units
  • 12. Leipzig, June 2021 12 / 24 site symmetry simplification: orthonormal teacher vectors, isotropic input density reflects permutation symmetry, allows for hidden unit specialization sigmoidal hidden units ReLU activations entropy (+ constant)
  • 13. Leipzig, June 2021 13 / 24 given 𝛼, determine (global and local) minima of given: size of the training data set K, g(z), obtain learning curves typical learning curves order parameters and generalization error as a function of the (scaled) training set size solve:
  • 14. Leipzig, June 2021 14 / 24 sigmoidal ( K = 2 ) invariance under exchange of the two hidden units R=S: both units ~ (w1 * + w2 *) + noise symmetry breaking phase transition (second order, continuous) ... ... results in a kink in the typical learning curve
  • 15. Leipzig, June 2021 15 / 24 ReLU ( K = 2 ) qualitatively identical behavior Note: num. values of and/or are irrelevant, scale depends a.o. on pre-factor of g(z) Physica A Vol. 564, 2021, 125517
  • 16. Leipzig, June 2021 16 / 24 sigmoidal ( K > 2 ) K=5 permutation symmetry of h.u. initial R=S phase discontinuous jump in ε g coexistence of poor and good generalization first order transition, local min. R>S competes with R=S R>S becomes global minimum facilitates perfect learning “anti-specialization” S>R (overlooked in 1998!) weak/no effect of additional anti-specialization on generalization error
  • 17. Leipzig, June 2021 17 / 24 ReLU ( K > 2) K=10 permutation symmetry of h.u. initial R=S phase continuous kink in ε g competing minima of poor* vs. good generalization continuous phase transition global minimum: R>S local minimum: R<S * pretty good Physica A Vol. 564, 2021, 125517
  • 18. Leipzig, June 2021 18 / 24 ReLU ( large K ) permutation symmetry of h.u. initial R=S phase specialized and anti-specialized branch achieve perfect generalization, asymptotically ! (due to partial linearity of ReLU) continuous phase transition at degenerate minima: R>S, R<S
  • 19. Leipzig, June 2021 19 / 24 Monte Carlo simulations histogram of observed Rij continous Metropolis, ReLU activation, K=4, N=50, β=1 (=T) gen. error vs. time, specialized and unspecialized initialization anti-specialized specialized unspecialized R=S R S S R
  • 20. Leipzig, June 2021 20 / 24 Monte Carlo simulations sigmoidal activation ReLU K= 4 large gap / high barrier between specialized and unspecialized states delays success of learning anti-specialized states display near optimal performance for large K stationary generalization error:
  • 21. Leipzig, June 2021 21 / 24 • formal equilibrium of training at high temperature in student/teacher model situations of supervised learning • unspecialized and partially or anti-specialized configurations compete as local/global minima of the free energy • phase transitions with scaled number of examples:  K=2: continuous symmetry-breaking transitions with equivalent competing states  K>2, sigmoidal activations: first order transition with competing states of distinct generalization ability  K>2, ReLU networks: continuous transition with competing states of similar performance Summary
  • 22. Leipzig, June 2021 22 / 24 piece-wise linear „sigmoidal“ activation ReLU increasing slope discontinuous to continuous Outlook which is the decisive property of the activation? • consider various activation functions (leaky ReLU, swish ... ) most important question: • study more complex solutions beyond site-symmetry piece-wise linear activtations
  • 23. Leipzig, June 2021 23 / 24 • replica trick / annealed approximation - low temperatures, vary # of examples and T independently - mismatched student/teacher networks 𝐾 ≠ 𝑀 - overfitting / underfitting effects • complementary approach: - dynamics of stochastic gradient descent - description in terms of ODE for order parameters • deep networks - many hidden layers - tree-like architectures with uncorrelated branches • realistic input data - clustered / correlated data - recent developments: Zdeborova, Mezard, Goldt et al. outlook (selected topics)
  • 24. Leipzig, June 2021 24 / 24 www.cs.rug.nl/~biehl m.biehl@rug.nl Questions ? see also for: algorithm development in machine learning applications in medicine, life sciences, astronomy …