SlideShare a Scribd company logo
MiWoCI IEEE 2018 1
The statistical physics of learning - revisited
www.cs.rug.nl/~biehl
Michael Biehl
Bernoulli Institute for
Mathematics, Computer Science
and Artificial Intelligence
University of Groningen / NL
MiWoCI IEEE 2018 2
machine learning theory ?
Computational Learning Theory
performance bounds & guarantees
independent of
- specific task
- statistical properties of data
- details of the training
...
Statistical Physics of Learning:
typical properties & phenomena
for models of specific
- systems/network architectures
- statistics of data and noise
- training algorithms / cost functions
...
MiWoCI IEEE 2018
www.ibm.com/developerworks/library/cc-cognitive-neural-networks-deep-
dive/
SVM
math. analogies with the theory of
disordered magnetic materials
statistical physics
- of network dynamics (neurons)
- of learning processes (weights)
A Neural Networks timeline
SOM
LVQMinsky
& Papert
Perceptrons
Widrow&Hoff: Adaline
MiWoCI IEEE 2018 4
news from the stone age of neural networks
Statistical Physics of Neural Networks: Two ground-breaking papers
Training, feed-forward networks:
Elizabeth Gardner (1957-1988).
The space of interactions in neural
networks. J. Phys. A 21:257-270 (1988)
Dynamics, attractor neural networks:
John Hopfield. Neural Networks and
physical systems with emergent
collective computational abilities.
PNAS 79(8):2554-2558 (1982)
MiWoCI IEEE 2018 5
From stochastic optimization Monte Carlo, Langevin dynamics
.... to thermal equilibrium: temperature, free energy, entropy, ...
(.... and back) formal application to optimization
training: stochastic optimization of (many) weights
guided by a data-dependent cost function
randomized data ( frozen disorder )
models: student/teacher scenarios
Machine learning: typical properties of large learning systems
Examples: perceptron classifier, “Ising” perceptron, layered networks
analysis: order parameters, disorder average, replica trick
annealed approximation, high temperature limit
overview
Outlook
MiWoCI IEEE 2018 6
stochastic optimization
objective/cost/energy function , e.g. with many degrees of freedom
discrete, e.g. continuous, e.g.
Metropolis algorithm Langevin dynamics
• acceptance of the change
- always if
- with probability
if
• suggest a (small) change
, e.g. „single spin flip“
for a random j
• compute
• continuous temporal change,
„noisy gradient descent“
controls acceptance rate
for „uphill“ moves
... controls noise level, i.e.
random deviation from gradient
• with delta-correlated white noise
(spatial + temporal independence)
MiWoCI IEEE 2018
thermal equilibrium
Markov chain continuous dynamics
stationary density of configurations:
normalization: „Zustandssumme“, partition function
Gibbs-Boltzmann density of states
• physics: thermal equilibrium of a physical system at temperature T
• optimization: formal equilibrium situation, control parameter T
7
note: additional constraints
can be imposed on the weights,
for instance: normalization
MiWoCI IEEE 2018
the role of Z: thermal averages <...>T in equilibrium, e.g.
... can be expressed as derivatives of ln Z
~ vol. of states with energy E
(microcanonical) entropy
per degree of freedom:
assume extensive energy, proportional to system size N:
thermal averages and entropy
8
re-write as an integral over all possible energies:
MiWoCI IEEE 2018 9
Darwin-Fowler, aka saddle point integration
function with maximum in , consider thermodynamic limit
is given by the minimum of the free energy
f = e - s(e) / β
MiWoCI IEEE 2018
free energy and temperature
in large systems (thermodynamic limit) ln Z is dominated by
the states with minimal free energy
T controls competition between - smaller energies
- larger number of available states
singles out the lowest energy (groundstate)
Metropolis: only down-hill, Langevin: true gradient descent
all states occur with equal probability, independent of energy
Metropolis: accept all random changes
Langevin: noise term suppresses gradient
10
T=1/β is the temperature at which <H>T = N eo
assumption: ergodicity (all states can be reached in the dynamics)
MiWoCI IEEE 2018 11
theory of stochastic optimization
by means of statistical physics
- development of algorithms
(e.g. Simulated Annealing )
- analysis of problem properties, even
in absence of practical algorithms
(number of groundstates, minima,...)
- applicable in many different
contexts, universality
statistical physics & optimization
MiWoCI IEEE 2018
machine learning
special case machine learning: choice of adaptive
e.g. all weights in a neural network, prototype
components in LVQ, centers in RBF-network....
cost function: defined w.r.t.
sum over examples, feature vectors xμ and target labels σ μ (if supervised)
costs or error measure ε(...) per example, e.g. number of misclassifications
training:
• consider weights as the outcome of a stochastic optimization process
• formal (thermal) equilibrium given by
• < ... >T : thermal average over training process for a particular data set
12
MiWoCI IEEE 2018
quenched average over training data
• note: energy/cost function is defined for one particular data set
typical properties by additional average over randomized data
• typical properties on average over randomized data set: derivatives of
quenched free energy ~ yield averages of the form
• the simplest assumption: i.i.d. input vectors
with i.i.d. components
• training labels given by target function:
for instance provided by a teacher network
• student / teacher scenarios
control the complexity of target rule and learning system
analyse training by (stochastic) optimization
? ? ? ? ? ? ?
13
MiWoCI IEEE 2018
average over training data
„replica trick“
n non-interacting „copies“ of the system (replicas)
quenched average introduces effective interactions between replicas
... saddle point integration for <Zn>ID , quenched free energy
requires analytic continuation to
Marc Mezard, Giorgio Parisi, Miguel Virasoro
Spin Glass Theory and Beyond, World Scientific (1987)
mathematical subtleties, replica symmetry-breaking,
order parameter functions, ...
14
MiWoCI IEEE 2018
annealed approximation:
becomes exact (=) in the high-temperature limit (replicas decouple)
• independent single examples:
• saddle point integration: < lnZ >ID / N is dominated by minimum of
• extensive number of examples: (prop. to number of weights)
generalization error plays the role
of the energy (i.e. training error?)
annealed approximation and high-T limit
with finite
“ learn almost nothing... ” (high T )
“ ...from infinitely many examples ”
15
average in the
exponent for β≈0
MiWoCI IEEE 2018
example: perceptron training
• student:
• teacher:
• training data: with independent
zero mean, unit variance
• Central Limit Theorem (CLT), for large N :
normally distributed with
16
fully specifies
MiWoCI IEEE 2018
example: perceptron training
i.i.d. isotropic data, geometry:
H SR=
=
J
B
S
f
• or, more intuitively...
order parameter R
17
MiWoCI IEEE 2018
example: perceptron training
• entropy:
- all weights with order parameter R: hypersphere
with radius ~ , volume ~
(+ irrelevant constants)
note: result carries over to more general C (many students and teachers)
• high-T free energy
• re-scaled number of examples
18
R
- or: exp. representation of the δ-functions + saddle point integration...
MiWoCI IEEE 2018 19
• “physical state”: (arg-)minimum of
• typical learning curves
example: perceptron training
R
R
R
• perfect generalization is achieved
MiWoCI IEEE 2018
perceptron learning curve
a very simple model:
- linearly separable rule (teacher)
- i.i.d. isotropic random data
- high temperature stochastic training
with perfect generalization for
Modifications/extensions:
- noisy data, unlearnable rules
- low-T results (annealed, replica...)
- unsupervised learning
- structured input data (clusters)
- large margin perceptron and SVM
- variational optimization of energy
function (i.e. training algorithm)
- binary weights (“Ising Perceptron”)
typical learning curve,
on average over random
linearly separable data sets
of a given size
20
MiWoCI IEEE 2018
example: Ising perceptron
• student:
• teacher:
• generalization error unchanged:
• entropy:
probability for alignment/misalignment
entropy of mixing N(1+R)/2 aligned and N(1-R)/2 misaligned components
21
MiWoCI IEEE 2018 22
• competing minima in
• for
example: Ising perceptron
• for
co-existing phases of poor/perfect
generalization, lower minimum is stable,
higher minimum is meta-stable
• for only one minimum (R=1)
“first order phase transition”
to perfect generalization
“system freezes” in
R
R
R
MiWoCI IEEE 2018
Monte Carlo results
(no prior knowledge)
results carry over (qualitatively) to low (zero) temperature training:
e.g. nature of phase transitions etc.
first order phase transition
(local) (global)
(global)
(local)
finite size
effects
equal f
23
MiWoCI IEEE 2018
adaptive student
N input units
(K) hidden units (M)
teacher
? ? ? ? ? ? ?
soft committee machine
order parameters: model parameters:macroscopic
properties of
the student
network:
training: minimization of
24
MiWoCI IEEE 2018 25
exploit thermodynamic limit, CLT for
normally distributed with zero means and covariance matrix
(+ constant)
soft committee machine
MiWoCI IEEE 2018 26
soft committee machine
K=M=2
symmetry breaking
phase transition (2nd order)
K=M > 2
1st order phase transition
with metastable states
K=5
K=2
(e.g.)
hidden unit specialization
MiWoCI IEEE 2018
adaptive student teacher
? ? ? ? ? ? ?
soft committee machine
• initial training phase: unspecialized hidden unit weights:
all student units represent “mean teacher”
• transition to specialization, makes perfect agreement possible
27
MiWoCI IEEE 2018
adaptive student teacher
? ? ? ? ? ? ?
soft committee machine
• successful training requires a critical number of examples
• hidden unit permutation symmetry has to be broken
28
• initial training phase: unspecialized hidden unit weights
all student units represent “mean teacher”
• transition to specialization, makes perfect agreement possible
equivalent permutations:
MiWoCI IEEE 2018 29
unspecialized state
remains meta-stable up to
large hidden layer:
many hidden units
perfect generalization without prior knowledge
impossible with order O(NK) examples ?
MiWoCI IEEE 2018 30
what’s next ?
• activation functions (ReLu etc.)
• deep networks
• online training by stochastic g.d.
• math. description in terms of ODE
• learning rates, momentum etc.
• regularization, e.g. drop-out,
weight decay etc.
• tree-like architectures as models
of convolution & pooling
• concept drift: time-dependent
statistics of data and target
... a lot more & new ideas to come
network architecture and design
dynamics of network training
other topics

More Related Content

What's hot

Random Forest for Big Data
Random Forest for Big DataRandom Forest for Big Data
Random Forest for Big Data
tuxette
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
tuxette
 
Nonnegative Matrix Factorization with Side Information for Time Series Recove...
Nonnegative Matrix Factorization with Side Information for Time Series Recove...Nonnegative Matrix Factorization with Side Information for Time Series Recove...
Nonnegative Matrix Factorization with Side Information for Time Series Recove...
Paris Women in Machine Learning and Data Science
 
Presentation on machine learning
Presentation on machine learningPresentation on machine learning
Listrik Magnet (1)
Listrik Magnet (1)Listrik Magnet (1)
Listrik Magnet (1)
jayamartha
 
Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)
HJ van Veen
 
Webinar on Graph Neural Networks
Webinar on Graph Neural NetworksWebinar on Graph Neural Networks
Webinar on Graph Neural Networks
LucaCrociani1
 
Tracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First PrinciplesTracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First Principles
kenluck2001
 
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
University of Groningen
 
Graph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph GenerationGraph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph Generation
Sangmin Woo
 
010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process
Ha Phuong
 
Reproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishReproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfish
tuxette
 
Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...
tuxette
 
Tracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First PrinciplesTracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First Principles
kenluck2001
 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC data
tuxette
 
Gnn overview
Gnn overviewGnn overview
Gnn overview
Louis (Yufeng) Wang
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIR
tuxette
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
tuxette
 
Cross-view Activity Recognition using Hankelets
Cross-view Activity Recognition using HankeletsCross-view Activity Recognition using Hankelets
Cross-view Activity Recognition using HankeletsGeorge Oleinikov
 
Recognition of Handwritten Mathematical Equations
Recognition of  Handwritten Mathematical EquationsRecognition of  Handwritten Mathematical Equations
Recognition of Handwritten Mathematical Equations
IRJET Journal
 

What's hot (20)

Random Forest for Big Data
Random Forest for Big DataRandom Forest for Big Data
Random Forest for Big Data
 
Convolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernelsConvolutional networks and graph networks through kernels
Convolutional networks and graph networks through kernels
 
Nonnegative Matrix Factorization with Side Information for Time Series Recove...
Nonnegative Matrix Factorization with Side Information for Time Series Recove...Nonnegative Matrix Factorization with Side Information for Time Series Recove...
Nonnegative Matrix Factorization with Side Information for Time Series Recove...
 
Presentation on machine learning
Presentation on machine learningPresentation on machine learning
Presentation on machine learning
 
Listrik Magnet (1)
Listrik Magnet (1)Listrik Magnet (1)
Listrik Magnet (1)
 
Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)
 
Webinar on Graph Neural Networks
Webinar on Graph Neural NetworksWebinar on Graph Neural Networks
Webinar on Graph Neural Networks
 
Tracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First PrinciplesTracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First Principles
 
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
 
Graph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph GenerationGraph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph Generation
 
010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process
 
Reproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfishReproducibility and differential analysis with selfish
Reproducibility and differential analysis with selfish
 
Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...
 
Tracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First PrinciplesTracking the tracker: Time Series Analysis in Python from First Principles
Tracking the tracker: Time Series Analysis in Python from First Principles
 
Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC data
 
Gnn overview
Gnn overviewGnn overview
Gnn overview
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIR
 
Kernel methods for data integration in systems biology
Kernel methods for data integration in systems biologyKernel methods for data integration in systems biology
Kernel methods for data integration in systems biology
 
Cross-view Activity Recognition using Hankelets
Cross-view Activity Recognition using HankeletsCross-view Activity Recognition using Hankelets
Cross-view Activity Recognition using Hankelets
 
Recognition of Handwritten Mathematical Equations
Recognition of  Handwritten Mathematical EquationsRecognition of  Handwritten Mathematical Equations
Recognition of Handwritten Mathematical Equations
 

Similar to The statistical physics of learning - revisited

stat-phys-AMALEA.pdf
stat-phys-AMALEA.pdfstat-phys-AMALEA.pdf
stat-phys-AMALEA.pdf
University of Groningen
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer PerceptronsESCOM
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzerbutest
 
Fundamentals of quantum computing part i rev
Fundamentals of quantum computing   part i revFundamentals of quantum computing   part i rev
Fundamentals of quantum computing part i rev
PRADOSH K. ROY
 
Implementing a neural network potential for exascale molecular dynamics
Implementing a neural network potential for exascale molecular dynamicsImplementing a neural network potential for exascale molecular dynamics
Implementing a neural network potential for exascale molecular dynamics
PFHub PFHub
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
Fabian Pedregosa
 
Machine learning with quantum computers
Machine learning with quantum computersMachine learning with quantum computers
Machine learning with quantum computers
Speck&Tech
 
ai7.ppt
ai7.pptai7.ppt
ai7.ppt
MrHacker61
 
ai7.ppt
ai7.pptai7.ppt
ai7.ppt
qwerty432737
 
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
IJDKP
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
PyData
 
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Förderverein Technische Fakultät
 
Quatum fridge
Quatum fridgeQuatum fridge
Quatum fridge
Jun Steed Huang
 
ANNs have been widely used in various domains for: Pattern recognition Funct...
ANNs have been widely used in various domains for: Pattern recognition  Funct...ANNs have been widely used in various domains for: Pattern recognition  Funct...
ANNs have been widely used in various domains for: Pattern recognition Funct...
vijaym148
 
Application of thermal error in machine tools based on Dynamic Bayesian Network
Application of thermal error in machine tools based on Dynamic Bayesian NetworkApplication of thermal error in machine tools based on Dynamic Bayesian Network
Application of thermal error in machine tools based on Dynamic Bayesian Network
IJRES Journal
 
Modelling and Analysis Laboratory Manual
Modelling and Analysis Laboratory ManualModelling and Analysis Laboratory Manual
Modelling and Analysis Laboratory Manual
Hareesha N Gowda, Dayananda Sagar College of Engg, Bangalore
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
midi
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
DrBaljitSinghKhehra
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
DrBaljitSinghKhehra
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
DrBaljitSinghKhehra
 

Similar to The statistical physics of learning - revisited (20)

stat-phys-AMALEA.pdf
stat-phys-AMALEA.pdfstat-phys-AMALEA.pdf
stat-phys-AMALEA.pdf
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
 
Fundamentals of quantum computing part i rev
Fundamentals of quantum computing   part i revFundamentals of quantum computing   part i rev
Fundamentals of quantum computing part i rev
 
Implementing a neural network potential for exascale molecular dynamics
Implementing a neural network potential for exascale molecular dynamicsImplementing a neural network potential for exascale molecular dynamics
Implementing a neural network potential for exascale molecular dynamics
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradientHyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
 
Machine learning with quantum computers
Machine learning with quantum computersMachine learning with quantum computers
Machine learning with quantum computers
 
ai7.ppt
ai7.pptai7.ppt
ai7.ppt
 
ai7.ppt
ai7.pptai7.ppt
ai7.ppt
 
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
Nonequilibrium Network Dynamics_Inference, Fluctuation-Respones & Tipping Poi...
 
Quatum fridge
Quatum fridgeQuatum fridge
Quatum fridge
 
ANNs have been widely used in various domains for: Pattern recognition Funct...
ANNs have been widely used in various domains for: Pattern recognition  Funct...ANNs have been widely used in various domains for: Pattern recognition  Funct...
ANNs have been widely used in various domains for: Pattern recognition Funct...
 
Application of thermal error in machine tools based on Dynamic Bayesian Network
Application of thermal error in machine tools based on Dynamic Bayesian NetworkApplication of thermal error in machine tools based on Dynamic Bayesian Network
Application of thermal error in machine tools based on Dynamic Bayesian Network
 
Modelling and Analysis Laboratory Manual
Modelling and Analysis Laboratory ManualModelling and Analysis Laboratory Manual
Modelling and Analysis Laboratory Manual
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
 

More from University of Groningen

Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
University of Groningen
 
ESE-Eyes-2023.pdf
ESE-Eyes-2023.pdfESE-Eyes-2023.pdf
ESE-Eyes-2023.pdf
University of Groningen
 
APPIS-FDGPET.pdf
APPIS-FDGPET.pdfAPPIS-FDGPET.pdf
APPIS-FDGPET.pdf
University of Groningen
 
stat-phys-appis-reduced.pdf
stat-phys-appis-reduced.pdfstat-phys-appis-reduced.pdf
stat-phys-appis-reduced.pdf
University of Groningen
 
prototypes-AMALEA.pdf
prototypes-AMALEA.pdfprototypes-AMALEA.pdf
prototypes-AMALEA.pdf
University of Groningen
 
Evidence for tissue and stage-specific composition of the ribosome: machine l...
Evidence for tissue and stage-specific composition of the ribosome: machine l...Evidence for tissue and stage-specific composition of the ribosome: machine l...
Evidence for tissue and stage-specific composition of the ribosome: machine l...
University of Groningen
 
Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)
University of Groningen
 
Biehl hanze-2021
Biehl hanze-2021Biehl hanze-2021
Biehl hanze-2021
University of Groningen
 
2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...
University of Groningen
 
2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ... 2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ...
University of Groningen
 
Prototype-based classifiers and their applications in the life sciences
Prototype-based classifiers and their applications in the life sciencesPrototype-based classifiers and their applications in the life sciences
Prototype-based classifiers and their applications in the life sciences
University of Groningen
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learning
University of Groningen
 
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
University of Groningen
 
2013: Prototype-based learning and adaptive distances for classification
2013: Prototype-based learning and adaptive distances for classification2013: Prototype-based learning and adaptive distances for classification
2013: Prototype-based learning and adaptive distances for classification
University of Groningen
 
2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...
University of Groningen
 
2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain Data2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain Data
University of Groningen
 
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
University of Groningen
 
June 2017: Biomedical applications of prototype-based classifiers and relevan...
June 2017: Biomedical applications of prototype-based classifiers and relevan...June 2017: Biomedical applications of prototype-based classifiers and relevan...
June 2017: Biomedical applications of prototype-based classifiers and relevan...
University of Groningen
 
January 2020: Prototype-based systems in machine learning
January 2020: Prototype-based systems in machine learning  January 2020: Prototype-based systems in machine learning
January 2020: Prototype-based systems in machine learning
University of Groningen
 

More from University of Groningen (19)

Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
 
ESE-Eyes-2023.pdf
ESE-Eyes-2023.pdfESE-Eyes-2023.pdf
ESE-Eyes-2023.pdf
 
APPIS-FDGPET.pdf
APPIS-FDGPET.pdfAPPIS-FDGPET.pdf
APPIS-FDGPET.pdf
 
stat-phys-appis-reduced.pdf
stat-phys-appis-reduced.pdfstat-phys-appis-reduced.pdf
stat-phys-appis-reduced.pdf
 
prototypes-AMALEA.pdf
prototypes-AMALEA.pdfprototypes-AMALEA.pdf
prototypes-AMALEA.pdf
 
Evidence for tissue and stage-specific composition of the ribosome: machine l...
Evidence for tissue and stage-specific composition of the ribosome: machine l...Evidence for tissue and stage-specific composition of the ribosome: machine l...
Evidence for tissue and stage-specific composition of the ribosome: machine l...
 
Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)
 
Biehl hanze-2021
Biehl hanze-2021Biehl hanze-2021
Biehl hanze-2021
 
2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...
 
2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ... 2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ...
 
Prototype-based classifiers and their applications in the life sciences
Prototype-based classifiers and their applications in the life sciencesPrototype-based classifiers and their applications in the life sciences
Prototype-based classifiers and their applications in the life sciences
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learning
 
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
 
2013: Prototype-based learning and adaptive distances for classification
2013: Prototype-based learning and adaptive distances for classification2013: Prototype-based learning and adaptive distances for classification
2013: Prototype-based learning and adaptive distances for classification
 
2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...
 
2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain Data2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain Data
 
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
 
June 2017: Biomedical applications of prototype-based classifiers and relevan...
June 2017: Biomedical applications of prototype-based classifiers and relevan...June 2017: Biomedical applications of prototype-based classifiers and relevan...
June 2017: Biomedical applications of prototype-based classifiers and relevan...
 
January 2020: Prototype-based systems in machine learning
January 2020: Prototype-based systems in machine learning  January 2020: Prototype-based systems in machine learning
January 2020: Prototype-based systems in machine learning
 

Recently uploaded

Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
ssuserbfdca9
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 

Recently uploaded (20)

Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 

The statistical physics of learning - revisited

  • 1. MiWoCI IEEE 2018 1 The statistical physics of learning - revisited www.cs.rug.nl/~biehl Michael Biehl Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence University of Groningen / NL
  • 2. MiWoCI IEEE 2018 2 machine learning theory ? Computational Learning Theory performance bounds & guarantees independent of - specific task - statistical properties of data - details of the training ... Statistical Physics of Learning: typical properties & phenomena for models of specific - systems/network architectures - statistics of data and noise - training algorithms / cost functions ...
  • 3. MiWoCI IEEE 2018 www.ibm.com/developerworks/library/cc-cognitive-neural-networks-deep- dive/ SVM math. analogies with the theory of disordered magnetic materials statistical physics - of network dynamics (neurons) - of learning processes (weights) A Neural Networks timeline SOM LVQMinsky & Papert Perceptrons Widrow&Hoff: Adaline
  • 4. MiWoCI IEEE 2018 4 news from the stone age of neural networks Statistical Physics of Neural Networks: Two ground-breaking papers Training, feed-forward networks: Elizabeth Gardner (1957-1988). The space of interactions in neural networks. J. Phys. A 21:257-270 (1988) Dynamics, attractor neural networks: John Hopfield. Neural Networks and physical systems with emergent collective computational abilities. PNAS 79(8):2554-2558 (1982)
  • 5. MiWoCI IEEE 2018 5 From stochastic optimization Monte Carlo, Langevin dynamics .... to thermal equilibrium: temperature, free energy, entropy, ... (.... and back) formal application to optimization training: stochastic optimization of (many) weights guided by a data-dependent cost function randomized data ( frozen disorder ) models: student/teacher scenarios Machine learning: typical properties of large learning systems Examples: perceptron classifier, “Ising” perceptron, layered networks analysis: order parameters, disorder average, replica trick annealed approximation, high temperature limit overview Outlook
  • 6. MiWoCI IEEE 2018 6 stochastic optimization objective/cost/energy function , e.g. with many degrees of freedom discrete, e.g. continuous, e.g. Metropolis algorithm Langevin dynamics • acceptance of the change - always if - with probability if • suggest a (small) change , e.g. „single spin flip“ for a random j • compute • continuous temporal change, „noisy gradient descent“ controls acceptance rate for „uphill“ moves ... controls noise level, i.e. random deviation from gradient • with delta-correlated white noise (spatial + temporal independence)
  • 7. MiWoCI IEEE 2018 thermal equilibrium Markov chain continuous dynamics stationary density of configurations: normalization: „Zustandssumme“, partition function Gibbs-Boltzmann density of states • physics: thermal equilibrium of a physical system at temperature T • optimization: formal equilibrium situation, control parameter T 7 note: additional constraints can be imposed on the weights, for instance: normalization
  • 8. MiWoCI IEEE 2018 the role of Z: thermal averages <...>T in equilibrium, e.g. ... can be expressed as derivatives of ln Z ~ vol. of states with energy E (microcanonical) entropy per degree of freedom: assume extensive energy, proportional to system size N: thermal averages and entropy 8 re-write as an integral over all possible energies:
  • 9. MiWoCI IEEE 2018 9 Darwin-Fowler, aka saddle point integration function with maximum in , consider thermodynamic limit is given by the minimum of the free energy f = e - s(e) / β
  • 10. MiWoCI IEEE 2018 free energy and temperature in large systems (thermodynamic limit) ln Z is dominated by the states with minimal free energy T controls competition between - smaller energies - larger number of available states singles out the lowest energy (groundstate) Metropolis: only down-hill, Langevin: true gradient descent all states occur with equal probability, independent of energy Metropolis: accept all random changes Langevin: noise term suppresses gradient 10 T=1/β is the temperature at which <H>T = N eo assumption: ergodicity (all states can be reached in the dynamics)
  • 11. MiWoCI IEEE 2018 11 theory of stochastic optimization by means of statistical physics - development of algorithms (e.g. Simulated Annealing ) - analysis of problem properties, even in absence of practical algorithms (number of groundstates, minima,...) - applicable in many different contexts, universality statistical physics & optimization
  • 12. MiWoCI IEEE 2018 machine learning special case machine learning: choice of adaptive e.g. all weights in a neural network, prototype components in LVQ, centers in RBF-network.... cost function: defined w.r.t. sum over examples, feature vectors xμ and target labels σ μ (if supervised) costs or error measure ε(...) per example, e.g. number of misclassifications training: • consider weights as the outcome of a stochastic optimization process • formal (thermal) equilibrium given by • < ... >T : thermal average over training process for a particular data set 12
  • 13. MiWoCI IEEE 2018 quenched average over training data • note: energy/cost function is defined for one particular data set typical properties by additional average over randomized data • typical properties on average over randomized data set: derivatives of quenched free energy ~ yield averages of the form • the simplest assumption: i.i.d. input vectors with i.i.d. components • training labels given by target function: for instance provided by a teacher network • student / teacher scenarios control the complexity of target rule and learning system analyse training by (stochastic) optimization ? ? ? ? ? ? ? 13
  • 14. MiWoCI IEEE 2018 average over training data „replica trick“ n non-interacting „copies“ of the system (replicas) quenched average introduces effective interactions between replicas ... saddle point integration for <Zn>ID , quenched free energy requires analytic continuation to Marc Mezard, Giorgio Parisi, Miguel Virasoro Spin Glass Theory and Beyond, World Scientific (1987) mathematical subtleties, replica symmetry-breaking, order parameter functions, ... 14
  • 15. MiWoCI IEEE 2018 annealed approximation: becomes exact (=) in the high-temperature limit (replicas decouple) • independent single examples: • saddle point integration: < lnZ >ID / N is dominated by minimum of • extensive number of examples: (prop. to number of weights) generalization error plays the role of the energy (i.e. training error?) annealed approximation and high-T limit with finite “ learn almost nothing... ” (high T ) “ ...from infinitely many examples ” 15 average in the exponent for β≈0
  • 16. MiWoCI IEEE 2018 example: perceptron training • student: • teacher: • training data: with independent zero mean, unit variance • Central Limit Theorem (CLT), for large N : normally distributed with 16 fully specifies
  • 17. MiWoCI IEEE 2018 example: perceptron training i.i.d. isotropic data, geometry: H SR= = J B S f • or, more intuitively... order parameter R 17
  • 18. MiWoCI IEEE 2018 example: perceptron training • entropy: - all weights with order parameter R: hypersphere with radius ~ , volume ~ (+ irrelevant constants) note: result carries over to more general C (many students and teachers) • high-T free energy • re-scaled number of examples 18 R - or: exp. representation of the δ-functions + saddle point integration...
  • 19. MiWoCI IEEE 2018 19 • “physical state”: (arg-)minimum of • typical learning curves example: perceptron training R R R • perfect generalization is achieved
  • 20. MiWoCI IEEE 2018 perceptron learning curve a very simple model: - linearly separable rule (teacher) - i.i.d. isotropic random data - high temperature stochastic training with perfect generalization for Modifications/extensions: - noisy data, unlearnable rules - low-T results (annealed, replica...) - unsupervised learning - structured input data (clusters) - large margin perceptron and SVM - variational optimization of energy function (i.e. training algorithm) - binary weights (“Ising Perceptron”) typical learning curve, on average over random linearly separable data sets of a given size 20
  • 21. MiWoCI IEEE 2018 example: Ising perceptron • student: • teacher: • generalization error unchanged: • entropy: probability for alignment/misalignment entropy of mixing N(1+R)/2 aligned and N(1-R)/2 misaligned components 21
  • 22. MiWoCI IEEE 2018 22 • competing minima in • for example: Ising perceptron • for co-existing phases of poor/perfect generalization, lower minimum is stable, higher minimum is meta-stable • for only one minimum (R=1) “first order phase transition” to perfect generalization “system freezes” in R R R
  • 23. MiWoCI IEEE 2018 Monte Carlo results (no prior knowledge) results carry over (qualitatively) to low (zero) temperature training: e.g. nature of phase transitions etc. first order phase transition (local) (global) (global) (local) finite size effects equal f 23
  • 24. MiWoCI IEEE 2018 adaptive student N input units (K) hidden units (M) teacher ? ? ? ? ? ? ? soft committee machine order parameters: model parameters:macroscopic properties of the student network: training: minimization of 24
  • 25. MiWoCI IEEE 2018 25 exploit thermodynamic limit, CLT for normally distributed with zero means and covariance matrix (+ constant) soft committee machine
  • 26. MiWoCI IEEE 2018 26 soft committee machine K=M=2 symmetry breaking phase transition (2nd order) K=M > 2 1st order phase transition with metastable states K=5 K=2 (e.g.) hidden unit specialization
  • 27. MiWoCI IEEE 2018 adaptive student teacher ? ? ? ? ? ? ? soft committee machine • initial training phase: unspecialized hidden unit weights: all student units represent “mean teacher” • transition to specialization, makes perfect agreement possible 27
  • 28. MiWoCI IEEE 2018 adaptive student teacher ? ? ? ? ? ? ? soft committee machine • successful training requires a critical number of examples • hidden unit permutation symmetry has to be broken 28 • initial training phase: unspecialized hidden unit weights all student units represent “mean teacher” • transition to specialization, makes perfect agreement possible equivalent permutations:
  • 29. MiWoCI IEEE 2018 29 unspecialized state remains meta-stable up to large hidden layer: many hidden units perfect generalization without prior knowledge impossible with order O(NK) examples ?
  • 30. MiWoCI IEEE 2018 30 what’s next ? • activation functions (ReLu etc.) • deep networks • online training by stochastic g.d. • math. description in terms of ODE • learning rates, momentum etc. • regularization, e.g. drop-out, weight decay etc. • tree-like architectures as models of convolution & pooling • concept drift: time-dependent statistics of data and target ... a lot more & new ideas to come network architecture and design dynamics of network training other topics