SlideShare a Scribd company logo
Mathématiques et Intelligence Artificielle
Marc Lelarge
INRIA-ENS
Colloquium du CEREMADE (Oct. 2023)
Mathématiques et Intelligence Artificielle
Mathématiques et Intelligence Artificielle
Part 1: statistical physics for machine learning
- A simple version of Approximate Message Passing (AMP)
algoritm
Part 1: statistical physics for machine learning
- A simple version of Approximate Message Passing (AMP)
algoritm
- Gap between information-theoretically optimal and
computationally feasible estimators
Part 1: statistical physics for machine learning
- A simple version of Approximate Message Passing (AMP)
algoritm
- Gap between information-theoretically optimal and
computationally feasible estimators
- Running example: matrix model
I connection to random matrix theory
I sparse PCA, community detection, Z2 synchronization,
submatrix localization, hidden clique...
A bit of history: 70’s
A bit of history: 70’s
A bit of history: 2010’s
A bit of history: 2010’s
A bit of history: 2010’s
Applications to high-dimensional statistics
AMP and its state evolution
Given a matrix W ∈ Rn×n and scalar functions ft : R → R, let
x0 ∈ Rn and
xt+1
= Wft(xt
) − btft−1(xt−1
) ∈ Rn
where
bt =
1
n
n
X
i=1
f 0
t (xt
i ) ∈ R.
AMP and its state evolution
Given a matrix W ∈ Rn×n and scalar functions ft : R → R, let
x0 ∈ Rn and
xt+1
= Wft(xt
) − btft−1(xt−1
) ∈ Rn
where
bt =
1
n
n
X
i=1
f 0
t (xt
i ) ∈ R.
If W ∼ GOE(n), ft are Lipschitz and the components of x0 are i.i.d
∼ X0 with E

X2
0

= 1, then for any nice test function Ψ : Rt → R,
1
n
n
X
i=1
Ψ

x1
i , . . . , xt
i

→ E [Ψ(Z1, . . . , Zt)] ,
where (Z1, . . . , Zt)
d
= (σ1G1, . . . , σtGt), where Gs ∼ N(0, 1) i.i.d.
(Bayati Montanari ’11)
Sanity check
We have x1 = Wf0(x0) so that
x1
i =
X
j
Wijf0(x0
j ),
where Wij ∼ N(0, 1/n) i.i.d. (ignore diagonal terms).
Hence x1 is a centred Gaussian vector with entries having variance
1
N
X
j
f0(x0
j )2
≈ E
h
f0(X0)2
i
= σ1.
AMP proof of Wigner’s semicircle law
Consider AMP with linear functions ft(x) = x, so that
x1
= Wx0
x2
= Wx1
− x0
= (W2
− Id)x0
x3
= Wx2
− x1
= (W3
− 2W)x0
,
AMP proof of Wigner’s semicircle law
Consider AMP with linear functions ft(x) = x, so that
x1
= Wx0
x2
= Wx1
− x0
= (W2
− Id)x0
x3
= Wx2
− x1
= (W3
− 2W)x0
,
so xt = Pt(W)x0 with
P0(x) = 1, P1(x) = x
Pt+1(x) = xPt(x) − Pt−1(x).
{Pt} are Chebyshev polynomials orthonormal wr.t. the semicircle
density µSC(x) = 1
2π
q
(4 − x2)+.
AMP proof of Wigner’s semicircle law
Consider AMP with linear functions ft(x) = x, so that
x1
= Wx0
x2
= Wx1
− x0
= (W2
− Id)x0
x3
= Wx2
− x1
= (W3
− 2W)x0
,
so xt = Pt(W)x0 with
P0(x) = 1, P1(x) = x
Pt+1(x) = xPt(x) − Pt−1(x).
{Pt} are Chebyshev polynomials orthonormal wr.t. the semicircle
density µSC(x) = 1
2π
q
(4 − x2)+.
When 1
n kx0k = 1, we have 1
n hxs, xti ≈ trPs(W)Pt(Wt).
AMP proof of Wigner’s semicircle law
xt+1
= Wxt
− xt−1
In this case, AMP state evolution gives
1
n
hxs
, xt
i → E [ZsZt] = 1(s = t)
AMP proof of Wigner’s semicircle law
xt+1
= Wxt
− xt−1
In this case, AMP state evolution gives
1
n
hxs
, xt
i → E [ZsZt] = 1(s = t)
Since 1
n hxs, xti ≈ trPs(W)Pt(Wt), the polynomials Pt are
orthonormal w.r.t the limit empirical spectral distribution of W
which must be µSC.
AMP proof of Wigner’s semicircle law
xt+1
= Wxt
− xt−1
In this case, AMP state evolution gives
1
n
hxs
, xt
i → E [ZsZt] = 1(s = t)
Since 1
n hxs, xti ≈ trPs(W)Pt(Wt), the polynomials Pt are
orthonormal w.r.t the limit empirical spectral distribution of W
which must be µSC.
Credit: Zhou Fan.
Wigner’s semicircle law: experiments
Wigner’s semicircle law: experiments
Wigner’s semicircle law: experiments
Wigner’s semicircle law: experiments
Explaining the Onsager term
xt+1
= Wxt
− xt−1
The first iteration with an Onsager term appears for t = 2.
Explaining the Onsager term
xt+1
= Wxt
− xt−1
The first iteration with an Onsager term appears for t = 2.
Then we have x2 = Wx1 − x0 = W2x0 − x0 so that
x2
1 =
X
i
W 2
1i x0
1 +
X
i,j6=1
W1i Wijx0
j − x0
1
=



X
i
W 2
1i x0
1 +
X
i,j6=1
W1i Wijx0
j
| {z }
N(0,1)
−

x0
1
Explaining the Onsager term
xt+1
= Wxt
− xt−1
The first iteration with an Onsager term appears for t = 2.
Then we have x2 = Wx1 − x0 = W2x0 − x0 so that
x2
1 =
X
i
W 2
1i x0
1 +
X
i,j6=1
W1i Wijx0
j − x0
1
=



X
i
W 2
1i x0
1 +
X
i,j6=1
W1i Wijx0
j
| {z }
N(0,1)
−

x0
1
The Onsager term is very similar to the Itô-correction in stochastic
calculus.
Part 1: statistical physics for machine learning
- A simple version of AMP algoritm
- Gap between information-theoretically optimal and
computationally feasible estimators
- Running example: matrix model
I connection to random matrix theory
I sparse PCA, community detection, Z2 synchronization,
submatrix localization, hidden clique...
Low-rank matrix estimation
“Spiked Wigner” model
Y
|{z}
observations
=
v
u
u
u
t
λ
n
XX|
| {z }
signal
+ Z
|{z}
noise
I X: vector of dimension n with entries Xi
i.i.d.
∼ P0. EX1 = 0,
EX2
1 = 1.
I Zi,j = Zj,i
i.i.d.
∼ N(0, 1).
I λ: signal-to-noise ratio.
I λ and P0 are known by the statistician.
Goal: recover the low-rank matrix XX|
from Y.
Principal component analysis (PCA)
Spectral estimator:
Estimate X using the eigenvector x̂n associated with the
largest eigenvalue µn of Y/
√
n.
Principal component analysis (PCA)
Spectral estimator:
Estimate X using the eigenvector x̂n associated with the
largest eigenvalue µn of Y/
√
n.
B.B.P. phase transition
I if λ 6 1



µn
a.s.
−
−
−
→
n→∞
2
X · x̂n
a.s.
−
−
−
→
n→∞
0
I if λ  1



µn
a.s.
−
−
−
→
n→∞
√
λ + 1
√
λ
 2
|X · x̂n|
a.s.
−
−
−
→
n→∞
p
1 − 1/λ  0
(Baik, Ben Arous, Péché ’05)
Questions
I PCA fails when λ 6 1, but is it still possible to recover
the signal?
Questions
I PCA fails when λ 6 1, but is it still possible to recover
the signal?
I When λ  1, is PCA optimal?
Questions
I PCA fails when λ 6 1, but is it still possible to recover
the signal?
I When λ  1, is PCA optimal?
I More generally, what is the best achievable estimation
performance in both regimes?
Plot of MMSE
Figure: Spiked Wigner model, centred binary prior (unit variance).
We can certainly improve spectral algorithm!
A scalar denoising problem
For Y =
√
γX0 + Z where X0 ∼ P0 and Z ∼ N(0, 1)
A scalar denoising problem
For Y =
√
γX0 + Z where X0 ∼ P0 and Z ∼ N(0, 1)
Bayes optimal AMP
We define mmse(γ) = E
h
X0 − E[X0|
√
γX0 + Z]
2
i
and the
recursion:
q0 = 1 − λ−1
qt+1 = 1 − mmse(λqt).
With the optimal denoiser gP0 (y, γ) = E[X0|
√
γX0 + Z = y], AMP
is defined by:
xt+1
= Y
s
λ
n
ft(xt
) − λbtft−1(xt−1
),
where ft(y) = gP0 (y/
√
λqt, λqt).
Bayes optimal AMP: experiment
Plot of MMSE
Figure: Spiked Wigner model, centred binary prior (unit variance).
Plot of MMSE
Figure: Spiked Wigner model, centred binary prior (unit variance).
Limiting formula for the MMSE
Theorem (L, Miolane ’19)
MMSEn −
−
−
→
n→∞
1
|{z}
Dummy MSE
− q∗
(λ)2
where q∗
(λ) is the minimizer of
q  0 7→ −EX0∼P0
Z0∼N

log
Z
x0
dP0(x0)e
√
λqZ0x0+λqX0x0+λq
2
x2
0

+
λ
4
q2
A simplified “free energy landscape”:
0.0 0.2 0.4 0.6 0.8 1.0
q
−0.06
−0.05
−0.04
−0.03
−0.02
−0.01
0.00 −F(λ, q)
(a) “Easy” phase (λ = 1.01)
0.0 0.2 0.4 0.6 0.8
q
−0.002
−0.001
0.000
0.001
0.002
0.003
−F(λ, q)
(b) “Hard” phase (λ = 0.625)
0.0 0.2 0.4 0.6 0.8
q
0.0000
0.0025
0.0050
0.0075
0.0100
0.0125
0.0150
0.0175 −F(λ, q)
(c) “Impossible” phase (λ = 0.5)
Proof ideas: a planted spin system
P(X = x | Y) =
1
Zn
P0(x)eHn(x)
where
Hn(x) =
X
ij
s
λ
n
Yi,jxi xj −
λ
2n
x2
i x2
j .
Proof ideas: a planted spin system
P(X = x | Y) =
1
Zn
P0(x)eHn(x)
where
Hn(x) =
X
ij
s
λ
n
Yi,jxi xj −
λ
2n
x2
i x2
j .
Two step proof:
I Lower bound: Guerra’s interpolation technique. Adapted in
(Korada, Macris ’09) (Krzakala, Xu, Zdeborová ’16)
(
Y =
√
t
p
λ/n XX| + Z
Y0 =
√
1 − t
√
λ X + Z0
I Upper bound: Cavity computations (Mézard, Parisi, Virasoro
’87). Aizenman-Sims-Starr scheme:(Aizenman, Sims,Starr
’03) (Talagrand ’10)
Part 1: conclusion
AMP is an iterative denoising algorithm which is optimal when the
energy landscape is simple.
Main references for this tutorial: (Montanari Venkataramanan ’21)
(L. Miolane ’19)
Many recent research directions: universality, structured matrices,
community detection... and new applications outside electrical
engineering like in ecology.
Part 1: conclusion
AMP is an iterative denoising algorithm which is optimal when the
energy landscape is simple.
Main references for this tutorial: (Montanari Venkataramanan ’21)
(L. Miolane ’19)
Many recent research directions: universality, structured matrices,
community detection... and new applications outside electrical
engineering like in ecology.
Deep learning, the new kid on the block:
From stochastic localization to sampling thanks to AMP
Target distribution µ
Diffusion process:
yt = tx∗
+ Bt, (x∗
∼ µ) ⊥
⊥ B.
µt(.) = P (x∗
∈ .|yt)
µ0 = µ → µ∞ = δx∗
From stochastic localization to sampling thanks to AMP
Target distribution µ
Diffusion process:
yt = tx∗
+ Bt, (x∗
∼ µ) ⊥
⊥ B.
µt(.) = P (x∗
∈ .|yt)
µ0 = µ → µ∞ = δx∗
There exists a Brownian motion G such that yt solves the SDE:
dyt = mt(yt)dt + dGt,
where mt(y) = E [x∗|yt = y]
From stochastic localization to sampling thanks to AMP
Target distribution µ
Diffusion process:
yt = tx∗
+ Bt, (x∗
∼ µ) ⊥
⊥ B.
µt(.) = P (x∗
∈ .|yt)
µ0 = µ → µ∞ = δx∗
There exists a Brownian motion G such that yt solves the SDE:
dyt = mt(yt)dt + dGt,
where mt(y) = E [x∗|yt = y]
Idea: use AMP for sampling (El Alaoui, Montanari, Sellke ’22),
(Montanari, Wu ’23)
From stochastic localization to sampling thanks to AMP
Idea: use AMP for sampling (El Alaoui, Montanari, Sellke ’22),
(Montanari, Wu ’23)
Part 2: mathématiques et IA
Part 2: mathématiques et IA
N. Wiener: the invention of cybernetics
AI winters
Lessons learned from AI winters: Common Task Framework (CTF)
Performance Assessment of Automatic Speech Recognizers (Pallett
’85)
”Definitive tests to fully characterize automatic speech recognizer
or system performance cannot be specified at present. However, it
is possible to design and conduct performance assessment
tests that make use of widely available speech data bases, use
test procedures similar to those used by others, and that are
well documented. These tests provide valuable benchmark
data and informative, though limited, predictive power.”
Key factors for the actual success of deep learning
The Bitter Lesson by Rich Sutton
The Bitter Lesson by Rich Sutton
The biggest lesson that can be read from 70 years of AI research is
that general methods that leverage computation are
ultimately the most effective, and by a large margin (...)
Seeking an improvement that makes a difference in the shorter
term, researchers seek to leverage their human knowledge of the
domain, but the only thing that matters in the long run is the
leveraging of computation (...) the human-knowledge approach
tends to complicate methods in ways that make them less suited to
taking advantage of general methods leveraging computation.
Is human led mathematic over?
Is human led mathematic over?
If it turns out that some Langlands-like questions can be answered
with the use of computation, there is always the possibility that
the mathematical community will interpret this as a demonstration
that, in hindsight, the Langlands program is not as deep as we
thought it was. There is always room to say, “Aha! Now we see
that it is just a matter of computation.” (Avigad ’22)
Thank you for your attention !

More Related Content

What's hot

An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
Jie-Han Chen
 
Mrbml004 : Introduction to Information Theory for Machine Learning
Mrbml004 : Introduction to Information Theory for Machine LearningMrbml004 : Introduction to Information Theory for Machine Learning
Mrbml004 : Introduction to Information Theory for Machine Learning
Jaouad Dabounou
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
Salem-Kabbani
 
Grovers Algorithm
Grovers Algorithm Grovers Algorithm
Grovers Algorithm
CaseyHaaland
 
Image processing
Image processingImage processing
Image processing
Mydul Islam Rashed
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
Subrat Panda, PhD
 
Fuzzy logic in approximate Reasoning
Fuzzy logic in approximate ReasoningFuzzy logic in approximate Reasoning
Fuzzy logic in approximate Reasoning
Hoàng Đức
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
Big Data Colombia
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rules
Mohammed Bennamoun
 
Fuzzy Logic in the Real World
Fuzzy Logic in the Real WorldFuzzy Logic in the Real World
Fuzzy Logic in the Real World
BCSLeicester
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningJungyeol
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
Kuppusamy P
 
Variational Inference
Variational InferenceVariational Inference
Variational Inference
Tushar Tank
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Simplilearn
 
Fuzzy relations
Fuzzy relationsFuzzy relations
Fuzzy relations
naugariya
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
Usman Qayyum
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
MeetupDataScienceRoma
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
Ding Li
 
Unit 5: All
Unit 5: AllUnit 5: All
Unit 5: All
Hector Zenil
 
Signals and systems-3
Signals and systems-3Signals and systems-3
Signals and systems-3
sarun soman
 

What's hot (20)

An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
 
Mrbml004 : Introduction to Information Theory for Machine Learning
Mrbml004 : Introduction to Information Theory for Machine LearningMrbml004 : Introduction to Information Theory for Machine Learning
Mrbml004 : Introduction to Information Theory for Machine Learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Grovers Algorithm
Grovers Algorithm Grovers Algorithm
Grovers Algorithm
 
Image processing
Image processingImage processing
Image processing
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Fuzzy logic in approximate Reasoning
Fuzzy logic in approximate ReasoningFuzzy logic in approximate Reasoning
Fuzzy logic in approximate Reasoning
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rules
 
Fuzzy Logic in the Real World
Fuzzy Logic in the Real WorldFuzzy Logic in the Real World
Fuzzy Logic in the Real World
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
Variational Inference
Variational InferenceVariational Inference
Variational Inference
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
 
Fuzzy relations
Fuzzy relationsFuzzy relations
Fuzzy relations
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Unit 5: All
Unit 5: AllUnit 5: All
Unit 5: All
 
Signals and systems-3
Signals and systems-3Signals and systems-3
Signals and systems-3
 

Similar to Mathematics and AI

Tutorial APS 2023: Phase transition for statistical estimation: algorithms an...
Tutorial APS 2023: Phase transition for statistical estimation: algorithms an...Tutorial APS 2023: Phase transition for statistical estimation: algorithms an...
Tutorial APS 2023: Phase transition for statistical estimation: algorithms an...
Marc Lelarge
 
comm_ch02_random_en.pdf
comm_ch02_random_en.pdfcomm_ch02_random_en.pdf
comm_ch02_random_en.pdf
ssuser87c04b
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
Fabian Pedregosa
 
Introduction to Diffusion Monte Carlo
Introduction to Diffusion Monte CarloIntroduction to Diffusion Monte Carlo
Introduction to Diffusion Monte Carlo
Claudio Attaccalite
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
Christian Robert
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
Elvis DOHMATOB
 
KAUST_talk_short.pdf
KAUST_talk_short.pdfKAUST_talk_short.pdf
KAUST_talk_short.pdf
Chiheb Ben Hammouda
 
Talk in BayesComp 2018
Talk in BayesComp 2018Talk in BayesComp 2018
Talk in BayesComp 2018
JeremyHeng10
 
A numerical method to solve fractional Fredholm-Volterra integro-differential...
A numerical method to solve fractional Fredholm-Volterra integro-differential...A numerical method to solve fractional Fredholm-Volterra integro-differential...
A numerical method to solve fractional Fredholm-Volterra integro-differential...
OctavianPostavaru
 
Nested sampling
Nested samplingNested sampling
Nested sampling
Christian Robert
 
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Yandex
 
Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
Christian Robert
 
Slides ACTINFO 2016
Slides ACTINFO 2016Slides ACTINFO 2016
Slides ACTINFO 2016
Arthur Charpentier
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMC
Pierre Jacob
 
Neuronal self-organized criticality
Neuronal self-organized criticalityNeuronal self-organized criticality
Neuronal self-organized criticality
Osame Kinouchi
 
Optimal multi-configuration approximation of an N-fermion wave function
 Optimal multi-configuration approximation of an N-fermion wave function Optimal multi-configuration approximation of an N-fermion wave function
Optimal multi-configuration approximation of an N-fermion wave function
jiang-min zhang
 
Statistical Physics Assignment Help
Statistical Physics Assignment HelpStatistical Physics Assignment Help
Statistical Physics Assignment Help
Statistics Assignment Help
 
Bath_IMI_Summer_Project
Bath_IMI_Summer_ProjectBath_IMI_Summer_Project
Bath_IMI_Summer_ProjectJosh Young
 
CLIM: Transition Workshop - Projected Data Assimilation - Erik Van Vleck, Ma...
CLIM: Transition Workshop - Projected Data Assimilation  - Erik Van Vleck, Ma...CLIM: Transition Workshop - Projected Data Assimilation  - Erik Van Vleck, Ma...
CLIM: Transition Workshop - Projected Data Assimilation - Erik Van Vleck, Ma...
The Statistical and Applied Mathematical Sciences Institute
 

Similar to Mathematics and AI (20)

Tutorial APS 2023: Phase transition for statistical estimation: algorithms an...
Tutorial APS 2023: Phase transition for statistical estimation: algorithms an...Tutorial APS 2023: Phase transition for statistical estimation: algorithms an...
Tutorial APS 2023: Phase transition for statistical estimation: algorithms an...
 
comm_ch02_random_en.pdf
comm_ch02_random_en.pdfcomm_ch02_random_en.pdf
comm_ch02_random_en.pdf
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
 
Introduction to Diffusion Monte Carlo
Introduction to Diffusion Monte CarloIntroduction to Diffusion Monte Carlo
Introduction to Diffusion Monte Carlo
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
KAUST_talk_short.pdf
KAUST_talk_short.pdfKAUST_talk_short.pdf
KAUST_talk_short.pdf
 
Talk in BayesComp 2018
Talk in BayesComp 2018Talk in BayesComp 2018
Talk in BayesComp 2018
 
A numerical method to solve fractional Fredholm-Volterra integro-differential...
A numerical method to solve fractional Fredholm-Volterra integro-differential...A numerical method to solve fractional Fredholm-Volterra integro-differential...
A numerical method to solve fractional Fredholm-Volterra integro-differential...
 
5.n nmodels i
5.n nmodels i5.n nmodels i
5.n nmodels i
 
Nested sampling
Nested samplingNested sampling
Nested sampling
 
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
 
Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
 
Slides ACTINFO 2016
Slides ACTINFO 2016Slides ACTINFO 2016
Slides ACTINFO 2016
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMC
 
Neuronal self-organized criticality
Neuronal self-organized criticalityNeuronal self-organized criticality
Neuronal self-organized criticality
 
Optimal multi-configuration approximation of an N-fermion wave function
 Optimal multi-configuration approximation of an N-fermion wave function Optimal multi-configuration approximation of an N-fermion wave function
Optimal multi-configuration approximation of an N-fermion wave function
 
Statistical Physics Assignment Help
Statistical Physics Assignment HelpStatistical Physics Assignment Help
Statistical Physics Assignment Help
 
Bath_IMI_Summer_Project
Bath_IMI_Summer_ProjectBath_IMI_Summer_Project
Bath_IMI_Summer_Project
 
CLIM: Transition Workshop - Projected Data Assimilation - Erik Van Vleck, Ma...
CLIM: Transition Workshop - Projected Data Assimilation  - Erik Van Vleck, Ma...CLIM: Transition Workshop - Projected Data Assimilation  - Erik Van Vleck, Ma...
CLIM: Transition Workshop - Projected Data Assimilation - Erik Van Vleck, Ma...
 

Recently uploaded

insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
ssuserbfdca9
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 

Recently uploaded (20)

insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 

Mathematics and AI

  • 1. Mathématiques et Intelligence Artificielle Marc Lelarge INRIA-ENS Colloquium du CEREMADE (Oct. 2023)
  • 4. Part 1: statistical physics for machine learning - A simple version of Approximate Message Passing (AMP) algoritm
  • 5. Part 1: statistical physics for machine learning - A simple version of Approximate Message Passing (AMP) algoritm - Gap between information-theoretically optimal and computationally feasible estimators
  • 6. Part 1: statistical physics for machine learning - A simple version of Approximate Message Passing (AMP) algoritm - Gap between information-theoretically optimal and computationally feasible estimators - Running example: matrix model I connection to random matrix theory I sparse PCA, community detection, Z2 synchronization, submatrix localization, hidden clique...
  • 7. A bit of history: 70’s
  • 8. A bit of history: 70’s
  • 9. A bit of history: 2010’s
  • 10. A bit of history: 2010’s
  • 11. A bit of history: 2010’s
  • 13. AMP and its state evolution Given a matrix W ∈ Rn×n and scalar functions ft : R → R, let x0 ∈ Rn and xt+1 = Wft(xt ) − btft−1(xt−1 ) ∈ Rn where bt = 1 n n X i=1 f 0 t (xt i ) ∈ R.
  • 14. AMP and its state evolution Given a matrix W ∈ Rn×n and scalar functions ft : R → R, let x0 ∈ Rn and xt+1 = Wft(xt ) − btft−1(xt−1 ) ∈ Rn where bt = 1 n n X i=1 f 0 t (xt i ) ∈ R. If W ∼ GOE(n), ft are Lipschitz and the components of x0 are i.i.d ∼ X0 with E X2 0 = 1, then for any nice test function Ψ : Rt → R, 1 n n X i=1 Ψ x1 i , . . . , xt i → E [Ψ(Z1, . . . , Zt)] , where (Z1, . . . , Zt) d = (σ1G1, . . . , σtGt), where Gs ∼ N(0, 1) i.i.d. (Bayati Montanari ’11)
  • 15. Sanity check We have x1 = Wf0(x0) so that x1 i = X j Wijf0(x0 j ), where Wij ∼ N(0, 1/n) i.i.d. (ignore diagonal terms). Hence x1 is a centred Gaussian vector with entries having variance 1 N X j f0(x0 j )2 ≈ E h f0(X0)2 i = σ1.
  • 16. AMP proof of Wigner’s semicircle law Consider AMP with linear functions ft(x) = x, so that x1 = Wx0 x2 = Wx1 − x0 = (W2 − Id)x0 x3 = Wx2 − x1 = (W3 − 2W)x0 ,
  • 17. AMP proof of Wigner’s semicircle law Consider AMP with linear functions ft(x) = x, so that x1 = Wx0 x2 = Wx1 − x0 = (W2 − Id)x0 x3 = Wx2 − x1 = (W3 − 2W)x0 , so xt = Pt(W)x0 with P0(x) = 1, P1(x) = x Pt+1(x) = xPt(x) − Pt−1(x). {Pt} are Chebyshev polynomials orthonormal wr.t. the semicircle density µSC(x) = 1 2π q (4 − x2)+.
  • 18. AMP proof of Wigner’s semicircle law Consider AMP with linear functions ft(x) = x, so that x1 = Wx0 x2 = Wx1 − x0 = (W2 − Id)x0 x3 = Wx2 − x1 = (W3 − 2W)x0 , so xt = Pt(W)x0 with P0(x) = 1, P1(x) = x Pt+1(x) = xPt(x) − Pt−1(x). {Pt} are Chebyshev polynomials orthonormal wr.t. the semicircle density µSC(x) = 1 2π q (4 − x2)+. When 1 n kx0k = 1, we have 1 n hxs, xti ≈ trPs(W)Pt(Wt).
  • 19. AMP proof of Wigner’s semicircle law xt+1 = Wxt − xt−1 In this case, AMP state evolution gives 1 n hxs , xt i → E [ZsZt] = 1(s = t)
  • 20. AMP proof of Wigner’s semicircle law xt+1 = Wxt − xt−1 In this case, AMP state evolution gives 1 n hxs , xt i → E [ZsZt] = 1(s = t) Since 1 n hxs, xti ≈ trPs(W)Pt(Wt), the polynomials Pt are orthonormal w.r.t the limit empirical spectral distribution of W which must be µSC.
  • 21. AMP proof of Wigner’s semicircle law xt+1 = Wxt − xt−1 In this case, AMP state evolution gives 1 n hxs , xt i → E [ZsZt] = 1(s = t) Since 1 n hxs, xti ≈ trPs(W)Pt(Wt), the polynomials Pt are orthonormal w.r.t the limit empirical spectral distribution of W which must be µSC. Credit: Zhou Fan.
  • 26. Explaining the Onsager term xt+1 = Wxt − xt−1 The first iteration with an Onsager term appears for t = 2.
  • 27. Explaining the Onsager term xt+1 = Wxt − xt−1 The first iteration with an Onsager term appears for t = 2. Then we have x2 = Wx1 − x0 = W2x0 − x0 so that x2 1 = X i W 2 1i x0 1 + X i,j6=1 W1i Wijx0 j − x0 1 = X i W 2 1i x0 1 + X i,j6=1 W1i Wijx0 j | {z } N(0,1) − x0 1
  • 28. Explaining the Onsager term xt+1 = Wxt − xt−1 The first iteration with an Onsager term appears for t = 2. Then we have x2 = Wx1 − x0 = W2x0 − x0 so that x2 1 = X i W 2 1i x0 1 + X i,j6=1 W1i Wijx0 j − x0 1 = X i W 2 1i x0 1 + X i,j6=1 W1i Wijx0 j | {z } N(0,1) − x0 1 The Onsager term is very similar to the Itô-correction in stochastic calculus.
  • 29. Part 1: statistical physics for machine learning - A simple version of AMP algoritm - Gap between information-theoretically optimal and computationally feasible estimators - Running example: matrix model I connection to random matrix theory I sparse PCA, community detection, Z2 synchronization, submatrix localization, hidden clique...
  • 30. Low-rank matrix estimation “Spiked Wigner” model Y |{z} observations = v u u u t λ n XX| | {z } signal + Z |{z} noise I X: vector of dimension n with entries Xi i.i.d. ∼ P0. EX1 = 0, EX2 1 = 1. I Zi,j = Zj,i i.i.d. ∼ N(0, 1). I λ: signal-to-noise ratio. I λ and P0 are known by the statistician. Goal: recover the low-rank matrix XX| from Y.
  • 31. Principal component analysis (PCA) Spectral estimator: Estimate X using the eigenvector x̂n associated with the largest eigenvalue µn of Y/ √ n.
  • 32. Principal component analysis (PCA) Spectral estimator: Estimate X using the eigenvector x̂n associated with the largest eigenvalue µn of Y/ √ n. B.B.P. phase transition I if λ 6 1    µn a.s. − − − → n→∞ 2 X · x̂n a.s. − − − → n→∞ 0 I if λ 1    µn a.s. − − − → n→∞ √ λ + 1 √ λ 2 |X · x̂n| a.s. − − − → n→∞ p 1 − 1/λ 0 (Baik, Ben Arous, Péché ’05)
  • 33. Questions I PCA fails when λ 6 1, but is it still possible to recover the signal?
  • 34. Questions I PCA fails when λ 6 1, but is it still possible to recover the signal? I When λ 1, is PCA optimal?
  • 35. Questions I PCA fails when λ 6 1, but is it still possible to recover the signal? I When λ 1, is PCA optimal? I More generally, what is the best achievable estimation performance in both regimes?
  • 36. Plot of MMSE Figure: Spiked Wigner model, centred binary prior (unit variance).
  • 37. We can certainly improve spectral algorithm!
  • 38. A scalar denoising problem For Y = √ γX0 + Z where X0 ∼ P0 and Z ∼ N(0, 1)
  • 39. A scalar denoising problem For Y = √ γX0 + Z where X0 ∼ P0 and Z ∼ N(0, 1)
  • 40. Bayes optimal AMP We define mmse(γ) = E h X0 − E[X0| √ γX0 + Z] 2 i and the recursion: q0 = 1 − λ−1 qt+1 = 1 − mmse(λqt). With the optimal denoiser gP0 (y, γ) = E[X0| √ γX0 + Z = y], AMP is defined by: xt+1 = Y s λ n ft(xt ) − λbtft−1(xt−1 ), where ft(y) = gP0 (y/ √ λqt, λqt).
  • 41. Bayes optimal AMP: experiment
  • 42. Plot of MMSE Figure: Spiked Wigner model, centred binary prior (unit variance).
  • 43. Plot of MMSE Figure: Spiked Wigner model, centred binary prior (unit variance).
  • 44. Limiting formula for the MMSE Theorem (L, Miolane ’19) MMSEn − − − → n→∞ 1 |{z} Dummy MSE − q∗ (λ)2 where q∗ (λ) is the minimizer of q 0 7→ −EX0∼P0 Z0∼N log Z x0 dP0(x0)e √ λqZ0x0+λqX0x0+λq 2 x2 0 + λ 4 q2 A simplified “free energy landscape”: 0.0 0.2 0.4 0.6 0.8 1.0 q −0.06 −0.05 −0.04 −0.03 −0.02 −0.01 0.00 −F(λ, q) (a) “Easy” phase (λ = 1.01) 0.0 0.2 0.4 0.6 0.8 q −0.002 −0.001 0.000 0.001 0.002 0.003 −F(λ, q) (b) “Hard” phase (λ = 0.625) 0.0 0.2 0.4 0.6 0.8 q 0.0000 0.0025 0.0050 0.0075 0.0100 0.0125 0.0150 0.0175 −F(λ, q) (c) “Impossible” phase (λ = 0.5)
  • 45. Proof ideas: a planted spin system P(X = x | Y) = 1 Zn P0(x)eHn(x) where Hn(x) = X ij s λ n Yi,jxi xj − λ 2n x2 i x2 j .
  • 46. Proof ideas: a planted spin system P(X = x | Y) = 1 Zn P0(x)eHn(x) where Hn(x) = X ij s λ n Yi,jxi xj − λ 2n x2 i x2 j . Two step proof: I Lower bound: Guerra’s interpolation technique. Adapted in (Korada, Macris ’09) (Krzakala, Xu, Zdeborová ’16) ( Y = √ t p λ/n XX| + Z Y0 = √ 1 − t √ λ X + Z0 I Upper bound: Cavity computations (Mézard, Parisi, Virasoro ’87). Aizenman-Sims-Starr scheme:(Aizenman, Sims,Starr ’03) (Talagrand ’10)
  • 47. Part 1: conclusion AMP is an iterative denoising algorithm which is optimal when the energy landscape is simple. Main references for this tutorial: (Montanari Venkataramanan ’21) (L. Miolane ’19) Many recent research directions: universality, structured matrices, community detection... and new applications outside electrical engineering like in ecology.
  • 48. Part 1: conclusion AMP is an iterative denoising algorithm which is optimal when the energy landscape is simple. Main references for this tutorial: (Montanari Venkataramanan ’21) (L. Miolane ’19) Many recent research directions: universality, structured matrices, community detection... and new applications outside electrical engineering like in ecology. Deep learning, the new kid on the block:
  • 49. From stochastic localization to sampling thanks to AMP Target distribution µ Diffusion process: yt = tx∗ + Bt, (x∗ ∼ µ) ⊥ ⊥ B. µt(.) = P (x∗ ∈ .|yt) µ0 = µ → µ∞ = δx∗
  • 50. From stochastic localization to sampling thanks to AMP Target distribution µ Diffusion process: yt = tx∗ + Bt, (x∗ ∼ µ) ⊥ ⊥ B. µt(.) = P (x∗ ∈ .|yt) µ0 = µ → µ∞ = δx∗ There exists a Brownian motion G such that yt solves the SDE: dyt = mt(yt)dt + dGt, where mt(y) = E [x∗|yt = y]
  • 51. From stochastic localization to sampling thanks to AMP Target distribution µ Diffusion process: yt = tx∗ + Bt, (x∗ ∼ µ) ⊥ ⊥ B. µt(.) = P (x∗ ∈ .|yt) µ0 = µ → µ∞ = δx∗ There exists a Brownian motion G such that yt solves the SDE: dyt = mt(yt)dt + dGt, where mt(y) = E [x∗|yt = y] Idea: use AMP for sampling (El Alaoui, Montanari, Sellke ’22), (Montanari, Wu ’23)
  • 52. From stochastic localization to sampling thanks to AMP Idea: use AMP for sampling (El Alaoui, Montanari, Sellke ’22), (Montanari, Wu ’23)
  • 55. N. Wiener: the invention of cybernetics
  • 57. Lessons learned from AI winters: Common Task Framework (CTF) Performance Assessment of Automatic Speech Recognizers (Pallett ’85) ”Definitive tests to fully characterize automatic speech recognizer or system performance cannot be specified at present. However, it is possible to design and conduct performance assessment tests that make use of widely available speech data bases, use test procedures similar to those used by others, and that are well documented. These tests provide valuable benchmark data and informative, though limited, predictive power.”
  • 58. Key factors for the actual success of deep learning
  • 59. The Bitter Lesson by Rich Sutton
  • 60. The Bitter Lesson by Rich Sutton The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin (...) Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation (...) the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation.
  • 61. Is human led mathematic over?
  • 62. Is human led mathematic over? If it turns out that some Langlands-like questions can be answered with the use of computation, there is always the possibility that the mathematical community will interpret this as a demonstration that, in hindsight, the Langlands program is not as deep as we thought it was. There is always room to say, “Aha! Now we see that it is just a matter of computation.” (Avigad ’22)
  • 63. Thank you for your attention !