SlideShare a Scribd company logo
1 of 63
Download to read offline
Mathématiques et Intelligence Artificielle
Marc Lelarge
INRIA-ENS
Colloquium du CEREMADE (Oct. 2023)
Mathématiques et Intelligence Artificielle
Mathématiques et Intelligence Artificielle
Part 1: statistical physics for machine learning
- A simple version of Approximate Message Passing (AMP)
algoritm
Part 1: statistical physics for machine learning
- A simple version of Approximate Message Passing (AMP)
algoritm
- Gap between information-theoretically optimal and
computationally feasible estimators
Part 1: statistical physics for machine learning
- A simple version of Approximate Message Passing (AMP)
algoritm
- Gap between information-theoretically optimal and
computationally feasible estimators
- Running example: matrix model
I connection to random matrix theory
I sparse PCA, community detection, Z2 synchronization,
submatrix localization, hidden clique...
A bit of history: 70’s
A bit of history: 70’s
A bit of history: 2010’s
A bit of history: 2010’s
A bit of history: 2010’s
Applications to high-dimensional statistics
AMP and its state evolution
Given a matrix W ∈ Rn×n and scalar functions ft : R → R, let
x0 ∈ Rn and
xt+1
= Wft(xt
) − btft−1(xt−1
) ∈ Rn
where
bt =
1
n
n
X
i=1
f 0
t (xt
i ) ∈ R.
AMP and its state evolution
Given a matrix W ∈ Rn×n and scalar functions ft : R → R, let
x0 ∈ Rn and
xt+1
= Wft(xt
) − btft−1(xt−1
) ∈ Rn
where
bt =
1
n
n
X
i=1
f 0
t (xt
i ) ∈ R.
If W ∼ GOE(n), ft are Lipschitz and the components of x0 are i.i.d
∼ X0 with E

X2
0

= 1, then for any nice test function Ψ : Rt → R,
1
n
n
X
i=1
Ψ

x1
i , . . . , xt
i

→ E [Ψ(Z1, . . . , Zt)] ,
where (Z1, . . . , Zt)
d
= (σ1G1, . . . , σtGt), where Gs ∼ N(0, 1) i.i.d.
(Bayati Montanari ’11)
Sanity check
We have x1 = Wf0(x0) so that
x1
i =
X
j
Wijf0(x0
j ),
where Wij ∼ N(0, 1/n) i.i.d. (ignore diagonal terms).
Hence x1 is a centred Gaussian vector with entries having variance
1
N
X
j
f0(x0
j )2
≈ E
h
f0(X0)2
i
= σ1.
AMP proof of Wigner’s semicircle law
Consider AMP with linear functions ft(x) = x, so that
x1
= Wx0
x2
= Wx1
− x0
= (W2
− Id)x0
x3
= Wx2
− x1
= (W3
− 2W)x0
,
AMP proof of Wigner’s semicircle law
Consider AMP with linear functions ft(x) = x, so that
x1
= Wx0
x2
= Wx1
− x0
= (W2
− Id)x0
x3
= Wx2
− x1
= (W3
− 2W)x0
,
so xt = Pt(W)x0 with
P0(x) = 1, P1(x) = x
Pt+1(x) = xPt(x) − Pt−1(x).
{Pt} are Chebyshev polynomials orthonormal wr.t. the semicircle
density µSC(x) = 1
2π
q
(4 − x2)+.
AMP proof of Wigner’s semicircle law
Consider AMP with linear functions ft(x) = x, so that
x1
= Wx0
x2
= Wx1
− x0
= (W2
− Id)x0
x3
= Wx2
− x1
= (W3
− 2W)x0
,
so xt = Pt(W)x0 with
P0(x) = 1, P1(x) = x
Pt+1(x) = xPt(x) − Pt−1(x).
{Pt} are Chebyshev polynomials orthonormal wr.t. the semicircle
density µSC(x) = 1
2π
q
(4 − x2)+.
When 1
n kx0k = 1, we have 1
n hxs, xti ≈ trPs(W)Pt(Wt).
AMP proof of Wigner’s semicircle law
xt+1
= Wxt
− xt−1
In this case, AMP state evolution gives
1
n
hxs
, xt
i → E [ZsZt] = 1(s = t)
AMP proof of Wigner’s semicircle law
xt+1
= Wxt
− xt−1
In this case, AMP state evolution gives
1
n
hxs
, xt
i → E [ZsZt] = 1(s = t)
Since 1
n hxs, xti ≈ trPs(W)Pt(Wt), the polynomials Pt are
orthonormal w.r.t the limit empirical spectral distribution of W
which must be µSC.
AMP proof of Wigner’s semicircle law
xt+1
= Wxt
− xt−1
In this case, AMP state evolution gives
1
n
hxs
, xt
i → E [ZsZt] = 1(s = t)
Since 1
n hxs, xti ≈ trPs(W)Pt(Wt), the polynomials Pt are
orthonormal w.r.t the limit empirical spectral distribution of W
which must be µSC.
Credit: Zhou Fan.
Wigner’s semicircle law: experiments
Wigner’s semicircle law: experiments
Wigner’s semicircle law: experiments
Wigner’s semicircle law: experiments
Explaining the Onsager term
xt+1
= Wxt
− xt−1
The first iteration with an Onsager term appears for t = 2.
Explaining the Onsager term
xt+1
= Wxt
− xt−1
The first iteration with an Onsager term appears for t = 2.
Then we have x2 = Wx1 − x0 = W2x0 − x0 so that
x2
1 =
X
i
W 2
1i x0
1 +
X
i,j6=1
W1i Wijx0
j − x0
1
=



X
i
W 2
1i x0
1 +
X
i,j6=1
W1i Wijx0
j
| {z }
N(0,1)
−

x0
1
Explaining the Onsager term
xt+1
= Wxt
− xt−1
The first iteration with an Onsager term appears for t = 2.
Then we have x2 = Wx1 − x0 = W2x0 − x0 so that
x2
1 =
X
i
W 2
1i x0
1 +
X
i,j6=1
W1i Wijx0
j − x0
1
=



X
i
W 2
1i x0
1 +
X
i,j6=1
W1i Wijx0
j
| {z }
N(0,1)
−

x0
1
The Onsager term is very similar to the Itô-correction in stochastic
calculus.
Part 1: statistical physics for machine learning
- A simple version of AMP algoritm
- Gap between information-theoretically optimal and
computationally feasible estimators
- Running example: matrix model
I connection to random matrix theory
I sparse PCA, community detection, Z2 synchronization,
submatrix localization, hidden clique...
Low-rank matrix estimation
“Spiked Wigner” model
Y
|{z}
observations
=
v
u
u
u
t
λ
n
XX|
| {z }
signal
+ Z
|{z}
noise
I X: vector of dimension n with entries Xi
i.i.d.
∼ P0. EX1 = 0,
EX2
1 = 1.
I Zi,j = Zj,i
i.i.d.
∼ N(0, 1).
I λ: signal-to-noise ratio.
I λ and P0 are known by the statistician.
Goal: recover the low-rank matrix XX|
from Y.
Principal component analysis (PCA)
Spectral estimator:
Estimate X using the eigenvector x̂n associated with the
largest eigenvalue µn of Y/
√
n.
Principal component analysis (PCA)
Spectral estimator:
Estimate X using the eigenvector x̂n associated with the
largest eigenvalue µn of Y/
√
n.
B.B.P. phase transition
I if λ 6 1



µn
a.s.
−
−
−
→
n→∞
2
X · x̂n
a.s.
−
−
−
→
n→∞
0
I if λ  1



µn
a.s.
−
−
−
→
n→∞
√
λ + 1
√
λ
 2
|X · x̂n|
a.s.
−
−
−
→
n→∞
p
1 − 1/λ  0
(Baik, Ben Arous, Péché ’05)
Questions
I PCA fails when λ 6 1, but is it still possible to recover
the signal?
Questions
I PCA fails when λ 6 1, but is it still possible to recover
the signal?
I When λ  1, is PCA optimal?
Questions
I PCA fails when λ 6 1, but is it still possible to recover
the signal?
I When λ  1, is PCA optimal?
I More generally, what is the best achievable estimation
performance in both regimes?
Plot of MMSE
Figure: Spiked Wigner model, centred binary prior (unit variance).
We can certainly improve spectral algorithm!
A scalar denoising problem
For Y =
√
γX0 + Z where X0 ∼ P0 and Z ∼ N(0, 1)
A scalar denoising problem
For Y =
√
γX0 + Z where X0 ∼ P0 and Z ∼ N(0, 1)
Bayes optimal AMP
We define mmse(γ) = E
h
X0 − E[X0|
√
γX0 + Z]
2
i
and the
recursion:
q0 = 1 − λ−1
qt+1 = 1 − mmse(λqt).
With the optimal denoiser gP0 (y, γ) = E[X0|
√
γX0 + Z = y], AMP
is defined by:
xt+1
= Y
s
λ
n
ft(xt
) − λbtft−1(xt−1
),
where ft(y) = gP0 (y/
√
λqt, λqt).
Bayes optimal AMP: experiment
Plot of MMSE
Figure: Spiked Wigner model, centred binary prior (unit variance).
Plot of MMSE
Figure: Spiked Wigner model, centred binary prior (unit variance).
Limiting formula for the MMSE
Theorem (L, Miolane ’19)
MMSEn −
−
−
→
n→∞
1
|{z}
Dummy MSE
− q∗
(λ)2
where q∗
(λ) is the minimizer of
q  0 7→ −EX0∼P0
Z0∼N

log
Z
x0
dP0(x0)e
√
λqZ0x0+λqX0x0+λq
2
x2
0

+
λ
4
q2
A simplified “free energy landscape”:
0.0 0.2 0.4 0.6 0.8 1.0
q
−0.06
−0.05
−0.04
−0.03
−0.02
−0.01
0.00 −F(λ, q)
(a) “Easy” phase (λ = 1.01)
0.0 0.2 0.4 0.6 0.8
q
−0.002
−0.001
0.000
0.001
0.002
0.003
−F(λ, q)
(b) “Hard” phase (λ = 0.625)
0.0 0.2 0.4 0.6 0.8
q
0.0000
0.0025
0.0050
0.0075
0.0100
0.0125
0.0150
0.0175 −F(λ, q)
(c) “Impossible” phase (λ = 0.5)
Proof ideas: a planted spin system
P(X = x | Y) =
1
Zn
P0(x)eHn(x)
where
Hn(x) =
X
ij
s
λ
n
Yi,jxi xj −
λ
2n
x2
i x2
j .
Proof ideas: a planted spin system
P(X = x | Y) =
1
Zn
P0(x)eHn(x)
where
Hn(x) =
X
ij
s
λ
n
Yi,jxi xj −
λ
2n
x2
i x2
j .
Two step proof:
I Lower bound: Guerra’s interpolation technique. Adapted in
(Korada, Macris ’09) (Krzakala, Xu, Zdeborová ’16)
(
Y =
√
t
p
λ/n XX| + Z
Y0 =
√
1 − t
√
λ X + Z0
I Upper bound: Cavity computations (Mézard, Parisi, Virasoro
’87). Aizenman-Sims-Starr scheme:(Aizenman, Sims,Starr
’03) (Talagrand ’10)
Part 1: conclusion
AMP is an iterative denoising algorithm which is optimal when the
energy landscape is simple.
Main references for this tutorial: (Montanari Venkataramanan ’21)
(L. Miolane ’19)
Many recent research directions: universality, structured matrices,
community detection... and new applications outside electrical
engineering like in ecology.
Part 1: conclusion
AMP is an iterative denoising algorithm which is optimal when the
energy landscape is simple.
Main references for this tutorial: (Montanari Venkataramanan ’21)
(L. Miolane ’19)
Many recent research directions: universality, structured matrices,
community detection... and new applications outside electrical
engineering like in ecology.
Deep learning, the new kid on the block:
From stochastic localization to sampling thanks to AMP
Target distribution µ
Diffusion process:
yt = tx∗
+ Bt, (x∗
∼ µ) ⊥
⊥ B.
µt(.) = P (x∗
∈ .|yt)
µ0 = µ → µ∞ = δx∗
From stochastic localization to sampling thanks to AMP
Target distribution µ
Diffusion process:
yt = tx∗
+ Bt, (x∗
∼ µ) ⊥
⊥ B.
µt(.) = P (x∗
∈ .|yt)
µ0 = µ → µ∞ = δx∗
There exists a Brownian motion G such that yt solves the SDE:
dyt = mt(yt)dt + dGt,
where mt(y) = E [x∗|yt = y]
From stochastic localization to sampling thanks to AMP
Target distribution µ
Diffusion process:
yt = tx∗
+ Bt, (x∗
∼ µ) ⊥
⊥ B.
µt(.) = P (x∗
∈ .|yt)
µ0 = µ → µ∞ = δx∗
There exists a Brownian motion G such that yt solves the SDE:
dyt = mt(yt)dt + dGt,
where mt(y) = E [x∗|yt = y]
Idea: use AMP for sampling (El Alaoui, Montanari, Sellke ’22),
(Montanari, Wu ’23)
From stochastic localization to sampling thanks to AMP
Idea: use AMP for sampling (El Alaoui, Montanari, Sellke ’22),
(Montanari, Wu ’23)
Part 2: mathématiques et IA
Part 2: mathématiques et IA
N. Wiener: the invention of cybernetics
AI winters
Lessons learned from AI winters: Common Task Framework (CTF)
Performance Assessment of Automatic Speech Recognizers (Pallett
’85)
”Definitive tests to fully characterize automatic speech recognizer
or system performance cannot be specified at present. However, it
is possible to design and conduct performance assessment
tests that make use of widely available speech data bases, use
test procedures similar to those used by others, and that are
well documented. These tests provide valuable benchmark
data and informative, though limited, predictive power.”
Key factors for the actual success of deep learning
The Bitter Lesson by Rich Sutton
The Bitter Lesson by Rich Sutton
The biggest lesson that can be read from 70 years of AI research is
that general methods that leverage computation are
ultimately the most effective, and by a large margin (...)
Seeking an improvement that makes a difference in the shorter
term, researchers seek to leverage their human knowledge of the
domain, but the only thing that matters in the long run is the
leveraging of computation (...) the human-knowledge approach
tends to complicate methods in ways that make them less suited to
taking advantage of general methods leveraging computation.
Is human led mathematic over?
Is human led mathematic over?
If it turns out that some Langlands-like questions can be answered
with the use of computation, there is always the possibility that
the mathematical community will interpret this as a demonstration
that, in hindsight, the Langlands program is not as deep as we
thought it was. There is always room to say, “Aha! Now we see
that it is just a matter of computation.” (Avigad ’22)
Thank you for your attention !

More Related Content

What's hot

Direct Methods to Solve Linear Equations Systems
Direct Methods to Solve Linear Equations SystemsDirect Methods to Solve Linear Equations Systems
Direct Methods to Solve Linear Equations SystemsLizeth Paola Barrero
 
5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...
5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...
5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...Ceni Babaoglu, PhD
 
Eigen value , eigen vectors, caley hamilton theorem
Eigen value , eigen vectors, caley hamilton theoremEigen value , eigen vectors, caley hamilton theorem
Eigen value , eigen vectors, caley hamilton theoremgidc engineering college
 
1. Linear Algebra for Machine Learning: Linear Systems
1. Linear Algebra for Machine Learning: Linear Systems1. Linear Algebra for Machine Learning: Linear Systems
1. Linear Algebra for Machine Learning: Linear SystemsCeni Babaoglu, PhD
 
APPLICATION OF PARTIAL DIFFERENTIATION
APPLICATION OF PARTIAL DIFFERENTIATIONAPPLICATION OF PARTIAL DIFFERENTIATION
APPLICATION OF PARTIAL DIFFERENTIATIONDhrupal Patel
 
Introduction to Matlab
Introduction to MatlabIntroduction to Matlab
Introduction to MatlabAmr Rashed
 
Lecture-4 Reduction of Quadratic Form.pdf
Lecture-4 Reduction of Quadratic Form.pdfLecture-4 Reduction of Quadratic Form.pdf
Lecture-4 Reduction of Quadratic Form.pdfRupesh383474
 
Presentation on laplace transforms
Presentation on laplace transformsPresentation on laplace transforms
Presentation on laplace transformsHimel Himo
 
Linear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialLinear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialJia-Bin Huang
 
Convex Optimization
Convex OptimizationConvex Optimization
Convex Optimizationadil raja
 
Introduction to Graph Theory
Introduction to Graph TheoryIntroduction to Graph Theory
Introduction to Graph TheoryYosuke Mizutani
 
Math major 14 differential calculus pw
Math major 14 differential calculus pwMath major 14 differential calculus pw
Math major 14 differential calculus pwReymart Bargamento
 
Abstract algebra & its applications (1)
Abstract algebra & its applications (1)Abstract algebra & its applications (1)
Abstract algebra & its applications (1)drselvarani
 
First order linear differential equation
First order linear differential equationFirst order linear differential equation
First order linear differential equationNofal Umair
 
Linear Algebra and Matrix
Linear Algebra and MatrixLinear Algebra and Matrix
Linear Algebra and Matrixitutor
 
Presentation on inverse matrix
Presentation on inverse matrixPresentation on inverse matrix
Presentation on inverse matrixSyed Ahmed Zaki
 
Logarithms
LogarithmsLogarithms
Logarithmssiking26
 

What's hot (20)

Direct Methods to Solve Linear Equations Systems
Direct Methods to Solve Linear Equations SystemsDirect Methods to Solve Linear Equations Systems
Direct Methods to Solve Linear Equations Systems
 
5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...
5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...
5. Linear Algebra for Machine Learning: Singular Value Decomposition and Prin...
 
Eigen value , eigen vectors, caley hamilton theorem
Eigen value , eigen vectors, caley hamilton theoremEigen value , eigen vectors, caley hamilton theorem
Eigen value , eigen vectors, caley hamilton theorem
 
1. Linear Algebra for Machine Learning: Linear Systems
1. Linear Algebra for Machine Learning: Linear Systems1. Linear Algebra for Machine Learning: Linear Systems
1. Linear Algebra for Machine Learning: Linear Systems
 
03 Machine Learning Linear Algebra
03 Machine Learning Linear Algebra03 Machine Learning Linear Algebra
03 Machine Learning Linear Algebra
 
APPLICATION OF PARTIAL DIFFERENTIATION
APPLICATION OF PARTIAL DIFFERENTIATIONAPPLICATION OF PARTIAL DIFFERENTIATION
APPLICATION OF PARTIAL DIFFERENTIATION
 
Introduction to Matlab
Introduction to MatlabIntroduction to Matlab
Introduction to Matlab
 
Lecture-4 Reduction of Quadratic Form.pdf
Lecture-4 Reduction of Quadratic Form.pdfLecture-4 Reduction of Quadratic Form.pdf
Lecture-4 Reduction of Quadratic Form.pdf
 
Presentation on laplace transforms
Presentation on laplace transformsPresentation on laplace transforms
Presentation on laplace transforms
 
Linear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialLinear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorial
 
Convex Optimization
Convex OptimizationConvex Optimization
Convex Optimization
 
Introduction to Graph Theory
Introduction to Graph TheoryIntroduction to Graph Theory
Introduction to Graph Theory
 
Math major 14 differential calculus pw
Math major 14 differential calculus pwMath major 14 differential calculus pw
Math major 14 differential calculus pw
 
Abstract algebra & its applications (1)
Abstract algebra & its applications (1)Abstract algebra & its applications (1)
Abstract algebra & its applications (1)
 
First order linear differential equation
First order linear differential equationFirst order linear differential equation
First order linear differential equation
 
Linear Algebra and Matrix
Linear Algebra and MatrixLinear Algebra and Matrix
Linear Algebra and Matrix
 
Tensor analysis
Tensor analysisTensor analysis
Tensor analysis
 
Presentation on inverse matrix
Presentation on inverse matrixPresentation on inverse matrix
Presentation on inverse matrix
 
Logarithms
LogarithmsLogarithms
Logarithms
 
Svd
SvdSvd
Svd
 

Similar to Mathematics and AI

Tutorial APS 2023: Phase transition for statistical estimation: algorithms an...
Tutorial APS 2023: Phase transition for statistical estimation: algorithms an...Tutorial APS 2023: Phase transition for statistical estimation: algorithms an...
Tutorial APS 2023: Phase transition for statistical estimation: algorithms an...Marc Lelarge
 
comm_ch02_random_en.pdf
comm_ch02_random_en.pdfcomm_ch02_random_en.pdf
comm_ch02_random_en.pdfssuser87c04b
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Fabian Pedregosa
 
Introduction to Diffusion Monte Carlo
Introduction to Diffusion Monte CarloIntroduction to Diffusion Monte Carlo
Introduction to Diffusion Monte CarloClaudio Attaccalite
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
 
Talk in BayesComp 2018
Talk in BayesComp 2018Talk in BayesComp 2018
Talk in BayesComp 2018JeremyHeng10
 
A numerical method to solve fractional Fredholm-Volterra integro-differential...
A numerical method to solve fractional Fredholm-Volterra integro-differential...A numerical method to solve fractional Fredholm-Volterra integro-differential...
A numerical method to solve fractional Fredholm-Volterra integro-differential...OctavianPostavaru
 
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...Yandex
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMCPierre Jacob
 
Neuronal self-organized criticality
Neuronal self-organized criticalityNeuronal self-organized criticality
Neuronal self-organized criticalityOsame Kinouchi
 
Optimal multi-configuration approximation of an N-fermion wave function
 Optimal multi-configuration approximation of an N-fermion wave function Optimal multi-configuration approximation of an N-fermion wave function
Optimal multi-configuration approximation of an N-fermion wave functionjiang-min zhang
 
Bath_IMI_Summer_Project
Bath_IMI_Summer_ProjectBath_IMI_Summer_Project
Bath_IMI_Summer_ProjectJosh Young
 

Similar to Mathematics and AI (20)

Tutorial APS 2023: Phase transition for statistical estimation: algorithms an...
Tutorial APS 2023: Phase transition for statistical estimation: algorithms an...Tutorial APS 2023: Phase transition for statistical estimation: algorithms an...
Tutorial APS 2023: Phase transition for statistical estimation: algorithms an...
 
comm_ch02_random_en.pdf
comm_ch02_random_en.pdfcomm_ch02_random_en.pdf
comm_ch02_random_en.pdf
 
Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3Random Matrix Theory and Machine Learning - Part 3
Random Matrix Theory and Machine Learning - Part 3
 
Introduction to Diffusion Monte Carlo
Introduction to Diffusion Monte CarloIntroduction to Diffusion Monte Carlo
Introduction to Diffusion Monte Carlo
 
Jere Koskela slides
Jere Koskela slidesJere Koskela slides
Jere Koskela slides
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
KAUST_talk_short.pdf
KAUST_talk_short.pdfKAUST_talk_short.pdf
KAUST_talk_short.pdf
 
Talk in BayesComp 2018
Talk in BayesComp 2018Talk in BayesComp 2018
Talk in BayesComp 2018
 
A numerical method to solve fractional Fredholm-Volterra integro-differential...
A numerical method to solve fractional Fredholm-Volterra integro-differential...A numerical method to solve fractional Fredholm-Volterra integro-differential...
A numerical method to solve fractional Fredholm-Volterra integro-differential...
 
5.n nmodels i
5.n nmodels i5.n nmodels i
5.n nmodels i
 
Nested sampling
Nested samplingNested sampling
Nested sampling
 
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
Subgradient Methods for Huge-Scale Optimization Problems - Юрий Нестеров, Cat...
 
Slides ACTINFO 2016
Slides ACTINFO 2016Slides ACTINFO 2016
Slides ACTINFO 2016
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMC
 
Neuronal self-organized criticality
Neuronal self-organized criticalityNeuronal self-organized criticality
Neuronal self-organized criticality
 
Optimal multi-configuration approximation of an N-fermion wave function
 Optimal multi-configuration approximation of an N-fermion wave function Optimal multi-configuration approximation of an N-fermion wave function
Optimal multi-configuration approximation of an N-fermion wave function
 
Statistical Physics Assignment Help
Statistical Physics Assignment HelpStatistical Physics Assignment Help
Statistical Physics Assignment Help
 
Bath_IMI_Summer_Project
Bath_IMI_Summer_ProjectBath_IMI_Summer_Project
Bath_IMI_Summer_Project
 
CLIM: Transition Workshop - Projected Data Assimilation - Erik Van Vleck, Ma...
CLIM: Transition Workshop - Projected Data Assimilation  - Erik Van Vleck, Ma...CLIM: Transition Workshop - Projected Data Assimilation  - Erik Van Vleck, Ma...
CLIM: Transition Workshop - Projected Data Assimilation - Erik Van Vleck, Ma...
 
sada_pres
sada_pressada_pres
sada_pres
 

Recently uploaded

Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Cytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxCytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxVarshiniMK
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantadityabhardwaj282
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett SquareIsiahStephanRadaza
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayZachary Labe
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsCharlene Llagas
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
insect anatomy and insect body wall and their physiology
insect anatomy and insect body wall and their  physiologyinsect anatomy and insect body wall and their  physiology
insect anatomy and insect body wall and their physiologyDrAnita Sharma
 
TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxdharshini369nike
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 

Recently uploaded (20)

Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Cytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptxCytokinin, mechanism and its application.pptx
Cytokinin, mechanism and its application.pptx
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are important
 
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Hauz Khas Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Module 4: Mendelian Genetics and Punnett Square
Module 4:  Mendelian Genetics and Punnett SquareModule 4:  Mendelian Genetics and Punnett Square
Module 4: Mendelian Genetics and Punnett Square
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work Day
 
TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Heredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of TraitsHeredity: Inheritance and Variation of Traits
Heredity: Inheritance and Variation of Traits
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
insect anatomy and insect body wall and their physiology
insect anatomy and insect body wall and their  physiologyinsect anatomy and insect body wall and their  physiology
insect anatomy and insect body wall and their physiology
 
TOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptxTOTAL CHOLESTEROL (lipid profile test).pptx
TOTAL CHOLESTEROL (lipid profile test).pptx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 

Mathematics and AI

  • 1. Mathématiques et Intelligence Artificielle Marc Lelarge INRIA-ENS Colloquium du CEREMADE (Oct. 2023)
  • 4. Part 1: statistical physics for machine learning - A simple version of Approximate Message Passing (AMP) algoritm
  • 5. Part 1: statistical physics for machine learning - A simple version of Approximate Message Passing (AMP) algoritm - Gap between information-theoretically optimal and computationally feasible estimators
  • 6. Part 1: statistical physics for machine learning - A simple version of Approximate Message Passing (AMP) algoritm - Gap between information-theoretically optimal and computationally feasible estimators - Running example: matrix model I connection to random matrix theory I sparse PCA, community detection, Z2 synchronization, submatrix localization, hidden clique...
  • 7. A bit of history: 70’s
  • 8. A bit of history: 70’s
  • 9. A bit of history: 2010’s
  • 10. A bit of history: 2010’s
  • 11. A bit of history: 2010’s
  • 13. AMP and its state evolution Given a matrix W ∈ Rn×n and scalar functions ft : R → R, let x0 ∈ Rn and xt+1 = Wft(xt ) − btft−1(xt−1 ) ∈ Rn where bt = 1 n n X i=1 f 0 t (xt i ) ∈ R.
  • 14. AMP and its state evolution Given a matrix W ∈ Rn×n and scalar functions ft : R → R, let x0 ∈ Rn and xt+1 = Wft(xt ) − btft−1(xt−1 ) ∈ Rn where bt = 1 n n X i=1 f 0 t (xt i ) ∈ R. If W ∼ GOE(n), ft are Lipschitz and the components of x0 are i.i.d ∼ X0 with E X2 0 = 1, then for any nice test function Ψ : Rt → R, 1 n n X i=1 Ψ x1 i , . . . , xt i → E [Ψ(Z1, . . . , Zt)] , where (Z1, . . . , Zt) d = (σ1G1, . . . , σtGt), where Gs ∼ N(0, 1) i.i.d. (Bayati Montanari ’11)
  • 15. Sanity check We have x1 = Wf0(x0) so that x1 i = X j Wijf0(x0 j ), where Wij ∼ N(0, 1/n) i.i.d. (ignore diagonal terms). Hence x1 is a centred Gaussian vector with entries having variance 1 N X j f0(x0 j )2 ≈ E h f0(X0)2 i = σ1.
  • 16. AMP proof of Wigner’s semicircle law Consider AMP with linear functions ft(x) = x, so that x1 = Wx0 x2 = Wx1 − x0 = (W2 − Id)x0 x3 = Wx2 − x1 = (W3 − 2W)x0 ,
  • 17. AMP proof of Wigner’s semicircle law Consider AMP with linear functions ft(x) = x, so that x1 = Wx0 x2 = Wx1 − x0 = (W2 − Id)x0 x3 = Wx2 − x1 = (W3 − 2W)x0 , so xt = Pt(W)x0 with P0(x) = 1, P1(x) = x Pt+1(x) = xPt(x) − Pt−1(x). {Pt} are Chebyshev polynomials orthonormal wr.t. the semicircle density µSC(x) = 1 2π q (4 − x2)+.
  • 18. AMP proof of Wigner’s semicircle law Consider AMP with linear functions ft(x) = x, so that x1 = Wx0 x2 = Wx1 − x0 = (W2 − Id)x0 x3 = Wx2 − x1 = (W3 − 2W)x0 , so xt = Pt(W)x0 with P0(x) = 1, P1(x) = x Pt+1(x) = xPt(x) − Pt−1(x). {Pt} are Chebyshev polynomials orthonormal wr.t. the semicircle density µSC(x) = 1 2π q (4 − x2)+. When 1 n kx0k = 1, we have 1 n hxs, xti ≈ trPs(W)Pt(Wt).
  • 19. AMP proof of Wigner’s semicircle law xt+1 = Wxt − xt−1 In this case, AMP state evolution gives 1 n hxs , xt i → E [ZsZt] = 1(s = t)
  • 20. AMP proof of Wigner’s semicircle law xt+1 = Wxt − xt−1 In this case, AMP state evolution gives 1 n hxs , xt i → E [ZsZt] = 1(s = t) Since 1 n hxs, xti ≈ trPs(W)Pt(Wt), the polynomials Pt are orthonormal w.r.t the limit empirical spectral distribution of W which must be µSC.
  • 21. AMP proof of Wigner’s semicircle law xt+1 = Wxt − xt−1 In this case, AMP state evolution gives 1 n hxs , xt i → E [ZsZt] = 1(s = t) Since 1 n hxs, xti ≈ trPs(W)Pt(Wt), the polynomials Pt are orthonormal w.r.t the limit empirical spectral distribution of W which must be µSC. Credit: Zhou Fan.
  • 26. Explaining the Onsager term xt+1 = Wxt − xt−1 The first iteration with an Onsager term appears for t = 2.
  • 27. Explaining the Onsager term xt+1 = Wxt − xt−1 The first iteration with an Onsager term appears for t = 2. Then we have x2 = Wx1 − x0 = W2x0 − x0 so that x2 1 = X i W 2 1i x0 1 + X i,j6=1 W1i Wijx0 j − x0 1 = X i W 2 1i x0 1 + X i,j6=1 W1i Wijx0 j | {z } N(0,1) − x0 1
  • 28. Explaining the Onsager term xt+1 = Wxt − xt−1 The first iteration with an Onsager term appears for t = 2. Then we have x2 = Wx1 − x0 = W2x0 − x0 so that x2 1 = X i W 2 1i x0 1 + X i,j6=1 W1i Wijx0 j − x0 1 = X i W 2 1i x0 1 + X i,j6=1 W1i Wijx0 j | {z } N(0,1) − x0 1 The Onsager term is very similar to the Itô-correction in stochastic calculus.
  • 29. Part 1: statistical physics for machine learning - A simple version of AMP algoritm - Gap between information-theoretically optimal and computationally feasible estimators - Running example: matrix model I connection to random matrix theory I sparse PCA, community detection, Z2 synchronization, submatrix localization, hidden clique...
  • 30. Low-rank matrix estimation “Spiked Wigner” model Y |{z} observations = v u u u t λ n XX| | {z } signal + Z |{z} noise I X: vector of dimension n with entries Xi i.i.d. ∼ P0. EX1 = 0, EX2 1 = 1. I Zi,j = Zj,i i.i.d. ∼ N(0, 1). I λ: signal-to-noise ratio. I λ and P0 are known by the statistician. Goal: recover the low-rank matrix XX| from Y.
  • 31. Principal component analysis (PCA) Spectral estimator: Estimate X using the eigenvector x̂n associated with the largest eigenvalue µn of Y/ √ n.
  • 32. Principal component analysis (PCA) Spectral estimator: Estimate X using the eigenvector x̂n associated with the largest eigenvalue µn of Y/ √ n. B.B.P. phase transition I if λ 6 1    µn a.s. − − − → n→∞ 2 X · x̂n a.s. − − − → n→∞ 0 I if λ 1    µn a.s. − − − → n→∞ √ λ + 1 √ λ 2 |X · x̂n| a.s. − − − → n→∞ p 1 − 1/λ 0 (Baik, Ben Arous, Péché ’05)
  • 33. Questions I PCA fails when λ 6 1, but is it still possible to recover the signal?
  • 34. Questions I PCA fails when λ 6 1, but is it still possible to recover the signal? I When λ 1, is PCA optimal?
  • 35. Questions I PCA fails when λ 6 1, but is it still possible to recover the signal? I When λ 1, is PCA optimal? I More generally, what is the best achievable estimation performance in both regimes?
  • 36. Plot of MMSE Figure: Spiked Wigner model, centred binary prior (unit variance).
  • 37. We can certainly improve spectral algorithm!
  • 38. A scalar denoising problem For Y = √ γX0 + Z where X0 ∼ P0 and Z ∼ N(0, 1)
  • 39. A scalar denoising problem For Y = √ γX0 + Z where X0 ∼ P0 and Z ∼ N(0, 1)
  • 40. Bayes optimal AMP We define mmse(γ) = E h X0 − E[X0| √ γX0 + Z] 2 i and the recursion: q0 = 1 − λ−1 qt+1 = 1 − mmse(λqt). With the optimal denoiser gP0 (y, γ) = E[X0| √ γX0 + Z = y], AMP is defined by: xt+1 = Y s λ n ft(xt ) − λbtft−1(xt−1 ), where ft(y) = gP0 (y/ √ λqt, λqt).
  • 41. Bayes optimal AMP: experiment
  • 42. Plot of MMSE Figure: Spiked Wigner model, centred binary prior (unit variance).
  • 43. Plot of MMSE Figure: Spiked Wigner model, centred binary prior (unit variance).
  • 44. Limiting formula for the MMSE Theorem (L, Miolane ’19) MMSEn − − − → n→∞ 1 |{z} Dummy MSE − q∗ (λ)2 where q∗ (λ) is the minimizer of q 0 7→ −EX0∼P0 Z0∼N log Z x0 dP0(x0)e √ λqZ0x0+λqX0x0+λq 2 x2 0 + λ 4 q2 A simplified “free energy landscape”: 0.0 0.2 0.4 0.6 0.8 1.0 q −0.06 −0.05 −0.04 −0.03 −0.02 −0.01 0.00 −F(λ, q) (a) “Easy” phase (λ = 1.01) 0.0 0.2 0.4 0.6 0.8 q −0.002 −0.001 0.000 0.001 0.002 0.003 −F(λ, q) (b) “Hard” phase (λ = 0.625) 0.0 0.2 0.4 0.6 0.8 q 0.0000 0.0025 0.0050 0.0075 0.0100 0.0125 0.0150 0.0175 −F(λ, q) (c) “Impossible” phase (λ = 0.5)
  • 45. Proof ideas: a planted spin system P(X = x | Y) = 1 Zn P0(x)eHn(x) where Hn(x) = X ij s λ n Yi,jxi xj − λ 2n x2 i x2 j .
  • 46. Proof ideas: a planted spin system P(X = x | Y) = 1 Zn P0(x)eHn(x) where Hn(x) = X ij s λ n Yi,jxi xj − λ 2n x2 i x2 j . Two step proof: I Lower bound: Guerra’s interpolation technique. Adapted in (Korada, Macris ’09) (Krzakala, Xu, Zdeborová ’16) ( Y = √ t p λ/n XX| + Z Y0 = √ 1 − t √ λ X + Z0 I Upper bound: Cavity computations (Mézard, Parisi, Virasoro ’87). Aizenman-Sims-Starr scheme:(Aizenman, Sims,Starr ’03) (Talagrand ’10)
  • 47. Part 1: conclusion AMP is an iterative denoising algorithm which is optimal when the energy landscape is simple. Main references for this tutorial: (Montanari Venkataramanan ’21) (L. Miolane ’19) Many recent research directions: universality, structured matrices, community detection... and new applications outside electrical engineering like in ecology.
  • 48. Part 1: conclusion AMP is an iterative denoising algorithm which is optimal when the energy landscape is simple. Main references for this tutorial: (Montanari Venkataramanan ’21) (L. Miolane ’19) Many recent research directions: universality, structured matrices, community detection... and new applications outside electrical engineering like in ecology. Deep learning, the new kid on the block:
  • 49. From stochastic localization to sampling thanks to AMP Target distribution µ Diffusion process: yt = tx∗ + Bt, (x∗ ∼ µ) ⊥ ⊥ B. µt(.) = P (x∗ ∈ .|yt) µ0 = µ → µ∞ = δx∗
  • 50. From stochastic localization to sampling thanks to AMP Target distribution µ Diffusion process: yt = tx∗ + Bt, (x∗ ∼ µ) ⊥ ⊥ B. µt(.) = P (x∗ ∈ .|yt) µ0 = µ → µ∞ = δx∗ There exists a Brownian motion G such that yt solves the SDE: dyt = mt(yt)dt + dGt, where mt(y) = E [x∗|yt = y]
  • 51. From stochastic localization to sampling thanks to AMP Target distribution µ Diffusion process: yt = tx∗ + Bt, (x∗ ∼ µ) ⊥ ⊥ B. µt(.) = P (x∗ ∈ .|yt) µ0 = µ → µ∞ = δx∗ There exists a Brownian motion G such that yt solves the SDE: dyt = mt(yt)dt + dGt, where mt(y) = E [x∗|yt = y] Idea: use AMP for sampling (El Alaoui, Montanari, Sellke ’22), (Montanari, Wu ’23)
  • 52. From stochastic localization to sampling thanks to AMP Idea: use AMP for sampling (El Alaoui, Montanari, Sellke ’22), (Montanari, Wu ’23)
  • 55. N. Wiener: the invention of cybernetics
  • 57. Lessons learned from AI winters: Common Task Framework (CTF) Performance Assessment of Automatic Speech Recognizers (Pallett ’85) ”Definitive tests to fully characterize automatic speech recognizer or system performance cannot be specified at present. However, it is possible to design and conduct performance assessment tests that make use of widely available speech data bases, use test procedures similar to those used by others, and that are well documented. These tests provide valuable benchmark data and informative, though limited, predictive power.”
  • 58. Key factors for the actual success of deep learning
  • 59. The Bitter Lesson by Rich Sutton
  • 60. The Bitter Lesson by Rich Sutton The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin (...) Seeking an improvement that makes a difference in the shorter term, researchers seek to leverage their human knowledge of the domain, but the only thing that matters in the long run is the leveraging of computation (...) the human-knowledge approach tends to complicate methods in ways that make them less suited to taking advantage of general methods leveraging computation.
  • 61. Is human led mathematic over?
  • 62. Is human led mathematic over? If it turns out that some Langlands-like questions can be answered with the use of computation, there is always the possibility that the mathematical community will interpret this as a demonstration that, in hindsight, the Langlands program is not as deep as we thought it was. There is always room to say, “Aha! Now we see that it is just a matter of computation.” (Avigad ’22)
  • 63. Thank you for your attention !