Measures of Central Tendency: Mean, Median and Mode
QMC: Operator Splitting Workshop, Stochastic Block-Coordinate Fixed Point Algorithms - Jean-Christophe Pesquet, Mar 23, 2018
1. 1/12
STOCHASTIC BLOCK-COORDINATE
FIXED POINT ALGORITHMS
Jean-Christophe Pesquet
Center for Visual Computing, CentraleSup´elec, University Paris-Saclay
Joint work with Patrick Louis Combettes
SAMSI Workshop - March 2018
2. 2/12
Motivation
FIXED POINT ALGORITHM
for n = 0, 1, . . .
xn+1 = xn + λn Tnxn − xn ,
where
• x0 ∈ H separable real Hilbert space
• (∀n ∈ N) Tn : H → H
• (λn)n∈N relaxation parameters in ]0, +∞[.
3. 2/12
Motivation
FIXED POINT ALGORITHM
for n = 0, 1, . . .
xn+1 = xn + λn Tnxn − xn ,
• widely used in optimization, game theory, inverse problems, ma-
chine learning,...
• convergence of (xn)n∈N to x ∈ F =
n∈N
Fix Tn, under suitable
assumptions.
E. Picard (1856-1941)
4. 2/12
Motivation
FIXED POINT ALGORITHM
for n = 0, 1, . . .
xn+1 = xn + λn Tnxn − xn ,
• widely used in optimization, game theory, inverse problems, ma-
chine learning,...
• convergence of (xn)n∈N to x ∈ F =
n∈N
Fix Tn, under suitable
assumptions.
In the context of high-dimensional problems,
how to limit computational issues
raised by memory requirements ?
6. 4/12
Block-coordinate algorithm
for n = 0, 1, . . .
for i = 1, . . . , m
xi,n+1 = xi,n + εi,nλn Ti,n(x1,n, . . . , xm,n) + ai,n − xi,n .
BLOCK-COORDINATE ALGORITHM
where
• (∀x ∈ H) Tnx = (Ti,n x)1 i m
where, for every i ∈ {1, . . . , m}, Ti,n : H → Hi is
measurable.
7. 4/12
Block-coordinate algorithm
for n = 0, 1, . . .
for i = 1, . . . , m
xi,n+1 = xi,n + εi,nλn Ti,n(x1,n, . . . , xm,n) + ai,n − xi,n .
BLOCK-COORDINATE ALGORITHM
where
• (∀x ∈ H) Tnx = (Ti,n x)1 i m
where, for every i ∈ {1, . . . , m}, Ti,n : H → Hi is measurable.
• (εn)n∈N = (εi,n)1 i m n∈N
identically distributed D-valued
random variables with D = {0, 1}m {0}.
8. 4/12
Block-coordinate algorithm
for n = 0, 1, . . .
for i = 1, . . . , m
xi,n+1 = xi,n + εi,nλn Ti,n(x1,n, . . . , xm,n) + ai,n − xi,n .
BLOCK-COORDINATE ALGORITHM
where
• (∀x ∈ H) Tnx = (Ti,n x)1 i m
where, for every i ∈ {1, . . . , m}, Ti,n : H → Hi is measurable.
• (εn)n∈N = (εi,n)1 i m n∈N
identically distributed D-valued
random variables with D = {0, 1}m {0}.
• λn ∈ ]0, 1].
9. 4/12
Block-coordinate algorithm
for n = 0, 1, . . .
for i = 1, . . . , m
xi,n+1 = xi,n + εi,nλn Ti,n(x1,n, . . . , xm,n) + ai,n − xi,n .
BLOCK-COORDINATE ALGORITHM
where
• (∀x ∈ H) Tnx = (Ti,n x)1 i m
where, for every i ∈ {1, . . . , m}, Ti,n : H → Hi is measurable.
• (εn)n∈N = (εi,n)1 i m n∈N
identically distributed D-valued
random variables with D = {0, 1}m {0}.
• λn ∈ ]0, 1].
• an = (ai,n)1 i n H-valued random variable: possible error
term.
10. 4/12
Block-coordinate algorithm
for n = 0, 1, . . .
for i = 1, . . . , m
xi,n+1 = xi,n + εi,nλn Ti,n(x1,n, . . . , xm,n) + ai,n − xi,n .
BLOCK-COORDINATE ALGORITHM
where
• (∀x ∈ H) Tnx = (Ti,n x)1 i m
where, for every i ∈ {1, . . . , m}, Ti,n : H → Hi is measurable.
• (εn)n∈N = (εi,n)1 i m n∈N
identically distributed D-valued
random variables with D = {0, 1}m {0}.
• λn ∈ ]0, 1].
• an = (ai,n)1 i n H-valued random variable: possible error
term.
an ≡ 0 and εn ≡ (1, . . . , 1) P-a.s. ⇔ deterministic algorithm with
no error
11. 5/12
Illustration of block activation strategy
Variable selection (∀n ∈ N)
x1,n activated when ε1,n = 1
x2,n activated when ε2,n = 1
x3,n activated when ε3,n = 1
x4,n activated when ε4,n = 1
x5,n activated when ε5,n = 1
x6,n activated when ε6,n = 1
How to choose the variable
εn = (ε1,n, . . . , ε6,n)?
12. 5/12
Illustration of block activation strategy
Variable selection (∀n ∈ N)
x1,n activated when ε1,n = 1
x2,n activated when ε2,n = 1
x3,n activated when ε3,n = 1
x4,n activated when ε4,n = 1
x5,n activated when ε5,n = 1
x6,n activated when ε6,n = 1
How to choose the variable
εn = (ε1,n, . . . , ε6,n)?
P[εn = (1, 1, 0, 0, 0, 0)] = 0.1
13. 5/12
Illustration of block activation strategy
Variable selection (∀n ∈ N)
x1,n activated when ε1,n = 1
x2,n activated when ε2,n = 1
x3,n activated when ε3,n = 1
x4,n activated when ε4,n = 1
x5,n activated when ε5,n = 1
x6,n activated when ε6,n = 1
How to choose the variable
εn = (ε1,n, . . . , ε6,n)?
P[εn = (1, 1, 0, 0, 0, 0)] = 0.1
P[εn = (1, 0, 1, 0, 0, 0)] = 0.2
14. 5/12
Illustration of block activation strategy
Variable selection (∀n ∈ N)
x1,n activated when ε1,n = 1
x2,n activated when ε2,n = 1
x3,n activated when ε3,n = 1
x4,n activated when ε4,n = 1
x5,n activated when ε5,n = 1
x6,n activated when ε6,n = 1
How to choose the variable
εn = (ε1,n, . . . , ε6,n)?
P[εn = (1, 1, 0, 0, 0, 0)] = 0.1
P[εn = (1, 0, 1, 0, 0, 0)] = 0.2
P[εn = (1, 0, 0, 1, 1, 0)] = 0.2
16. 6/12
Convergence analysis
NOTATION
(Fn)n∈N sequence of sigma-algebras such that
(∀n ∈ N) Fn ⊂ F and σ(x0, . . . , xn) ⊂ Fn ⊂ Fn+1
where σ(x0, . . . , xn) is the smallest σ-algebra generated by
(x0, . . . , xn).
17. 6/12
Convergence analysis
NOTATION
(Fn)n∈N sequence of sigma-algebras such that
(∀n ∈ N) Fn ⊂ F and σ(x0, . . . , xn) ⊂ Fn ⊂ Fn+1
where σ(x0, . . . , xn) is the smallest σ-algebra generated by
(x0, . . . , xn).
ASSUMPTIONS
(i) F = ∅.
(ii) infn∈N λn > 0.
(iii) There exists a sequence (αn)n∈N in [0, +∞[ such that
n∈N
√
αn < +∞ and (∀n ∈ N) E( an
2 |Fn) αn.
(iv) For every n ∈ N, En = σ(εn) and Fn are independent.
(v) For every i ∈ {1, . . . , m}, pi = P[εi,0 = 1] > 0.
18. 7/12
Convergence results
[Combettes, Pesquet, 2015]
Suppose that supn∈N λn < 1 and that, for every n ∈ N, Tn is
quasinonexpansive, i.e.
(∀z ∈ Fix Tn)(∀x ∈ H) Tnx − z x − z .
Then
(i) (Tnxn − xn)n∈N converges strongly P-a.s.to 0.
(ii) Suppose that, almost surely, every sequential cluster point
of (xn)n∈N belongs to F. Then (xn)n∈N converges weakly
P-a.s.to an F-valued random variable.
REMARK
Conditions met for many algorithms for solving monotone
inclusion problems, e.g., the forward-backward or the
Douglas-Rachford algorithm.
19. 8/12
Convergence results
[Combettes, Pesquet, 2017]
Assume that
F = {x} = {(xi)1 i m}
(∀n ∈ N)(∀x = (xi)1 i m ∈ H) Tnx − x 2
m
i=1
τi,n xi − xi
2
,
where {τi,n | 1 i m, n ∈ N} ⊂]0, +∞[. Then
(∀n ∈ N) E( xn+1−x 2
|F0)
max
1 i m
pi
min
1 i m
pi
n
k=0
χk x0−x 2
+ηn.
with, for every n ∈ N,
ξn =
αn
min
1 i m
pi
, µn = 1 − min
1 i m
pi 1 − τi,n
χn = 1 − λn(1 − µn) + ξnλn(1 + λn
√
µn)
ηn =
n
k=0
n
=k+1
χ λk 1 + λk
√
µk + λk ξk ξk.
20. 8/12
Convergence results
[Combettes, Pesquet, 2017]
Assume that
F = {x} = {(xi)1 i m}
(∀n ∈ N)(∀x = (xi)1 i m ∈ H) Tnx − x 2
m
i=1
τi,n xi − xi
2
,
where {τi,n | 1 i m, n ∈ N} ⊂]0, +∞[ and
(∀i ∈ {1, . . . , m}) sup
n∈N
τi,n < 1.
Suppose that x0 ∈ L2(Ω, F, P; H).
Then (xn)n∈N converges to x both in the mean square and
strongly P-a.s. senses.
21. 9/12
Behavior in the absence of errors
• Under the same assumptions, linear convergence rate.
• Comparison with deterministic case
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
χ = 0.95
χ = 0.8
χ = 0.6
χ = 0.4
χ = 0.2
χ = 0.1
(p)/ (1) as a function of p for various values of χ
(p) = −
ln 1−(1−χ)p
p : convergence rate normalized by the
computational cost when (∀i ∈ {1, . . . , m}) pi = p
χ: convergence factor in the deterministic case.
22. 9/12
Behavior in the absence of errors
• Under the same assumptions, linear convergence rate.
• Accuracy of upper bounds for a variational problem in
multicomponent image recovery
0 20 40 60 80 100 120 140 160 180 200
-120
-100
-80
-60
-40
-20
0
E xn − x 2
/E x0 − x 2
(in dB) versus iteration number n
when p = 1, p = 0.8, p = 0.46.
Theoretical upper bound in dashed lines.
23. 10/12
Influence of stochastic errors
Assume that
αn = O(n−θ
)
with θ ∈ ]2, +∞[.
Then
E xn − x 2
= O(n−θ/2
).
loss of the linear convergence
24. 11/12
Open issue: deterministic block activation
Let
(∀x ∈ H) |||x|||2
=
m
i=1
ωi xi
2
,
where max
1 i m
ωipi = 1.
Assume that λn ≡ 1 and an ≡ 0. Then
(∀n ∈ N) E(|||xn+1 − x|||2
|Fn)
=
m
i=1
ωipi Ti,n xn − xi
2
+
m
i=1
ωi(1 − pi) xi,n − xi
2
Tnxn − x 2
+ |||xn − x|||2
−
m
i=1
ωipi xi,n − xi
2
|||xn − x|||2
+
m
i=1
(τi,n − ωipi)
0
xi,n − xi
2
.
stochastic Fej´er monotonicity [Combettes, Pesquet, 2015]
25. 12/12
Open issue: more directional convergence conditions
Example:
minimize
x∈H
f(x) = g
m
i=1
Lixi +
θ
2
x 2
where g: G → R convex 1-Lipschitz differentiable, G separable
real Hilbert space, (∀i ∈ {1, . . . , m}) Li bounded linear from Hi
to G, θ ∈]0, +∞[
• stochastic approach
Tn = Id − γn f
⇒ (∀i ∈ {1, . . . , m}) τn,i = 1 − γnθ
γn < 2
m
i=1 L∗
i Li +2θ
• deterministic approach (quasi cyclic activation)
γn < 2
Lin
2+2θ