A

Startup ADMM
Shin Matsushima
Department of Statistics
Purdue University
Lab Seminar
April 2, 2012
Shin Matsushima (Purdue University) Startup ADMM April 2, 2012 1 / 22

The Paper
”Distributed Optimization and Statistical Learning
via the Alternating Direction Method of Multipliers”
Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato and Jonathan Eckstein
Foundations and Trends in Machine Learning Vol. 3, No. 1 (2010) 1-122
URL: http://www.stanford.edu/˜boyd/papers/pdf/admm distr stats.pdf

Outline
1 Dual Ascent
2 Method of Multipliers
3 Alterative Direction Method of Multipliers (ADMM)
4 The following contents

Consider the equality-comnstrained convex optimization problem
minimize f (x)
subject to Ax = b
where x ∈ Rn
, A ∈ Rm×n
and f is convex.
Lagrangian:
L(x, y) = f (x) + y (Ax − b)

Assume strong duality:
inf
x
sup
y
L(x, y) = sup
y
inf
x
L(x, y)
g(y):concave
Aiming to solve the dual problem:
maximize g(y) = inf
x
f (x) + y (Ax − b)
and recover a primal optimal point using y (optimal solution of above)
x = argmin
x
L(x, y )

Dual Ascent
Procedure(Dual Ascent):
xk+1
= argmin
x
L(x, yk
)
= argmin
x
f (x) + y Ax
yk+1
= yk
+ αk
(Axk+1
− b)
Axk+1 − b ∈ ∂g(yk) because
g(yk
) = min
x
f (x) + yk
(Ax − b) = f (xk+1
) + yk
(Axk+1
− b)
g(y) = min
x
f (x) + y (Ax − b) ≤ f (xk+1
) + y (Axk+1
− b)
⇒ g(y) − g(yk
) ≤ (y − yk
) (Axk+1
− b)
Necessary to chose appropriate stepsize αk

Dual Decomposition
In the problem
minimize f (x)
subject to Ax = b,
assume f (x) = N
i=1 fi (xi ) where xi ∈ Rni
, x = [x1 · · · xN ]
Let Ai ∈ Rm×ni
, A = [A1 · · · AN]
Then the algorithm becomes decentraized:
xk+1
= argmin
x
L(x, yk
)
= argmin
xi
f (xi ) + yk
Ai xi for i = 1, . . . , N
yk+1
broadcast
= yk
+ αk
(A xk+1
gather
−b)

Original Problem:
minimize f (x)
subject to Ax = b
Original Lagrangian:
L(x, y) = f (x) + y (Ax − b)
Augmented Lagrangian:
Lρ(x, y) = f (x) + y (Ax − b) + ρ/2 Ax − b 2
is considered Lagrangian for the following equality-comnstrained convex
optimization problem wich is equivalent to the original problem
minimize f (x) + ρ/2 Ax − b 2
subject to Ax = b

Method of Multipliers
Procedure(Method of Multipliers):
xk+1
= argmin
x
Lρ(x, yk
)
= argmin
x
L(x, yk
) + ρ/2 Ax − b 2
yk+1
= yk
+ ρ(Axk+1
− b)
stepsize is now a ﬁxed constant ρ.

An easy understanding for setting stepsize to ρ is the following:
We can see
xk+1
= argmin
x
Lρ(x, yk
)
⇒ 0 ∈ ∂xk+1 {f (xk+1
) + yk
Axk+1
+ ρ/2 Axk+1
− b 2
}
= ∂xk+1 f (xk+1
) + A yk
+ ρ(Axk+1
− b)
= ∂xk+1 f (xk+1
) + A yk+1
This implies that MM keeps
0 ∈ ∂f (xk+1
) + A yk+1
after every iteration. Note that
0 ∈ ∂f (x ) + A y (dual feasibility)

Method of Multipliers has more improved convergence property.
But the augmented term disables to make it separate.
xk+1
= argmin
x
Lρ(x, yk
)
= argmin
x
L(x, yk
) + ρ/2 Ax − b 2
= argmin
xi
fi (xi ) + yk
Ai xi + · · ·
yk+1
= yk
+ ρ(Axk+1
− b)

ADMM solves problems in the following form
minimize f (x) + g(z)
subject to Ax + Bz = c
Augmented Lagrangian:
Lρ(x, z, y) = f (x) + g(z) + y (Ax + Bz − c) + ρ/2 Ax − Bz − c 2

Alternative Direction Method of Mutipliers(ADMM)
Procedure(ADMM):
xk+1
= argmin
x
Lρ(x, zk
, yk
)
= argmin
x
f (x) + yk
(Ax + Bzk
− c) + ρ/2 Ax + Bzk
− c 2
zk+1
= argmin
z
Lρ(xk+1
, z, yk
)
yk+1
= yk
+ ρ(Axk+1
+ Bzk+1
− c)
This includes x-minimization step and z-minimization step.
c.f. Method of multipliers should be
xk+1
, zk+1
= argmin
x,z
Lρ(x, z, yk
)
yk+1
= yk
+ ρ(Axk+1
+ Bzk+1
− c)

Scaled Form of Alternative Direction Method of
Mutipliers(ADMM)
Let rk = Axk + Bzk − c (residual) and uk = (1/ρ)yk(scaled dual variable),
The procedure of ADMM can be rewritten as follows:
Procedure(scaled form of ADMM):
xk+1
= argmin
x
Lρ(x, zk
, yk
)
= argmin
x
f (x) + yk
(Ax + Bzk
− c) + ρ/2 Ax + Bzk
− c 2
= argmin
x
f (x) + ρ/2 Ax + Bzk
− c + uk 2
zk+1
= argmin
z
Lρ(xk+1
, z, yk
)
uk+1
= uk
+ (Axk+1
+ Bzk+1
− c)
= uk
+ rk+1

Convergence of ADMM
Assumption :
1 f and g are closed, proper, and convex
2 L(x, z, y) = L0(x, z, y) has a saddle point
Result : As k → ∞,
1 Residual Convergence
rk
→ 0
2 Objective Convergence
f (xk
) + g(zk
) → p∗
3 Dual variable Convergence
yk
→ y∗
where y∗
is a dual optimal point
In practice:
a few tens of iteration will often produce modest accuracy solution
it can be very slow to get high accuracy solution

Other Characteristics of ADMM
optimality condition and stop conditioin
some variants
Varying ρk
→ 0
More general Augumenting terms
Inexact x/z-minimization step
...

Remarkable Application Discussed in the Following Chapter
Chap 5: Problem with more general constraint
minimize f (x)
subject to x ∈ C
is transfoemed as
minimize f (x) + IC(z)
subject to x − z = 0
Procedure:
xk+1
= argmin
x
f (x) + ρ/2 x − zk
+ uk 2
zk+1
= argmin
z∈C
ρ/2 xk+1
− z + uk 2
uk+1
= uk
+ (xk+1
− zk+1
)

Chap 7: Distributed Version (Consensus) of
minimize f (x) =
N
i=1
fi (x)
is transfromed as
minimize
N
i=1
fi (xi )
subject to xi − z = 0 i = 1, . . . , N
Procedure:
xk+1
= argmin
x
Lρ(x, zk
, yk
)
= argmin
xi
fi (xi ) + yk
i (xi − z) + ρ/2 xi − z 2
zk+1
= 1/N
N
i=1
(xk+1
i + (1/ρ)yk
i )
yk+1
= yk
+ ρ(xk+1
− zk+1
)

Contents
4 General patterns
Tips and tools used after this chaper
5 Constrained Convex Optimization
How to incorpolate general constraints
6
1-Norm Problems
Discussion about problems involving 1-Norm
7 Consensus and Sharing
Framework for distributed optimization
8 Distributed Model Fitting
Examples for the distributed optimization
9 Nonconvex problems
10 Implementation
11 Numerical Examples
12 Conclusion

A

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A

Similar to A (20)

Recently uploaded

Recently uploaded (20)

A