Distributed Support Vector Machines

Distributed SVM
Harsha Vardhan
IIT Gandhinagar
harsha.tetali@iitgn.ac.in
April 30, 2017
Harsha Vardhan (IIT Gandhinagar) ADMM April 30, 2017 1 / 21

Overview
1 Alternating Direction Method of Multipliers
Objective
The Lagrangian Dual
Formulation of ADMM
2 Distributed SVM

ADMM
Objective
min
x∈Rn,z∈Rm
f (x) + g(z) (1)
subject to Ax + Bz = c

ADMM-Lagrangian
Lagrangian without the Penalty Term
Lρ(x, z, λ) = f (x) + g(z) + λ (Ax + Bz − c) (2)

ADMM-Lagrangian
Lagrangian with the Penalty Term
Lρ(x, z, λ) = f (x) + g(z) + λ (Ax + Bz − c) +
ρ
2
Ax + Bz − c 2
(3)
ρ > 0 is called the Augmented Lagrangian Parameter. This Lagrangian
with added penalty term is also called the Augmented Lagrangian.

ADMM
Formulation
We need the following:
p∗ = Inf{f (x) + g(z)|Ax + Bz = c} (4)
We have the dual problem formulated as:
g(λ) = inf
x,z
f (x) + g(z) + λ (Ax + Bz − c) +
ρ
2
Ax + Bz − c 2
(5)

ADMM
Formulation
Assuming that the saddle point of Lρ(x, z, λ) exists and that we have
strong duality, we can write:
p∗ = d∗ = sup
λ
inf
x,z
f (x) + g(z) + λ (Ax + Bz − c) +
ρ
2
Ax + Bz − c 2
(6)

ADMM
Formulation
Writing down the complete optimization problem, formulated till now, we
have,
p∗ = d∗ = sup
λ
inf
x,z
f (x) + g(z) + λ (Ax + Bz − c) +
ρ
2
Ax + Bz − c 2
(7)

ADMM
Formulation
The problem can be restated as,
sup
λ
inf
z
inf
x
f (x) + g(z) + λ (Ax + Bz − c) +
ρ
2
Ax + Bz − c 2
(8)

ADMM
Formulation
We try to solve the underlined problem ﬁrst:
sup
λ
inf
z
inf
x
f (x) + g(z) + λ (Ax + Bz − c) +
ρ
2
Ax + Bz − c 2
(9)
To solve this, we follow the rule:
x(k+1)
:= arg min
x
L(x, z(k)
, λ(k)
) (10)

ADMM
Formulation
We follow the Gauss Seidel Approach to update the remaining variables,
i.e. we use the updated values of the variables already updated. Now for
the problem underlined below:
sup
λ
inf
z
inf
x
f (x) + g(z) + λ (Ax + Bz − c) +
ρ
2
Ax + Bz − c 2
(11)
We use:
z(k+1)
:= arg min
z
L(x(k+1)
, z, λ(k)
) (12)

ADMM
Formulation
Now we need to have an update of the variable λ, for this we need to solve
the outermost maximization problem. Finding the derivative of g w.r.t λ,
we get
g(λ) = Ax + Bz − c (13)
Now we go in the direction of ascent to increase the value of the function
g(.), so we write the following update rule,
λ(k+1)
:= λ(k)
+ ρ(Ax + Bz − c) (14)
Here the step size associated with the gradient is set equal to the
Augmented Lagrangian parameter ρ.

ADMM
Formulation
Thus, the ﬁnal formulation of the ADMM algorithm is:
x(k+1)
:= arg min
x
L(x, z(k)
, λ(k)
) (15)
z(k+1)
:= arg min
z
L(x(k+1)
, z, λ(k)
) (16)
λ(k+1)
:= λ(k)
+ ρ(Ax + Bz − c) (17)

Support Vector Machines
In Support Vector Machines, the main aim is to ﬁnd a hyper-plane w that
separates data linearly in the space where the data resides. This can be
posed as the following optimization problem.g
Support Vector Machine
Given a dataset {(xi , yi )}l
i=1(xi ∈ Rn, yi ∈ −1, +1) in L2-regularized
L2-loss (squared hinge loss) SVM,
min
w
1
2
w 2
2 + C
l
i=1
max(1 − yi w xi , 0) (18)

Distributed Support Vector Machines
The task of linear classification is now distributed among various
machines, for this each machine will have a different dataset each of
handleable size. Now to make the problem amenable to decomposition, we
first let {B1, B2, ..., Bm} be a partition of the data indices {1, 2, ..., l}.
SVM in the Distributed Setting
min
w
1
2
w 2
2 + C
m
j=1 i∈Bj
max(1 − yi w xi , 0) (19)

Distributed Support Vector Machine
Let us say, each machine working with its own dataset finds an optimal w
for its iteration, so this implies that, there is no single w, there is a set of
such weight vectors which each machine tries to figure out. Thus we
represent each of then by wj for j = 1, 2, ..., m machines. Now we want
the global vector w to be a unique vector and not many vectors, so we
impose the condition a new artificial condition:
z = w1 = w2 = · · · = wm
SVM in the Distributed Setting
In this setting the distributed SVM takes the following form:
min
w1,...,wj ,z
1
2
z 2
2 + C
m
j=1 i∈Bj
max(1 − yi wj xi , 0) (20)
subject to wj − z = 0

Distributed Support Vector Machine
Now let us write the Augmented Lagrangian.
Augmented Lagrangian
L(w, z, λ) =
1
2
z 2
2 + C
m
j=1 i∈Bj
max(1 − yi wj xi , 0)
+
m
j=1
ρ
2
wj − z 2
2 + λj (wj − z) (21)
where, w := {w1, w2, ..., wm} and λ := {λ1, λ2, ..., λm}

The ADMM Way
Now we use ADMM to optimize the above Lagrangian. As seen in the
ADMM section, we have the following update rules.
ADMM on Distributed SVM
w(k+1)
= arg min
w
L(w, z(k)
, λ(k)
) (22)
z(k+1)
= arg min
z
L(w(k+1)
, z, λ(k)
) (23)
λ
(k+1)
j = λ
(k)
j + ρ w
(k+1)
j − zk+1
, j = 1, ..., m (24)

The Update Equations
The problem in 22 can be parallelized to m machines with each machine
solving the following minimization problem:
w Update
w
(k+1)
j = arg min
w i∈Bj
max(1−yi w xi , 0)+
ρ
2
w −z(k) 2
2+λ
(k)
j w − z(k)
(2
j = 1, ..., m
z Update
z(k+1)
=
ρ m
i=1 w
(k+1)
j + λ
(k)
j
mρ + 1
(26)

References
Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J., 2011. Distributed
optimization and statistical learning via the alternating direction method of
multipliers. Foundations and Trends in Machine Learning, 3(1), pp.1-122.
Zhang, C., Lee, H. and Shin, K.G., 2012. Eﬃcient Distributed Linear Classiﬁcation
Algorithms via the Alternating Direction Method of Multipliers. In AISTATS (pp.
1398-1406).

The End

Distributed Support Vector Machines

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Distributed Support Vector Machines

Similar to Distributed Support Vector Machines (20)

Recently uploaded

Recently uploaded (20)

Distributed Support Vector Machines