CVPR2010: higher order models in computer vision: Part 3

Belief Propagation Relaxations MAP-MRF LP Lagrangian Relaxation Dual Decomposition Polyhedral Higher-order Interactions

Higher Order Models in Computer Vision:
Inference

Carsten Rother, Sebastian Nowozin

Machine Learning and Perception Group
Microsoft Research Cambridge

San Francisco, 18th June 2010

Carsten Rother, Sebastian Nowozin Microsoft Research Cambridge
Higher Order Models in Computer Vision: Inference


Belief Propagation

Belief Propagation Reminder

Sum-product belief propagation rF →Yi
Max-product/Max-sum/Min-sum belief
.
. Yi .
.
propagation qYi →F
F

Uses
Messages: vectors at the factor graph
edges
qYi →F ∈ RYi , the variable-to-factor
message, and
rF →Yi ∈ RYi , the factor-to-variable
message.
Updates: iteratively recomputing these
vectors



Belief Propagation

Belief Propagation Reminder: Min-Sum Updates (cont)

Variable-to-factor message
rA→Yi
A qYi →F
qYi →F (yi ) = rF →Yi (yi )
Yi
F ∈
M(i){F } B rF →Yi F
. rB→Yi
.
.

Factor-to-variable message
  Yj qYj →F
rF →Yi
  Yi
rF →Yi (yi ) = min EF (yF ) +
 qYj →F (yj )

Yk
qYk →F F qYi →F
y ∈YF F
yi =yi
j∈ .
.
.
N(F ){i}



Belief Propagation

Higher-order Factors

Factor-to-variable message
  Yj
 
rF →Yi (yi ) = min EF (yF ) +
 qYj →F (yj )
 Yk
y ∈YF F rF →Yi
j∈
yi =yi N(F ){i} Yi
Yl F
.
.
.
Minimization becomes intractable for Ym
general EF
Tractable for special structure of EF Yn
See (Felzenszwalb and Huttenlocher,
CVPR 2004), (Potetz, CVPR 2007), and
(Tarlow et al., AISTATS 2010)



Relaxations

Problem Relaxations
Optimization problems (minimizing g : G → R over Y ⊆ G) can
become easier if
feasible set is enlarged, and/or
objective function is replaced with a bound.

g(x),
h(x)

g(y)

Y
h(z)

Z⊇Y
y, z



Relaxations

Problem Relaxations
Optimization problems (minimizing g : G → R over Y ⊆ G) can
become easier if
feasible set is enlarged, and/or
objective function is replaced with a bound.

Deﬁnition (Relaxation (Geoﬀrion, 1974))
Given two optimization problems (g , Y, G) and (h, Z, G), the problem
(h, Z, G) is said to be a relaxation of (g , Y, G) if,
1. Z ⊇ Y, i.e. the feasible set of the relaxation contains the feasible
set of the original problem, and
2. ∀y ∈ Y : h(y ) ≤ g (y ), i.e. over the original feasible set the objective
function h achieves no larger values than the objective function g .
(Minimization version)



Relaxations

Relaxations

Relaxed solution z ∗ provides a bound:

h(z ∗ ) ≥ g (y ∗ ), (maximization: upper bound)
h(z ∗ ) ≤ g (y ∗ ), (minimization: lower bound)

Relaxation is typically tractable
Evidence that relaxations are “better” for learning (Kulesza and
Pereira, 2007), (Finley and Joachims, 2008), (Martins et al., 2009)

Drawback: z ∗ ∈ Z Y possible
solution is not integral (but fractional), or
solution violates constraints (primal infeasible).
To obtain z ∈ Y from y ∗ : rounding, heuristics



Relaxations

Constructing Relaxations

Principled methods to construct relaxations
Linear Programming Relaxations
Lagrangian relaxation
Lagrangian/Dual decomposition



MAP-MRF LP

MAP-MRF Linear Programming Relaxation

Simple two variable pairwise MRF

Y1 Y2

Y1 Y1 × Y2 Y2

θ1 θ1,2 θ2



MAP-MRF LP


Y1 Y2

Y1 Y1 × Y2 Y2

y1 = 2 (y1 , y2 ) = (2, 3)
y2 = 3

Energy E (y ; x) = θ1 (y1 ; x) + θ1,2 (y1 , y2 ; x) + θ2 (y2 ; x)
1
Probability p(y |x) = Z (x) exp{−E (y ; x)}
MAP prediction: argmax p(y |x) = argmin E (y ; x)
y ∈Y y ∈Y



MAP-MRF LP


Y1 Y2

Y1 Y1 × Y2 Y2

µ1 ∈ {0, 1}Y1 µ1,2 ∈ {0, 1}Y1 ×Y2 µ2 ∈ {0, 1}Y2
µ1 (y1 ) = y2 ∈Y2 µ1,2 (y1 , y2 ) y1 ∈Y1 µ1,2 (y1 , y2 ) = µ2 (y2 )
y1 ∈Y1 µ1 (y1 ) = 1 (y1 ,y2 )∈Y1 ×Y2 µ1,2 (y1 , y2 ) = 1 y2 ∈Y2 µ2 (y2 ) = 1

Energy is now linear: E (y ; x) = θ, µ



MAP-MRF LP

MAP-MRF Linear Programming Relaxation (cont)

min θi (yi )µi (yi ) + θi,j (yi , yj )µi,j (yi , yj )
µ
i∈V yi ∈Yi {i,j}∈E (yi ,yj )∈Yi ×Yj

sb.t. µi (yi ) = 1, ∀i ∈ V ,
yi ∈Yi

µi,j (yi , yj ) = µi (yi ), ∀{i, j} ∈ E , ∀yi ∈ Yi
yj ∈Yj

µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,
µi,j (yi , yj ) ∈ {0, 1}, ∀{i, j} ∈ E , ∀(yi , yj ) ∈ Yi × Yj .



MAP-MRF LP


µ

sb.t. µi (yi ) = 1, ∀i ∈ V ,
yi ∈Yi

yj ∈Yj

µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,

→ NP-hard integer linear program



MAP-MRF LP


µ

sb.t. µi (yi ) = 1, ∀i ∈ V ,
yi ∈Yi

yj ∈Yj

µi (yi ) ∈ [0, 1], ∀i ∈ V , ∀yi ∈ Yi ,
µi,j (yi , yj ) ∈ [0, 1], ∀{i, j} ∈ E , ∀(yi , yj ) ∈ Yi × Yj .

→ linear program
→ LOCAL polytope



MAP-MRF LP

MAP-MRF LP Relaxation, Works

MAP-MRF LP Analysis
(Wainwright et al., Trans. Inf. Tech., 2005), (Weiss et al., UAI
2007), (Werner, PAMI 2005)
TRW-S (Kolmogorov, PAMI 2006)

Improving tightness
(Komodakis and Paragios, ECCV 2008), (Kumar and Torr, ICML
2008), (Sontag and Jaakkola, NIPS 2007), (Sontag et al., UAI
2008), (Werner, CVPR 2008), (Kumar et al., NIPS 2007)

Algorithms based on the LP
(Globerson and Jaakkola, NIPS 2007), (Kumar and Torr, ICML
2008), (Sontag et al., UAI 2008)



MAP-MRF LP

MAP-MRF LP Relaxation, General case

B
A

yA ∈YA µA (yA ) = 1,
yA ∈YA µA (yA ) = µB (yB ), ∀yB ∈ YB .
[yA ]B =yB

Generalization to higher-order factors
For B ⊆ A factors: marginalization constraints
Deﬁnes LOCAL polytope (Wainwright and Jordan, 2008)



MAP-MRF LP

MAP-MRF LP Relaxation, Known results

1. LOCAL is tight iﬀ the factor graph is a forest (acyclic)
2. All labelings are vertices of LOCAL (Wainwright and Jordan, 2008)
3. For cyclic graphs there are additional fractional vertices.
4. If all factors have regular energies, the fractional solutions are never
optimal (Wainwright and Jordan, 2008)
5. For models with only binary states: half-integrality, integral variables
are certain

See some examples later...



Lagrangian Relaxation






Lagrangian Relaxation, Motivation

B
A

[yA ]B =yB

MAP-MRF LP with higher-order factors problematic:
Number of LP variables grows with |YA | → exponential in MRF
variables
But: number of constraints stays small (|YB |)
Lagrangian Relaxation allows one to solve the LP by treating the
constraints implicitly





min g (y )
y
sb.t. y ∈ D,
y ∈ C.

Assumption
Optimizing g (y ) over y ∈ D is “easy”.
Optimizing over y ∈ D ∩ C is hard.

High-level idea
Incorporate y ∈ C constraint into objective function
For an excellent introduction, see (Guignard, “Lagrangean Relaxation”,
TOP 2003) and (Lemar´chal, “Lagrangian Relaxation”, 2001).
e




Lagrangian Relaxation (cont)
Recipe
1. Express C in terms of equalities and inequalities

C = {y ∈ G : ui (y ) = 0, ∀i = 1, . . . , I , vj (y ) ≤ 0, ∀j = 1, . . . , J},

ui : G → R differentiable, typically affine,
vj : G → R differentiable, typically convex.
2. Introduce Lagrange multipliers, yielding

min g (y )
y
sb.t. y ∈ D,
ui (y ) = 0 : λ, i = 1, . . . , I ,
vj (y ) ≤ 0 : µ, j = 1, . . . , J.





2. Introduce Lagrange multipliers, yielding

min g (y )
y
sb.t. y ∈ D,
ui (y ) = 0 : λ, i = 1, . . . , I ,
vj (y ) ≤ 0 : µ, j = 1, . . . , J.

3. Build partial Lagrangian

min g (y ) + λ u(y ) + µ v (y ) (1)
y
sb.t. y ∈ D.






min g (y ) + λ u(y ) + µ v (y ) (1)
y
sb.t. y ∈ D.

Theorem (Weak Duality of Lagrangean Relaxation)
For diﬀerentiable functions ui : G → R and vj : G → R, and for any
λ ∈ RI and any non-negative µ ∈ RJ , µ ≥ 0, problem (1) is a relaxation
of the original problem: its value is lower than or equal to the optimal
value of the original problem.






q(λ, µ) := min g (y ) + λ u(y ) + µ v (y ) (1)
y
sb.t. y ∈ D.

4. Maximize lower bound wrt λ, µ

max q(λ, µ)
λ,µ
sb.t. µ≥0

→ eﬃciently solvable if q(λ, µ) can be evaluated




Optimizing Lagrangian Dual Functions
4. Maximize lower bound wrt λ, µ
max q(λ, µ) (2)
λ,µ
sb.t. µ≥0

Theorem (Lagrangean Dual Function)
1. q is concave in λ and µ, Problem (2) is a concave maximization
problem.
2. If q is unbounded above, then the original problem is infeasible.
3. For any λ, µ ≥ 0, let
y (λ, µ) = argminy ∈D g (y ) + λ u(y ) + µ v (u)

Then, a supergradient of q can be constructed by evaluating the
constraint functions at y (λ, µ) as
∂ ∂
u(y (λ, µ)) ∈ q(λ, µ), and v (y (λ, µ)) ∈ q(λ, µ).
∂λ ∂µ



Subgradients, Supergradients

f (x)

Subgradients for convex functions: global aﬃne underestimators
Supergradients for concave functions: global aﬃne overestimators





f (x)

x






f (x)

x
f (y) ≥ f (x) + g (y − x), ∀y





f (x)

x
∂
0∈ ∂x
f (x)




Example behaviour of objectives
(from a 10-by-10 pairwise binary MRF relaxation, submodular)
10
Dual objective
0
Primal objective

−10
Objective

−20

−30

−40

−50
0 50 100 150 200 250
Iteration




Primal Solution Recovery
Assume we solved the dual problem for (λ∗ , µ∗ )
1. Can we obtain a primal solution y ∗ ?
2. Can we say something about the relaxation quality?

Theorem (Suﬃcient Optimality Conditions)
If for a given λ, µ ≥ 0, we have u(y (λ, µ)) = 0 and v (y (λ, µ)) ≤ 0
(primal feasibility) and further we have

λ u(y (λ, µ)) = 0, and µ v (y (λ, µ)) = 0,

(complementary slackness), then
y (λ, µ) is an optimal primal solution to the original problem, and
(λ, µ) is an optimal solution to the dual problem.




Primal Solution Recovery (cont)
But,
we might never see a solution satisfying the optimality condition
is the case only if there is no duality gap, i.e. q(λ∗ , µ∗ ) = g (x, y ∗ )

Special case: integer linear programs
we can always reconstruct a primal solution to
min g (y ) (3)
y
sb.t. y ∈ conv(D),
y ∈ C.
For example for subgradient method updates it is known that
(Anstreicher and Wolsey, MathProg 2009)
T
1
lim y (λt , µt ) → y ∗ of (3).
T →∞ T
t=1




Application to higher-order factors

B
A

[yA ]B =yB

(Komodakis and Paragios, CVPR 2009) propose a MAP inference
algorithm suitable for higher-order factors
Can be seen as Lagrangian relaxation of the marginalization
constraint of the MAP-MRF LP
See also (Johnson et al., ACCC 2007), (Schlesinger and Giginyak,
CSC 2007), (Wainwright and Jordan, 2008, Sect. 4.1.3), (Werner,
PAMI 2005), (Werner, TechReport 2009), (Werner, CVPR 2008)



Komodakis MAP-MRF Decomposition

µ
i∈V yi ∈Yi {i,j}∈ (yi ,yj )∈
E Yi ×Yj

sb.t. µi (yi ) = 1, ∀i ∈ V ,
yi ∈Yi

µi,j (yi , yj ) = 1, ∀{i, j} ∈ E ,
(yi ,yj )∈Yi ×Yj

yj ∈Yj

µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,





min θi , µ i + θi,j , µi,j
µ
i∈V {i,j}∈
E

sb.t. µi (yi ) = 1, ∀i ∈ V ,
yi ∈Yi

µi,j (yi , yj ) = 1, ∀{i, j} ∈ E ,
(yi ,yj )∈Yi ×Yj

yj ∈Yj

µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,




min θi , µ i + θi,j , µi,j + θA (yA )µA (yA )
µ
i∈V {i,j}∈ yA ∈YA
E

sb.t. µi (yi ) = 1, ∀i ∈ V ,
yi ∈Yi

µi,j (yi , yj ) = 1, ∀{i, j} ∈ E ,
(yi ,yj )∈Yi ×Yj

yj ∈Yj

µA (yA ) = µB (yB ), ∀B ⊂ A, yB ∈ YB ,
yA ∈YA ,[yA ]B =yB

µA (yA ) = 1,
yA ∈YA

µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,
µi,j (yi , yj ) ∈ {0, 1}, ∀{i, j} ∈ E , ∀(yi , yj ) ∈ Yi × Yj ,
µA (yA ) ∈ {0, 1}, ∀yA ∈ YA .



min θi , µ i + θi,j , µi,j + θA (yA )µA (yA )
µ
E

sb.t. µi (yi ) = 1, ∀i ∈ V ,
yi ∈Yi

µi,j (yi , yj ) = 1, ∀{i, j} ∈ E ,
(yi ,yj )∈Yi ×Yj

yj ∈Yj

µA (yA ) = µB (yB ) : φA→B (yB ), ∀B ⊂ A, yB ∈ YB ,
yA ∈YA ,[yA ]B =yB

µA (yA ) = 1,
yA ∈YA

µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,
µA (yA ) ∈ {0, 1}, ∀yA ∈ YA .



q(φ) := min θi , µ i + θi,j , µi,j + θA (yA )µA (yA )
µ
E
 

+ φA→B (yB )  µA (yA ) − µB (yB )
B⊂A, yA ∈YA ,[yA ]B =yB
yB ∈YB

sb.t. µi (yi ) = 1, ∀i ∈ V ,
yi ∈Yi

µi,j (yi , yj ) = 1, ∀{i, j} ∈ E ,
(yi ,yj )∈Yi ×Yj

µA (yA ) = 1,
yA ∈YA

yj ∈Yj

µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,
Carsten Rother, Sebastian Nowozin µA (yA ) ∈ {0, 1}, ∀yA ∈ YA . Microsoft Research Cambridge



q(φ) := min θi , µ i + θi,j , µi,j + θA (yA )µA (yA )
µ
E

+ φA→B (yB ) µA (yA ) − φA→B (yB )µB (yB )
B⊂A, yA ∈YA ,[yA ]B =yB B⊂A,
yB ∈YB yB ∈YB

sb.t. µi (yi ) = 1, ∀i ∈ V ,
yi ∈Yi

µi,j (yi , yj ) = 1, ∀{i, j} ∈ E ,
(yi ,yj )∈Yi ×Yj

µA (yA ) = 1,
yA ∈YA

yj ∈Yj

µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,
µA (yA ) ∈ {0, 1}, ∀yA ∈ YA .



q(φ) := min θi , µ i + θi,j , µi,j − φA→B (yB )µB (yB )
µ
i∈V {i,j}∈ B⊂A,
E yB ∈YB

+ φA→B (yB ) µA (yA ) + θA (yA )µA (yA )
B⊂A, yA ∈YA ,[yA ]B =yB yA ∈YA
yB ∈YB

sb.t. µi (yi ) = 1, ∀i ∈ V ,
yi ∈Yi

µi,j (yi , yj ) = 1, ∀{i, j} ∈ E ,
(yi ,yj )∈Yi ×Yj

µA (yA ) = 1,
yA ∈YA

yj ∈Yj

µi (yi ) ∈ {0, 1}, ∀i ∈ V , ∀yi ∈ Yi ,
µA (yA ) ∈ {0, 1}, ∀yA ∈ YA .



Message passing

B D

φA→B φA→D

φA→C A φA→E

C E
Maximizing q(φ) corresponds to message passing from the
higher-order factor to the unary factors




The higher-order subproblem

min φA→B (yB ) µA (yA ) + θA (yA )µA (yA )
µA
B⊂A, yA ∈YA ,[yA ]B =yB yA ∈YA
yB ∈YB

sb.t. µA (yA ) = 1,
yA ∈YA

µA (yA ) ∈ {0, 1}, ∀yA ∈ YA .

Simple structure: select one element
Simplify to an optimization problem over yA ∈ YA




The higher-order subproblem

min θA (yA ) + I ([yA ]B = yB )φA→B (yB )
yA ∈YA
B⊂A,
yB ∈YB

Solvability depends on structure of θA as function of yA
∗
(For now) assume we can solve for yA ∈ YA
How can we use this to solve the original large LP?




Maximizing the dual function: Subgradient Method
Maximizing q(φ):
1. Initialize φ
2. Iterate
2.1 Given φ solve pairwise MAP subproblem, obtain µ∗ , µ∗
1 2
2.2 Given φ solve higher-order MAP subproblem, obtain y∗A
2.3 Update φ
Supergradient
∂ ∗
(I ([yA ]B = yB ) − µB (yB ))
∂φB (yB ) B⊂A,
yB ∈YB

Supergradient ascent, step size αt > 0
∗
φB (yB ) ← φB (yB ) + αt (I ([yA ]B = yB ) − µB (yB ))
B⊂A,
yB ∈YB

The step size αt must satisfy the “diminishing step size conditions”,
i) limt→∞ αt → 0, and ii) ∞ αt → ∞.
t=0




Subgradient Method (cont)

Typical step size choices
1. Simple diminishing step size,
1+m
αt = ,
t +m
where m > 0 is an arbitrary constant.
2. Polyak’s step size,

q − q(λt , µt )
¯
αt = β t t , µt )) 2 + v (y (λt , µt )) 2
,
u(y (λ

where
0 < β t < 2 is a diminishing step size, and
q ≥ q(λ∗ , µ∗ ) is an upper bound on the optimal dual objective.
¯




Subgradient Method (cont)

Primal solution recovery
Uniform average
T
1
lim y (λt , µt ) → y ∗ .
T →∞ T
t=1

Geometric series (volume algorithm)

y t = γy (λt , µt ) + (1 − γ)¯ t−1 ,
¯ y

where 0 < γ < 1 is a small constant such as γ = 0.1, and y t might
¯
be slightly primal infeasible.




Pattern-based higher order potentials

Example (Pattern-based potentials)
Given a small set of patterns P = {y1 , y2 , . . . , yK }, deﬁne

CyA if yA ∈ P,
θA (yA ) =
Cmax otherwise.

(Rother et al, CVPR 2009), (Komodakis and Paragios, CVPR 2009)
Generalizes P n -Potts model (Kohli et al., CVPR 2007)
“Patterns have low energies”: Cy ≤ Cmax for all y ∈ P




Pattern-based higher order potentials
Example (Pattern-based potentials)
Given a small set of patterns P = {y1 , y2 , . . . , yK }, deﬁne

CyA if yA ∈ P,
θA (yA ) =
Cmax otherwise.

Can solve

min θA (yA ) + I ([yA ]B = yB )φA→B (yB )
yA ∈YA
B⊂A,
yB ∈YB

by
1. Testing all yA ∈ P, and
2. Fixing θA (yA ) = Cmax and solving for the minimum of the second
term.


Dual Decomposition





Dual Decomposition

Dual/Lagrangian Decomposition
Additive structure
K
min gk (y )
y
k=1
sb.t. y ∈ Yk , ∀k = 1, . . . , K ,
K
such that k=1 gk (y ) is hard, but for any k,

min gk (y )
y
sb.t. y ∈ Yk

is tractable.
(Guignard, “Lagrangean Decomposition”, MathProg 1987)
(Conejo et al., “Decomposition Techniques in Mathematical
Programming”, 2006)



Dual Decomposition

Dual/Lagrangian Decomposition
Idea
1. Selectively duplicate variables (“variable splitting”)
K
min gk (yk )
y1 ,...,yK ,y
k=1
sb.t. y ∈Y ,
yk ∈ Yk , k = 1, . . . , K ,
y = yk : λk , k = 1, . . . , K , (4)

where Y ⊇ k=1,...,K Yk .
2. Add coupling equality constraints between duplicated variables
3. Apply Lagrangian relaxation to the coupling constraint

Also known as dual decomposition and variable splitting.



Dual Decomposition

Dual/Lagrangian Decomposition (cont)

K
min gk (yk )
y1 ,...,yK ,y
k=1
sb.t. y ∈Y ,
yk ∈ Yk , k = 1, . . . , K ,
y = yk : λk , k = 1, . . . , K . (5)



Dual Decomposition


K K
min gk (yk ) + λk (y − yk )
y1 ,...,yK ,y
k=1 k=1
sb.t. y ∈Y ,
yk ∈ Yk , k = 1, . . . , K ,



Dual Decomposition


K K
min gk (yk ) − λk yk + λk y
y1 ,...,yK ,y
k=1 k=1
sb.t. y ∈Y ,
yk ∈ Yk , k = 1, . . . , K ,

Problem is decomposed
There can be multiple possible decompositions of diﬀerent quality



Dual Decomposition

Dual function
K K
q(λ) := min gk (yk ) − λk yk + λk y
y1 ,...,yK ,y
k=1 k=1
sb.t. y ∈Y ,
yk ∈ Yk , k = 1, . . . , K ,

Dual maximization problem

max q(λ)
λ
K
sb.t. λk = 0,
k=1

K
where {λ| k=1 λk = 0} is the domain where q(λ) > −∞.


Dual Decomposition


Dual maximization problem

max q(λ)
λ
K
sb.t. λk = 0,
k=1

Projected subgradient method, using
K K
∂ 1 1
(y − yk ) − (y − y ) = y − yk .
∂λk K K
=1 =1



Dual Decomposition

Example (Connectivity, Vicente et al., CVPR 2008)



Dual Decomposition


min E (y ) + C (y ),
y

where E (y ) is a normal binary pairwise segmentation energy, and

0 if marked points are connected,
C (y ) =
∞ otherwise.



Dual Decomposition


miny ,z E (y ) + C (z)
sb.t. y = z.



Dual Decomposition


miny ,z E (y ) + C (z)
sb.t. y = z : λ.



Dual Decomposition


miny ,z E (y ) + C (z) + λ (y − z)



Dual Decomposition


q(λ) := miny ,z E (y ) + λ y + C (z) − λ z
Subproblem 1 Subproblem 2

Subproblem 1: normal binary image segmentation energy
Subproblem 2: shortest path problem (no pairwise terms)
→ Maximize q(λ) to ﬁnd best lower bound
Above derivation simpliﬁed, see (Vicente et al., CVPR 2008) for a
more sophisticated decomposition using three subproblems



Dual Decomposition

Other examples

Plenty of applications of decomposition:
(Komodakis et al., ICCV 2007)
(Strandmark and Kahl, CVPR 2010)
(Torresani et al., ECCV 2008)
(Werner, TechReport 2009)
(Strandmark and Kahl, CVPR 2010)
(Kumar and Koller, CVPR 2010)
(Woodford et al., ICCV 2009)
(Vicente et al., ICCV 2009)



Dual Decomposition

Geometric Interpretation

Theorem (Lagrangian Decomposition Primal)
Let Yk be ﬁnite sets and g (y ) = c y a linear objective function. Then
the optimal solution of the Lagrangian decomposition obtains the value
of the following relaxed primal optimization problem.
K
min gk (y )
y
k=1
sb.t. y ∈ conv(Yk ), ∀k = 1, . . . , K .

Therefore, optimizing the Lagrangian decomposition dual is equivalent to
optimizing the primal objective,
on the intersection of the convex hulls of the individual feasible sets.



Dual Decomposition

Geometric Interpretation (cont)

yD
y∗
conv(Y1 )∩
conv(Y2 ) conv(Y2)
conv(Y1)

c y



Dual Decomposition

Lagrangian Relaxation vs. Decomposition

Needs explicit complicating constraints

Lagrangian/Dual Decomposition
Can work with black-box solver for subproblem (implicit)
Can yield stronger bounds



Dual Decomposition

Lagrangian Relaxation vs. Decomposition

Y1 ⊇ conv(Y1 )
yD yR

y∗
conv(Y1 )∩
conv(Y2 ) conv(Y2)
conv(Y1)

c y



Polyhedral Higher-order Interactions

Polyhedral View on Higher-order Interactions

Idea
MAP-MRF LP relaxation: LOCAL polytope
Incorporate higher-order interactions directly by constraining LOCAL
Techniques from polyhedral combinatorics
See (Werner, CVPR 2008), (Nowozin and Lampert, CVPR 2009),
(Lempitsky et al., ICCV 2009)




Representation of Polytopes

Convex polytopes have two equivalent
representations
d3 y ≤ 1
As a convex combination of extreme d2 y ≤ 1
points
As a set of facet-defining linear
inequalities Z
A linear inequality with respect to a
polytope can be d1 y ≤ 1
valid, does not cut off the polytope,
representing a face, valid and touching,
facet-defining, representing a face of
dimension one less than the polytope.




Intersecting Polytopes
Construction

1. Consider marginal polytope
M(G ) of a discrete graphical
model
M(G)
2. Vertices of M(G ) are feasible
labelings
3. Add linear inequalities d µ ≤ b
that cut oﬀ vertices
4. Optimize over resulting µ
intersection M (G ),

0 if µ is valid,
EA (µA ) =
∞ otherwise.




Construction

model
2. Vertices of M(G ) are feasible M(G)
labelings

0 if µ is valid,
EA (µA ) =
∞ otherwise.




Construction

1. Consider marginal polytope d µ≤b
model
2. Vertices of M(G ) are feasible
labelings M(G)
4. Optimize over resulting
µ
0 if µ is valid,
EA (µA ) =
∞ otherwise.




Construction

model
2. Vertices of M(G ) are feasible M (G)
labelings

0 if µ is valid,
EA (µA ) =
∞ otherwise.




Deﬁning the Inequalities

Questions
1. Where do the inequalities come from?
2. Is this construction always possible?
3. How to optimize over M (G )?




1. Deriving Combinatorial Polytopes

“Foreground mask must be connected”
Global constraint on labelings




Constructing the Inequalities


CVPR2010: higher order models in computer vision: Part 3

CVPR2010: higher order models in computer vision: Part 3

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (19)

More from zukun

More from zukun (20)

Recently uploaded

Recently uploaded (20)

CVPR2010: higher order models in computer vision: Part 3