Upcoming SlideShare
×

04 structured prediction and energy minimization part 1

760 views
657 views

Published on

Published in: Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
760
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
16
0
Likes
0
Embeds 0
No embeds

No notes for slide

04 structured prediction and energy minimization part 1

1. 1. Prediction Problem G: Generality G: Optimality Part 6: Structured Prediction and Energy Minimization (1/2) Sebastian Nowozin and Christoph H. Lampert Colorado Springs, 25th June 2011Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
2. 2. Prediction Problem G: Generality G: OptimalityPrediction ProblemPrediction Problem y ∗ = f (x) = argmax g (x, y ) y ∈Y g (x, y ) = p(y |x), factor graphs/MRF/CRF, g (x, y ) = −E (y ; x, w ), factor graphs/MRF/CRF, g (x, y ) = w , ψ(x, y ) , linear model (e.g. multiclass SVM), → diﬃculty: Y ﬁnite but very largeSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
3. 3. Prediction Problem G: Generality G: OptimalityPrediction ProblemPrediction Problem y ∗ = f (x) = argmax g (x, y ) y ∈Y g (x, y ) = p(y |x), factor graphs/MRF/CRF, g (x, y ) = −E (y ; x, w ), factor graphs/MRF/CRF, g (x, y ) = w , ψ(x, y ) , linear model (e.g. multiclass SVM), → diﬃculty: Y ﬁnite but very largeSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
4. 4. Prediction Problem G: Generality G: OptimalityPrediction ProblemPrediction Prblem (cont) Deﬁnition (Optimization Problem) Given (g , Y, G, x), with feasible set Y ⊆ G over decision domain G, and given an input instance x ∈ X and an objective function g : X × G → R, ﬁnd the optimal value α = sup g (x, y ), y ∈Y and, if the supremum exists, ﬁnd an optimal solution y ∗ ∈ Y such that g (x, y ∗ ) = α.Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
5. 5. Prediction Problem G: Generality G: OptimalityPrediction ProblemThe feasible set Ingredients Decision domain G, typically simple (G = Rd , G = 2V , etc.) Feasible set Y ⊆ G, deﬁning the problem-speciﬁc structure Objective function g : X × G → R. Terminology Y = G: unconstrained optimization problem, G ﬁnite: discrete optimization problem, G = 2Σ for ground set Σ: combinatorial optimization problem, Y = ∅: infeasible problem.Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
6. 6. Prediction Problem G: Generality G: OptimalityPrediction ProblemExample: Feasible Sets (cont) Jij yi yj Jjk yj yk Yi Yj Yk (+1) (−1) (−1) hi yi hj yj hk yk Ising model with external ﬁeld Graph G = (V , E ) “External ﬁeld”: h ∈ RV Interaction matrix: J ∈ RV ×V Objective, deﬁned on yi ∈ {−1, 1} 1 1 g (y ) = hi yi + hj yj + hk yk + Jij yi yj + Jjk yj yk 2 2Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
7. 7. Prediction Problem G: Generality G: OptimalityPrediction ProblemExample: Feasible Sets (cont) Ising model with external ﬁeld Y = G = {−1, +1}V 1 g (y ) = Ji,j yi yj + hi yi 2 (i,j)∈E i∈V Unconstrained Objective function contains quadratic termsSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
8. 8. Prediction Problem G: Generality G: OptimalityPrediction ProblemExample: Feasible Sets (cont) G = {0, 1}(V ×{−1,+1})∪(E ×{−1,+1}×{−1,+1}) , Y = {y ∈ G : ∀i ∈ V : yi,−1 + yi,+1 = 1, ∀(i, j) ∈ E : yi,j,+1,+1 + yi,j,+1,−1 = yi,+1 , ∀(i, j) ∈ E : yi,j,−1,+1 + yi,j,−1,−1 = yi,−1 }, 1 g (y ) = Ji,j (yi,j,+1,+1 + yi,j,−1,−1 ) 2 (i,j)∈E 1 − Ji,j (yi,j,+1,−1 + yi,j,−1,+1 ) 2 (i,j)∈E + hi (yi,+1 − yi,−1 ) i∈V Constrained, more variables Objective function contains linear terms onlySebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
9. 9. Prediction Problem G: Generality G: OptimalityPrediction ProblemEvaluating f : what do we want? f (x) = argmax g (x, y ) y ∈Y For evaluating f (x) we want an algorithm that 1. is general: applicable to all instances of the problem, 2. is optimal: provides an optimal y ∗ , 3. has good worst-case complexity: for all instances the runtime and space is acceptably bounded, 4. is integral: its solutions are restricted to Y, 5. is deterministic: its results and runtime are reproducible and depend on the input data only.Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
10. 10. Prediction Problem G: Generality G: OptimalityPrediction ProblemEvaluating f : what do we want? f (x) = argmax g (x, y ) y ∈Y For evaluating f (x) we want an algorithm that 1. is general: applicable to all instances of the problem, 2. is optimal: provides an optimal y ∗ , 3. has good worst-case complexity: for all instances the runtime and space is acceptably bounded, 4. is integral: its solutions are restricted to Y, 5. is deterministic: its results and runtime are reproducible and depend on the input data only. wanting all of them → impossibleSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
11. 11. Prediction Problem G: Generality G: OptimalityPrediction ProblemGiving up some properties Hard problem Generality Optimality Worst-case Integrality Determinism complexity → giving up one or more properties allows us to design algorithms satisfying the remaining properties might be suﬃcient for the task at handSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
12. 12. Prediction Problem G: Generality G: OptimalityG: Generality Hard problem Generality Optimality Worst-case Integrality Determinism complexitySebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
13. 13. Prediction Problem G: Generality G: OptimalityG: GeneralityGiving up Generality Identify an interesting and tractable subset of instances Set of all instances Tractable subsetSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
14. 14. Prediction Problem G: Generality G: OptimalityG: GeneralityExample: MAP Inference in Markov Random Fields Although NP-hard in general, it is tractable... with low tree-width (Lauritzen, Spiegelhalter, 1988) with binary states, pairwise submodular interactions (Boykov, Jolly, 2001) with binary states, pairwise interactions (only), planar graph structure (Globerson, Jaakkola, 2006) with submodular pairwise interactions (Schlesinger, 2006) with P n -Potts higher order factors (Kohli, Kumar, Torr, 2007) with perfect graph structure (Jebara, 2009)Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
15. 15. Prediction Problem G: Generality G: OptimalityG: GeneralityBinary Graph-Cuts Energy function: unary and pairwise E (y ; x, w ) = EF (yF ; x, wtF )+ EF (yF ; x, wtF ) F ∈F1 F ∈F2 Restriction 1 (wlog) EF (yi ; x, wtF ) ≥ 0 Restriction 2 (regular/submodular/attractive) EF (yi , yj ; x, wtF ) = 0, if yi = yj , EF (yi , yj ; x, wtF ) = EF (yj , yi ; x, wtF ) ≥ 0, otherwise.Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
16. 16. Prediction Problem G: Generality G: OptimalityG: GeneralityBinary Graph-Cuts Energy function: unary and pairwise E (y ; x, w ) = EF (yF ; x, wtF )+ EF (yF ; x, wtF ) F ∈F1 F ∈F2 Restriction 1 (wlog) EF (yi ; x, wtF ) ≥ 0 Restriction 2 (regular/submodular/attractive) EF (yi , yj ; x, wtF ) = 0, if yi = yj , EF (yi , yj ; x, wtF ) = EF (yj , yi ; x, wtF ) ≥ 0, otherwise.Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
17. 17. Prediction Problem G: Generality G: OptimalityG: GeneralityBinary Graph-Cuts (cont) Construct auxiliary undirected graph One node {i}i∈V per variable s {i, s} Two extra nodes: source s, sink t Edges {i, t} i j k Edge Graph cut weight {i, j} EF (yi = 0, yj = 1; x, wtF ) l m n {i, s} EF (yi = 1; x, wtF ) {i, t} EF (yi = 0; x, wtF ) t Find linear s-t-mincut Solution deﬁnes optimal binary labeling of the original energy minimization problemSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
18. 18. Prediction Problem G: Generality G: OptimalityG: GeneralityExample: Figure-Ground Segmentation Input image (http://pdphoto.org)Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
19. 19. Prediction Problem G: Generality G: OptimalityG: GeneralityExample: Figure-Ground Segmentation Color model log-oddsSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
20. 20. Prediction Problem G: Generality G: OptimalityG: GeneralityExample: Figure-Ground Segmentation Independent decisionsSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
21. 21. Prediction Problem G: Generality G: OptimalityG: GeneralityExample: Figure-Ground Segmentation g (x, y , w ) = log p(yi |xi ) + w C (xi , xj )I (yi = yj ) i∈V (i,j)∈E Gradient strength 2 C (xi , xj ) = exp(γ xi − xj ) γ estimated from mean edge strength (Blake et al, 2004) w ≥ 0 controls smoothingSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
22. 22. Prediction Problem G: Generality G: OptimalityG: GeneralityExample: Figure-Ground Segmentation w =0Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
23. 23. Prediction Problem G: Generality G: OptimalityG: GeneralityExample: Figure-Ground Segmentation Small w > 0Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
24. 24. Prediction Problem G: Generality G: OptimalityG: GeneralityExample: Figure-Ground Segmentation Medium w > 0Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
25. 25. Prediction Problem G: Generality G: OptimalityG: GeneralityExample: Figure-Ground Segmentation Large w > 0Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
26. 26. Prediction Problem G: Generality G: OptimalityG: GeneralityGeneral Binary Case Is there a larger class of energies for which binary graph cuts are applicable? (Kolmogorov and Zabih, 2004), (Freedman and Drineas, 2005) Theorem (Regular Binary Energies) E (y ; x, w ) = EF (yF ; x, wtF ) + EF (yF ; x, wtF ) F ∈F1 F ∈F2 is a energy function of binary variables containing only unary and pairwise factors. The discrete energy minimization problem argminy E (y ; x, w ) is representable as a graph cut problem if and only if all pairwise energy functions EF for F ∈ F2 with F = {i, j} satisfy Ei,j (0, 0) + Ei,j (1, 1) ≤ Ei,j (0, 1) + Ei,j (1, 0).Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
27. 27. Prediction Problem G: Generality G: OptimalityG: GeneralityGeneral Binary Case Is there a larger class of energies for which binary graph cuts are applicable? (Kolmogorov and Zabih, 2004), (Freedman and Drineas, 2005) Theorem (Regular Binary Energies) E (y ; x, w ) = EF (yF ; x, wtF ) + EF (yF ; x, wtF ) F ∈F1 F ∈F2 is a energy function of binary variables containing only unary and pairwise factors. The discrete energy minimization problem argminy E (y ; x, w ) is representable as a graph cut problem if and only if all pairwise energy functions EF for F ∈ F2 with F = {i, j} satisfy Ei,j (0, 0) + Ei,j (1, 1) ≤ Ei,j (0, 1) + Ei,j (1, 0).Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
28. 28. Prediction Problem G: Generality G: OptimalityG: GeneralityExample: Class-independent Object Hypotheses (Carreira and Sminchisescu, 2010) PASCAL VOC 2009/2010 segmentation winner Generate class-independent object hypotheses Energy (almost) as before g (x, y , w ) = Ei (yi ) + w C (xi , xj )I (yi = yj ) i∈V (i,j)∈E Fixed unaries   ∞ if i ∈ Vfg and yi = 0 Ei (yi ) = ∞ if i ∈ Vbg and yi = 1 0 otherwise  Test all w ≥ 0 using parametric max-ﬂow (Picard and Queyranne, 1980), (Kolmogorov et al., 2007)Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
29. 29. Prediction Problem G: Generality G: OptimalityG: GeneralityExample: Class-independent Object Hypotheses (Carreira and Sminchisescu, 2010) PASCAL VOC 2009/2010 segmentation winner Generate class-independent object hypotheses Energy (almost) as before g (x, y , w ) = Ei (yi ) + w C (xi , xj )I (yi = yj ) i∈V (i,j)∈E Fixed unaries   ∞ if i ∈ Vfg and yi = 0 Ei (yi ) = ∞ if i ∈ Vbg and yi = 1 0 otherwise  Test all w ≥ 0 using parametric max-ﬂow (Picard and Queyranne, 1980), (Kolmogorov et al., 2007)Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
30. 30. Prediction Problem G: Generality G: OptimalityG: GeneralityExample: Class-independent Object Hypotheses (Carreira and Sminchisescu, 2010) PASCAL VOC 2009/2010 segmentation winner Generate class-independent object hypotheses Energy (almost) as before g (x, y , w ) = Ei (yi ) + w C (xi , xj )I (yi = yj ) i∈V (i,j)∈E Fixed unaries   ∞ if i ∈ Vfg and yi = 0 Ei (yi ) = ∞ if i ∈ Vbg and yi = 1 0 otherwise  Test all w ≥ 0 using parametric max-ﬂow (Picard and Queyranne, 1980), (Kolmogorov et al., 2007)Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
31. 31. Prediction Problem G: Generality G: OptimalityG: GeneralityExample: Class-independent Object Hypotheses (cont) Input image (http://pdphoto.org)Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
32. 32. Prediction Problem G: Generality G: OptimalityG: GeneralityExample: Class-independent Object Hypotheses (cont) CPMC proposal segmentations (Carreira and Sminchisescu, 2010)Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
33. 33. Prediction Problem G: Generality G: OptimalityG: Optimality Hard problem Generality Optimality Worst-case Integrality Determinism complexitySebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
34. 34. Prediction Problem G: Generality G: OptimalityG: OptimalityGiving up Optimality Solving for y ∗ is hard, but is it necessary? pragmatic motivation: in many applications a close-to-optimal solution is good enough computational motivation: set of “good” solutions might be large and ﬁnding just one element can be easy For machine learning models modeling error: we always use the wrong model estimation error: preference for y ∗ might be an artifactSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
35. 35. Prediction Problem G: Generality G: OptimalityG: OptimalityGiving up Optimality Solving for y ∗ is hard, but is it necessary? pragmatic motivation: in many applications a close-to-optimal solution is good enough computational motivation: set of “good” solutions might be large and ﬁnding just one element can be easy For machine learning models modeling error: we always use the wrong model estimation error: preference for y ∗ might be an artifactSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
36. 36. Prediction Problem G: Generality G: OptimalityG: OptimalityLocal Search Y y0Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
37. 37. Prediction Problem G: Generality G: OptimalityG: OptimalityLocal Search Y y0 N (y 0 )Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
38. 38. Prediction Problem G: Generality G: OptimalityG: OptimalityLocal Search Y y0 y1 N (y 0 )Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
39. 39. Prediction Problem G: Generality G: OptimalityG: OptimalityLocal Search Y y0 y1 y2 y3 N (y 3 ) N (y 0 ) N (y 1 ) N (y 2 ) y∗ ∗ N (y )Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
40. 40. Prediction Problem G: Generality G: OptimalityG: OptimalityLocal Search Y y0 y1 y2 y3 N (y 3 ) N (y 0 ) N (y 1 ) N (y 2 ) y∗ N (y ∗ ) Nt : Y → 2Y , neighborhood system Optimization with respect to Nt (y ) must be tractable: y t+1 = argmax g (x, y ) y ∈Nt (y t )Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
41. 41. Prediction Problem G: Generality G: OptimalityG: OptimalityExample: Iterated Conditional Modes (ICM) Iterated Conditional Modes (ICM), (Besag, 1986) g (x, y ) = log p(y |x) y ∗ = argmax log p(y |x) y ∈Y Neighborhoods Ns (y ) = {(y1 , . . . , ys−1 , zs , ys+1 , . . . , yS )|zs ∈ Ys }Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
42. 42. Prediction Problem G: Generality G: OptimalityG: OptimalityExample: Iterated Conditional Modes (ICM) Iterated Conditional Modes (ICM), (Besag, 1986) y t+1 = argmax log p(y1 , y2 , . . . , yV |x) t t y1 ∈Y1Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
43. 43. Prediction Problem G: Generality G: OptimalityG: OptimalityExample: Iterated Conditional Modes (ICM) Iterated Conditional Modes (ICM), (Besag, 1986) y t+1 = argmax log p(y1 , y2 , y3 , . . . , yV |x) t t t y2 ∈Y2Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
44. 44. Prediction Problem G: Generality G: OptimalityG: OptimalityNeighborhood Size ICM neighborhood Nt (y t ): all states reachable from y t by changing a single variable (Besag, 1986) Neighborhood size: in general, larger is better (VLSN, Ahuja, 2000) Example: neighborhood along chainsSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
45. 45. Prediction Problem G: Generality G: OptimalityG: OptimalityExample: Block ICM Block Iterated Conditional Modes (ICM) (Kelm et al., 2006), (Kittler and F¨glein, 1984) o t y t+1 = argmax log p(yC1 , yV C1 |x) yC1 ∈YC1Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
46. 46. Prediction Problem G: Generality G: OptimalityG: OptimalityExample: Block ICM Block Iterated Conditional Modes (ICM) (Kelm et al., 2006), (Kittler and F¨glein, 1984) o t y t+1 = argmax log p(yC2 , yV C2 |x) yC2 ∈YC2Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
47. 47. Prediction Problem G: Generality G: OptimalityG: OptimalityExample: Multilabel Graph-Cut Binary graph-cuts are not applicable to multilabel energy minimization problems (Boykov et al., 2001): two local search algorithms for multilabel problems Sequence of binary directed s-t-mincut problems Iteratively improve multilabel solutionSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
48. 48. Prediction Problem G: Generality G: OptimalityG: Optimalityα-β Swap Neighborhood Select two diﬀerent labels α and β Fix all variables i for which yi ∈ {α, β} / Optimize over remaining i with yi ∈ {α, β} Nα,β : Y × N × N → 2Y , Nα,β (y , α, β) := {z ∈ Y : zi = yi if yi ∈ {α, β}, / otherwise zi ∈ {α, β}}.Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
49. 49. Prediction Problem G: Generality G: OptimalityG: Optimalityα-β-swap illustrated 5-label problem α − β-swapSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
50. 50. Prediction Problem G: Generality G: OptimalityG: Optimalityα-β-swap illustrated 5-label problem α − β-swapSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
51. 51. Prediction Problem G: Generality G: OptimalityG: Optimalityα-β-swap illustrated 5-label problem α − β-swapSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
52. 52. Prediction Problem G: Generality G: OptimalityG: Optimalityα-β-swap illustrated 5-label problem α − β-swapSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
53. 53. Prediction Problem G: Generality G: OptimalityG: Optimalityα-β-swap illustrated 5-label problem α − β-swapSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
54. 54. Prediction Problem G: Generality G: OptimalityG: Optimalityα-β-swap derivation y t+1 = argmin E (y ; x) y ∈Nα,β (y t ,α,β) Constant: drop out Unary: combine Pairwise: binary pairwiseSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
55. 55. Prediction Problem G: Generality G: OptimalityG: Optimalityα-β-swap derivation y t+1 = argmin Ei (yi ; x) + Ei,j (yi , yj ; x) y ∈Nα,β (y t ,α,β) i∈V (i,j)∈E Constant: drop out Unary: combine Pairwise: binary pairwiseSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
56. 56. Prediction Problem G: Generality G: OptimalityG: Optimalityα-β-swap derivation y t+1 = argmin Ei (yit ; x) + Ei (yi ; x) y ∈Nα,β (y t ,α,β) i∈V , i∈V , yit ∈{α,β} / yit ∈{α,β} + Ei,j (yit , yjt ; x) + Ei,j (yi , yjt ; x) (i,j)∈E , (i,j)∈E , yit ∈{α,β},yjt ∈{α,β} / / yit ∈{α,β},yjt ∈{α,β} / + Ei,j (yit , yj ; x) + Ei,j (yi , yj ; x) . (i,j)∈E , (i,j)∈E , yit ∈{α,β},yjt ∈{α,β} / yit ∈{α,β},yjt ∈{α,β} Constant: drop out Unary: combine Pairwise: binary pairwiseSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
57. 57. Prediction Problem G: Generality G: OptimalityG: Optimalityα-β-swap derivation y t+1 = argmin Ei (yit ; x) + Ei (yi ; x) y ∈Nα,β (y t ,α,β) i∈V , i∈V , yit ∈{α,β} / yit ∈{α,β} + Ei,j (yit , yjt ; x) + Ei,j (yi , yjt ; x) (i,j)∈E , (i,j)∈E , yit ∈{α,β},yjt ∈{α,β} / / yit ∈{α,β},yjt ∈{α,β} / + Ei,j (yit , yj ; x) + Ei,j (yi , yj ; x) . (i,j)∈E , (i,j)∈E , yit ∈{α,β},yjt ∈{α,β} / yit ∈{α,β},yjt ∈{α,β} Constant: drop out Unary: combine Pairwise: binary pairwiseSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
58. 58. Prediction Problem G: Generality G: OptimalityG: Optimalityα-β-swap graph construction Directed graph G = (V , E ) tα α tα V = {α, β} ∪ {i ∈ V : yi ∈ {α, β}}, i tα k j E = {(α, i, tiα ) : ∀i ∈ V : yi ∈ {α, β}} ∪ ni,j {(i, β, tiβ ) : ∀i ∈ V : yi ∈ {α, β}} ∪ i j ... k ni,j {(i, j, ni,j ) : ∀(i, j), (j, i) ∈ E : yi , yj ∈ {α, β}}. tβ j tβ tβ i β k Edge weights tiα , tiβ , and ni,j ni,j = Ei,j (α, β; x) tiα = Ei (α; x) + Ei,j (α, yj ; x) (i,j)∈E, yj ∈{α,β} / tiβ = Ei (β; x) + Ei,j (β, yj ; x) (i,j)∈E, yj ∈{α,β} /Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
59. 59. Prediction Problem G: Generality G: OptimalityG: Optimalityα-β-swap graph construction Directed graph G = (V , E ) tα α tα V = {α, β} ∪ {i ∈ V : yi ∈ {α, β}}, i tα k j E = {(α, i, tiα ) : ∀i ∈ V : yi ∈ {α, β}} ∪ ni,j {(i, β, tiβ ) : ∀i ∈ V : yi ∈ {α, β}} ∪ i j ... k ni,j {(i, j, ni,j ) : ∀(i, j), (j, i) ∈ E : yi , yj ∈ {α, β}}. tβ j tβ tβ i β k Edge weights tiα , tiβ , and ni,j ni,j = Ei,j (α, β; x) tiα = Ei (α; x) + Ei,j (α, yj ; x) (i,j)∈E, yj ∈{α,β} / tiβ = Ei (β; x) + Ei,j (β, yj ; x) (i,j)∈E, yj ∈{α,β} /Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
60. 60. Prediction Problem G: Generality G: OptimalityG: Optimalityα-β-swap move tα i α tα k tα j ni,j i j ... k ni,j tβ j tβ tβ i β k Side of cut determines yi ∈ {α, β} Iterate all possible (α, β) combinations Semi-metric requirement on pairwise energies Ei,j (yi , yj ; x) = 0 ⇔ yi = yj Ei,j (yi , yj ; x) = Ei,j (yj , yi ; x) ≥ 0Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
61. 61. Prediction Problem G: Generality G: OptimalityG: Optimalityα-β-swap move C α i j ... k β Side of cut determines yi ∈ {α, β} Iterate all possible (α, β) combinations Semi-metric requirement on pairwise energies Ei,j (yi , yj ; x) = 0 ⇔ yi = yj Ei,j (yi , yj ; x) = Ei,j (yj , yi ; x) ≥ 0Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
62. 62. Prediction Problem G: Generality G: OptimalityG: Optimalityα-β-swap move C α i j ... k β Side of cut determines yi ∈ {α, β} Iterate all possible (α, β) combinations Semi-metric requirement on pairwise energies Ei,j (yi , yj ; x) = 0 ⇔ yi = yj Ei,j (yi , yj ; x) = Ei,j (yj , yi ; x) ≥ 0Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
63. 63. Prediction Problem G: Generality G: OptimalityG: OptimalityExample: Stereo Disparity Estimation Infer depth from two images Discretized multi-label problem α-expansion solution close to optimalSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
64. 64. Prediction Problem G: Generality G: OptimalityG: OptimalityExample: Stereo Disparity Estimation Infer depth from two images Discretized multi-label problem α-expansion solution close to optimalSebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
65. 65. Prediction Problem G: Generality G: OptimalityG: OptimalityModel Reduction Energy minimization problem: many decision to make jointly Model reduction 1. Fix a subset of decisions 2. Optimize the smaller remaining model Example: forcing yi = yj for pairs (i, j)Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
66. 66. Prediction Problem G: Generality G: OptimalityG: OptimalityExample: Superpixels in Labeling Problems Input image: 500-by-375 pixels (187,500 decisions)Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)
67. 67. Prediction Problem G: Generality G: OptimalityG: OptimalityExample: Superpixels in Labeling Problems Image with 149 superpixels (149 decisions)Sebastian Nowozin and Christoph H. LampertPart 6: Structured Prediction and Energy Minimization (1/2)