Your SlideShare is downloading. ×
The harmony potential: fusing local and global information for semantic image segmentation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

The harmony potential: fusing local and global information for semantic image segmentation

756

Published on

Semantic image segmentation is the process of assigning semantically relevant labels to all pixels in an image. Hierarchical Conditional Random Fields (HCRFs) are a popular and successful approach …

Semantic image segmentation is the process of assigning semantically relevant labels to all pixels in an image. Hierarchical Conditional Random Fields (HCRFs) are a popular and successful approach this problem. One reason for their popularity is their ability to incorporate contextual information at different scales. However, existing HCRF models do not allow multiple labels to be assigned to individual nodes. At higher scales in the image, this results in an oversimplified model, since multiple classes can be reasonable expected to appear within a single region. This simplified model especially limits the impact that observations at larger scales may have on the CRF model. Furthermore, neglecting the information at larger scales is undesirable since class-label estimates based on these scales are more reliable than at smaller, noisier scales.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
756
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion The harmony potential: fusing local and global information for semantic image segmentation Andrew D. Bagdanov bagdanov@cvc.uab.es Departamento de Ciencias de la Computación Universidad Autónoma de Barcelona CVPR 2010 (to appear) J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 2. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Outline 1 Introduction 2 Graph cuts for image segmentation 3 The harmony potential 4 Experimental results 5 Discussion J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 3. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Semantic image segmentation Semantic categories Our main idea Outline 1 Introduction Semantic image segmentation Semantic categories Our main idea 2 Graph cuts for image segmentation 3 The harmony potential 4 Experimental results 5 Discussion J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 4. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Semantic image segmentation Semantic categories Our main idea Giving semantics to pixels Image Object Class Semantic image segmentation is not object segmentation Only for simple cases are they the same J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 5. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Semantic image segmentation Semantic categories Our main idea Turning a hard problem into a harder one Image Object Class The object is to assign semantic labels to every pixel Fine distinctions must be made J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 6. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Semantic image segmentation Semantic categories Our main idea Make that a very hard one Image Object Class The object is to assign semantic labels to every pixel Fine distinctions must be made Occlusions, varying viewpoint and size complicate things J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 7. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Semantic image segmentation Semantic categories Our main idea Semantic categories 20 semantic categories for Pascal aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, diningtable, dog, horse, motorbike, person, potted plant, sheep, sofa, train, and tv/monitor. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 8. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Semantic image segmentation Semantic categories Our main idea SOA: Conditional Random Fields (CRFs) One of the most successful approaches to image segmentation is the Hierarchical CRF approach. Using potential functions, information at different scales can be incorporated into the segmentation. We identify three levels of scale: local, mid-level and global [Zhu, NIPS2008]. We show how these three levels of scale can be integrated in a way that preserves their unique characteristics. Existing techniques apply overly-simplified models of context that do not generalize upward from local to global scales. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 9. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Semantic image segmentation Semantic categories Our main idea Global constraints on label combinations Our principal idea is to use global classification to enhance segmentation results. Global image classification results tend to be less noisy than ones. We will use them to constrain the combinations of semantic labels we are likely to encounter during segmentation. We also show how the resulting optimization problem can be made tractable by learning to efficiently subsample label combinations at the global level. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 10. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Smoothness potentials Potts potentials Robust PN Outline 1 Introduction 2 Graph cuts for image segmentation Smoothness potentials Potts potentials Robust PN 3 The harmony potential 4 Experimental results 5 Discussion J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 11. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Smoothness potentials Potts potentials Robust PN Some terminology We represent our segmentation problem as a graph: G = (V, E) V is used for indexing random variables, and E is the set of undirected edges representing compatibility relationships between random variables. X = {Xi} denotes the set of random variables or nodes, for i ∈ V. An energy function will be defined over graphical configurations of random variables. By the Hammersley-Clifford theorem, the energy of a configuration of x = {xi} can be written as the negative exponential of an energy function E(x) = c∈C ϕc(xc), where ϕc is the potential function of clique c ∈ C. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 12. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Smoothness potentials Potts potentials Robust PN Consistency potentials for labeling problems The energy function of G can be written as: E(x) = i∈V φ(xi) + (i,j)∈EL ψL(xi, xj) + (i,g)∈EG ψG(xi, xg). The unary term φ(xi) depends on a single probability P(Xi = xi|i), where i is the observation that affects Xi in the model. The smoothness potential ψL(xi, xj) determines the pairwise relationship between two local nodes. The consistency potential ψG(xi, xg) expresses the dependency between local nodes and a global node. And the Maximum a Posteriori (MAP) estimate of the optimal labeling is: x∗ = arg min x E(x). J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 13. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Smoothness potentials Potts potentials Robust PN Representing semantic segmentations Each node represents an image region Nodes take single label from the set of semantic categories J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 14. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Smoothness potentials Potts potentials Robust PN Smoothness: only local constraints Adds additional constraint on neighboring nodes Usually enforces gradual (local) changes J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 15. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Smoothness potentials Potts potentials Robust PN Potts: ψG(xi, xg) = γl i T[xi = xg] New node enforces global consistency among local labels Consistency with a single global label [Plath, ICML2009] J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 16. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Smoothness potentials Potts potentials Robust PN Robust PN : consistency + “anything goes” Free Extends Potts potential [Kohili, CVPR2008] “Free label” at global node allows any local combination J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 17. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Motivation revisited Blowing up the problem Outline 1 Introduction 2 Graph cuts for image segmentation 3 The harmony potential Motivation revisited Blowing up the problem 4 Experimental results 5 Discussion J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 18. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Motivation revisited Blowing up the problem Different features for discriminations The previously mentioned approaches all try to make global distinctions using local information. Either by voting of local observations (Potts). Or, by penalizing rampantly discordant local label assignments PN. None of these techniques try to exploit truly global information to constrain local labels. And none incorporate the notion of encoding combinations of primitive node labels at the global level. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 19. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Motivation revisited Blowing up the problem The harmony potential: symphony of semantics Let L = {l1, . . . , lM} denote the set of semantic class labels from which local nodes Xi, take their labels. The global node Xg, instead, will take labels from P(L), the power set of L. In this way, we can represent any combinations of primitive labels from L at the global node. The harmony potential is now defined as: ψG(xi, xg) = γl i T[xi /∈ xg]. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 20. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Motivation revisited Blowing up the problem The harmony potential: selective subsets Only labels that do not agree with subset are penalized. Can represent more diverse combinations. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 21. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Motivation revisited Blowing up the problem Potentials: the gory details The unary potential of the local nodes is: φL(xi) = −µLKiωL(xi) log P(Xi = xi|i), where µL is the weighting factor of the local unary potential, Ki normalizes over the number of pixels inside superpixel i, and ωL(xi) is a learned per-class normalization. P(Xi = xi|i) is the classification score given an observed representation i of the region, which is based on a bag-of-words built from features of superpixel i and those superpixels adjacent to it. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 22. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Motivation revisited Blowing up the problem More potentials The global unary potential is defined as: φG(xg) = −µGωG(xg) log P(Xg = xg|g), where µG is the weighting factor of the global unary potential, and ωG(xg) is again a per-class normalization like the one used in the local unary potential. The main difference comes in the computation of P(Xg = xg|g), which is the posterior: P(Xg = xg|g) ∝ P(g|Xg = xg)P(Xg = xg). J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 23. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Motivation revisited Blowing up the problem Holy crap that’s a lot of labels! We have turned a barely tractable optimization problem into a (seemingly) spectacularly intractable one. To optimize the energy function, we must optimize over 2|L| possible global node labels. If we had an analytic form for P( = x∗ g |O) we might be able to do something. We don’t. Instead, we will use the probability that a certain label ∈ P(L) appears in x∗, given all the observations O required by the model. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 24. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Motivation revisited Blowing up the problem Ranked subsampling of P(L) We can do this using the following posterior: P( ⊆ x∗ g |) ∝ P( ⊆ x∗ g )P(O| ⊆ x∗ g ). This allows us to effectively rank possible global node labels, and thus to prioritize candidates in the search for the optimal label x∗ g . P( ⊆ x∗ g |O) establishes an order on subsets of the (unknown) optimal labeling of the global node x∗ g that guides the consideration of global labels. We may not be able to exhaustively consider all labels in P(L), but at least we consider the most likely candidates for x∗ g . And image classification can give us an estimate of this posterior. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 25. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 Outline 1 Introduction 2 Graph cuts for image segmentation 3 The harmony potential 4 Experimental results Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 5 Discussion J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 26. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 Datasets We have evaluated the harmony potential approach on two standard, publicly available datasets. The Pascal VOC 2009 Segmentation Challenge dataset contains 2250 color images of 20 different semantic classes. This set is split into 750 images for training, 750 images for testing, and 750 for validation. The Microsoft MSRC-21 dataset contains 591 color images of 21 object classes. We do our own splits for cross-validation on MSRC-21. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 27. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 Unsupervised segmentation Images are first over-segmented to with quick-shift to derive super-pixels [Fulkerson, ICCV 2009]. This preserves object boundaries while simplifying the representation. Working at the super-pixel level reduces the number of nodes in the CRF by 102 to 105 per image. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 28. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 Local classification scores: P(Xi = xi|Oi) We extract patches with 50% overlap on a regular grid at several resolutions (12, 24, 36 and 48 pixels in diameter). Patches are described with SIFT, color and for MSCR-21 location features. A vocabulary is constructed using k-means to quantize to 1000 SIFT words and 400 color words. An SVM classifier using an intersection kernel is built for each semantic category. A similar number of positive and negative examples are used: around a total of 8.000 superpixel samples for MSCR-21, and 20.000 for VOC 2009 for each class. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 29. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 Global classification scores: P(Xg = xg|Og) For the Pascal 2009 dataset we use our entry to the 2009 VOC Classification Challenge [Khan, PAMI2010 (submitted)]. It uses a bag-of-words representation based on SIFT and color SIFT, plus spatial pyramids and color attention [Khan, ICCV 2009]. An SVM classifier with a χ2 kernel is trained for each semantic category in the dataset. SVM outputs are re-normalized to generate an estimate of the global label: P(Xg = xg|Og). J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 30. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 MAP inference The optimal MAP label configuration x∗ is inferred using α-expansion graph cuts [Kolmogorov, PAMI2004]. The global node uses the 100 most probable label subsets obtained from ranked subsampling. No significant improvements were observed by considering more than 100 label subsets. The average time to do MAP inference for an image in MSCR-21 is 0.24 seconds and in VOC 2009 is 0.32 seconds. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 31. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 Cross-validation of CRF parameters For MSCR-21 we learn the CRF parameters with a 5-fold cross-validation of the union of training and validation sets. If we only use the validation set of 59 images, we overfit to this small set. For VOC 2009, we used the available validation set to train CRF parameters. Since the background class always appears in combination with other classes, we do not allow the harmony potential to apply any penalization to the background class. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 32. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 Qualitative results J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 33. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 Qualitative results (II) J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 34. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 Quantitative results Background Aeroplane Bicycle Bird Boat Bottle Bus Car Cat Chair BONN 83.9 64.3 21.8 21.7 32.0 40.2 57.3 49.4 38.8 5.2 BROOKES 79.6 48.3 6.7 19.1 10.0 16.6 32.7 38.1 25.3 5.5 Harmony potential 80.5 62.3 24.1 28.3 30.5 32.7 42.2 48.1 22.8 9.1 Cow DinningTable Dog Horse Motorbike Person PottedPlant Sheep Sofa Train TV/Monitor Average BONN 28.5 22.0 19.6 33.6 45.5 33.6 27.3 40.4 18.1 33.6 46.1 36.3 BROOKES 9.4 25.1 13.3 12.3 35.5 20.7 13.4 17.1 18.4 37.5 36.4 24.8 Harmony potential 30.1 7.9 21.5 41.9 49.6 31.5 26.1 37.0 20.1 39.4 31.1 34.1 J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 35. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Datasets and implementation Results: Pascal VOC 2009 Results: MSRC-21 Qualitative results J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 36. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Computational considerations The future Reflections Outline 1 Introduction 2 Graph cuts for image segmentation 3 The harmony potential 4 Experimental results 5 Discussion Computational considerations The future Reflections J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 37. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Computational considerations The future Reflections A modest cluster proposal 4 Dell R610i 1U Rack Servers Each with: 2x Intel Xeon E5502 Quad Core CPUs Each with: 24GB RAM Each with: 4x Broadcom 10Gb Ethernet adapters Each with: 1x 160GB 7.2K RPM Disk Two units with: PERC 6/i SAS RAID Controller One unit with: 5x 300GB 10K RPM Disk J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 38. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Computational considerations The future Reflections Organizing computations J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 39. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Computational considerations The future Reflections Some (mostly meaningless) numbers Days of pascal challenge: 45 Seconds of computation: 3,888,000.00 Estimated GFLOPS: 307.2 Sustainded CPU utilization: 80% Total GFLOP: 955,514,880.00 Images: 15,000 Pixels (assuming 640 × 480): 4,608,000,000.00 GFLOP/Image: 63,700.99 GFLOP/Pixel: 0.21 J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 40. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Computational considerations The future Reflections Conclusions The harmony potential works well for fusing global information into local segmentations. It works by modeling global observations as subsets of the local label set. Ranked sub-sampling, driven by the same posterior as used to define the global potential function, renders the optimization problem tractable. The harmony potential gets state-of-the-art results are difficult, publicly available datasets. Most useful when multiple semantic classes co-occur frequently. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 41. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Computational considerations The future Reflections Prospectus Semantic image segmentation has come a long way, but still has a long way to go. Segmentation will become mainstream event in Pascal VOC 2010 We have shown that combining global information with local can be tractable and improves on state-of-the-art. Currently, combining mid-level information is where the game is being played. Detection is probably the key. We can also begin to think about what types of new applications are enabled by such combinations. J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential
  • 42. Introduction Graph cuts for image segmentation The harmony potential Experimental results Discussion Computational considerations The future Reflections Final words Semantic image segmentation is hard. Participating in a competition like the Pascal VOC is very hard. But, it brings many technologies and people and groups and ideas together. Xavier Pep Fahad J. Gonfaus, X. Boix, J. van de Weijer, J. Serrat, J. González The harmony potential

×