Introduction        Harmony potential 2.0: fusing across scale                                Action recognition          ...
Introduction   PASCAL VOC 2010           Harmony potential 2.0: fusing across scale    Semantic image segmentation        ...
Introduction   PASCAL VOC 2010        Harmony potential 2.0: fusing across scale    Semantic image segmentation           ...
Introduction   PASCAL VOC 2010       Harmony potential 2.0: fusing across scale    Semantic image segmentation            ...
Introduction   PASCAL VOC 2010          Harmony potential 2.0: fusing across scale    Semantic image segmentation         ...
Introduction   PASCAL VOC 2010        Harmony potential 2.0: fusing across scale    Semantic image segmentation           ...
Introduction   PASCAL VOC 2010        Harmony potential 2.0: fusing across scale    Semantic image segmentation           ...
Introduction   PASCAL VOC 2010        Harmony potential 2.0: fusing across scale    Semantic image segmentation           ...
Introduction   PASCAL VOC 2010        Harmony potential 2.0: fusing across scale    Semantic image segmentation           ...
Introduction   PASCAL VOC 2010       Harmony potential 2.0: fusing across scale    Semantic image segmentation            ...
Introduction   PASCAL VOC 2010        Harmony potential 2.0: fusing across scale    Semantic image segmentation           ...
Introduction   PASCAL VOC 2010        Harmony potential 2.0: fusing across scale    Semantic image segmentation           ...
Introduction   PASCAL VOC 2010        Harmony potential 2.0: fusing across scale    Semantic image segmentation           ...
Introduction                                                     Our point of departure       Harmony potential 2.0: fusin...
Introduction                                                          Our point of departure        Harmony potential 2.0:...
Introduction                                                           Our point of departure           Harmony potential ...
Introduction                                                      Our point of departure        Harmony potential 2.0: fus...
Introduction                                                      Our point of departure        Harmony potential 2.0: fus...
Introduction                                                     Our point of departure       Harmony potential 2.0: fusin...
Introduction                                                      Our point of departure        Harmony potential 2.0: fus...
Introduction                                                     Our point of departure       Harmony potential 2.0: fusin...
Introduction                                                        Our point of departure          Harmony potential 2.0:...
Introduction                                                       Our point of departure       Harmony potential 2.0: fus...
Introduction                                                     Our point of departure       Harmony potential 2.0: fusin...
Introduction                                                     Our point of departure       Harmony potential 2.0: fusin...
Introduction                                                      Our point of departure        Harmony potential 2.0: fus...
Introduction                                                      Our point of departure        Harmony potential 2.0: fus...
Introduction                                                     Our point of departure       Harmony potential 2.0: fusin...
Introduction                                                                                    Our point of departure    ...
Introduction                                                     Our point of departure       Harmony potential 2.0: fusin...
Introduction                                                      Our point of departure        Harmony potential 2.0: fus...
Introduction                                                     Our point of departure       Harmony potential 2.0: fusin...
Introduction                                                      Our point of departure        Harmony potential 2.0: fus...
Introduction                                                                   Our point of departure        Harmony poten...
Introduction                                                      Our point of departure        Harmony potential 2.0: fus...
Introduction                                                      Our point of departure        Harmony potential 2.0: fus...
Introduction                                                                                                          Our ...
Introduction   The data        Harmony potential 2.0: fusing across scale    State-of-the-art                             ...
Introduction     The data        Harmony potential 2.0: fusing across scale      State-of-the-art                         ...
Introduction   The data        Harmony potential 2.0: fusing across scale    State-of-the-art                             ...
Introduction   The data        Harmony potential 2.0: fusing across scale    State-of-the-art                             ...
Introduction   The data       Harmony potential 2.0: fusing across scale    State-of-the-art                              ...
Introduction   The data        Harmony potential 2.0: fusing across scale    State-of-the-art                             ...
Introduction   The data        Harmony potential 2.0: fusing across scale    State-of-the-art                             ...
Introduction   The data        Harmony potential 2.0: fusing across scale    State-of-the-art                             ...
Introduction   The data        Harmony potential 2.0: fusing across scale    State-of-the-art                             ...
Introduction   The data       Harmony potential 2.0: fusing across scale    State-of-the-art                              ...
Introduction   The data      Harmony potential 2.0: fusing across scale    State-of-the-art                              A...
Introduction   The data       Harmony potential 2.0: fusing across scale    State-of-the-art                              ...
Introduction   The data         Harmony potential 2.0: fusing across scale    State-of-the-art                            ...
Introduction   The data        Harmony potential 2.0: fusing across scale    State-of-the-art                             ...
Introduction   The data        Harmony potential 2.0: fusing across scale    State-of-the-art                             ...
Introduction       Harmony potential 2.0: fusing across scale                               Action recognition            ...
Introduction        Harmony potential 2.0: fusing across scale                                Action recognition          ...
Introduction        Harmony potential 2.0: fusing across scale                                Action recognition          ...
Introduction        Harmony potential 2.0: fusing across scale                                Action recognition          ...
Upcoming SlideShare
Loading in …5
×

PASCAL VOC 2010: semantic object segmentation and action recognition in still images

3,090 views

Published on

In this talk, I will discuss the extensions we have made to our approach to semantic image segmentation. I will show how the results of object detectors and spatial priors can be naturally integrated into our hierarchical conditional random field (HCRF) approach based on the harmony potential. The addition of these extra cues, as well as class-specific normalization of classifier outputs, significantly improves segmentation quality.

Published in: Technology

PASCAL VOC 2010: semantic object segmentation and action recognition in still images

  1. 1. Introduction Harmony potential 2.0: fusing across scale Action recognition Discussion PASCAL VOC 2010Semantic object segmentation and action recognition in still images Andrew D. Bagdanov bagdanov@cvc.uab.es ´ Departamento de Ciencias de la Computacion ´ Universidad Autnoma de Barcelona Xavier Pep Nataliya Wenjuan Fahad The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  2. 2. Introduction PASCAL VOC 2010 Harmony potential 2.0: fusing across scale Semantic image segmentation Action recognition Action recognition Discussion Our main ideasOverview On 03/05/2010 the PASCAL VOC competition was announced and the training and validation sets published. 20 semantic categories for the competition remain the same:aeroplane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, diningtable,dog, horse, motorbike, person, potted plant, sheep, sofa, train, and tv/monitor. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  3. 3. Introduction PASCAL VOC 2010 Harmony potential 2.0: fusing across scale Semantic image segmentation Action recognition Action recognition Discussion Our main ideasOld competitions, new competitions There are two (+ 1/2) main challenges in PASCAL. Image classification is the prediction of the presence/absence of an instance of class in a test image. Object detection is the prediction of the bounding box and label of each object from the twenty target classes in a test image. Semantic image segmentation is the assignment of one of the twenty class labels to every pixel in a test image. Image segmentation is becoming a mainstream competition. Action recognition in still images was included as a new “taster challenge” this year. Taster competitions are used to measure interest in new problems. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  4. 4. Introduction PASCAL VOC 2010 Harmony potential 2.0: fusing across scale Semantic image segmentation Action recognition Action recognition Discussion Our main ideasOur contributions to PASCAL VOC 2010 Last year we participated in the Detection, Classification and Segmentation challenges. This year we decided to concentrate on Classification and Segmentation. Our segmentation technique relies heavily on classification. We also fielded a team in Action Recognition this year to see what that’s all about. As always, success in PASCAL VOC challenges is approximately 85% engineering, 10% inspiration and 5% luck (if you’re lucky). The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  5. 5. Introduction PASCAL VOC 2010 Harmony potential 2.0: fusing across scale Semantic image segmentation Action recognition Action recognition Discussion Our main ideasOutline 1 Introduction Overview of the challenges Our contribution and main ideas 2 The harmony potential 2.0: fusing across scale Building on last year’s submission Fusing across scales and learning 3 Action recognition A torrent of features Exploiting the size of the problem 4 Discussion The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  6. 6. Introduction PASCAL VOC 2010 Harmony potential 2.0: fusing across scale Semantic image segmentation Action recognition Action recognition Discussion Our main ideasGiving semantics to pixels Image Object Class Semantic image segmentation is not object segmentation Only for simple cases are they the same. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  7. 7. Introduction PASCAL VOC 2010 Harmony potential 2.0: fusing across scale Semantic image segmentation Action recognition Action recognition Discussion Our main ideasTurning a hard problem into a harder one Image Object Class The object is to assign semantic labels to every pixel Fine distinctions must be made The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  8. 8. Introduction PASCAL VOC 2010 Harmony potential 2.0: fusing across scale Semantic image segmentation Action recognition Action recognition Discussion Our main ideasMake that a very hard one Image Object Class The objective is to assign semantic labels to every pixel Fine distinctions must be made Occlusions, varying viewpoint and size complicate things The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  9. 9. Introduction PASCAL VOC 2010 Harmony potential 2.0: fusing across scale Semantic image segmentation Action recognition Action recognition Discussion Our main ideasAction recognition in still images New competition this year: human action recognition in still images. Individual images sampled from the Flikr dataset. Bounding boxes of the human in each image is provided. Very important: we don’t have to solve the detection problem. Action recognition is offered as a “taster challenge” in order to gauge interest in the general problem. It was difficult to hypothesize about what would succeed and what would not in this challenge. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  10. 10. Introduction PASCAL VOC 2010 Harmony potential 2.0: fusing across scale Semantic image segmentation Action recognition Action recognition Discussion Our main ideasAction classes The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  11. 11. Introduction PASCAL VOC 2010 Harmony potential 2.0: fusing across scale Semantic image segmentation Action recognition Action recognition Discussion Our main ideasSegmentation: the role of context Context provides very important cues for make fine discriminations at the (super-) pixel scale. We can exploit three levels of scale: local, mid-level and global [Zhu, NIPS2008]. Existing techniques apply overly-simplified models of context that do not generalize upward from local to global scales. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  12. 12. Introduction PASCAL VOC 2010 Harmony potential 2.0: fusing across scale Semantic image segmentation Action recognition Action recognition Discussion Our main ideasSegmentation: global constraints on labelcombinations Our principal idea is to use global Classification to enhance segmentation results. Global image classification results tend to be less noisy than ones. We will use them to constrain the combinations of semantic labels we are likely to encounter during segmentation. We showed last year how a tractable inference technique can be devised for this labeling problem (our PASCAL 2009 entry). This year we also show how mid-level context can be incorporated in the form of object detections. We also show how position priors cam be similarly incorporated into the framework to provide class specific location information. Finally, we devised a stochastic steepest ascent technique for optimizing the many parameters in a class-specific way. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  13. 13. Introduction PASCAL VOC 2010 Harmony potential 2.0: fusing across scale Semantic image segmentation Action recognition Action recognition Discussion Our main ideasAction recognition: driven by data limitations Initial experiments confirmed our intuition about the limitations of the data. Structural learning: sampling of pose space not dense enough. Latent SVM: object interactions under-sampled as well. Multiple kernel learning: converges to simple selection. From a very early stage, we decided to treat action recognition as an image classification problem. We exploit the small size dataset by performing extensive cross validation. Features are one of our string points, and we had to get the feature pipeline running for Classification in any case. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  14. 14. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionHCRFs for labeling problem We represent our segmentation problem as a graph: G = (V, E) V is used for indexing random variables, and E is the set of undirected edges representing compatibility relationships between random variables. X = {Xi } denotes the set of random variables or nodes, for i ∈ V. An energy function will be defined over graphical configurations of random variables. By the Hammersley-Clifford theorem, the energy of a configuration of x = {xi } can be written as the negative exponential of an energy function E(x) = c∈C ϕc (xc ), where ϕc is the potential function of clique c ∈ C. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  15. 15. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionConsistency potentials for labeling problems The energy function of G can be written as: E(x) = φ(xi ) + ψL (xi , xj ) + ψG (xi , xg ). i∈V (i,j)∈EL (i,g)∈EG The unary term φ(xi ) depends on a single probability P(Xi = xi |Øi ), where Øi is the observation that affects Xi in the model. The smoothness potential ψL (xi , xj ) determines the pairwise relationship between two local nodes. The consistency potential ψG (xi , xg ) expresses the dependency between local nodes and a global node. And the Maximum a Posteriori (MAP) estimate of the optimal labeling is: x∗ = arg min E(x). x The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  16. 16. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionHCRF models of image segmentation Smoothness Potts Robust P N Free (Shotten et al, CVPR2008) (Plath et al, ICML2009) (Ladicky et al, ICCV2009) Colored nodes represent (hidden) semantic labels. Dark nodes represent image measurements. Red edges represent penalties imposed by potential. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  17. 17. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionDifferent features for discriminations The previously mentioned approaches all try to make global distinctions using local information. Either by voting of local observations (Potts). Or, by penalizing rampantly discordant local label assignments PN . None of these techniques try to exploit truly global information to constrain local labels. And none incorporate the notion of encoding combinations of primitive node labels at the global level. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  18. 18. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionThe harmony potential: selective subsets Only labels that do not agree with subset are penalized. Can represent more diverse combinations. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  19. 19. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionThe harmony potential: overview The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  20. 20. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionRanked subsampling of P(L) We can do this using the following posterior: ∗ ∗ ∗ P( ⊆ xg |Ø) ∝ P( ⊆ xg )P(O| ⊆ xg ). This allows us to effectively rank possible global node labels, and ∗ thus to prioritize candidates in the search for the optimal label xg . ∗ P( ⊆ xg |O) establishes an order on subsets of the (unknown) ∗ optimal labeling of the global node xg that guides the consideration of global labels. We may not be able to exhaustively consider all labels in P(L), but ∗ at least we consider the most likely candidates for xg . And image classification can give us an estimate of this posterior. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  21. 21. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionPASCAL 2010: pushing the limit The previous slides describe our approach used for the PASCAL 2009 submission. The discriminative model was based on only SVMs trained to discriminate object classes from their own backgrounds. Starting with the harmony potential approach, this year we concentrated on adding cues derived from different levels of mid-level context. We found the HCRF model with harmony potential to be very useful for performing this fusion. Our hypothesis at the end of the 2009 competition was that detection would be essential for pushing forward the state-of-the-art. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  22. 22. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionPASCAL 2010: fusing across scales 1 FG/BG: 20 SVMs trained to discriminate classes from their own background. The same discriminative model used last year, essential for localizing object boundaries. 2 CLASS: 20 SVMs trained to discriminate each object class from the other object. Essential for distinguishing objects with similar backgrounds (e.g. cows from sheep, birds from planes). Incorporated directly into unary potential. 3 LOC: 20 class-specific location priors. Computed from ground truth segmentations by simple, spatial averaging. A form of top-down mid-level context. 4 OBJ: 20 class-specific object detectors [Felzenszwalb 2010] are converted to superpixel scores by selecting the highest scoring detection intersecting each pixel of the superpixel. A type of bottom-up mid-level context. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  23. 23. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionPASCAL 2010: learning unary potentials We compute the unary potential by weighting the classification scores {si (k , xi )}k∈F through a sigmoid function. The unary potential becomes: 1 φL (xi ) = −µL Ki log i 1 + exp(fi (k, xi )) k∈F fi (k , xi ) = a(k, xi )si (k , xi ) + b(k, xi ) µL is the weighting factor of the local unary potential, and Ki normalizes over the number of pixels inside the superpixel. We have two sigmoid parameters for each class/cue pair: a(k , xi ) and b(k , xi ). The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  24. 24. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionDatasets We have evaluated the harmony potential approach on two standard, publicly available datasets. The Pascal VOC 2010 Segmentation Challenge dataset contains 2250 color images of 20 different semantic classes. This set is split into 750 images for training, 750 images for testing, and 750 for validation. The Microsoft MSRC-21 dataset contains 591 color images of 21 object classes. We do our own splits for cross-validation on MSRC-21. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  25. 25. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionUnsupervised segmentation Images are first over-segmented to with quick-shift to derive super-pixels [Fulkerson, ICCV 2009]. This preserves object boundaries while simplifying the representation. Working at the super-pixel level reduces the number of nodes in the CRF by 102 to 105 per image. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  26. 26. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionLocal classification scores: P(Xi = xi |Oi ) We extract patches with 50% overlap on a regular grid at several resolutions (12, 24, 36 and 48 pixels in diameter). Patches are described with SIFT, color and for MSCR-21 location features. A vocabulary is constructed using k-means to quantize to 1000 SIFT words and 400 color words. An SVM classifier using an intersection kernel is built for each semantic category. A similar number of positive and negative examples are used: around a total of 8.000 superpixel samples for MSCR-21, and 20.000 for VOC 2010 for each class. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  27. 27. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionGlobal potential and general approach For the PASCAL 2010 dataset we use our entry to the 2010 VOC Classification Challenge: [Khan, IJCV2010 (submitted)]. It uses a bag-of-words representation based on SIFT and color SIFT, plus spatial pyramids and color attention [Khan, ICCV 2009]. An SVM classifier with a χ2 kernel is trained for each semantic category in the dataset. The FG/BG and CLASS cues are computed by training a discriminative model using an SVM with histogram intersection kernel. Except for the additional cues and optimization strategy, architecture the same as our approach described at CVPR. [Gonfaus, CVPR2010] The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  28. 28. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionLearning the HCRF parameters We found it to be essential to train the per-class sigmoid parameters through cross validation. Classification scores are learned independently, are unbalanced and are effectively incomparable in many cases. The sigmoid functions weight the importance of each cue for each class. In addition to these (180) sigmoid parameters, we also must learn the weighting factors for each potential. We use a stochastic, steepest ascent technique to optimize these parameters on a validation set. In each step we randomly generate new instances of parameters. New parameter instances are generated using a Gibbs-like sampling strategy. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  29. 29. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionHistory: PASCAL VOC 2009 Background Aeroplane Bicycle Bottle Chair Boat Bird Bus Car Cat BONN 83.9 64.3 21.8 21.7 32.0 40.2 57.3 49.4 38.8 5.2 BROOKES 79.6 48.3 6.7 19.1 10.0 16.6 32.7 38.1 25.3 5.5Harmony potential 80.5 62.3 24.1 28.3 30.5 32.7 42.2 48.1 22.8 9.1 Dinning Table Potted Plant TV/Monitor Motorbike Average Person Sheep Horse Train Sofa Cow Dog BONN 28.5 22.0 19.6 33.6 45.5 33.6 27.3 40.4 18.1 33.6 46.1 36.3 BROOKES 9.4 25.1 13.3 12.3 35.5 20.7 13.4 17.1 18.4 37.5 36.4 24.8Harmony potential 30.1 7.9 21.5 41.9 49.6 31.5 26.1 37.0 20.1 39.4 31.1 34.1 The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  30. 30. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionQualitative results: MSRC-21 The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  31. 31. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionQuantitative results: MSRC-21 MSRC-21 contains more multi-class images than PASCAL. Our performance demonstrates the benefits of incorporating global scale when making local decisions. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  32. 32. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionQualitative results: PASCAL 2010 The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  33. 33. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionQuantitative results: PASCAL 2010 FG/BG shows the performance of our baseline (PASCAL 2009) approach. At the top, performance on the validation set (i.e. how well we thought we were doing). Image tags indicated how well the technique can perform with perfect global information. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  34. 34. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionThe cost of segmentation The optimal MAP label configuration x∗ is inferred using α-expansion graph cuts [Kolmogorov, PAMI2004]. The global node uses the 100 most probable label subsets Sheet1 obtained from ranked subsampling. MSRC-21 PASCAL 2010 85 50 48 80 mAP on PASCAL VOC 2010 46 75 44 mAP on MSRC-21 70 42 40 65 38 60 36 34 55 32 50 30 1 2 3 5 10 15 20 25 30 35 40 50 75 100 150 200 # labels selected The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  35. 35. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionQualitative results: PASCAL 2010 failures Context is sometimes weighted too much. When the global classifier fails, little can be done. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  36. 36. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionEvery little bit helps The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  37. 37. Introduction Our point of departure Harmony potential 2.0: fusing across scale Datasets and implementation Action recognition Experimental results DiscussionA photo finish Sheet1 Sheet1 42 15 20 25 30 35 40 40 mAP on PASCAL VOC 2010 FG-BG 33.9 CLASS 23.4 38 LOC 20.1 36 OBJ 26.2 34 FG-BG + CLASS 36.6 32 All 40.4 30 0 500 1000 1500 2000 2500 3000 #iterations The final results are tough to call between BONN and CVC. In the end, fusion over many scales and per-class, per-feature parameter optimization won. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  38. 38. Introduction The data Harmony potential 2.0: fusing across scale State-of-the-art Action recognition Our approach Discussion ResultsThe action recognition taster Images collected from Flikr using action queries. A set of nine actions was chosen in the end. They are disjoint from the main challenge dataset. Only subset of people are annotated (bounding box + action). This subset labelled with exactly one action class. Important point: we don’t have to solve the detection problem. Most action classes in the challenge contain either large variation in scale or large variations in pose (or both). The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  39. 39. Introduction The data Harmony potential 2.0: fusing across scale State-of-the-art Action recognition Our approach Discussion ResultsDataset breakdown train val trainval test img obj img obj img obj img obj Phoning 25 25 25 26 50 51 - -Playinginstrument 27 38 27 38 54 76 - - Reading 25 26 26 27 51 53 - - Ridingbike 25 33 25 33 50 66 - - Ridinghorse 27 35 26 36 53 71 - - Running 26 47 25 47 51 94 - - Takingphoto 25 27 26 28 51 55 - - Usingcomputer 26 29 26 30 52 59 - - Walking 25 41 26 42 51 83 - - Total 226 301 228 307 454 608 - - The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  40. 40. Introduction The data Harmony potential 2.0: fusing across scale State-of-the-art Action recognition Our approach Discussion ResultsGrouplets and poselets Two state-of-the art techniques to action recognition in still images. The grouplets of Fei Fei Li [Yao et al, CVPR2010]: And the latent poses of Greg Mori [Yang et al, CVPR2010]: The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  41. 41. Introduction The data Harmony potential 2.0: fusing across scale State-of-the-art Action recognition Our approach Discussion ResultsTreat it like image classification Initial experiments confirmed our intuition about the limitations of the data. Structural learning: sampling of pose space not dense enough. Latent SVM: complexity of object interactions problematic. Multiple kernel learning: converges to simple selection. State-of-the-art techniques rely on learning complex structural models of pose-variations over many From a very early stage, we decided to treat action recognition as an image classification problem. We exploit the small size dataset by performing extensive cross validation. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  42. 42. Introduction The data Harmony potential 2.0: fusing across scale State-of-the-art Action recognition Our approach Discussion ResultsThe classification pipeline The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  43. 43. Introduction The data Harmony potential 2.0: fusing across scale State-of-the-art Action recognition Our approach Discussion ResultsAction recognition: features SIFT, color SIFT (normalize R/G and opponent), self-similarity, SURF, PHOG (good for capturing pose), and color attention (focuses on interesting color features). Sparse and dense variations of most of these. Plus a range of pyramid configurations (1, 2 × 2, 3 × 3, 4 × 4). Object detectors also incorporated using a simple occurrence histogram [Felzenszwalb 2010]. The goal was to incorporate all of this into a BoVW classifier and push the limits of what is possible using classical BoW on actions. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  44. 44. Introduction The data Harmony potential 2.0: fusing across scale State-of-the-art Action recognition Our approach Discussion ResultsAction recognition: contextual pyramids Context was also important for most object classes. We used a type of foreground/background pyramid decomposition that split features into object or background. The was done using a type of spatial soft-assign based on the distance to the boundary of the object. For some classes, we also assigned contextual object regions that model the appearance of objects associated with them (the “horsy box”). The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  45. 45. Introduction The data Harmony potential 2.0: fusing across scale State-of-the-art Action recognition Our approach Discussion ResultsAction recognition: learning in the design space In the end, after all of the combinatorics introduced by pyramids and other variations, we had about 100 feature configurations in a big pool. Most attempts to automatically learn the parameters of these features were total failures. Except one. Initial experiments with multiple kernel learning showed that MKL starts converging quickly towards class-specific feature selection rather than mixing. With such a small dataset, and a little heuristic trimming, we were able to exhaustively explore a part of the design space. This resulted in the best per-class feature combinations. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  46. 46. Introduction The data Harmony potential 2.0: fusing across scale State-of-the-art Action recognition Our approach Discussion ResultsAction recognition: classification We experimented with a number of kernels (histogram intersection, χ2 , bin-ratio distance). There wasn’t a huge difference among these kernels. In the end, we chose histogram intersection for our submission as it appeared to generalize better. In addition to over-fitting less, there are no parameters to tune and it is very fast. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  47. 47. Introduction The data Harmony potential 2.0: fusing across scale State-of-the-art Action recognition Our approach Discussion ResultsOverall results: average precision The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  48. 48. Introduction The data Harmony potential 2.0: fusing across scale State-of-the-art Action recognition Our approach Discussion ResultsPer-class AP The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  49. 49. Introduction The data Harmony potential 2.0: fusing across scale State-of-the-art Action recognition Our approach Discussion ResultsPer technique median average precision The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  50. 50. Introduction The data Harmony potential 2.0: fusing across scale State-of-the-art Action recognition Our approach Discussion ResultsQualitative results When the horsey box and detectors fail, context dominates. Classifier still surprisingly robust. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  51. 51. Introduction The data Harmony potential 2.0: fusing across scale State-of-the-art Action recognition Our approach Discussion ResultsQualitative results Some fine discriminations very difficult to make. Probably difficult even for humans. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  52. 52. Introduction The data Harmony potential 2.0: fusing across scale State-of-the-art Action recognition Our approach Discussion ResultsQualitative results People taking photos should be banned. Classes with large pose variations were the most difficult. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  53. 53. Introduction Harmony potential 2.0: fusing across scale Action recognition DiscussionDiscussion: semantic image segmentation The harmony potential works well for fusing global information into local segmentations. This year we also showed that the harmony potential framework is also appropriate for incorporating different types of mid-level cues as well. Ranked sub-sampling, driven by the same posterior as used to define the global potential function, renders the optimization problem tractable. Most useful when multiple semantic classes co-occur frequently. Per-class learning of parameters essential (about +5% in final results). The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  54. 54. Introduction Harmony potential 2.0: fusing across scale Action recognition DiscussionDiscussion: action recognition This year’s taster challenge on action recognition was little more than a toy. However, we have demonstrated what is possible using proven techniques from image classification. We feel that object context, in particular object interaction context, is the way forward. The PASCAL data set is the right direction to go (more general), but we need more samples. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  55. 55. Introduction Harmony potential 2.0: fusing across scale Action recognition DiscussionThe future: segmentation Semantic image segmentation has come a long way, but still has a long way to go. It is becoming a mainstream event in PASCAL. This year we arrived as a sort of three-way detente between the CVC (winner 2010), BONN (winner 2009) and OXFORD (best paper award ECCV 2010) in segmentation. Each have their own approach, and each has its advantages and disadvantages. Engineering can probably maximize results. It is becoming mature, and we can begin thinking about what new applications are enabled by such technologies. The CVC PASCAL VOC Team CVC PASCAL VOC 2010
  56. 56. Introduction Harmony potential 2.0: fusing across scale Action recognition DiscussionThe future: action recognition It seems that action recognition in still images is a popular challenge. The PASCAL organizers are keen to promote it for the future. The concentration will remain on still images, but perhaps more concentration on incorporating user interaction as well. It seems that the community is becoming more interested in the “alternative” PASCAL challenges. The multimedia community probably has an important role to play here. The CVC PASCAL VOC Team CVC PASCAL VOC 2010

×