Object Discovery using CNN
Features in EgocentricVideos
M. Bolaños, M. Garolera and P. Radeva
SenseCam and Narrative wearable cameras
Lifelogging and EgocentricVision
Present a new field of research with a high
potential:
 Automatic nutrition diary.
 Memory aid for mild cognitive
impairment patients.
Introduction Scheme Methodology Validation Conclusions
Lifelogging cameras passively acquire pictures 24-7-365. Producing data about
the users' typical activities and habits.
Focus of theWork
Hypothesis:
Individuals’ environment is “constructed” by sets of objects that characterize it.
3
Introduction Scheme Methodology Validation Conclusions
Object Discovery ≠ Object Detection?
Goal:
To develop automatic techniques for object discovery to characterize the environment
of the person wearing the camera.
Focus of theWork
Hypothesis:
Individuals’ environment is “constructed” by sets of objects that characterize it.
4
Introduction Scheme Methodology Validation Conclusions
Object Discovery ≠ Object Detection?
Goal:
To develop automatic techniques for object discovery to characterize the environment
of the person wearing the camera.
Person A
Person A
Person B
Baseline Object Discovery Proposal
Lee & Grauman’s “Learning the EasyThings First: Self-PacedVisual Category
Discovery”
They propose to use:
a) As features for object candidates:
 LAB histograms: colour information.
 Pyramid HOG: shape information
 Spatial Pyramid Matching: texture information.
Visual words technique based on SIFT.
b) Iterative clustering-based easy-first discovery.
5
Introduction Scheme Methodology Validation Conclusions
Our Proposal
Discover iteratively and semi-supervisedly the most relevant classes from
egocentric videos for a particular user, based on the following methodology:
 How can we apply a sampling on each image? Objectness object detector
 How can we represent real world objects? CNN features
 How can we apply a knowledge reuse? Refill strategy
 How can we discover new “concepts”? Clustering
6
Introduction Scheme Methodology Validation Conclusions
Ego-Object Discovery Scheme (I)
Introduction Scheme Methodology Validation Conclusions
Ego-Object Discovery Scheme (II)
Introduction Scheme Methodology Validation Conclusions
Ego-Object Discovery Scheme (III)
Introduction Scheme Methodology Validation Conclusions
Objectness detector
Object detection method for extracting a set of initial object candidates as an initial
step for the object discovery.
Maximizes 3 complementary terms:
 Multi-scale Saliency
 Color Contrast
 Superpixels Straddling
10
Introduction Scheme Methodology Validation Conclusions
CNN Features
We propose a set of features based on a pre-trained Convolutional Neural
Network on ImageNet objects.
After deleting the last layer, it can be used as a powerful feature extractor for
real-world objects (output of dimension 4096).
11
Introduction Scheme Methodology Validation Conclusions
Refill Strategy
Refill strategy: useful for offering a knowledge reuse and an additional context
from the objects. Consequently providing more robust clusters.
Bag of Refill
usage
12
Introduction Scheme Methodology Validation Conclusions
Dataset
1.000 images from a person's work day.
50.000 object candidates were extracted.
To validate our method, we used the most frequent:
Introduction Scheme Methodology Validation Conclusions
Tests Settings
True detection of an object candidate, if the Overlapping Score (OS) between
the detected region and the GroundTruth is ≥ 0.4:
76% of the object detector
samples are FPs! (“No Objects”)
Introduction Scheme Methodology Validation Conclusions
Tests Comparison
Final F-Measure, Purity andAccuracy for
each setting
F-Measure evolution for each different setting
Introduction Scheme Methodology Validation Conclusions
Real Clusters Obtained (simplified)
tvmonitor
hand
cluster
hard samples
cluster
hard samples
Introduction Scheme Methodology Validation Conclusions
Extended RecentWork (I)
Introduction Scheme Methodology Validation Conclusions
Egocentric Dataset of the University of Barcelona (EDUB)
 4,912 images
 Acquired by 4 people in 8 different days
 11,294 object labels (21 different classes).
Tested also on public datasets of intentional images.
MSRC
PASCAL 12
EDUB
Extended RecentWork (II)
Introduction Scheme Methodology Validation Conclusions
Setting 5
Added SVM Filter strategy for removing FPs produced by the Object Detectors.
Setting 3
Setting 1
Extended RecentWork (III)
Introduction Scheme Methodology Validation Conclusions
Conclusions
Object discovery algorithm that outperforms the state of the art and relies on:
 Refill strategy for providing knowledge reuse.
 Features extraction method using a pre-trained CNN.
 SVM filtering for providing a filter of FP candidates.
 Egocentric lifelogging dataset with object labels (EDUB).
 Comparison on three datasets EDUB, PASCAL and MSRC.
Future lines:
1. Discovering objects, scenes and people to characterize the user’s environment,
adding also information provided by event segmentation techniques.
2. Algorithm code and dataset to be released soon.
20
Introduction Scheme Methodology Validation Conclusions

Object Discovery using CNN Features in Egocentric Videos

  • 1.
    Object Discovery usingCNN Features in EgocentricVideos M. Bolaños, M. Garolera and P. Radeva
  • 2.
    SenseCam and Narrativewearable cameras Lifelogging and EgocentricVision Present a new field of research with a high potential:  Automatic nutrition diary.  Memory aid for mild cognitive impairment patients. Introduction Scheme Methodology Validation Conclusions Lifelogging cameras passively acquire pictures 24-7-365. Producing data about the users' typical activities and habits.
  • 3.
    Focus of theWork Hypothesis: Individuals’environment is “constructed” by sets of objects that characterize it. 3 Introduction Scheme Methodology Validation Conclusions Object Discovery ≠ Object Detection? Goal: To develop automatic techniques for object discovery to characterize the environment of the person wearing the camera.
  • 4.
    Focus of theWork Hypothesis: Individuals’environment is “constructed” by sets of objects that characterize it. 4 Introduction Scheme Methodology Validation Conclusions Object Discovery ≠ Object Detection? Goal: To develop automatic techniques for object discovery to characterize the environment of the person wearing the camera. Person A Person A Person B
  • 5.
    Baseline Object DiscoveryProposal Lee & Grauman’s “Learning the EasyThings First: Self-PacedVisual Category Discovery” They propose to use: a) As features for object candidates:  LAB histograms: colour information.  Pyramid HOG: shape information  Spatial Pyramid Matching: texture information. Visual words technique based on SIFT. b) Iterative clustering-based easy-first discovery. 5 Introduction Scheme Methodology Validation Conclusions
  • 6.
    Our Proposal Discover iterativelyand semi-supervisedly the most relevant classes from egocentric videos for a particular user, based on the following methodology:  How can we apply a sampling on each image? Objectness object detector  How can we represent real world objects? CNN features  How can we apply a knowledge reuse? Refill strategy  How can we discover new “concepts”? Clustering 6 Introduction Scheme Methodology Validation Conclusions
  • 7.
    Ego-Object Discovery Scheme(I) Introduction Scheme Methodology Validation Conclusions
  • 8.
    Ego-Object Discovery Scheme(II) Introduction Scheme Methodology Validation Conclusions
  • 9.
    Ego-Object Discovery Scheme(III) Introduction Scheme Methodology Validation Conclusions
  • 10.
    Objectness detector Object detectionmethod for extracting a set of initial object candidates as an initial step for the object discovery. Maximizes 3 complementary terms:  Multi-scale Saliency  Color Contrast  Superpixels Straddling 10 Introduction Scheme Methodology Validation Conclusions
  • 11.
    CNN Features We proposea set of features based on a pre-trained Convolutional Neural Network on ImageNet objects. After deleting the last layer, it can be used as a powerful feature extractor for real-world objects (output of dimension 4096). 11 Introduction Scheme Methodology Validation Conclusions
  • 12.
    Refill Strategy Refill strategy:useful for offering a knowledge reuse and an additional context from the objects. Consequently providing more robust clusters. Bag of Refill usage 12 Introduction Scheme Methodology Validation Conclusions
  • 13.
    Dataset 1.000 images froma person's work day. 50.000 object candidates were extracted. To validate our method, we used the most frequent: Introduction Scheme Methodology Validation Conclusions
  • 14.
    Tests Settings True detectionof an object candidate, if the Overlapping Score (OS) between the detected region and the GroundTruth is ≥ 0.4: 76% of the object detector samples are FPs! (“No Objects”) Introduction Scheme Methodology Validation Conclusions
  • 15.
    Tests Comparison Final F-Measure,Purity andAccuracy for each setting F-Measure evolution for each different setting Introduction Scheme Methodology Validation Conclusions
  • 16.
    Real Clusters Obtained(simplified) tvmonitor hand cluster hard samples cluster hard samples Introduction Scheme Methodology Validation Conclusions
  • 17.
    Extended RecentWork (I) IntroductionScheme Methodology Validation Conclusions Egocentric Dataset of the University of Barcelona (EDUB)  4,912 images  Acquired by 4 people in 8 different days  11,294 object labels (21 different classes). Tested also on public datasets of intentional images. MSRC PASCAL 12 EDUB
  • 18.
    Extended RecentWork (II) IntroductionScheme Methodology Validation Conclusions Setting 5 Added SVM Filter strategy for removing FPs produced by the Object Detectors. Setting 3 Setting 1
  • 19.
    Extended RecentWork (III) IntroductionScheme Methodology Validation Conclusions
  • 20.
    Conclusions Object discovery algorithmthat outperforms the state of the art and relies on:  Refill strategy for providing knowledge reuse.  Features extraction method using a pre-trained CNN.  SVM filtering for providing a filter of FP candidates.  Egocentric lifelogging dataset with object labels (EDUB).  Comparison on three datasets EDUB, PASCAL and MSRC. Future lines: 1. Discovering objects, scenes and people to characterize the user’s environment, adding also information provided by event segmentation techniques. 2. Algorithm code and dataset to be released soon. 20 Introduction Scheme Methodology Validation Conclusions

Editor's Notes

  • #4 These are some examples of lifelogging sets aquired with a wearable camera. We can see that some of them capture quite well the objects and environment surrounding the user, but in many occasions, considering the non-intentionality of the aquisition of the pictures, we do not have useful information and moreover many can appear blurry. Analysing these pictures, we considered that the environment of any individual is composed of and can be described by the objects people and scenes that appear in their daily life images. Based on this, our goal is to discover the set of objects that characterize this environment using an Object Discovery algorithm. We must note that Object Discovery ≠ Object Detection, because we do not use a supervised technique for detecting a specific object, but have to gradually discover new concepts that describe the environment.
  • #5 These are some examples of lifelogging sets aquired with a wearable camera. We can see that some of them capture quite well the objects and environment surrounding the user, but in many occasions, considering the non-intentionality of the aquisition of the pictures, we do not have useful information and moreover many can appear blurry. Analysing these pictures, we considered that the environment of any individual is composed of and can be described by the objects people and scenes that appear in their daily life images. Based on this, our goal is to discover the set of objects that characterize this environment using an Object Discovery algorithm. We must note that Object Discovery ≠ Object Detection, because we do not use a supervised technique for detecting a specific object, but have to gradually discover new concepts that describe the environment.
  • #6 A good object discovery proposal was given by Lee & Grauman. They propose a set of colour, shape and texture features for describing each object candidate after the previous application of an object detection method. Then they apply an interative semi-supervised object discovery method based on starting by the easiest objects first.
  • #7 In our proposal we want to keep the good aspects of other methods, but at the same time define a new methodology useful for solving the object discovery problem in non-intentional images. Therefore, our newly defined methodology has to tackle some very precise problems: Sampling? Objectness Rich representation? CNN features Knowledge reuse? Refill Remove FPs? SVM filtering Discover new concepts? Clustering We additionally make public the first egocentric lifelogging dataset (EDUB).
  • #11 For extracting the set of general object candidates (without an initial specific label), we use Ferrari’s Objectness. Their proposal is to use a combination of three measures to choose the best object candidates and an associated objectness score.
  • #12 For obtaining a rich feature representation for our objects, we used a pre-trained CNN provided by Hinton et al. on general objects images as a feature extractor.
  • #13 In order to have more robuts clusters and reuse the aquired knowledge, we developed the refill strategy, which inserts previously labeled samples during the clustering step. Then, for deleting a great part of the FPs caused by the object dectector, we trained a Object vs NoObject SVM.
  • #21 In this work we have proposed a new object discovery methodology that uses: CNN features for a rich representation. SVM filtering for removing FP candidates. Refill strategy for applying a knowledge reuse. Moreover: First public … …
  • #22 These are some examples of lifelogging sets aquired with a wearable camera. We can see that some of them capture quite well the objects and environment surrounding the user, but in many occasions, considering the non-intentionality of the aquisition of the pictures, we do not have useful information and moreover many can appear blurry. Analysing these pictures, we considered that the environment of any individual is composed of and can be described by the objects people and scenes that appear in their daily life images. Based on this, our goal is to discover the set of objects that characterize this environment using an Object Discovery algorithm. We must note that Object Discovery ≠ Object Detection, because we do not use a supervised technique for detecting a specific object, but have to gradually discover new concepts that describe the environment.
  • #23 Our clustering step is based in 3 parts: Agglomerative ward clustering, which introduces a measure of minimum variance. Silhouette coefficient for selecting the best cluster for the user to label. One-Class SVM trained on the newly labeled samples for applying an expansion on harder instances of the same concept.
  • #24 As a future work we plan to: Define an algorithm for discovering objects, scenes and people and taking advantage of event segmentation techniques for a better characterisation of the user. Iterative scene and object complementary discovery. Make the method user-discriminative for detecting only relevant objects for the user.
  • #25 Quick overview!