Visual Object CategoryRecognitionAshish GuptaCentre for Vision, Speech, and Signal Processing
Contents• Introduction• Related work• Overview: Object recognition system• Object classification & detection• Conclusions• Future work
IntroductionResearch Topic: Visual object category recognition usingweakly supervised learning.DIPLECS: Artificial cognitive system for autonomous systems.• Interested in object interactions determined bytheir functional properties.• All objects in same category have the samefunctional properties.• Recognition is based on object’s visualproperties.
IntroductionResearch Topic: Visual object category recognition usingweakly supervised learning.• A very large training set is required to learn thelarge appearance variation in a category.• So we utilize huge image datasets like Flickr®and GoogleTM Image.• The images are corrupt and incompletelylabelled.• Therefore, weakly supervised learning isutilized which can handle corrupt and noisytraining data.
Occurrence frequency of visual words is characteristic of the objectObject model : bag-of-visual wordsCreating a visual codebook
Object model : bag-of-visual wordsA test image can be classifiedbased on the distance of itsnormalized codebook from thecodebooks of positive and negativetraining samples.Codebook positive samples Codebook negative samples Codebook test image
Object model : bag-of-visual wordsVisual codebooks for positive and negative samples of ‘car’ category inPASCAL VOC 2006
Object model : bag-of-visual wordsVisual codebooks for ‘car’ and ‘cow’ categories in PASCAL VOC 2009 dataset
ClassificationROC (Receiver OperatingCharacteristics): evaluatingclassification performance.ROC for ‘car’ category inPASCAL VOC 2006The linear kernel:K(x,y) = xTy, was usedsince it is fast.
Improve ClassificationLarger Visual Codebook:• More representative of category• Higher computational costROC of ‘car’ category in PACAL VOC2006 for codebook sizes from 20 to20000 visual words.
Improve ClassificationTraining and test images in thedataset scaled down by same factor.Training and test images scaled down bydifferent factors.
Improve ClassificationTraining Samples Dataset 1 Training Samples Dataset 2Scale downfactor/1/2Y NY YTest Image Image classified correctly
Improve ClassificationROC for 20 visual categories inPASCAL VOC 2009The PACAL VOC 2009 dataset islarger and more challenging than the2006 dataset.
Improve ClassificationROC for PASCAL VOC 2009 trainingand test images images scaled downby factor of 2ROC for PASCAL VOC 2009 using auniversal visual vocabulary
Object localization using sliding windowThe poor localization results are due to:• Lack of structural information in thebag-of-words object model• Classifier learning object background
Visual codebookTraining images withbounding - boxesTraining images withoutbounding - boxesGood Codebook with equal population ofpositive and negative visual wordsPositive background differentfrom negative imagesPositive background similar tonegative imagesWith no bounding-boxutilized, the codebookconsists of a majority ofnegative visual words.
Visual codebookTraining images withbounding - boxesTraining images withoutbounding - boxesGood Codebook with equal population ofpositive and negative visual wordsPositive background differentfrom negative imagesPositive background similar tonegative imagesClassification based onobject context(background) rather thanobject features.
Improve ClassificationThe detection at each iteration estimates a bounding box which provides a bettervisual codebook which in turn leads to better detection.
• Key-point configurations asfeatures are a discriminativeobject feature set.• A configuration of visual wordsappends structural informationto the bag-of-words model.Object detection• Harvest frequent and discriminative configurations.• Encode configurations called transaction vectors.• Association between a transaction vector and thetraining type is an association rule.• Apriori algorithm finds association rules with highconfidence in a support-confidence framework.Transaction vector encodingkey-point configuration
Apriori algorithm• Uses breadth-first search and tree structure.• Longer configurations will have lower support asthey are infrequent but higher confidence as theyare more discriminative.• Downward closure lemma: prune configurationswith infrequent sub-sets.
Object localizationTrainingData SetTest DataSetTest ImageGenerateTransactions TransactionsApriori dataminingAssociationRulesGenerate Confidencefor each TransactionThresholdConfidenceTransactions• A confidence is assigned to everykey-point in the image.• Key-points with sufficiently highconfidence are retained.• Key-points which occur oncommon background objects likedoors and windows can have highconfidence.
Object classification using AprioriTrainingData SetTest DataSetGenerateTransactions TransactionsApriori dataminingAssociationRulesGenerate Confidencefor each TransactionSumConfidenceTransactionsTestImagesROC ‘car’ in PASCAL VOC 2006The summed confidence score dependsupon object scale in the image, whichexplains the comparatively poorperformance of this approach.
Conclusions• The ‘bag-of-words’ model is good for classification, but poor for localization.• Separate foreground-background for better visual codebooks.• The good classification using PASCAL VOC 2006 dataset is attributed torecognition of object context rather than object features.• The dataset utilized should have sufficient variation in appearance of theobject and its background.• Larger visual vocabulary gives slightly better classification, but iscomputationally more expensive.• The visual vocabulary built has majority of background visual words sincebounding-boxes are not utilized during training.
Conclusions• Improving the proportion of visual words representing the object in thevocabulary is vital for good classification.• Incorporate object boundary contour to the descriptor.• Use of frequent and discriminative key-point configurations is a promisingapproach for object localization.• A low quality dataset results in a weak visual codebook and classifiers biasedto the training data.• Classification using key-point configurations was poor compared to ‘bag-of-words’ for PASCAL VOC 2006.
Future Work• Improve a visual codebook by increasing the proportion of visual wordspertaining to object features. Combine Apriori based localization andclustering for visual word selection in an iterative approach.•Model visual scene information (Use the GIST descriptor by Torralba). Learnco-occurrence statistics of a scene and a visual category. Recognition of thescene serves as prior for object presence and improves object recognitionperformance.• Improve object localization by using context priming.• Model object contextual information to aid foreground-backgrounddisambiguation for better object localization.
Future Work• Share information of features between visual categories. The size of auniversal visual vocabulary should increase sub-linearly with increase innumber of visual categories.• Combine image segmentation and classification to improve the objectmodel to provide better classification performance.• Build a hierarchical framework for visual categorization:• Representation: combine local and global features.• Model: combine semantic and structural object models.• Classification: combine generative and discriminative approaches.