Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

【arXiv】Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

1,876 views

Published on

arXiv1509.07627
http://arxiv.org/abs/1509.07627

In this paper, we evaluate convolutional neural network (CNN) features using the AlexNet architecture developed by [9] and very deep convolutional network (VGGNet) architecture developed by [16]. To date, most CNN researchers have employed the last layers before output, which were extracted from the fully connected feature layers. However, since it is unlikely that feature representation effectiveness is dependent on the problem, this study evaluates additional convolutional layers that are adjacent to fully connected layers, in addition to executing simple tuning
for feature concatenation (e.g., layer 3 + layer 5 + layer7) and transformation, using tools such as principal component analysis. In our experiments, we carried out detection and classification tasks using the Caltech 101 and Daimler Pedestrian Benchmark Datasets.

Published in: Science
  • Be the first to comment

【arXiv】Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

  1. 1. Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection Hirokatsu KATAOKA, Kenji Iwata, Yutaka SATOH National Institute of Advanced Industrial Science and Technology (AIST) http://www.hirokatsukataoka.net/ arXiv preprint arXiv:1509.07627 http://arxiv.org/abs/1509.07627
  2. 2. Feature Evaluation •  Significant task in computer vision –  Based on the DeCAF [Donahue+, ICML2014], we evaluate several CNN features + SVM classifier –  The representative architecture: AlexNet [Krizhevsky+, NIPS2012] & VGGNet[Simonyan+, ICLR2015] –  Basic Idea1: Which layer has better feature in CNN architecture? –  Basic Idea2: Mid- & High-level CNN features should be concatenated! (e.g. Layer 3 + Layer 5 + Layer 7)
  3. 3. CNN Architecture & Feature Extraction •  AlexNet & VGGNet –  AlexNet: 8-layer architecture –  VGGNet: 16-layer arhitecture (each pooling layer and last 2 FC layers are applied as feature vector) Input   Conv   Conv   Pool   Conv   Pool   FC   FC   So.max   Input   Conv   Conv   Pool   FC   FC   AlexNet   VGGNet   Conv   Conv   Pool   Conv   Conv   Pool   Conv   Conv   Pool   Conv   Conv   Pool   FC   So.max   Input   Conv   Pool   FC   So.max   :  Image  input   :  Convolu:onal  layer   :  Max-­‐pooling  layer   :  Fully-­‐connected  layer   :  So.max  layer   Layer1   Layer2   Layer3   Layer4   Layer5   Layer6   Layer7   Layer1   Layer2   Layer3   Layer4   Layer5   Layer6   Layer7  
  4. 4. Experiment •  Settings –  Layer: 3 – 7 (middle and deeper layers) •  Conv., pooling and fully-connected layers –  Concatenation and transformation •  Layer 345, 456, 567, 357 •  Principal component analysis (PCA): 1500dims –  Classifier •  Support vector machine (SVM) •  The parameters are based on DeCAF [Donahue+, ICML2014] •  Datasets –  Daimler pedestrian benchmark dataset (pedestrian detection) [Munder+, TPAMI2006] –  Caltech 101 dataset (object classification) [Fei-Fei+, CVPRW2004]
  5. 5. Results on the Daimler dataset •  Daimler pedestrian benchmark dataset –  VGGNet Layer 5 (original vector) is the best rate (99.35%) –  In AlexNet, Layer 3 with PCA is the best rate (98.71%) Mid-layer is tend to be better rate on the pedestrian detection data
  6. 6. Results on the Caltech 101 dataset •  Caltech 101 dataset –  VGGNet Layer 5 (original vector) is the best rate (91.80%) –  In AlexNet, Layer 5 with PCA is the best rate (78.37%) The layer before FC layer performs good rate in object classification
  7. 7. Feature Concatenation •  Three-layer connection with PCA –  Layer 345, 456, 567, 357 –  4,500 dimensions (1,500dims at each vector) –  Left: Daimler –  Right: Caltech 101 Daimler Caltech 101 VGGNet layer 567 is the significant tuning Pedestrian detection: mid-level feature Object classification: high-level feature
  8. 8. Conclusion •  Feature evaluation with AlexNet & VGGNet –  VGGNet is better than AlexNet –  Mid-level feature is good for pedestrian detection, and high-level feature is good for object classification task –  Concatenation of VGGNet - 5th Pooling, last 2 FC layers is the best setting on the Daimler pedestrian benchmark and Caltech 101 dataset –  PCA is effective transformation for CNN feature

×