9. nlab.
Machine
Group
Perception
• Linking Visual Concept Detection with Viewer
Demographics [A. Ulges et al., ICMR2012]
➡ detect concept of video clips (semantic indexing)
estimate demographic from concept
• Online video recommendation based on multi-
modal fusion and relevance feedback [Bo Yang
et al., CIVR2007]
➡ calculate the similarity of video clips using featur-
es such as colors and movements and tempo of
sound
Kohei Yamamoto
Related works
15年2月26日木曜日
18. nlab.
Machine
Group
Perception
•Modeling the shape of the scene: a holistic representation of the
spatial envelope, [A. Olivia et. al., IJCV, 2001]
•Commonly used for describing the scene of an image
3 x 4 x 4 x 20 = 960 dim
Kohei Yamamoto
Gist
Pipeline of Gist descriptor*1
*1:Evaluation of GIST descriptors for web-scale image search [Matthijs Douze et. al., CIVR 2009]
RGB Regions dimensional responses
15年2月26日木曜日
19. nlab.
Machine
Group
Perception
•Improving the fisher kernel for large-scale image classification, [F.
Perronnin, et.al., ECCV, 2010]
•State-of-the-art method of bag-of-visual-words based global image
representation
Kohei Yamamoto
Fisher Vector
Pipeline of Gist descriptor*1
*1:Evaluation of GIST descriptors for web-scale image search [Matthijs Douze et. al., CIVR 2009]
SIFT descriptor SIFT descriptor
Vector quantization
Gaussian Mixture
Model (GMM)
Local descriptor Image features
BoVW
Fisher Vector
* Image representation of Fisher Vector: http://www.slideshare.net/takao-y/fisher-vector
15年2月26日木曜日
21. nlab.
Machine
Group
Perception
•Achieve surprisingly high performance in visual recognition
•Caffe*1 (dim = 4096)
•Pre-trained model*2
•Used a response of last layer as image features
*1 Caffe: software which provide the pre-trained CNN model ofKrizhevsky’s architecture[Yangqing Jia,]
http://caffe.berkeleyvision.org/
*2 Imagenet classification with deep convolutional neural networks. [ A. Krizhevsky, et.al., NIPS2012]
Kohei Yamamoto
Convolutional Neural Networks (CNN)
Pre-trained model by Supervision Group *2
Repeat convolution & pooling
15年2月26日木曜日
23. nlab.
Machine
Group
Perception
Kohei Yamamoto
•Train classifiers for each image feature
•Linear logistic regression
•Solver type = L2 Regularization
•Late-fusion
•Final score is defined as the average of
each classifier score (posterior probability)
•Search the best combination of features
Learning
15年2月26日木曜日
34. nlab.
Machine
Group
Perception
• Proposed the method that estimate viewer demographic of video
clips using various image features
• Performed an evaluation experiment of the proposed method and
confirmed the effectiveness of it
• Fisher vector encoding a local descriptors is effective
• Local descriptors in consideration of a color information such as
RGB-SIFT are effective
• Integration of various image features of video clips and
combining of key-frame of video clips leads to improvements of
classification accuracy
Kohei Yamamoto
Conclusion
15年2月26日木曜日
35. nlab.
Machine
Group
Perception
• Consider a larger set of demographics (not only gender but
also region and age group)
• Compare our method with the method using high-level
concepts (semantic indexing approach)
• Improve CNN(Caffe) model to perform fine-tuning
• Use multimodal features such as audio features and text
features
Kohei Yamamoto
Future work
15年2月26日木曜日