RecSysTV2014

nlab.
Machine
Group
Perception
http://www.nlab.ci.i.u-tokyo.ac.jp/
Content-based viewer estimation using image
features for recommendation of video clips
Kohei Yamamoto Riku Togashi Hideki Nakayama
Nakayama Lab. (Machine Perception Group)
Graduate School of Information Science and Technology
The University of Tokyo
Oct. 10, 2014
15年2月26日木曜日

nlab.
Machine
Group
Perception
• Introduction
• Experiment
• Result
• Conclusion & Future work
Kohei Yamamoto
Agenda

nlab.
Machine
Group
Perception
•Popularity
•Rule-based filtering
•Collaborative filtering
•Content-based filtering
•
•
•
Recommendation
Monetize only 14%
YouTube Press Statistics :
http://yuoutube.com/yt/press/statistics (March. 12)
Kohei Yamamoto
Youtube
Background

nlab.
Machine
Group
Perception
•Popularity
•
•
•
Recommendation
Monetize only 14%
why?
Kohei Yamamoto
Youtube
Background

nlab.
Machine
Group
Perception
•Popularity
•
•
•
Recommendation
Monetize only 14%
Cold-start problem
sparse information
ex)No login
Freshly uploaded
Meaningless title,tag
why?
Kohei Yamamoto
Youtube
Background

nlab.
Machine
Group
Perception
•Popularity
•
•
•
Recommendation
AD / Content recommendation
Monetize only 14%
Cold-start problem
sparse information
ex)No login
Freshly uploaded
Meaningless title,tag
why?
Content-based estimation
Kohei Yamamoto
Youtube
Background

nlab.
Machine
Group
Perception
Estimating viewer profiles from image features of video clips
Content-based
estimate
Overcome cold-start problem
Image
features
Viewer
profile
Kohei Yamamoto
Objective

nlab.
Machine
Group
Perception
• Linking Visual Concept Detection with Viewer
Demographics [A. Ulges et al., ICMR2012]
➡ detect concept of video clips (semantic indexing)
estimate demographic from concept
• Online video recommendation based on multi-
modal fusion and relevance feedback [Bo Yang
et al., CIVR2007]
➡ calculate the similarity of video clips using featur-
es such as colors and movements and tempo of
sound
Kohei Yamamoto
Related works

nlab.
Machine
Group
Perception
Semantic indexing
•Lack of important information to estimate viewership
•High cost
label:[singer] = ?
Kohei Yamamoto
Problem in previous works

nlab.
Machine
Group
Perception
Existing method Proposed method
video label demographic video demographic
“Soccer”
“Skateboard”
“Basketball”
“Baseball”
:
:
Logistic Regression
Directly
Semantic
indexing
Kohei Yamamoto
Novelty

nlab.
Machine
Group
Perception
Creating
dataset
Extracting
features
Learning
Thumbnail
Key-frame
x=(0,1, 0,1.5.-1,......)
y=(1,0.5,2,1.5,.........)
Feature vector
Classification
Regression
Logistic regression
YouTube
!! =
exp!(!!
!! + !)
1 + exp!(!!
!! + !)
!
images
Kohei Yamamoto
Experiment

nlab.
Machine
Group
Perception
YouTube Trends Map
Scraping
http://www.youtube.com/trendsmapYouTube Trends Map:
Class
Video
Clips
Thumbnail
images
Keyframe
images
Train Male
Female
5000
5000
5000
5000
20000
20000
Test Male
Female
500
500
500
500
2000
2000
•Area -Japan
•Age -All
•Sex -Male/Female
fusion
Kohei Yamamoto
Dataset

nlab.
Machine
Group
Perception
・Gist
・Fisher vector
・SIFT
・C-SIFT
・Opponent-SIFT
・RGB-SIFT
・CNN (Caffe*1)
Kohei Yamamoto
*1 Caffe: software which provide the pre-trained CNN model ofKrizhevsky’s architecture[Yangqing Jia,]
http://caffe.berkeleyvision.org/
Image features

nlab.
Machine
Group
Perception
•Modeling the shape of the scene: a holistic representation of the
spatial envelope, [A. Olivia et. al., IJCV, 2001]
•Commonly used for describing the scene of an image
3 x 4 x 4 x 20 = 960 dim
Kohei Yamamoto
Gist
Pipeline of Gist descriptor*1
*1:Evaluation of GIST descriptors for web-scale image search [Matthijs Douze et. al., CIVR 2009]
RGB Regions dimensional responses

nlab.
Machine
Group
Perception
•Improving the fisher kernel for large-scale image classification, [F.
Perronnin, et.al., ECCV, 2010]
•State-of-the-art method of bag-of-visual-words based global image
representation
Kohei Yamamoto
Fisher Vector
Pipeline of Gist descriptor*1
*1:Evaluation of GIST descriptors for web-scale image search [Matthijs Douze et. al., CIVR 2009]
SIFT descriptor SIFT descriptor
Vector quantization
Gaussian Mixture
Model (GMM)
Local descriptor Image features
BoVW
Fisher Vector
* Image representation of Fisher Vector: http://www.slideshare.net/takao-y/fisher-vector

nlab.
Machine
Group
Perception
Implementation:
・SIFT
・C-SIFT
・Opponent-SIFT
・RGB-SIFT
Fisher Vector
(dim = 32768)
PCA
Fisher kernel
(x4 regions)
(dim = 64)
64 Gaussians
Kohei Yamamoto
Fisher Vector
Local descriptor Image feature

nlab.
Machine
Group
Perception
•Achieve surprisingly high performance in visual recognition
•Caffe*1 (dim = 4096)
•Pre-trained model*2
•Used a response of last layer as image features
*1 Caffe: software which provide the pre-trained CNN model ofKrizhevsky’s architecture[Yangqing Jia,]
http://caffe.berkeleyvision.org/
*2 Imagenet classification with deep convolutional neural networks. [ A. Krizhevsky, et.al., NIPS2012]
Kohei Yamamoto
Convolutional Neural Networks (CNN)
Pre-trained model by Supervision Group *2
Repeat convolution & pooling

nlab.
Machine
Group
Perception
Creating
dataset
Extracting
features
Learning
Key-frame Image,
Audio...
x=(0,1, 0,1.5.-1,......)
y=(1,0.5,2,1.5,.........)
Feature vector
Classification
Regression
Logistic regression
YouTube
!! =
exp!(!!
!! + !)
1 + exp!(!!
!! + !)
!
Kohei Yamamoto
Experiment

nlab.
Machine
Group
Perception
Kohei Yamamoto
•Train classifiers for each image feature
•Linear logistic regression
•Solver type = L2 Regularization
•Late-fusion
•Final score is defined as the average of
each classifier score (posterior probability)
•Search the best combination of features
Learning

nlab.
Machine
Group
Perception
Thumbnail+Keyframe
Fisher Vector
(RGB-SIFT)
Gist/CNN(Caffe)
Precision, Recall, F1
Kohei Yamamoto
Result (Classification accuracy)

nlab.
Machine
Group
Perception
Thumbnail+Keyframe
Fisher Vector
(RGB-SIFT)
Gist/CNN(Caffe)
Kohei Yamamoto
Gist 65%

nlab.
Machine
Group
Perception
Thumbnail+Keyframe
Fisher Vector
(RGB-SIFT)
Gist/CNN(Caffe)
Kohei Yamamoto
Fisher Vector(RGB-SIFT) 70%

nlab.
Machine
Group
Perception
Thumbnail+Keyframe
Fisher Vector
(RGB-SIFT)
Gist/CNN(Caffe)
Kohei Yamamoto
73%Late-fusion of Fisher Vector’s sore 72%

nlab.
Machine
Group
Perception
Thumbnail+Keyframe
Fisher Vector
(RGB-SIFT)
Gist/CNN(Caffe)
Kohei Yamamoto
73%Integrating Key-frame’s score 74%

nlab.
Machine
Group
Perception
Thumbnail+Keyframe
Fisher Vector
(RGB-SIFT)
Gist/CNN(Caffe)
Kohei Yamamoto
Subjective evaluation 79%

nlab.
Machine
Group
Perception
Thumbnail+Keyframe
Fisher Vector
(RGB-SIFT)
Gist/CNN(Caffe)
Kohei Yamamoto
74% 79%

nlab.
Machine
Group
Perception
Top 15 samples of high probability
(Fisher vector(RGB-SIFT))
Kohei Yamamoto

nlab.
Machine
Group
Perception
• Proposed the method that estimate viewer demographic of video
clips using various image features
• Performed an evaluation experiment of the proposed method and
confirmed the effectiveness of it
• Fisher vector encoding a local descriptors is effective
• Local descriptors in consideration of a color information such as
RGB-SIFT are effective
• Integration of various image features of video clips and
combining of key-frame of video clips leads to improvements of
classification accuracy
Kohei Yamamoto
Conclusion

nlab.
Machine
Group
Perception
• Consider a larger set of demographics (not only gender but
also region and age group)
• Compare our method with the method using high-level
concepts (semantic indexing approach)
• Improve CNN(Caffe) model to perform fine-tuning
• Use multimodal features such as audio features and text
features
Kohei Yamamoto
Future work

nlab.
Machine
Group
Perception
Thank you for listening.
Question?

RecSysTV2014

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to RecSysTV2014

Similar to RecSysTV2014 (20)

Recently uploaded

Recently uploaded (20)

RecSysTV2014