Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Me11 genre
1. Genre Tagging Task: Prediction
using Bag-of-(visual)-Words Approaches
Schmiedeke, Kelm and Sikora
Communication Systems Group
Technische Universität Berlin
Dienstag, 9. Oktober 2012
2. Motivation 2
Schmiedeke: “Prediction using Bag-of-(visual)-Words Approaches”
3. Experimental Setup 3
Classifier
ASR Transcript
Translation into English
Support Vector
Machine
Title, multiclass (one-vs.-one)
Meta-
Description, Linear or RBF kernel
Bag-of-Words representation
data
Comments
Tags Naive Bayes Genre
without a-priori labels
knowledge
Temporal pooling
Grey SURF
Nearest Neighbour
Classification
Key based on Jensen-
Frames Shannon-Divergence
Colour SURF
Schmiedeke: “Prediction using Bag-of-(visual)-Words Approaches”
4. Extracting textual features 4
Vocabulary is built on video documents
• Stemming
• Stop word removal
• (Translation into English)
Term vectors are generated for each video
document
• (calculation of Tf-idf)
Schmiedeke: “Prediction using Bag-of-(visual)-Words Approaches”
5. Extracting visual features 5
SURF are extracted from each key frame
• At keypoints and at a regular grid
Vocabulary is built using k-Means on SURF
features of development set
• 2048 codewords
Term vector for a single video is obtained by bin-
wise pooling of each key frames’ term vector
• max,avg,median
Schmiedeke: “Prediction using Bag-of-(visual)-Words Approaches”
6. Official runs 6
Schmiedeke: “Prediction using Bag-of-(visual)-Words Approaches”
7. additional runs (textual) 7
To answer some research questions: (for this database)
• Is translation into English useful?
(linear SVM, C=1,
non-translated)
• What is the effect of classification methods?
(non-translated metadata)
• Resources?
(linear SVM, C=1, translated)
Schmiedeke: “Prediction using Bag-of-(visual)-Words Approaches”
8. additional runs (visual) 8
• Which pooling method works best?
(linear SVM, C=1)
• Grey-SURF vs. Colour-SURF
(linear SVM, C=1,
pooled by averaging)
• Local vs. Global features
Schmiedeke: “Prediction using Bag-of-(visual)-Words Approaches”
9. Backup 9
Schmiedeke: “Prediction using Bag-of-(visual)-Words Approaches”
10. Backup 10
Schmiedeke: “Prediction using Bag-of-(visual)-Words Approaches”
11. additional runs (fusion) 11
• Direct fusion?
(linear SVM, C=1)
Schmiedeke: “Prediction using Bag-of-(visual)-Words Approaches”
12. MediaEval 2011: Genre Tagging 12
Question: What is the videos’ blip.tv category?
Blip.tv database (cc): ~ 350 h
• 247 trainings videos
• 1727 test videos
Official evaluation measurement is Mean
Average Precision (MAP)
Workshop will be held 1-2 September 2011 in
Pisa, Italy (satelite of InterSpeech)
Schmiedeke: “Prediction using Bag-of-(visual)-Words Approaches”