Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
UNICAMP-UFMG at MediaEval 2012:             Genre Tagging Task                                    ´Jurandy Almeida1 , Thia...
Genre Tagging Task at MediaEval 2012                                     2  Automatically assign tags to Internet videos c...
Available Resources                                                  3              Table: Resources made available for th...
The 26 Genre Tags                                           4  Default Category          Documentary  Technology          ...
es            s                   5                                                                                       ...
Proposed Framework                        6It relies on two learning strategies: 1. Meta-learning scheme for combining    ...
Visual Approach (HMP)                                                        7  Keyframes: Not used  Applying an algorithm...
Visual Approach (HMP)                                                                                                     ...
Textual Approach (BoW)                                     9  Title, tags, and description used as textual features.  Only...
The Classifiers                                                                             10  KNN for visual features  St...
Stacked Generalization                                   11             Figure: Stacked Generalization framework.
Experimental Protocol                          12  5-fold cross validation  Development set size: 5,288 videos (∼33%)  Tes...
Experimental Results                                                13                    Table: MAP for each submitted ru...
14                                                                                                         s              ...
Experimental Results                                                              15Visual featuresperform better in most ...
Conclusion                                                                          16RemarksWe used different learning str...
Acknowledgements                                  17  Organizers of Tagging Task and MediaEval 2012  Brazilian funding age...
Upcoming SlideShare
Loading in …5
×

UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task

1,023 views

Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task

  1. 1. UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task ´Jurandy Almeida1 , Thiago Salles2 , Eder F. Martins2 , Ot´vio A. B. Penatti1 , a Ricardo da S. Torres , Marcos A. Gon¸alves , and Jussara M. Almeida2 1 c 2 1 RECOD Lab, Institute of Computing, University of Campinas (UNICAMP) {jurandy.almeida, penatti, rtorres}@ic.unicamp.br 2 Dept. of Computer Science, Federal University of Minas Gerais (UFMG) {tsalles, ederfm, mgoncalv, jussara}@dcc.ufmg.br MediaEval’12 – Pisa, Italy – October 4-5 – 2012
  2. 2. Genre Tagging Task at MediaEval 2012 2 Automatically assign tags to Internet videos collected from blip.tv. The focus is on tags related to the category of the video. Each video belongs to only one category.
  3. 3. Available Resources 3 Table: Resources made available for the task. Resources Textual Visual Used Title, Tags, Descriptions Videos Not Used Tweets, ASR transcripts Shots and Keyframes
  4. 4. The 26 Genre Tags 4 Default Category Documentary Technology The Mainstream Media Music and Entertainment Business Politics Food & Drink Educational Health Videoblogging Conferences and Other Events Religion Literature Movies and Television The Environment Sports Personal or Auto-biographical Art School and Education Comedy Travel Gaming Web Development and Sites Citizen Journalism Autos and Vehicles
  5. 5. es s 5 cl te hi si ve nd l d ta ca an en hi s m ap to p gr au lo io ve -b de to eb au n w or io al at on uc rs ed pe d an ol ho sc t ve l en tra nm nt s ro vi ve en re th e he otDistribution of Genres (Development Set) e ur d at an er s lit ce en er k nf in co dr d an od fo th al ia he ed ss m ne am si bu tre ns ai m e ry th ta en m cu do g m in lis m na ga ur jo en tiz ci y ed m co t ar ts or on sp si n vi io le lig te re d an s ie ov ng m g gi lo t ob en de l m vi na in at io rta te uc en ed d an ic us m gy lo no ch te s y lit ic or po eg at tc ul fa de 800 700 600 500 400 300 200 100 0 # of objects
  6. 6. Proposed Framework 6It relies on two learning strategies: 1. Meta-learning scheme for combining classifiers (stacked generalization) trained with metadata information model as a bag-of-words. 2. Histogram of motion patterns joint with a KNN classifier for visual information.
  7. 7. Visual Approach (HMP) 7 Keyframes: Not used Applying an algorithm to compare video sequences. “Comparison of Video Sequences with Histograms of Motion Patterns”. J. Almeida, N. J. Leite, and R. S. Torres. IEEE Conference on Image Processing (ICIP) 2011. It relies on three steps: 1. partial decoding; 2. feature extraction; 3. signature generation.
  8. 8. Visual Approach (HMP) 8 Video Frames Motion Pattern 1 1 2 1 0101100110010011 2 1 0 3 Temporal Spatial Histogram Distribution 1: Partial Decoding 3: Signature Generation DC coefficient Time Series of Macroblocks Pixel Block 106 111 91 94 90 93 Macroblock 2: Feature Extraction 100 88 95 90 96 91 Previous Current Next I-frames[Almeida et al., Comparison of Video Sequences with Histograms of Motion Patterns. ICIP 2011]
  9. 9. Textual Approach (BoW) 9 Title, tags, and description used as textual features. Only metadata in English. Porter Stemming algorithm to remove affixes. Stop words removed. Bag-of-Words using T F xIDF as weights.
  10. 10. The Classifiers 10 KNN for visual features Stacked generalization for textual features Level 0 ◮ K-Nearest Neighbors (KNN): using the k nearest training examples to assign a class to test example. ◮ Na¨ Bayes (NB): a probabilistic learning method that aims at inferring a model ıve for each class by assigning to a test example the class associated with the most probable model that would have generated it. ◮ Random Forests (RF) a variation of decision tree’s bagging, in which an ensemble of de-correlated decision trees is learned. Level 1 ◮ Support Vector Machine (SVM): a classifier that aims at finding an optimal separating hyperplane between the positive and negative training documents, maximizing the distance (margin) to the closest points from either class.
  11. 11. Stacked Generalization 11 Figure: Stacked Generalization framework.
  12. 12. Experimental Protocol 12 5-fold cross validation Development set size: 5,288 videos (∼33%) Test set size: 10,161 videos (∼67%) Vocabulary of 54,796 terms Mean of 31.03 terms per video Total number of visual features: 4,888 Mean of 3,703.59 visual features per video
  13. 13. Experimental Results 13 Table: MAP for each submitted run. Experiment Input MAP Run 1 audio/visual only, no ASR 0.1238 Run 4 everything allowed, no uploader ID 0.2112
  14. 14. 14 s ite l es s ca cl d hi hi an p ve nt ra d e iog an pm b s lo o− n to ve ut tio au de a a or uc eb al d w on d e rs an s pe ol t nt run1 run4 ho en ve sc el nm re v iro he tra env ot nd e e a Figure: AP per class achieved in each run. th tur es a c er n k lit ere drin nf d co an ia od ed fo lth m a s am he s e ne tr si ins bu ma tary sm e n li th me rna cu jou do en tiz ci ing m ga edy m co n t io ar ts is or ev el sp ics d t lit an po iesExperimental Results ov g en t m ion gin m lig og in re obl al rta n e de o nt vi cati d e u n ed ic a y y r us og o m nol teg ch ca te ult fa de 1 0.8 0.6 0.4 0.2 0 Average Precision
  15. 15. Experimental Results 15Visual featuresperform better in most imbalanced classes like “food and drink” (AP = 0.372),“school and education” (AP = 0.509), and “autos and vehicles” (AP = 0.692).Textual featuresperform better in the more frequent classes in development set as “default category”(AP = 0.860) and “business” (AP = 0.557).
  16. 16. Conclusion 16RemarksWe used different learning strategies for each data modality: video similarity forprocessing visual content and an ensemble of classifiers for text-processing.FindingsObtained results demonstrate that the proposed framework is promising. Bycombining textual and visual features, we can make a contribution to better results.Future workThe investigation of learning strategies for combining features from differentmodalities and considering other information sources, such as ASR transcripts, toinclude more features semantically related to each category.
  17. 17. Acknowledgements 17 Organizers of Tagging Task and MediaEval 2012 Brazilian funding agencies FAPEMIG, FAPESP, CAPES, and CNPq

×