Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
TUD MediaEval 2012 Tagging TaskReporter: Martha A. LarsonMultimedia Information Retrieval LabDelft University of Technolog...
Outline•  TUD-MM: Multi-modality video categorization with one-vs-all classifiers   •  Peng Xu, Yangyang Shi, Martha A. La...
TUD-MM:Multi-modality videocategorization with one-vs-allclassifiersPeng Xu, Yangyang Shi, Martha A. Larson05-10-2012     ...
Introduction•  Features from different modalities   •  Visual feature        •  Visual Words based representation & Global...
Visual representations •  Visual words based video representation    •  SIFT features are extracted from each key-frame   ...
Classification and Fusion    •  One-vs-all linear SVM         •  C is determined by the 5-folder cross-validation    •  Re...
Result analysis•  MAP of different runs           Run_1    Run_2      Run_3       Run_4        Run_5       *Run_6       *R...
Performance of visual features                    Random basline                  VW                 Global0,025 0,020,015...
MediaEval 2012 Tagging Task:Prediction based on One Best List andConfusion NetworksYangyang Shi, Martha A. Larson, Catholi...
Models for One-best list and   Confusion Networks                        Dynamic                        Bayesian          ...
One-best List SVM                                                      Linear     Cut-off 3                               ...
One-best List DBN       E1            E2                        E3  T1        T2                       T3       W1        ...
One-best List DBN•                               TUD MediaEval 2012 Tagging Task          Visual similarity measures for s...
Results on Only ASR Run           Models                          MAP      Run2-one-best SVM                    0.23      ...
Average Precision on Each Genre0,80,70,60,50,4                                                                            ...
Discussion and Future work•  Discussion    •  Visual only methods can be improved in several ways         •  Features sele...
Thank you! Visual similarity measures for semantic Genre retrieval             Video Search Reranking for video Tagging	  ...
Upcoming SlideShare
Loading in …5
×

TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion Networks

869 views

Published on

  • Be the first to comment

  • Be the first to like this

TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers & MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion Networks

  1. 1. TUD MediaEval 2012 Tagging TaskReporter: Martha A. LarsonMultimedia Information Retrieval LabDelft University of Technology05-10-2012 Delft University of Technology Challenge the future
  2. 2. Outline•  TUD-MM: Multi-modality video categorization with one-vs-all classifiers •  Peng Xu, Yangyang Shi, Martha A. Larson•  MediaEval 2012 Tagging Task: Prediction based on One BestList and Confusion Networks •  Yangyang Shi, Martha A. Larson, Catholijn M. Jonker TUD MediaEval 2012 Tagging Task Visual similarity measures for semantic video retrieval 2
  3. 3. TUD-MM:Multi-modality videocategorization with one-vs-allclassifiersPeng Xu, Yangyang Shi, Martha A. Larson05-10-2012 Delft University of Technology Challenge the future
  4. 4. Introduction•  Features from different modalities •  Visual feature •  Visual Words based representation & Global video representation •  Text features •  ASR, Metadata •  Term-frequency, LDA•  Classification and Fusion •  One-vs-all linear SVMs •  Reciprocal Rank Fusion •  Post-processing procedure to assign one category label for each video TUD MediaEval 2012 Tagging Task Visual similarity measures for semantic video retrieval 4
  5. 5. Visual representations •  Visual words based video representation •  SIFT features are extracted from each key-frame •  Visual vocabulary is build by hierarchical k-means clustering •  The normalized term-frequency of the entire video •  Global video representation •  Edit features •  Content features TUD MediaEval 2012 Tagging Task Visual similarity measures for semantic video retrieval 5
  6. 6. Classification and Fusion •  One-vs-all linear SVM •  C is determined by the 5-folder cross-validation •  Reciprocal Rank Fusion (RRF)* •  K=60 is to balance the importance of the lower ranked items •  The weights w(r) are determined by the cross-validation errors from each modalities •  Post-processing procedure* G. V. Cormack, C. L. A. Clarke, and S. Buettcher. Reciprocal rank fusion outperformsCondorcet and individual rank learning methods. SIGIR 09, pages 758-759.. •  TUD MediaEval 2012 Tagging Task Visual similarity measures for semantic video retrieval 6
  7. 7. Result analysis•  MAP of different runs Run_1 Run_2 Run_3 Run_4 Run_5 *Run_6 *Run_7MAP 0.0061 0.3127 0.2279 0.3675 0.2157 0.0577 0.0047 •  Run_1 to Run_5 are official runs •  Run_6 is the visual-only run without post-processing •  Run_7 is the visual-only run with global feature TUD MediaEval 2012 Tagging Task Visual similarity measures for semantic video retrieval 7
  8. 8. Performance of visual features Random basline VW Global0,025 0,020,015 0,010,005 0 TUD MediaEval 2012 Tagging Task Visual similarity measures for semantic video retrieval 8
  9. 9. MediaEval 2012 Tagging Task:Prediction based on One Best List andConfusion NetworksYangyang Shi, Martha A. Larson, Catholijn M. Jonker05-10-2012 Delft University of Technology Challenge the future
  10. 10. Models for One-best list and Confusion Networks Dynamic Bayesian Networks Support Conditional vector random machine fields ASR TUD MediaEval 2012 Tagging Task Visual similarity measures for semantic video retrieval 10
  11. 11. One-best List SVM Linear Cut-off 3 kernel multi- TF-IDF vocabulary class SVM (c=0.5) TUD MediaEval 2012 Tagging Task Visual similarity measures for semantic video retrieval 11
  12. 12. One-best List DBN E1 E2 E3 T1 T2 T3 W1 W2 W3 TUD MediaEval 2012 Tagging Task Visual similarity measures for semantic video retrieval 12
  13. 13. One-best List DBN•  TUD MediaEval 2012 Tagging Task Visual similarity measures for semantic video retrieval 13
  14. 14. Results on Only ASR Run Models MAP Run2-one-best SVM 0.23 Run2-one-best DBN 0.25 Run2-one-best CRF 0.10 Run2-CN-CRF 0.09 TUD MediaEval 2012 Tagging Task Visual similarity measures for semantic video retrieval 14
  15. 15. Average Precision on Each Genre0,80,70,60,50,4 DBN0,30,2 SVM0,1 0 TUD MediaEval 2012 Tagging Task Visual similarity measures for semantic video retrieval 15
  16. 16. Discussion and Future work•  Discussion •  Visual only methods can be improved in several ways •  Features selection or dimensional reduction methods can be applied. •  Genre-level video representation •  CRF failure •  A document is treated as a item rather than one word. •  Feature size is too big to converge. • DBN outperforms SVM: The sequence order information probably helps prediction•  Potentials •  Generate clear and useful labels Visual similarity measures MediaEval 2012 Tagging Task Video Search Reranking for Genre retrieval TUD for semantic video Tagging 16
  17. 17. Thank you! Visual similarity measures for semantic Genre retrieval Video Search Reranking for video Tagging 17

×