SlideShare a Scribd company logo
~ Multimodal Video Classification ~

                            ARF (Austria-Romania-France) team


       Bogdan IONESCU*1,3              Ionuț MIRONICĂ1              Klaus SEYERLEHNER2
           bionescu@imag.pub.ro          imironica@imag.pub.ro               music@cp.jku.at

          Peter KNEES2                  Jan SCHLÜTER4                  Markus SCHEDL2
            peter.knees@jku.at            jan.schlueter@ofai.at           markus.schedl@jku.at

           Horia CUCU1                     Andi BUZO1                 Patrick LAMBERT3
            horia.cucu@upb.ro              andi.buzo@upb.ro           patrick.lambert@univ-savoie.fr


    *this work was partially supported under European Structural Funds EXCEL POSDRU/89/1.5/S/62557.
1                       2                             3                        4
         University                                                                         Austrian Research
         POLITEHNICA                                                                        Institute for Artificial
         of Bucharest                                                                       Intelligence
Presentation outline


          • The approach

          • Video content description

          • Experimental results

          • Conclusions and future work




MediaEval - Pisa, Italy, 4-5 October 2012   1/16 2
The approach
  > challenge: find a way to assign (genre) tags to unknown videos;
  > approach: machine learning paradigm;

                                     …
      web       food       autos             label data

                        train


                                            unlabeled data

               classifier                   labeled data



                                                             tagged video database
                                                                 video database
MediaEval - Pisa, Italy, 4-5 October 2012                                            2/163
The approach: classification
  > the entire process relies on the concept of “similarity” computed
  between content annotations (numeric features),

  > this year focus is on:

       objective 1: go multimodal (truly)




                   visual                   audio   text


       objective 2: test a broad range of classifiers and descriptor
       combinations;


MediaEval - Pisa, Italy, 4-5 October 2012                               3/164
Video content description - audio
   block-level audio features                           • Spectral Pattern,
  (capture also local temporal information)              ~ soundtrack’s timbre;
                                                         • delta Spectral Pattern,
    e.g. 50% overlapping
                                                         ~ strength of onsets;
                                                         • variance delta Spectral Pattern,
                                             average     ~ variation of the onset strength;
                                             median      • Logarithmic Fluctuation Pattern,
                                             variance    ~ rhythmic aspects;
                                             ...         • Correlation Pattern,
                                                         ~ loudness changes;
                                                         • Spectral Contrast Pattern,
                                                         ~ ”toneness”;
                                                            • Local Single Gaussian model,
                  [Klaus Seyerlehner et al., MIREX’11, USA] ~ timbral;
                                                         • George Tzanetakis model,
                                                         ~ timbral;

MediaEval - Pisa, Italy, 4-5 October 2012                                               4/16
                                                                                           5
Video content description - audio
     standard audio features
    (audio frame-based)

                                                         • Zero-Crossing Rate,

                                                         • Linear Predictive Coefficients,

                                       time              • Line Spectral Pairs,

                                                         • Mel-Frequency Cepstral Coefficients,
                                              global
                                             feature     • spectral centroid, flux, rolloff, and
    f1 f2        …        fn
                                                =        kurtosis,
+                                           mean &       + variance of each feature over
     var{f2}          var{fn}               variance     a certain window.



                                            [B. Mathieu et al., Yaafe toolbox, ISMIR’10, Netherlands]

MediaEval - Pisa, Italy, 4-5 October 2012                                                       5/16
                                                                                                   6
Video content description - visual
   MPEG-7 & color/texture descriptors
  (visual frame-based)

                                                            • Local Binary Pattern,

                                              global        • Autocorrelogram,
                                             feature        • Color Coherence Vector,
                                                 =
                                             mean &         • Color Layout Pattern,
                                          dispersion &      • Edge Histogram,
                                          skewness &
                               time
                                            kurtosis &      • Classic color histogram,
    f1      f2    …       fn                median &
                                                            • Scalable Color Descriptor,
                                        root mean square
                                                            • Color moments.



                                              [OpenCV toolbox, http://opencv.willowgarage.com]

MediaEval - Pisa, Italy, 4-5 October 2012                                                6/16
                                                                                            7
Video content description - visual
   feature descriptors
  (visual frame-based)
  • Histogram of oriented Gradients (HoG)
  ~ counts occurrences of gradient orientation
                                                                 feature points (e.g. Harris)
  in localized portions of an image (20º per bin)

  • Harris corner detector

  • Speeded Up Robust Feature (SURF)




                                                    image source http://www.ifp.illinois.edu/~yuhuang

                                               [OpenCV toolbox, http://opencv.willowgarage.com]

MediaEval - Pisa, Italy, 4-5 October 2012                                                       7/16
                                                                                                   8
Video content description - text
   TF-IDF descriptors
  (Term Frequency-Inverse Document Frequency)

  > text sources: ASR and metadata,

     1. remove XML markups,

     2. remove terms <5%-percentile of the frequency distribution,

     3. select term corpus: retaining for each genre class m terms (e.g. m =
     150 for ASR and 20 for metadata) with the highest χ2 values that
     occur more frequently than in complement classes,

     4. for each document we represent the TF-IDF values.



MediaEval - Pisa, Italy, 4-5 October 2012                                      8/16
                                                                                  9
Experimental results: devset (5,127 seq.)
  > classifiers from Weka (Bayes, lazy, functional, trees, etc.),
  > cross-validation (train 50% – test 50%),
  avg. Fscore (over all genres)




    - visual descriptors capabilities 30%±10%,
    - using more visual is not more accurate than using few,
    - best LBP+CCV+histogram (Fscore=41.2%).
                                            [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/]

MediaEval - Pisa, Italy, 4-5 October 2012                                                9/1610
Experimental results: devset (5,127 seq.)
  > cross-validation (train 50% – test 50%),


  avg. Fscore (over all genres)




     - audio still better than visual (improvement ~6%),

     - proposed block-based better than standard (by ~10%),

                                            [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/]

MediaEval - Pisa, Italy, 4-5 October 2012                                                10/16
                                                                                             11
Experimental results: devset (5,127 seq.)
  > cross-validation (train 50% – test 50%),


  avg. Fscore (over all genres)




     - ASR from LIMSI more representative than LIUM (~3%),

     - best performance ASR LIMSI + metadata (Fscore=68%).

                                            [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/]

MediaEval - Pisa, Italy, 4-5 October 2012                                                11/16
                                                                                             12
Experimental results: devset (5,127 seq.)
  > cross-validation (train 50% – test 50%),


  avg. Fscore (over all genres)




     - audio-visual close to text (ASR) for the automatic descriptors,

     - increasing the number of modalities increases the performance.

                                            [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/]

MediaEval - Pisa, Italy, 4-5 October 2012                                                12/16
                                                                                             13
Experimental results: official runs (9,550 seq.)
  > train on devset, test on testset (SVM linear),

 MediaEval                                                                MediaEval
   2011                                                                     2011
 MAP 12%                                                                  MAP 10.3%




     Run1              Run2                  Run3              Run4         Run5
  LBP+CCV+           TF-IDF on        audio block-based +      audio      TF-IDF on
  hist + audio       ASR LIMSI        LBP + CCV + hist +    block-based   metadata +
                                                                          metadata
  block-based                           TF-IDF on ASR                     ASR LIMSI
                                             LIMSI




MediaEval - Pisa, Italy, 4-5 October 2012                                        13/16
                                                                                     14
Experimental results: official runs (9,550 seq.)
  > genre MAP for Run 5: TF-IDF on ASR + metadata,
                  Run 1: visual + audio
  autos                             gaming   religion   environment
  52%                                71%      71%           50%




MediaEval - Pisa, Italy, 4-5 October 2012                             14/16
                                                                          15
Conclusions and future work
  > classification adapts to the corpus – changing the corpus will
  change the performance;
  > audio-visual descriptors are inherently limited;
  > how far can we go with ad-hoc classification without human
  intervention?

  > future work:
      more elaborated late-fusion ?
      pursue tests on the entire data set;
      perhaps more elaborated Bag-of-Visual-Words.

    Acknowledgement: we would like to thank Prof. Fausto Giunchiglia and
    Prof. Nicu Sebe from University of Trento for their support.

MediaEval - Pisa, Italy, 4-5 October 2012                                  15/16
                                                                               16
thank you !
                       any questions ?




MediaEval - Pisa, Italy, 4-5 October 2012   16/16
                                                17

More Related Content

Viewers also liked

GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012MediaEval2012
 
Brave New Task: User Account Matching
Brave New Task: User Account MatchingBrave New Task: User Account Matching
Brave New Task: User Account MatchingMediaEval2012
 
Como hacer una pagina web en wix sharon
Como hacer una pagina web en wix sharonComo hacer una pagina web en wix sharon
Como hacer una pagina web en wix sharonSharon Jimenez
 
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...MediaEval2012
 
Ghent and Cardiff University at the 2012 Placing Task
Ghent and Cardiff University at the 2012 Placing TaskGhent and Cardiff University at the 2012 Placing Task
Ghent and Cardiff University at the 2012 Placing TaskMediaEval2012
 
The L2F Spoken Web Search system for Mediaeval 2012
The L2F Spoken Web Search system for Mediaeval 2012The L2F Spoken Web Search system for Mediaeval 2012
The L2F Spoken Web Search system for Mediaeval 2012MediaEval2012
 
KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual CuesKIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual CuesMediaEval2012
 
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...MediaEval2012
 
Papiloma humano
Papiloma humanoPapiloma humano
Papiloma humano
alexitolindoo
 
Activities for journalistic skills
Activities for journalistic skillsActivities for journalistic skills
Activities for journalistic skillsJNavarro0321
 
How Spatial Segmentation improves the Multimodal Geo-Tagging
How Spatial Segmentation improves the Multimodal Geo-TaggingHow Spatial Segmentation improves the Multimodal Geo-Tagging
How Spatial Segmentation improves the Multimodal Geo-TaggingMediaEval2012
 
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskNII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskMediaEval2012
 
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search TaskThe TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search TaskMediaEval2012
 
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVMTUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVMMediaEval2012
 
Intro totransportphenomenanew
Intro totransportphenomenanewIntro totransportphenomenanew
Intro totransportphenomenanewilovepurin
 
2010 Marketing Plan
2010 Marketing Plan2010 Marketing Plan
2010 Marketing Plan
JPemberton15
 
6dicas– veda 4
6dicas– veda 46dicas– veda 4
6dicas– veda 4
souzadea1
 
14 10 21_презентация сту
14 10 21_презентация сту14 10 21_презентация сту
14 10 21_презентация сту
Stanislav Litvinenko
 

Viewers also liked (20)

10 ρ. δρακουλησ
10 ρ. δρακουλησ10 ρ. δρακουλησ
10 ρ. δρακουλησ
 
GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012
 
Brave New Task: User Account Matching
Brave New Task: User Account MatchingBrave New Task: User Account Matching
Brave New Task: User Account Matching
 
Como hacer una pagina web en wix sharon
Como hacer una pagina web en wix sharonComo hacer una pagina web en wix sharon
Como hacer una pagina web en wix sharon
 
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
 
Ghent and Cardiff University at the 2012 Placing Task
Ghent and Cardiff University at the 2012 Placing TaskGhent and Cardiff University at the 2012 Placing Task
Ghent and Cardiff University at the 2012 Placing Task
 
κειμενο
κειμενοκειμενο
κειμενο
 
The L2F Spoken Web Search system for Mediaeval 2012
The L2F Spoken Web Search system for Mediaeval 2012The L2F Spoken Web Search system for Mediaeval 2012
The L2F Spoken Web Search system for Mediaeval 2012
 
KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual CuesKIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
 
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
 
Papiloma humano
Papiloma humanoPapiloma humano
Papiloma humano
 
Activities for journalistic skills
Activities for journalistic skillsActivities for journalistic skills
Activities for journalistic skills
 
How Spatial Segmentation improves the Multimodal Geo-Tagging
How Spatial Segmentation improves the Multimodal Geo-TaggingHow Spatial Segmentation improves the Multimodal Geo-Tagging
How Spatial Segmentation improves the Multimodal Geo-Tagging
 
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskNII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
 
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search TaskThe TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
 
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVMTUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
 
Intro totransportphenomenanew
Intro totransportphenomenanewIntro totransportphenomenanew
Intro totransportphenomenanew
 
2010 Marketing Plan
2010 Marketing Plan2010 Marketing Plan
2010 Marketing Plan
 
6dicas– veda 4
6dicas– veda 46dicas– veda 4
6dicas– veda 4
 
14 10 21_презентация сту
14 10 21_презентация сту14 10 21_презентация сту
14 10 21_презентация сту
 

Similar to ARF @ MediaEval 2012: Multimodal Video Classification

Lec18 bag of_features
Lec18 bag of_featuresLec18 bag of_features
Lec18 bag of_featuresBo Li
 
Lecture 21 - Image Categorization - Computer Vision Spring2015
Lecture 21 - Image Categorization -  Computer Vision Spring2015Lecture 21 - Image Categorization -  Computer Vision Spring2015
Lecture 21 - Image Categorization - Computer Vision Spring2015Jia-Bin Huang
 
Color: from craft to computation
Color: from craft to computationColor: from craft to computation
Color: from craft to computation
Jan Morovic
 
Overview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging TaskOverview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging TaskMediaEval2012
 
Nema e newsletter
Nema e newsletterNema e newsletter
Nema e newsletter
Leigh Smead
 
Experimental Media Voodoo™
Experimental Media Voodoo™Experimental Media Voodoo™
Experimental Media Voodoo™
SkyRonDotOrg
 
Vdfp audio and video fingerprinting
Vdfp   audio and video fingerprintingVdfp   audio and video fingerprinting
Vdfp audio and video fingerprinting
WietskevdHeuvel
 
VAEs for multimodal disentanglement
VAEs for multimodal disentanglementVAEs for multimodal disentanglement
VAEs for multimodal disentanglement
Antonio Tejero de Pablos
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Zachary S. Brown
 
Fairfield High School Handout
Fairfield High School HandoutFairfield High School Handout
Fairfield High School Handout
KatherineHaratsis
 
Horst Goes Pop - Wieviel Musikempfehlung braucht der Mensch
Horst Goes Pop - Wieviel Musikempfehlung braucht der MenschHorst Goes Pop - Wieviel Musikempfehlung braucht der Mensch
Horst Goes Pop - Wieviel Musikempfehlung braucht der Mensch
Stephan Baumann
 
Open archive islandora-channel-training
Open archive islandora-channel-trainingOpen archive islandora-channel-training
Open archive islandora-channel-trainingscottmertz
 

Similar to ARF @ MediaEval 2012: Multimodal Video Classification (14)

Speech recognition (dr. m. sabarimalai manikandan)
Speech recognition (dr. m. sabarimalai manikandan)Speech recognition (dr. m. sabarimalai manikandan)
Speech recognition (dr. m. sabarimalai manikandan)
 
Lec18 bag of_features
Lec18 bag of_featuresLec18 bag of_features
Lec18 bag of_features
 
Lecture 21 - Image Categorization - Computer Vision Spring2015
Lecture 21 - Image Categorization -  Computer Vision Spring2015Lecture 21 - Image Categorization -  Computer Vision Spring2015
Lecture 21 - Image Categorization - Computer Vision Spring2015
 
Color: from craft to computation
Color: from craft to computationColor: from craft to computation
Color: from craft to computation
 
Overview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging TaskOverview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging Task
 
Nema e newsletter
Nema e newsletterNema e newsletter
Nema e newsletter
 
Experimental Media Voodoo™
Experimental Media Voodoo™Experimental Media Voodoo™
Experimental Media Voodoo™
 
Vdfp audio and video fingerprinting
Vdfp   audio and video fingerprintingVdfp   audio and video fingerprinting
Vdfp audio and video fingerprinting
 
VAEs for multimodal disentanglement
VAEs for multimodal disentanglementVAEs for multimodal disentanglement
VAEs for multimodal disentanglement
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
Fairfield High School Handout
Fairfield High School HandoutFairfield High School Handout
Fairfield High School Handout
 
Dmk audioviz
Dmk audiovizDmk audioviz
Dmk audioviz
 
Horst Goes Pop - Wieviel Musikempfehlung braucht der Mensch
Horst Goes Pop - Wieviel Musikempfehlung braucht der MenschHorst Goes Pop - Wieviel Musikempfehlung braucht der Mensch
Horst Goes Pop - Wieviel Musikempfehlung braucht der Mensch
 
Open archive islandora-channel-training
Open archive islandora-channel-trainingOpen archive islandora-channel-training
Open archive islandora-channel-training
 

More from MediaEval2012

MediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval2012
 
A Multimodal Approach for Video Geocoding
A Multimodal Approach for   Video Geocoding A Multimodal Approach for   Video Geocoding
A Multimodal Approach for Video Geocoding MediaEval2012
 
Brave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music TaggingBrave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music TaggingMediaEval2012
 
Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012MediaEval2012
 
CUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskCUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskMediaEval2012
 
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking TaskDCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking TaskMediaEval2012
 
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...MediaEval2012
 
The CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsThe CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsMediaEval2012
 
Overview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskOverview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskMediaEval2012
 
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...MediaEval2012
 
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...MediaEval2012
 
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...MediaEval2012
 
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...MediaEval2012
 
The MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioThe MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioMediaEval2012
 
LIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic methodLIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic methodMediaEval2012
 
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...MediaEval2012
 
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging TaskUNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging TaskMediaEval2012
 
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012MediaEval2012
 

More from MediaEval2012 (20)

MediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval 2012 Opening
MediaEval 2012 Opening
 
Closing
ClosingClosing
Closing
 
A Multimodal Approach for Video Geocoding
A Multimodal Approach for   Video Geocoding A Multimodal Approach for   Video Geocoding
A Multimodal Approach for Video Geocoding
 
Brave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music TaggingBrave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music Tagging
 
Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012
 
CUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskCUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking Task
 
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking TaskDCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
 
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
 
The CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsThe CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and Onwards
 
Overview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskOverview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy Task
 
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
 
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
 
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
 
mevd2012 esra_
 mevd2012 esra_ mevd2012 esra_
mevd2012 esra_
 
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
 
The MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioThe MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes Detectio
 
LIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic methodLIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic method
 
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
 
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging TaskUNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
 
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
 

Recently uploaded

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 

Recently uploaded (20)

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 

ARF @ MediaEval 2012: Multimodal Video Classification

  • 1. ~ Multimodal Video Classification ~ ARF (Austria-Romania-France) team Bogdan IONESCU*1,3 Ionuț MIRONICĂ1 Klaus SEYERLEHNER2 bionescu@imag.pub.ro imironica@imag.pub.ro music@cp.jku.at Peter KNEES2 Jan SCHLÜTER4 Markus SCHEDL2 peter.knees@jku.at jan.schlueter@ofai.at markus.schedl@jku.at Horia CUCU1 Andi BUZO1 Patrick LAMBERT3 horia.cucu@upb.ro andi.buzo@upb.ro patrick.lambert@univ-savoie.fr *this work was partially supported under European Structural Funds EXCEL POSDRU/89/1.5/S/62557. 1 2 3 4 University Austrian Research POLITEHNICA Institute for Artificial of Bucharest Intelligence
  • 2. Presentation outline • The approach • Video content description • Experimental results • Conclusions and future work MediaEval - Pisa, Italy, 4-5 October 2012 1/16 2
  • 3. The approach > challenge: find a way to assign (genre) tags to unknown videos; > approach: machine learning paradigm; … web food autos label data train unlabeled data classifier labeled data tagged video database video database MediaEval - Pisa, Italy, 4-5 October 2012 2/163
  • 4. The approach: classification > the entire process relies on the concept of “similarity” computed between content annotations (numeric features), > this year focus is on: objective 1: go multimodal (truly) visual audio text objective 2: test a broad range of classifiers and descriptor combinations; MediaEval - Pisa, Italy, 4-5 October 2012 3/164
  • 5. Video content description - audio  block-level audio features • Spectral Pattern, (capture also local temporal information) ~ soundtrack’s timbre; • delta Spectral Pattern, e.g. 50% overlapping ~ strength of onsets; • variance delta Spectral Pattern, average ~ variation of the onset strength; median • Logarithmic Fluctuation Pattern, variance ~ rhythmic aspects; ... • Correlation Pattern, ~ loudness changes; • Spectral Contrast Pattern, ~ ”toneness”; • Local Single Gaussian model, [Klaus Seyerlehner et al., MIREX’11, USA] ~ timbral; • George Tzanetakis model, ~ timbral; MediaEval - Pisa, Italy, 4-5 October 2012 4/16 5
  • 6. Video content description - audio  standard audio features (audio frame-based) • Zero-Crossing Rate, • Linear Predictive Coefficients, time • Line Spectral Pairs, • Mel-Frequency Cepstral Coefficients, global feature • spectral centroid, flux, rolloff, and f1 f2 … fn = kurtosis, + mean & + variance of each feature over var{f2} var{fn} variance a certain window. [B. Mathieu et al., Yaafe toolbox, ISMIR’10, Netherlands] MediaEval - Pisa, Italy, 4-5 October 2012 5/16 6
  • 7. Video content description - visual  MPEG-7 & color/texture descriptors (visual frame-based) • Local Binary Pattern, global • Autocorrelogram, feature • Color Coherence Vector, = mean & • Color Layout Pattern, dispersion & • Edge Histogram, skewness & time kurtosis & • Classic color histogram, f1 f2 … fn median & • Scalable Color Descriptor, root mean square • Color moments. [OpenCV toolbox, http://opencv.willowgarage.com] MediaEval - Pisa, Italy, 4-5 October 2012 6/16 7
  • 8. Video content description - visual  feature descriptors (visual frame-based) • Histogram of oriented Gradients (HoG) ~ counts occurrences of gradient orientation feature points (e.g. Harris) in localized portions of an image (20º per bin) • Harris corner detector • Speeded Up Robust Feature (SURF) image source http://www.ifp.illinois.edu/~yuhuang [OpenCV toolbox, http://opencv.willowgarage.com] MediaEval - Pisa, Italy, 4-5 October 2012 7/16 8
  • 9. Video content description - text  TF-IDF descriptors (Term Frequency-Inverse Document Frequency) > text sources: ASR and metadata, 1. remove XML markups, 2. remove terms <5%-percentile of the frequency distribution, 3. select term corpus: retaining for each genre class m terms (e.g. m = 150 for ASR and 20 for metadata) with the highest χ2 values that occur more frequently than in complement classes, 4. for each document we represent the TF-IDF values. MediaEval - Pisa, Italy, 4-5 October 2012 8/16 9
  • 10. Experimental results: devset (5,127 seq.) > classifiers from Weka (Bayes, lazy, functional, trees, etc.), > cross-validation (train 50% – test 50%), avg. Fscore (over all genres) - visual descriptors capabilities 30%±10%, - using more visual is not more accurate than using few, - best LBP+CCV+histogram (Fscore=41.2%). [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/] MediaEval - Pisa, Italy, 4-5 October 2012 9/1610
  • 11. Experimental results: devset (5,127 seq.) > cross-validation (train 50% – test 50%), avg. Fscore (over all genres) - audio still better than visual (improvement ~6%), - proposed block-based better than standard (by ~10%), [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/] MediaEval - Pisa, Italy, 4-5 October 2012 10/16 11
  • 12. Experimental results: devset (5,127 seq.) > cross-validation (train 50% – test 50%), avg. Fscore (over all genres) - ASR from LIMSI more representative than LIUM (~3%), - best performance ASR LIMSI + metadata (Fscore=68%). [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/] MediaEval - Pisa, Italy, 4-5 October 2012 11/16 12
  • 13. Experimental results: devset (5,127 seq.) > cross-validation (train 50% – test 50%), avg. Fscore (over all genres) - audio-visual close to text (ASR) for the automatic descriptors, - increasing the number of modalities increases the performance. [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/] MediaEval - Pisa, Italy, 4-5 October 2012 12/16 13
  • 14. Experimental results: official runs (9,550 seq.) > train on devset, test on testset (SVM linear), MediaEval MediaEval 2011 2011 MAP 12% MAP 10.3% Run1 Run2 Run3 Run4 Run5 LBP+CCV+ TF-IDF on audio block-based + audio TF-IDF on hist + audio ASR LIMSI LBP + CCV + hist + block-based metadata + metadata block-based TF-IDF on ASR ASR LIMSI LIMSI MediaEval - Pisa, Italy, 4-5 October 2012 13/16 14
  • 15. Experimental results: official runs (9,550 seq.) > genre MAP for Run 5: TF-IDF on ASR + metadata, Run 1: visual + audio autos gaming religion environment 52% 71% 71% 50% MediaEval - Pisa, Italy, 4-5 October 2012 14/16 15
  • 16. Conclusions and future work > classification adapts to the corpus – changing the corpus will change the performance; > audio-visual descriptors are inherently limited; > how far can we go with ad-hoc classification without human intervention? > future work:  more elaborated late-fusion ?  pursue tests on the entire data set;  perhaps more elaborated Bag-of-Visual-Words. Acknowledgement: we would like to thank Prof. Fausto Giunchiglia and Prof. Nicu Sebe from University of Trento for their support. MediaEval - Pisa, Italy, 4-5 October 2012 15/16 16
  • 17. thank you ! any questions ? MediaEval - Pisa, Italy, 4-5 October 2012 16/16 17