KIT at MediaEval 2012 - Content-based Genre
Classification with Visual Cues
Tomas Semela
Makarand Tapaswi

MediaEval 2012 Workshop
Institute for Anthropomatics




KIT – University of the State of Baden-Wuerttemberg and
National Research Center of the Helmholtz Association     www.kit.edu
Motivation


      Rapid growth of digital
      multimedia data in the
      broadcast and web video
      domain



      Need for efficient
      automated indexing and
      content search



2           KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues   Institute for Anthropomatics
            MediaEval 2012 Workshop
Challenges



    Broadcast TV domain                                             Web video domain

      Channel archives                                                    Video portals like YouTube
      Digital distribution                                                (User content)
      Web offerings                                                       Arrangement in categories:
                                                                                   Resemblence to topics
                                                                                   (Autos – Animals – Travel)
      Arrangement in genres:                                                       Variation in production
         Highly characteristic                                                     values and style
         Low variance                                                              Not limited to single genre
         Clear boundaries                                                          characterstics


3            KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues            Institute for Anthropomatics
             MediaEval 2012 Workshop
Related work



      System from University of Torino, Italy
              Extract video features from aural, visual, cognitive and structural
              cues
              Neural network for classification



                    M. Montagnuolo, A. Messina, ”Parallel Neural
                    Networks for Multimodal Video Genre Classification”,
                    Multimedia Tools and Appl., 41(1):125–159, 2009




4    05.10.2012   KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues   Institute for Anthropomatics
                  MediaEval 2012 Workshop
KIT System




      Visual feature extraction from keyframes
      SVM classification system
      Fusion of results with majority voting

5    05.10.2012   KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues   Institute for Anthropomatics
                  MediaEval 2012 Workshop
Low-level visual features



      Color                                                                    Texture
              Color moments                                                             Wavelet texture
              HSV histogram                                                             Edge histogram
              Color auto correlogram                                                    Co-occurrence texture

                                 Global features for each video

            H. K. Ekenel, T. Semela, and R. Stiefelhagen, “Content-based video
            genre classification using multiple cues”, AIEMPro'10, pages 21-26,
            2010.




6    05.10.2012   KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues         Institute for Anthropomatics
                  MediaEval 2012 Workshop
SIFT – For each keyframe

      Interest point detection
          Dense sampling
      Spatial-pyramid
          1x1 – 2x2 – 1x3
      SIFT descriptors
          SIFT
          rgbSIFT
          opponentSIFT
      Bag-of-visual-words
          Codebook (500-dim.)
          Codeword histograms
    K. E. A. van de Sande, T. Gevers, and C. G. M. Snoek,
    “Empowering Visual Categorization with the GPU”, IEEE
    Transactions on Multimedia, 13(1):60-70, 2011.
7              KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues   Institute for Anthropomatics
               MediaEval 2012 Workshop
Classification

      Training of one support vector machine (SVM) for each
      genre and each feature
              Binary classification (one vs. all)
              RBF kernel
              Cross-validation
              Fusion in decision level
              Majority voting (probability output)
              SIFT: keyframes classified individually, output averaged over video




8    05.10.2012   KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues   Institute for Anthropomatics
                  MediaEval 2012 Workshop
Domain Knowledge


      Video distribution in the development set:
              Autos 8 videos
              Technology ~ 500 videos


      Use this information in the final prediction of the category
      as a likelihood of the distribution on blip.tv:
         1. SVM scores for each video normalized to unit sum
         2. Divide these probabilities by the square root of the number of
            videos in the development set for each category to include the
            a-priori knowledge of the class distribution
         3. Finally, step one is repeated to obtain unit sum


9    05.10.2012   KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues   Institute for Anthropomatics
                  MediaEval 2012 Workshop
Evaluation


       Blip.tv data with ~ 9550 clips
       Two configurations with/without prior domain knowledge


            No prior                               run1                                 run2             run3

                                                 Visual                                 SIFT         Visual + SIFT

                   MAP                           0.3008                               0.2329            0.3499

                   Prior                           run4                                 run5             run6

                                                 Visual                                 SIFT         Visual + SIFT

                   MAP                           0.3461                               0.1448            0.3581


10    05.10.2012       KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues         Institute for Anthropomatics
                       MediaEval 2012 Workshop
Evaluation – Run 6




11    05.10.2012   KIT at MediaEval 2011 – Content-based genre classification on web-videos   Institute for Anthropomatics
                   MediaEval 2011 Workshop
Evaluation




                                                           Run6 (MAP):
                   Top 4 categories:                                                        Worst 4 categories:
               autos and vehicles (0.812)                                                citizen journalism (0.158)
                     health (0.668)                                                          documentary (0.119)
             movies and television (0.602)                                                  videoblogging (0.100)
                    religion (0.578)                                                              travel (0.010)




12    05.10.2012    KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues                    Institute for Anthropomatics
                    MediaEval 2012 Workshop
Conclusions & Future Work

      Conclusions
         Visual-based classification shows limitations for category tagging
         Few categories with satisfactory results
         SIFT: only slight improvement of overall results
         Prior domain knowledge improves results overall
      Future Work
         Temporal features
         Mid-level semantics
            Action Detection, Audio segmentation
         ASR & Metadata integration
         Individual classification approach & features for each genre

13           KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues   Institute for Anthropomatics
             MediaEval 2012 Workshop
Thank you


14   05.10.2012   KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues   Institute for Anthropomatics
                  MediaEval 2012 Workshop
15   05.10.2012   KIT at MediaEval 2011 – Content-based genre classification on web-videos   Institute for Anthropomatics
                  MediaEval 2011 Workshop

KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues

  • 1.
    KIT at MediaEval2012 - Content-based Genre Classification with Visual Cues Tomas Semela Makarand Tapaswi MediaEval 2012 Workshop Institute for Anthropomatics KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu
  • 2.
    Motivation Rapid growth of digital multimedia data in the broadcast and web video domain Need for efficient automated indexing and content search 2 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 3.
    Challenges Broadcast TV domain Web video domain Channel archives Video portals like YouTube Digital distribution (User content) Web offerings Arrangement in categories: Resemblence to topics (Autos – Animals – Travel) Arrangement in genres: Variation in production Highly characteristic values and style Low variance Not limited to single genre Clear boundaries characterstics 3 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 4.
    Related work System from University of Torino, Italy Extract video features from aural, visual, cognitive and structural cues Neural network for classification M. Montagnuolo, A. Messina, ”Parallel Neural Networks for Multimodal Video Genre Classification”, Multimedia Tools and Appl., 41(1):125–159, 2009 4 05.10.2012 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 5.
    KIT System Visual feature extraction from keyframes SVM classification system Fusion of results with majority voting 5 05.10.2012 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 6.
    Low-level visual features Color Texture Color moments Wavelet texture HSV histogram Edge histogram Color auto correlogram Co-occurrence texture Global features for each video H. K. Ekenel, T. Semela, and R. Stiefelhagen, “Content-based video genre classification using multiple cues”, AIEMPro'10, pages 21-26, 2010. 6 05.10.2012 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 7.
    SIFT – Foreach keyframe Interest point detection Dense sampling Spatial-pyramid 1x1 – 2x2 – 1x3 SIFT descriptors SIFT rgbSIFT opponentSIFT Bag-of-visual-words Codebook (500-dim.) Codeword histograms K. E. A. van de Sande, T. Gevers, and C. G. M. Snoek, “Empowering Visual Categorization with the GPU”, IEEE Transactions on Multimedia, 13(1):60-70, 2011. 7 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 8.
    Classification Training of one support vector machine (SVM) for each genre and each feature Binary classification (one vs. all) RBF kernel Cross-validation Fusion in decision level Majority voting (probability output) SIFT: keyframes classified individually, output averaged over video 8 05.10.2012 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 9.
    Domain Knowledge Video distribution in the development set: Autos 8 videos Technology ~ 500 videos Use this information in the final prediction of the category as a likelihood of the distribution on blip.tv: 1. SVM scores for each video normalized to unit sum 2. Divide these probabilities by the square root of the number of videos in the development set for each category to include the a-priori knowledge of the class distribution 3. Finally, step one is repeated to obtain unit sum 9 05.10.2012 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 10.
    Evaluation Blip.tv data with ~ 9550 clips Two configurations with/without prior domain knowledge No prior run1 run2 run3 Visual SIFT Visual + SIFT MAP 0.3008 0.2329 0.3499 Prior run4 run5 run6 Visual SIFT Visual + SIFT MAP 0.3461 0.1448 0.3581 10 05.10.2012 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 11.
    Evaluation – Run6 11 05.10.2012 KIT at MediaEval 2011 – Content-based genre classification on web-videos Institute for Anthropomatics MediaEval 2011 Workshop
  • 12.
    Evaluation Run6 (MAP): Top 4 categories: Worst 4 categories: autos and vehicles (0.812) citizen journalism (0.158) health (0.668) documentary (0.119) movies and television (0.602) videoblogging (0.100) religion (0.578) travel (0.010) 12 05.10.2012 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 13.
    Conclusions & FutureWork Conclusions Visual-based classification shows limitations for category tagging Few categories with satisfactory results SIFT: only slight improvement of overall results Prior domain knowledge improves results overall Future Work Temporal features Mid-level semantics Action Detection, Audio segmentation ASR & Metadata integration Individual classification approach & features for each genre 13 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 14.
    Thank you 14 05.10.2012 KIT at MediaEval 2012 – Content-based Genre Classification with Visual Cues Institute for Anthropomatics MediaEval 2012 Workshop
  • 15.
    15 05.10.2012 KIT at MediaEval 2011 – Content-based genre classification on web-videos Institute for Anthropomatics MediaEval 2011 Workshop