Advances in Semantic Analysis
       of Multimedia




Dr. Gerald Friedland
International Computer Science Institute
Berkeley, CA
friedland@icsi.berkeley.edu
The Internet Today




                     2
Internet Use Today




Raphaël Troncy: Linked Media: Weaving non-textual content into the Semantic Web, MozCamp, 03/2009.
                                                                                                3
Types of Videos




                  4
Addressable Market for
                          Enterprise Video Applications




          Security               Asset Tracking        QA/Operational Efficiency         Intelligent
        $1.2 Billion             $480m by 2010                 $700m                     Marketing
(Total Market $7.8B, 2005)
                              $4.0 Billion Commercially
                               (RFID in 2006 2.4B)        (source: Envysion,
   (Source: JP Freeman) (Total Asset protection $14.7B) Arrowsight, corporate
                                                                                            $200m
                                                                                   (source: T3CI corporate
($7B in 06. Source Lehman)(Source: Lehman report 2006)        analysis)                   analysis)




                                       BI                       Training                Government
       Compliance                    $400m
         $450m                                                   $600m
                          (Reporting and Analysis 4B)
   (source: JP Freeman)                                    (source: Forrester      (Intelligence, Defense,
                            (Total BI market $13.3B)
                                                          Enterprise Software        Homeland Security) 5
                          (source: IDC BI tools 03-08)
                                                              report 2005)
Multimedia Capabilities:
       1985


• Record
• Store
• Play
• Random Seek
• Annotate Manually


                                  6
Multimedia Capabilities:
       2009

• Record
• Store
• Stream
• Play
• Random Seek
• Annotate Manually

                                  7
Multimedia Capabilities:
      Wanted
       • Semantic Navigation
       • Search
       • Content Compare
       • Object Cut & Paste
       • Annotate Automatically
       • Infer over Content

=> Make multimedia “understandable”
for computers.
                                      8
Problems


•Multimedia data very dense manual
 annotation not feasable
•Multimedia content analysis is
 difficult and rarely good enough to
 create reliable products.


                                       9
My Research...
         Network                     Knowledge

     Semantic Web



         Context                    Understanding

  Semantic Computing



   Machine Learning                  Recognition

   Artificial Intelligence



         Filtering                    Features

  Signal/Text Processing



           Images           Audio     Video         Text
My Research...


Hypotheses:
• Multimedia content analysis works
  better when every cue is taken into
  account (eg. video AND audio).
• Semantic is enabled through
  context. Converts AI research into
  products.
Context
Sources of Context:
• Inclusion of prior knowledge
• Combination of algorithms
• Multimodality:
  – audio+video+...
  – extra hardware
• Human interaction
• ...

                                 12
Context as Key:
 Example 1



      →   Cut          Horse    →
          Paste   ^V   Meadow




Visual Object Extraction

                                    13
Simple Interactive
        Object Extraction (SIOX)


           →                   →




Image          User Input           Output


  Context delivered by human interaction
                                             14
SIOX: Algorithm Idea
                   Color Signatures from image retrieval:




Y. Rubner, C. Tomasi, and L. J. Guibas: The Earth Mover’s Distance as a Metric for Image
Retrieval. Int. Journal of Computer Vision, 40(2):99–121, 2000.


Idea: Instead of searching and image database, use Color
          Signatures to search inside an image.



                                                                                           15
SIOX in GIMP
             SIOX
            Button




G. Friedland, K. Jantz, T. Lenz, F. Wiesel, R. Rojas: “Object Cut and Paste in
Images and Videos”, International Journal of Semantic Computing Vol 1,
             No 2, pp. 221-247, World Scientific, USA, June 2007.            16
SIOX in Inkscape




                   17
SIOX in Blender




                  18
Extensions
Extracting multiple similar
objects at once:




          →




                              19
Sub-Pixel Refinement
      Problem: Spill colors and foreground
      disappearance



           →



Original          SIOX         GraphCut


           →



                                          20
Sub-Pixel Refinement
Detail Refinement Brush:
Coarse Interaction



                    →




                    →




                          21
VideoSIOX

1st Frame:




Subsequent
Frames:


             22
More Information



 http://www.siox.org




                       23
Shoesurfer




             24
Shoesurfer




             25
Shoesurfer




             26
Shoesurfer




             27
Shoesurfer




             28
Context as Key:
Example 2




                  29
Speaker Diarization: Who
            Spoke When?
            Audiotrack:


             Segmentation:




             Clustering:



G. Friedland, O. Vinyals, Y. Huang, C. Müller: “Prosodic and other Long-Term
Features for Speaker Diarization”, IEEE Transactions on Audio, Speech, and
Language Processing, Vol 17, No 5, pp 985--993, July 2009.
                                                                       30
Analyzing Meetings




                     31
Dominance Estimation
I Know You...



http://www.icsi.berkeley.edu/
~fractor/ioda_demo.avi




                                33
Narrative Theme Navigation




G. Friedland, L. Gottlieb, A. Janin: “Joke-o-mat: Browsing Sitcoms Punchline by
Punchline”, Proceedings of ACM Multimedia, Beijing, China, October 2009.
                                                                          34
Joke-O-Mat: Demo




http://www.youtube.com/watch?v=1qfa84Ulm5s




                                         35
Connecting Multimedia
and Semantic Technologies
   GStreamer

     Appscio
                   User
       Device   Component 1
       Driver
                   User
                Component 2
       Source                 Recorder
                    .
                    .
                    .
        File       User
                Component n




                                         36
Semantic Media
Framework
   Pipeline Framework
                                    Integrated
      C/C++/Java                   Development
       Interface                   Environment

                        Events             Code
     Custom Event
       Source 1
                             Video Application Server
                                  Web Technology
     Custom Event
                                    Interface
       Source 2
          .                  Scripting & Logic Engine
          .
          .
     Custom Event                Services Connector
       Source n


  http://www.appscio.com
                                                        37
Semantic Analysis of
Multimedia Data
• enables automatic logical
  inference on perceptually
  encoded data
• enables more “natural”
  interaction with the computer:
  “do what the user means”
• Interfaces nicely with Semantic
  Web technologies

                                    38
A note...




            James A. Hendler


                          39
MySTT



 Open-Source, open-model,
 state-of-the-art speech
 recognizer for multiparty
 conversations.

 Release Date: February 2010
                               40
4th IEEE International
  Conference on Semantic
  Computing 2010




Paper Deadline: May 3rd, 2010
                                41
Upcoming...




              42
Thank You!
Questions?
Contact:
Dr. Gerald Friedland
International Computer Science Institute
Berkeley, CA
http://www.gerald-friedland.org
friedland@icsi.berkeley.edu                43

Semantics And Multimedia