Automatic Metadata Extraction

                                Marco Bertini
                         Università di Firenz...
The problem

                  The massive increase in digital audio-visual information
                  poses high deman...
Two solutions




              www.vidivideo.info     www.im3i.eu



giovedì 1 luglio 2010
VidiVideo: project overview
                        The VidiVideo project addressed the
                        challenge ...
VidiVideo: project results
            The automatic annotation part of the system performs audio
            and video se...
Call Identifier FP7-SME-2010-1
    Submitted 03 December 2009



                    VidiVideo: project partners
    Name ...
IM3I: project overview
                   IM3I aims to provide the creative media sector with new
                   ways ...
IM3I: project results
              Developed a set of tools for automatic audio-visual
              annotation and searc...
IM3I: project partners




giovedì 1 luglio 2010
The IM3I backend




giovedì 1 luglio 2010
Visual annotation
                   •    Split a video detecting shots and large content changes
                        ...
Baseline: typical BoW

                                              Hierarch.
                                           ...
Fusion schemes




    •       Early fusion: integrates unimodal features before learning concepts.

    •       Late fusi...
Fusion schemes




    •       Early fusion: integrates unimodal features before learning concepts.

    •       Late fusi...
Early fusion approach


                                                                                    Hierarch.
    ...
Late fusion approach
                                                                                                     ...
Test: baseline
                                                                      Time       Avg.       Max
           ...
Test: early fusion
                              Sampling                                               Avg.          Max
...
Test: late fusion
                        Method 1        Method 2                                Accuracy




           ...
Conclusions
          •        Early fusion strategies:
                 •   ~ baseline accuracy
                 •   fast...
The users



giovedì 1 luglio 2010
Video search engine
                 Our goal is to provide a search engine for videos
                 for both technical...
Sirio and Orione
                                     •   Design goals/assumptions:

                                     ...
Sirio and Orione




giovedì 1 luglio 2010
Sirio and Orione




giovedì 1 luglio 2010
Sirio and Orione




giovedì 1 luglio 2010
Sirio and Orione




giovedì 1 luglio 2010
Sirio and Orione




giovedì 1 luglio 2010
Sirio and Orione




giovedì 1 luglio 2010
Sirio and Orione




giovedì 1 luglio 2010
Sirio and Orione




giovedì 1 luglio 2010
Andromeda
                                                             •   System interface query options:
        •      ...
Andromeda




giovedì 1 luglio 2010
Andromeda




giovedì 1 luglio 2010
Andromeda




giovedì 1 luglio 2010
Andromeda




giovedì 1 luglio 2010
Andromeda




giovedì 1 luglio 2010
Andromeda




giovedì 1 luglio 2010
Pan
                 •      Design goals/assumptions:

                        •   complete/correct automatic
            ...
Pan




                              !
giovedì 1 luglio 2010
Pan




                              !
giovedì 1 luglio 2010
Pan




                              !
giovedì 1 luglio 2010
Pan




                              !
giovedì 1 luglio 2010
Pan




giovedì 1 luglio 2010
Pan




giovedì 1 luglio 2010
Daphnis
         •       Design goals/assumptions:

               •        build on image tagging made popular     •   Sy...
Daphnis




                                  !

giovedì 1 luglio 2010
Daphnis




giovedì 1 luglio 2010
Daphnis




                                  !

giovedì 1 luglio 2010
Daphnis




giovedì 1 luglio 2010
IM3I: authoring platform
             A CMS approach to repository
            analysis, authoring and publication



giov...
IM3I: authoring platform
                        Authoring IM3I end-user functionality typically covers 5
                ...
Editing workflow demo
                        •Step 1: Importing a video-repository
                        •Step 2: Enhanc...
I: Importing a repository

                 •Importing an existing repository to an internal and
                 flexible ...
I: Importing a repository

                                           Mapping the
                                        ...
II: Enhancing the Datamodel
                        •Datamodels contain the descriptions of your
                        r...
II: Enhancing the Datamodel




                        Adding a ‘translation’ element to the datamodel
giovedì 1 luglio 2...
II: Enhancing the Datamodel




                        Adding a ‘translation’ element to the datamodel
giovedì 1 luglio 2...
III: Layout and Functionality
                        Easy manipulation of layout to a repository by:

                   ...
III: Layout and Functionality




                        Defining a layout table
giovedì 1 luglio 2010
III: Layout and Functionality




                        Dragging repository contents to layout
giovedì 1 luglio 2010
III: Layout and Functionality




                        Previewing layout
giovedì 1 luglio 2010
IV: Embedding in website

                        Easy blend- in of layouts in corporate websites

                       ...
IV: Embedding in website



                Original
                contents                     Added
                  ...
The super users



giovedì 1 luglio 2010
Atlante - process manager
                                                  •   Main functions of this
      •       Web a...
Atlante




                                  !

giovedì 1 luglio 2010
Atlante




                                  !

giovedì 1 luglio 2010
Atlante




                                  !

giovedì 1 luglio 2010
Gaia - media manager

                   •    Web application that will be used for a technical
                        ad...
Gaia




                               !
giovedì 1 luglio 2010
Gaia




                               !


giovedì 1 luglio 2010
One more thing...



giovedì 1 luglio 2010
giovedì 1 luglio 2010
giovedì 1 luglio 2010
ACM MM 2010 Workshop
         3rd International Workshop on Automated Information Extraction in Media Production
         ...
Upcoming SlideShare
Loading in …5
×

Vidivideo and IM3I

1,551 views

Published on

Presentation held by Marco Bertini at the first EUscreen Open Workshop in Mykonos, Greece, on June 23 and 24, 2010 on the Videivideo and IM3I projects

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Vidivideo and IM3I

  1. 1. Automatic Metadata Extraction Marco Bertini Università di Firenze - MICC www.micc.unifi.it giovedì 1 luglio 2010
  2. 2. The problem The massive increase in digital audio-visual information poses high demands on advanced storage and search engines for consumers and professional archives. Video is now a natural form of communication for the Internet and mobile devices. Video search engines are the product of progress in many technologies: visual and audio analysis, machine learning techniques, as well as visualization and interaction. giovedì 1 luglio 2010
  3. 3. Two solutions www.vidivideo.info www.im3i.eu giovedì 1 luglio 2010
  4. 4. VidiVideo: project overview The VidiVideo project addressed the challenge of creating a substantially enhanced semantic access to video, implemented in a search engine. The outcome of the project is an audio-visual search engine, composed of two parts: a automatic annotation part, that runs off-line, where detectors for more than 1000 semantic concepts are collected in a thesaurus to process and automatically annotate the video and an interactive part that provides a video search engine for both technical and non-technical users. giovedì 1 luglio 2010
  5. 5. VidiVideo: project results The automatic annotation part of the system performs audio and video segmentation, speech recognition, speaker clustering and semantic concept detection. The VidiVideo system has achieved the highest performance in the most important object and concept recognition international contests (PASCAL VOC and TRECVID). The interactive part provides a desktop-based and a web-based search engines. The system permits different query modalities (free text, natural language, graphical composition of concepts using boolean and temporal relations and query by visual example) and visualizations for video retrieval and browsing. giovedì 1 luglio 2010
  6. 6. Call Identifier FP7-SME-2010-1 Submitted 03 December 2009 VidiVideo: project partners Name of the co-ordinating person Dr.-Ing. Georgios Ioannidis E-Mail gi@in-two.com Fax +49-179-33-2286677 No. Participant Name Type Short Name Country 1 IN2 search interfaces development Ltd SME IN2 UK 2 spring techno GmbH SME SPRING DE 3 VISup Srl SME VISUP IT 4 Hogeschool voor de Kunsten Utrecht RTDP HKU NL 5 University Firenze RTDP UNIFI IT 6 Instituto de Engenharia de Sistemas e RTDP INESC-ID PT Computadores giovedì 1 luglio 2010
  7. 7. IM3I: project overview IM3I aims to provide the creative media sector with new ways of searching, summarising and visualising large multimedia archives. IM3I will provide a service-oriented architecture that allow multiple viewpoints upon multimedia data that are available in a repository, and provide better ways to interact and share rich media. This paves the way for a multimedia information management platform which is more flexible, adaptable and customisable than current repository software. This in turn enables new opportunities for content owners to exploit their digital assets. giovedì 1 luglio 2010
  8. 8. IM3I: project results Developed a set of tools for automatic audio-visual annotation and search Developed a set of web services to manage, create and orchestrate the indexing services Developed a set of specialized search and management interfaces IM3I authoring platform: allows professional users to import and publish repositories of digital media, authoring of web-based environments for the end-users, creation of elaborate workflow patterns and search & retrieval interfaces to allow a diversity of end-user interactions and scenarios giovedì 1 luglio 2010
  9. 9. IM3I: project partners giovedì 1 luglio 2010
  10. 10. The IM3I backend giovedì 1 luglio 2010
  11. 11. Visual annotation • Split a video detecting shots and large content changes with very fast algorithm • Use different annotation strategies and types of detectors: • low level (color, B/W, motion) • Haar-based boosted classifiers • HOG + SVMs • Bag-of-words • k-NN + voting (for tag suggestion) • simple MPEG-7 XML format (full and fragment) giovedì 1 luglio 2010
  12. 12. Baseline: typical BoW Hierarch. clustering Feature extract. visual words histo Learning giovedì 1 luglio 2010
  13. 13. Fusion schemes • Early fusion: integrates unimodal features before learning concepts. • Late fusion: first reduces unim. feat. to separately learned concepts scores, then these scores are integrated to learn concepts. giovedì 1 luglio 2010
  14. 14. Fusion schemes • Early fusion: integrates unimodal features before learning concepts. • Late fusion: first reduces unim. feat. to separately learned concepts scores, then these scores are integrated to learn concepts. giovedì 1 luglio 2010
  15. 15. Early fusion approach Hierarch. clustering • Hypothesis: MSER isolate semantically relevant information. • Idea: represent points that have some spatial relation with regions that are inside, outside, just on the border • Sampling: SIFT-SURF, dense. giovedì 1 luglio 2010
  16. 16. Late fusion approach Hierarch. clustering Hierarch. clustering !"# !1 !2 !"###$%#&'%(!")#*%+,$-#&'-(!")#*%+......$%#&'%(!")#*/+,$-#&'-(!")#*/+# • Use SURF/SIFT + MSER • Use geometric descriptors for MSERs giovedì 1 luglio 2010
  17. 17. Test: baseline Time Avg. Max Method Sampling # points Time accuracy accuracy • Best: SURF 64 Grid 10 (accuracy, computational cost) • SURF 64 Grid 5: +7-8% accuracy, +300% time • the number of points influences accuracy giovedì 1 luglio 2010
  18. 18. Test: early fusion Sampling Avg. Max Method # points Time Time accuracy accuracy • Best: EF SURF 64 Grid 10 (accuracy, computational cost) • EF SURF 64 Borders: many points, accuracy ~ that of Grid 10 but higher computational costs • EF SURF 64 Grid 10 is worst than SURF 64 Grid 10, but much faster (50% of execution time) giovedì 1 luglio 2010
  19. 19. Test: late fusion Method 1 Method 2 Accuracy • weighting 0.6 (best method) and 0.4 (worst method) lead to good results • best performance: dense sampling + sparse sampling • best combination: SURF 64 + EF SURF 64 Grid 10 (improved accuracy, modest computational cost increase) giovedì 1 luglio 2010
  20. 20. Conclusions • Early fusion strategies: • ~ baseline accuracy • faster • Late fusion strategies: • better accuracy than baseline • each method corrects some errors made by the other • fuse keypoints/regions (SURF, fusion of SURF and MSER) • IM3I users will be able to chose what’s best for them giovedì 1 luglio 2010
  21. 21. The users giovedì 1 luglio 2010
  22. 22. Video search engine Our goal is to provide a search engine for videos for both technical and non-technical users. Provide different interfaces that permit different query modalities: free-text, natural language, graphical composition of concepts using boolean and temporal relations and query by visual example. In addition, exploit ontologies and their structure to encode semantic relations between concepts permitting, for example, to expand queries to synonyms and concept specializations. giovedì 1 luglio 2010
  23. 23. Sirio and Orione • Design goals/assumptions: • semantic content-based retrieval • efficient web-based interface • System features: • System interface query options: • Sirio is a Rich Internet • ontology exploration using a Application (in Adobe Flex) front graph-based view end. • compact keyframe-based results • Orione is web service search engine presentation / streaming videos • Support for multiple ontologies • concept drag&drop facility (to build and ontology reasoning complex queries) • Results are in Media RSS format • natural language query (with Boolean/ (queries treated as RSS feeds) temporal ops.) • New search engine able to scale • free text query (for Google-like to large number of instances of search) ontology concepts giovedì 1 luglio 2010
  24. 24. Sirio and Orione giovedì 1 luglio 2010
  25. 25. Sirio and Orione giovedì 1 luglio 2010
  26. 26. Sirio and Orione giovedì 1 luglio 2010
  27. 27. Sirio and Orione giovedì 1 luglio 2010
  28. 28. Sirio and Orione giovedì 1 luglio 2010
  29. 29. Sirio and Orione giovedì 1 luglio 2010
  30. 30. Sirio and Orione giovedì 1 luglio 2010
  31. 31. Sirio and Orione giovedì 1 luglio 2010
  32. 32. Andromeda • System interface query options: • Design goals/assumptions: • Shows the concepts with more instances in a concept cloud view • semantic content-based browsing • efficient web-based interface using • Graph representation of semantic data structure RIA • System features: • Multiple automatic layout algorithms for spatial positioning and manual drag • Query manager as a Rich Internet & drop Application (in Adobe Flex). Connects to web service (search • Thumbnails view of the instances of each concept engine) • Support for multiple ontologies • Access to video metadata and video streaming and ontology reasoning • Access to social content related to ontology concepts (Flickr,YouTube, and real time tweets from Twitter) giovedì 1 luglio 2010
  33. 33. Andromeda giovedì 1 luglio 2010
  34. 34. Andromeda giovedì 1 luglio 2010
  35. 35. Andromeda giovedì 1 luglio 2010
  36. 36. Andromeda giovedì 1 luglio 2010
  37. 37. Andromeda giovedì 1 luglio 2010
  38. 38. Andromeda giovedì 1 luglio 2010
  39. 39. Pan • Design goals/assumptions: • complete/correct automatic annotations • System interface options • help in training new automatic • Integrated with web-based concept detectors search engine and automatic • System features: video annotation • Rich Internet Application • Multiple user profiles: a (in Adobe Flex). simple user may change his own annotations, while a super user • video streaming using the same can import the annotations of system of Sirio and Andromeda other users, e.g. to supervise the annotation process • new backend within an organization. • geotagging using Google Maps giovedì 1 luglio 2010
  40. 40. Pan ! giovedì 1 luglio 2010
  41. 41. Pan ! giovedì 1 luglio 2010
  42. 42. Pan ! giovedì 1 luglio 2010
  43. 43. Pan ! giovedì 1 luglio 2010
  44. 44. Pan giovedì 1 luglio 2010
  45. 45. Pan giovedì 1 luglio 2010
  46. 46. Daphnis • Design goals/assumptions: • build on image tagging made popular • System interface options by Flickr and tag clouds • users can tag images and retrieve images based on tags, or use tags • connect to social web sites to filter the results of similarity based retrieval. • allow CBIR • System features: • Ongoing work: • Rich Internet Application • merging with automatic video annotation for automatic (in Adobe Flex). tagging • Connects to Flickr (and also • adoption of mechanisms for Facebook, if needed) tag suggestion, based on • Approximate nearest recent research work in this field (use content, tags and neighbour search using MPEG-7 descriptors, to scale to large number geolocalization) of images giovedì 1 luglio 2010
  47. 47. Daphnis ! giovedì 1 luglio 2010
  48. 48. Daphnis giovedì 1 luglio 2010
  49. 49. Daphnis ! giovedì 1 luglio 2010
  50. 50. Daphnis giovedì 1 luglio 2010
  51. 51. IM3I: authoring platform A CMS approach to repository analysis, authoring and publication giovedì 1 luglio 2010
  52. 52. IM3I: authoring platform Authoring IM3I end-user functionality typically covers 5 distinctive stages: • Importing an existing repository from RSS and various XML streams • Extending the associated datamodel • Editing layout and editing features • Editing Search and Retrieval interfaces • Embedding the IM3I end-user interfaces in a (corporate) website giovedì 1 luglio 2010
  53. 53. Editing workflow demo •Step 1: Importing a video-repository •Step 2: Enhancing the datamodel •Step 3: Authoring layouts •Step 4: Publishing the repository giovedì 1 luglio 2010
  54. 54. I: Importing a repository •Importing an existing repository to an internal and flexible datamodel •Aggregating and harmonizing multiple repositories •Visualisation of markup and preview of contents •Flexibly mapping by drag-and-drop giovedì 1 luglio 2010
  55. 55. I: Importing a repository Mapping the contents of video RSS to an IM3I Datamodel giovedì 1 luglio 2010
  56. 56. II: Enhancing the Datamodel •Datamodels contain the descriptions of your repository and in this way stipulate what can be shown to- or retrieved by an end-user. •Datamodels can reference to each other •Datamodels can be extended overtime by adding elements •Elements are based on types: media files, URIs, date, string, etc. •Elements can be shared across datamodels to allow search & retrieval across multiple collections giovedì 1 luglio 2010
  57. 57. II: Enhancing the Datamodel Adding a ‘translation’ element to the datamodel giovedì 1 luglio 2010
  58. 58. II: Enhancing the Datamodel Adding a ‘translation’ element to the datamodel giovedì 1 luglio 2010
  59. 59. III: Layout and Functionality Easy manipulation of layout to a repository by: •Table metaphor (easy editing of table characteristics) •Drag and drop graphical elements •Drag and drop contents of repository in cells •Easy manipulation of look and feel •Easy adding editing functionalities to a layout •Easy preview and markup functionalities giovedì 1 luglio 2010
  60. 60. III: Layout and Functionality Defining a layout table giovedì 1 luglio 2010
  61. 61. III: Layout and Functionality Dragging repository contents to layout giovedì 1 luglio 2010
  62. 62. III: Layout and Functionality Previewing layout giovedì 1 luglio 2010
  63. 63. IV: Embedding in website Easy blend- in of layouts in corporate websites •By means of plugins for CMSs (e.g. WebManager, WordPress, Typo3) •Using <embed> </embed> •Allowing for elaborate workflow patterns in combining multiple layouts giovedì 1 luglio 2010
  64. 64. IV: Embedding in website Original contents Added Translation Functionality giovedì 1 luglio 2010
  65. 65. The super users giovedì 1 luglio 2010
  66. 66. Atlante - process manager • Main functions of this • Web application that is used for application are: creation, technical administration and monitoring • creation of new type of of IM3I processing pipeline (e.g. (distributed) process automatic annotation process, media transcoding, etc.) • params setting for new type of process • This web application has • creation of “Multiprocess” multiple user profile: composed by sets of single • managers (distributed) Processes • administrators • starting/pausing/stopping a process • monitoring running processes giovedì 1 luglio 2010
  67. 67. Atlante ! giovedì 1 luglio 2010
  68. 68. Atlante ! giovedì 1 luglio 2010
  69. 69. Atlante ! giovedì 1 luglio 2010
  70. 70. Gaia - media manager • Web application that will be used for a technical administration and monitoring of the database • Main functions of this application are: • media management • configuration of metadata, broadcasters, Annotations types, Concept types and Media types • media annotations monitoring by technical backend giovedì 1 luglio 2010
  71. 71. Gaia ! giovedì 1 luglio 2010
  72. 72. Gaia ! giovedì 1 luglio 2010
  73. 73. One more thing... giovedì 1 luglio 2010
  74. 74. giovedì 1 luglio 2010
  75. 75. giovedì 1 luglio 2010
  76. 76. ACM MM 2010 Workshop 3rd International Workshop on Automated Information Extraction in Media Production AIEMPro'10 Organizers: Dr. Robbie De Sutter Vlaamse Radio- en Televisieomroep - Medialab Jean-Pierre Evain European Broadcasting Union . Union Européenne de Radiotélévision Dr. Gerald Friedland ICSI (International Computer Science Institute) Dr. Alberto Messina RAI Radiotelevisione Italiana, Centre for Research and Technological Innovation Dr. Masanori Sano NHK (Japan Broadcasting Corporation) Science and Technology Research Laboratories giovedì 1 luglio 2010

×