Your SlideShare is downloading. ×
  • Like
Vidivideo and IM3I
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply
Published

Presentation held by Marco Bertini at the first EUscreen Open Workshop in Mykonos, Greece, on June 23 and 24, 2010 on the Videivideo and IM3I projects

Presentation held by Marco Bertini at the first EUscreen Open Workshop in Mykonos, Greece, on June 23 and 24, 2010 on the Videivideo and IM3I projects

Published in Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,199
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
9
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Automatic Metadata Extraction Marco Bertini Università di Firenze - MICC www.micc.unifi.it giovedì 1 luglio 2010
  • 2. The problem The massive increase in digital audio-visual information poses high demands on advanced storage and search engines for consumers and professional archives. Video is now a natural form of communication for the Internet and mobile devices. Video search engines are the product of progress in many technologies: visual and audio analysis, machine learning techniques, as well as visualization and interaction. giovedì 1 luglio 2010
  • 3. Two solutions www.vidivideo.info www.im3i.eu giovedì 1 luglio 2010
  • 4. VidiVideo: project overview The VidiVideo project addressed the challenge of creating a substantially enhanced semantic access to video, implemented in a search engine. The outcome of the project is an audio-visual search engine, composed of two parts: a automatic annotation part, that runs off-line, where detectors for more than 1000 semantic concepts are collected in a thesaurus to process and automatically annotate the video and an interactive part that provides a video search engine for both technical and non-technical users. giovedì 1 luglio 2010
  • 5. VidiVideo: project results The automatic annotation part of the system performs audio and video segmentation, speech recognition, speaker clustering and semantic concept detection. The VidiVideo system has achieved the highest performance in the most important object and concept recognition international contests (PASCAL VOC and TRECVID). The interactive part provides a desktop-based and a web-based search engines. The system permits different query modalities (free text, natural language, graphical composition of concepts using boolean and temporal relations and query by visual example) and visualizations for video retrieval and browsing. giovedì 1 luglio 2010
  • 6. Call Identifier FP7-SME-2010-1 Submitted 03 December 2009 VidiVideo: project partners Name of the co-ordinating person Dr.-Ing. Georgios Ioannidis E-Mail gi@in-two.com Fax +49-179-33-2286677 No. Participant Name Type Short Name Country 1 IN2 search interfaces development Ltd SME IN2 UK 2 spring techno GmbH SME SPRING DE 3 VISup Srl SME VISUP IT 4 Hogeschool voor de Kunsten Utrecht RTDP HKU NL 5 University Firenze RTDP UNIFI IT 6 Instituto de Engenharia de Sistemas e RTDP INESC-ID PT Computadores giovedì 1 luglio 2010
  • 7. IM3I: project overview IM3I aims to provide the creative media sector with new ways of searching, summarising and visualising large multimedia archives. IM3I will provide a service-oriented architecture that allow multiple viewpoints upon multimedia data that are available in a repository, and provide better ways to interact and share rich media. This paves the way for a multimedia information management platform which is more flexible, adaptable and customisable than current repository software. This in turn enables new opportunities for content owners to exploit their digital assets. giovedì 1 luglio 2010
  • 8. IM3I: project results Developed a set of tools for automatic audio-visual annotation and search Developed a set of web services to manage, create and orchestrate the indexing services Developed a set of specialized search and management interfaces IM3I authoring platform: allows professional users to import and publish repositories of digital media, authoring of web-based environments for the end-users, creation of elaborate workflow patterns and search & retrieval interfaces to allow a diversity of end-user interactions and scenarios giovedì 1 luglio 2010
  • 9. IM3I: project partners giovedì 1 luglio 2010
  • 10. The IM3I backend giovedì 1 luglio 2010
  • 11. Visual annotation • Split a video detecting shots and large content changes with very fast algorithm • Use different annotation strategies and types of detectors: • low level (color, B/W, motion) • Haar-based boosted classifiers • HOG + SVMs • Bag-of-words • k-NN + voting (for tag suggestion) • simple MPEG-7 XML format (full and fragment) giovedì 1 luglio 2010
  • 12. Baseline: typical BoW Hierarch. clustering Feature extract. visual words histo Learning giovedì 1 luglio 2010
  • 13. Fusion schemes • Early fusion: integrates unimodal features before learning concepts. • Late fusion: first reduces unim. feat. to separately learned concepts scores, then these scores are integrated to learn concepts. giovedì 1 luglio 2010
  • 14. Fusion schemes • Early fusion: integrates unimodal features before learning concepts. • Late fusion: first reduces unim. feat. to separately learned concepts scores, then these scores are integrated to learn concepts. giovedì 1 luglio 2010
  • 15. Early fusion approach Hierarch. clustering • Hypothesis: MSER isolate semantically relevant information. • Idea: represent points that have some spatial relation with regions that are inside, outside, just on the border • Sampling: SIFT-SURF, dense. giovedì 1 luglio 2010
  • 16. Late fusion approach Hierarch. clustering Hierarch. clustering !"# !1 !2 !"###$%#&'%(!")#*%+,$-#&'-(!")#*%+......$%#&'%(!")#*/+,$-#&'-(!")#*/+# • Use SURF/SIFT + MSER • Use geometric descriptors for MSERs giovedì 1 luglio 2010
  • 17. Test: baseline Time Avg. Max Method Sampling # points Time accuracy accuracy • Best: SURF 64 Grid 10 (accuracy, computational cost) • SURF 64 Grid 5: +7-8% accuracy, +300% time • the number of points influences accuracy giovedì 1 luglio 2010
  • 18. Test: early fusion Sampling Avg. Max Method # points Time Time accuracy accuracy • Best: EF SURF 64 Grid 10 (accuracy, computational cost) • EF SURF 64 Borders: many points, accuracy ~ that of Grid 10 but higher computational costs • EF SURF 64 Grid 10 is worst than SURF 64 Grid 10, but much faster (50% of execution time) giovedì 1 luglio 2010
  • 19. Test: late fusion Method 1 Method 2 Accuracy • weighting 0.6 (best method) and 0.4 (worst method) lead to good results • best performance: dense sampling + sparse sampling • best combination: SURF 64 + EF SURF 64 Grid 10 (improved accuracy, modest computational cost increase) giovedì 1 luglio 2010
  • 20. Conclusions • Early fusion strategies: • ~ baseline accuracy • faster • Late fusion strategies: • better accuracy than baseline • each method corrects some errors made by the other • fuse keypoints/regions (SURF, fusion of SURF and MSER) • IM3I users will be able to chose what’s best for them giovedì 1 luglio 2010
  • 21. The users giovedì 1 luglio 2010
  • 22. Video search engine Our goal is to provide a search engine for videos for both technical and non-technical users. Provide different interfaces that permit different query modalities: free-text, natural language, graphical composition of concepts using boolean and temporal relations and query by visual example. In addition, exploit ontologies and their structure to encode semantic relations between concepts permitting, for example, to expand queries to synonyms and concept specializations. giovedì 1 luglio 2010
  • 23. Sirio and Orione • Design goals/assumptions: • semantic content-based retrieval • efficient web-based interface • System features: • System interface query options: • Sirio is a Rich Internet • ontology exploration using a Application (in Adobe Flex) front graph-based view end. • compact keyframe-based results • Orione is web service search engine presentation / streaming videos • Support for multiple ontologies • concept drag&drop facility (to build and ontology reasoning complex queries) • Results are in Media RSS format • natural language query (with Boolean/ (queries treated as RSS feeds) temporal ops.) • New search engine able to scale • free text query (for Google-like to large number of instances of search) ontology concepts giovedì 1 luglio 2010
  • 24. Sirio and Orione giovedì 1 luglio 2010
  • 25. Sirio and Orione giovedì 1 luglio 2010
  • 26. Sirio and Orione giovedì 1 luglio 2010
  • 27. Sirio and Orione giovedì 1 luglio 2010
  • 28. Sirio and Orione giovedì 1 luglio 2010
  • 29. Sirio and Orione giovedì 1 luglio 2010
  • 30. Sirio and Orione giovedì 1 luglio 2010
  • 31. Sirio and Orione giovedì 1 luglio 2010
  • 32. Andromeda • System interface query options: • Design goals/assumptions: • Shows the concepts with more instances in a concept cloud view • semantic content-based browsing • efficient web-based interface using • Graph representation of semantic data structure RIA • System features: • Multiple automatic layout algorithms for spatial positioning and manual drag • Query manager as a Rich Internet & drop Application (in Adobe Flex). Connects to web service (search • Thumbnails view of the instances of each concept engine) • Support for multiple ontologies • Access to video metadata and video streaming and ontology reasoning • Access to social content related to ontology concepts (Flickr,YouTube, and real time tweets from Twitter) giovedì 1 luglio 2010
  • 33. Andromeda giovedì 1 luglio 2010
  • 34. Andromeda giovedì 1 luglio 2010
  • 35. Andromeda giovedì 1 luglio 2010
  • 36. Andromeda giovedì 1 luglio 2010
  • 37. Andromeda giovedì 1 luglio 2010
  • 38. Andromeda giovedì 1 luglio 2010
  • 39. Pan • Design goals/assumptions: • complete/correct automatic annotations • System interface options • help in training new automatic • Integrated with web-based concept detectors search engine and automatic • System features: video annotation • Rich Internet Application • Multiple user profiles: a (in Adobe Flex). simple user may change his own annotations, while a super user • video streaming using the same can import the annotations of system of Sirio and Andromeda other users, e.g. to supervise the annotation process • new backend within an organization. • geotagging using Google Maps giovedì 1 luglio 2010
  • 40. Pan ! giovedì 1 luglio 2010
  • 41. Pan ! giovedì 1 luglio 2010
  • 42. Pan ! giovedì 1 luglio 2010
  • 43. Pan ! giovedì 1 luglio 2010
  • 44. Pan giovedì 1 luglio 2010
  • 45. Pan giovedì 1 luglio 2010
  • 46. Daphnis • Design goals/assumptions: • build on image tagging made popular • System interface options by Flickr and tag clouds • users can tag images and retrieve images based on tags, or use tags • connect to social web sites to filter the results of similarity based retrieval. • allow CBIR • System features: • Ongoing work: • Rich Internet Application • merging with automatic video annotation for automatic (in Adobe Flex). tagging • Connects to Flickr (and also • adoption of mechanisms for Facebook, if needed) tag suggestion, based on • Approximate nearest recent research work in this field (use content, tags and neighbour search using MPEG-7 descriptors, to scale to large number geolocalization) of images giovedì 1 luglio 2010
  • 47. Daphnis ! giovedì 1 luglio 2010
  • 48. Daphnis giovedì 1 luglio 2010
  • 49. Daphnis ! giovedì 1 luglio 2010
  • 50. Daphnis giovedì 1 luglio 2010
  • 51. IM3I: authoring platform A CMS approach to repository analysis, authoring and publication giovedì 1 luglio 2010
  • 52. IM3I: authoring platform Authoring IM3I end-user functionality typically covers 5 distinctive stages: • Importing an existing repository from RSS and various XML streams • Extending the associated datamodel • Editing layout and editing features • Editing Search and Retrieval interfaces • Embedding the IM3I end-user interfaces in a (corporate) website giovedì 1 luglio 2010
  • 53. Editing workflow demo •Step 1: Importing a video-repository •Step 2: Enhancing the datamodel •Step 3: Authoring layouts •Step 4: Publishing the repository giovedì 1 luglio 2010
  • 54. I: Importing a repository •Importing an existing repository to an internal and flexible datamodel •Aggregating and harmonizing multiple repositories •Visualisation of markup and preview of contents •Flexibly mapping by drag-and-drop giovedì 1 luglio 2010
  • 55. I: Importing a repository Mapping the contents of video RSS to an IM3I Datamodel giovedì 1 luglio 2010
  • 56. II: Enhancing the Datamodel •Datamodels contain the descriptions of your repository and in this way stipulate what can be shown to- or retrieved by an end-user. •Datamodels can reference to each other •Datamodels can be extended overtime by adding elements •Elements are based on types: media files, URIs, date, string, etc. •Elements can be shared across datamodels to allow search & retrieval across multiple collections giovedì 1 luglio 2010
  • 57. II: Enhancing the Datamodel Adding a ‘translation’ element to the datamodel giovedì 1 luglio 2010
  • 58. II: Enhancing the Datamodel Adding a ‘translation’ element to the datamodel giovedì 1 luglio 2010
  • 59. III: Layout and Functionality Easy manipulation of layout to a repository by: •Table metaphor (easy editing of table characteristics) •Drag and drop graphical elements •Drag and drop contents of repository in cells •Easy manipulation of look and feel •Easy adding editing functionalities to a layout •Easy preview and markup functionalities giovedì 1 luglio 2010
  • 60. III: Layout and Functionality Defining a layout table giovedì 1 luglio 2010
  • 61. III: Layout and Functionality Dragging repository contents to layout giovedì 1 luglio 2010
  • 62. III: Layout and Functionality Previewing layout giovedì 1 luglio 2010
  • 63. IV: Embedding in website Easy blend- in of layouts in corporate websites •By means of plugins for CMSs (e.g. WebManager, WordPress, Typo3) •Using <embed> </embed> •Allowing for elaborate workflow patterns in combining multiple layouts giovedì 1 luglio 2010
  • 64. IV: Embedding in website Original contents Added Translation Functionality giovedì 1 luglio 2010
  • 65. The super users giovedì 1 luglio 2010
  • 66. Atlante - process manager • Main functions of this • Web application that is used for application are: creation, technical administration and monitoring • creation of new type of of IM3I processing pipeline (e.g. (distributed) process automatic annotation process, media transcoding, etc.) • params setting for new type of process • This web application has • creation of “Multiprocess” multiple user profile: composed by sets of single • managers (distributed) Processes • administrators • starting/pausing/stopping a process • monitoring running processes giovedì 1 luglio 2010
  • 67. Atlante ! giovedì 1 luglio 2010
  • 68. Atlante ! giovedì 1 luglio 2010
  • 69. Atlante ! giovedì 1 luglio 2010
  • 70. Gaia - media manager • Web application that will be used for a technical administration and monitoring of the database • Main functions of this application are: • media management • configuration of metadata, broadcasters, Annotations types, Concept types and Media types • media annotations monitoring by technical backend giovedì 1 luglio 2010
  • 71. Gaia ! giovedì 1 luglio 2010
  • 72. Gaia ! giovedì 1 luglio 2010
  • 73. One more thing... giovedì 1 luglio 2010
  • 74. giovedì 1 luglio 2010
  • 75. giovedì 1 luglio 2010
  • 76. ACM MM 2010 Workshop 3rd International Workshop on Automated Information Extraction in Media Production AIEMPro'10 Organizers: Dr. Robbie De Sutter Vlaamse Radio- en Televisieomroep - Medialab Jean-Pierre Evain European Broadcasting Union . Union Européenne de Radiotélévision Dr. Gerald Friedland ICSI (International Computer Science Institute) Dr. Alberto Messina RAI Radiotelevisione Italiana, Centre for Research and Technological Innovation Dr. Masanori Sano NHK (Japan Broadcasting Corporation) Science and Technology Research Laboratories giovedì 1 luglio 2010