Fiat 20080921 results PISA

1,277 views

Published on

In the research project PISA we have investigated how powerful search engines can be build, given a library of audiovisual material that has been analysed objectively and intelligently

Published in: Technology, Travel, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,277
On SlideShare
0
From Embeds
0
Number of Embeds
23
Actions
Shares
0
Downloads
28
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Fiat 20080921 results PISA

  1. 1. medialab PISA – Proof of Concept Production, Indexing and Search of Audiovisual Material
  2. 2. PISA - Positioning ! VRT-Medialab (medialab.vrt.be) - technical R&D ! IBBT (www.ibbt.be) – Interdisciplinary Research Institute ! PISA – Research Project on Production and Indexing of Audiovisual Media ! 21 Man-year ! Computer Assisted Manufacturing ! Unsupervised Feature Extraction ! Search Engine Technology 2 medialab
  3. 3. Context - Digital Media Production Suprastructure – Metadata Mgnt Production and distribution Production and distribution Editing Mastering Media Ingest Asset Mgnt Playout Infrastructure - Networks and Storage Production Platform 3 medialab
  4. 4. Digital Asset Management, Content Management… Suprastructure – Metadata Mgnt Production and distribution Infrastructure - Networks and Storage Production Platform 4 medialab
  5. 5. User Expectations Communication (Information) Data General Data General Data General Suprastructure – Metadata Mgnt Data General Data General Data General Meta Meta Data Data Production and distribution Assumptions: • An item is relevant or it is not • A “scene” is the logical unit of search Infrastructure - Networks and Storage The ideal search engine • retrieves all relevant items (recall 100%) • without false positives (precision 100%) • enables instant access to digital media • with respect to intellectual property. Production Platform 5 medialab
  6. 6. Archiving – Disclosure, Annotation,… archiefnummer : ALG 20010813 1 fragmentnummer : 1 reeks : 1000 ZONNEN EN GARNALEN Opzoekscherm FILM Set: 16 Aantal: 1 bandnummer : E03024404 blz 1 van 3 formaat : DBCM trefwoorden: ibm and vrt fragmenttitel : 1000 ZONNEN & GARNALEN beeld : KL/PALPLUS archiefnummer: - fragmentduur : 18 20 uitzendjaar: maand: dag: tekst : 0'00quot; TOERISTISCH REPORTAGEMAGAZINE OVERZICHT fragmentnummer: fragmentduur: ONDERWERPEN GENERIEK TOERISTISCH REPORTAGEMAGAZINE, reeks: OVERZICHT ONDERWERPEN formaat: bandnummer: 0'50quot; VANDAAG : KUNSTENAAR LUC HOFKENS ONTWIERP EEN OASE aflevering: afleveringsnummer: OP ZIJN DAKTERRAS IN BORGERHOUT DIE DOET DENKEN AAN DE programma: uitzenddatum: GRAND CANYON INTERVIEW MET LUC EN ZIJN VROUW fragmenttitel: MARILOU BUITENBEELD DAK MET OMGEVING BUITENKANT tekst: ARBEIDERSWONING, PANO OVER ROTSWANDEN, KRATEN MET WATER, kategorie: BEPANTING, FOTOALBUM MET VERLOOP WERKEN opnamedatum: opnamenummer: 4'00quot; JUNIOR : KLAARTJE ALAERTS, 13 JAAR WIL ASTRONAUTEN journalist: rechthebbende: WORDEN ZE BEZOEKT HETEUROSPACE CENTER METRUIMTEVEREN, RAKETTEN SIMULATIE IN RUIMTEVEER, INTERVIEW, HEEFT EEN UFO GEZIEN MAAKT ZELF KLEIN RAKETJE, SCHIET HET AF SETS 7'50quot; DE SCHEURKALENDER : ARCHIEF RECLAMEFILM IBM The strings required for the operation are not defined INTERVIEW MAURICE DE WILDE, EERSTE PERSOONLIJKECOMPUTER trefwoorden : BELGIE; BORGERHOUT; ARTIEST; OASE; KUNST; GRAND CANYON (NATUURGEBIED); DAK; TERRAS; INTERVIEW; EURO F11 F12 F13 F14 F17 F18 F19 F20 Ent SPACE CENTER; RUIMTEVAART; PC; BOOTTOCHT; RIJKDOM; Eindigen Sets Refset Toon Vorige Volg/Leeg Thesaurus Commando Opzoeken PASSAGIER; GASTRONOMIE; RESTAURANT; PERSONEEL; VAKANTIE; BINNENBEELD; SCHIP; BECKERS LEEN; VRT; LOTTO; RADIOOMROEPSTER; KLANKSTUDIO; UITVINDING; BARBECUE; BETONMOLEN; IBM; RECLAMESPOT rechthebbende : VRT 6 medialab
  7. 7. Aha - The Search Engine! 7 medialab
  8. 8. Issues – Catch-22 -> Automated processing of information is a key discriminator, but it requires correct and structured metadata -> “Annotation” of rich media requires semantic awareness and interpretation, and thus it is at best an approximation -> Product Engineering is the source of structured and meaningful information, but creative staff are not susceptible to technology 8 medialab
  9. 9. Objectives - Proof of Concept • One Set of Numbers(!) • Model Driven Development • Computer Assisted Manufacturing • Unsupervised Feature Extraction • Efficient Search and Retrieval ! Develop an extensible data-model and a consistent application framework, accessible via an intuitive user-interface (! Digitizing analogue and disintegrated information flows) 9 medialab
  10. 10. Milestone 1 – Search Engine 10 medialab
  11. 11. Milestone 1 – Search Engine ! Search federation by system integration Search Client ! Facetted search (Custom Development) ! Integrated application of keywords ! Intuitive and structured presentation of results ! Direct access to audiovisual material Legacy Video Library (Basisplus) <NewsML-G2> Raw Material (EBU Superpop) Media Asset Search Engine Management System (Lucene/SOLR) (Ardome) Actual news items (Ardome) 11 medialab
  12. 12. Shot Segmentation and Scene Recognition 12 medialab
  13. 13. Character Recognition 13 medialab
  14. 14. Video copy detection ! Identify dupplicates ! Generation tracking ! Grouping of search results ! Intellectual Property Protection 14 medialab
  15. 15. Milestone 2 – Feature Extraction ! Time-coded properties and indexing allow random access to material fragments: ! Shot segmentation and Keyframe extraction ! Subtitle processing and Speech recognition ! Taxonomy-driven topic detection ! Face recognition ! Scene recognition ! Copy detection Legacy Video Library (Basisplus) <NewsML-G2> Raw Material Media Asset (EBU Superpop) Management Asset Media Search Engine Management System (Lucene/SOLR) (Ardome)(Ardome) Actual news items (Ardome) Face Detection Shot Topic Segmentation Detection Media Speech 15 medialab Production Recognition
  16. 16. Work in Process (due Q4 2008) ! Multi-lingual ! Access control and Intellectual Property Protection ! Audio segmentation and classification ! Music transcription ! Fractal-based visual indexing ! … Media 16 medialab Production
  17. 17. Conclusion ! Enterprise search – structured metadata, limited number of libraries, limited number of records per library, dependencies between objects ! Intelligent search federation is aware of the media production process - scripts, webpages, subtitles and formal annotation may represent the same editorial object ! Random access to audiovisual material requires an index is based on timecode and not « wordposition in a document » ! Onthology-driven application logic is essential to create semantic awareness, i.e. resolving synonyms and disambiguation of homonyms ! The perfect search engine is not for sale yet and required from the ground up design and development. 17 medialab
  18. 18. Future Work - From « Metadata » to CAD/CAM ? 18 medialab
  19. 19. Future Work - From « Metadata » to CAD/CAM ? 19 medialab
  20. 20. ! http://medialab.vrt.be/pisa ! http://projects.ibbt.be/pisa ! Maarten.verwaest@vrt.be 20 medialab

×