• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
search and retrieval of audiovisual material
 

search and retrieval of audiovisual material

on

  • 2,352 views

 

Statistics

Views

Total Views
2,352
Views on SlideShare
2,340
Embed Views
12

Actions

Likes
2
Downloads
31
Comments
0

2 Embeds 12

http://www.vrtmedialab.be 10
http://www.slideshare.net 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    search and retrieval of audiovisual material search and retrieval of audiovisual material Presentation Transcript

    • PISA Production, Indexing and Search of Audio-visual Material De wiskundige logica achter search en retrieval van audiovisueel materiaal Valérie De Witte, VRT-medialab
    • Archiving archiefnummer : ALG 20010813 1 fragmentnummer : 1 reeks : 1000 ZONNEN EN GARNALEN Opzoekscherm FILM Set: 16 Aantal: 1 bandnummer : E03024404 blz 1 van 3 formaat : DBCM trefwoorden: ibm and vrt fragmenttitel : 1000 ZONNEN & GARNALEN beeld : KL/PALPLUS archiefnummer: - fragmentduur : 18 20 uitzendjaar: maand: dag: tekst : 0'00quot; TOERISTISCH REPORTAGEMAGAZINE OVERZICHT fragmentnummer: fragmentduur: ONDERWERPEN GENERIEK TOERISTISCH REPORTAGEMAGAZINE, reeks: OVERZICHT ONDERWERPEN formaat: bandnummer: 0'50quot; VANDAAG : KUNSTENAAR LUC HOFKENS ONTWIERP EEN OASE aflevering: afleveringsnummer: OP ZIJN DAKTERRAS IN BORGERHOUT DIE DOET DENKEN AAN DE programma: uitzenddatum: GRAND CANYON INTERVIEW MET LUC EN ZIJN VROUW fragmenttitel: MARILOU BUITENBEELD DAK MET OMGEVING BUITENKANT tekst: ARBEIDERSWONING, PANO OVER ROTSWANDEN, KRATEN MET WATER, kategorie: BEPANTING, FOTOALBUM MET VERLOOP WERKEN opnamedatum: opnamenummer: 4'00quot; JUNIOR : KLAARTJE ALAERTS, 13 JAAR WIL ASTRONAUTEN journalist: rechthebbende: WORDEN ZE BEZOEKT HETEUROSPACE CENTER METRUIMTEVEREN, RAKETTEN SIMULATIE IN RUIMTEVEER, INTERVIEW, HEEFT EEN UFO GEZIEN MAAKT ZELF KLEIN RAKETJE, SCHIET HET AF SETS 7'50quot; DE SCHEURKALENDER : ARCHIEF RECLAMEFILM IBM The strings required for the operation are not defined INTERVIEW MAURICE DE WILDE, EERSTE PERSOONLIJKECOMPUTER trefwoorden : BELGIE; BORGERHOUT; ARTIEST; OASE; KUNST; GRAND CANYON (NATUURGEBIED); DAK; TERRAS; INTERVIEW; EURO F11 F12 F13 F14 F17 F18 F19 F20 Ent SPACE CENTER; RUIMTEVAART; PC; BOOTTOCHT; RIJKDOM; Eindigen Sets Refset Toon Vorige Volg/Leeg Thesaurus Commando Opzoeken PASSAGIER; GASTRONOMIE; RESTAURANT; PERSONEEL; VAKANTIE; BINNENBEELD; SCHIP; BECKERS LEEN; VRT; LOTTO; RADIOOMROEPSTER; KLANKSTUDIO; UITVINDING; BARBECUE; BETONMOLEN; IBM; RECLAMESPOT rechthebbende : VRT 81 medialab
    • Issues -> “Annotation” provides structured metadata and needs to become scalable for the increasing set of information -> Automated processing of information is a key issue, but it requires correct and structured metadata -> Product Engineering is the source of structured and meaningful information 82 medialab
    • Alternative solution medialab
    • Milestone 1 – Searching Audiovisual Material Assumptions: • A “scene” is the logical unit of search Search Client (Custom Development) The ideal search engine: • retrieves all relevant items (recall 100%) • without false positives (precision 100%) • provides grouping of similar results • gives instant access to digital media • with respect to intellectual property. Legacy Video Library (Basisplus) NewsML-G2 Raw Material (EBU Superpop) Media Asset Search Engine Management System (Lucene/SOLR) (Ardome) Actual news items (Ardome) 84 medialab
    • Milestone 2 – Computer Assisted Analysis ! Shot segmentation ! Audio classification ! Face detection ! Face recognition ! Scene detection ! Subtitle processing ! Topic recognition Legacy Video Library (Basisplus) NewsML-G2 Raw Material Media Asset (EBU Superpop) Management Asset Media Search Engine Management System (Lucene/SOLR) (Ardome)(Ardome) Actual news items (Ardome) Face Detection Shot Topic Segmentation Recognition Media Scene 85 medialab Production Detection
    • Search systems Actual search implementations are excellent in terms of search capabilities - Boolean logic (AND-, OR- and NOT-operators) - truncation (plural, stemming, capital letters) - thesaurus (synonyms, homonyms,…) - structured metadata and range search - single word and phrase searching But… retrieval efficiency - coverage (composition of the used index, which parts of the documents that are indexed, update frequency) - response time (average waiting time between issuing a search command and displaying the first batch of results on the screen) - user effort (user-friendly interface) - output option (number of output options, layout, clarity) 86 medialab
    • Qualitative evaluation -> precision = l relevant documents ! retrieved documents l l retrieved documents l - fraction of the returned results that are relevant - requires knowledge of the relevant and non-relevant hits in the set of retrieved documents 87 medialab
    • Qualitative evaluation -> recall = l relevant documents ! retrieved documents l l relevant documents l - fraction of the relevant documents in the collection that are retrieved - requires knowledge not only of the relevant and retrieved documents but also of those not retrieved 88 medialab
    • Qualitative evaluation ! There is often an inverse relationship between precision and recall: increasing one will reduce the other ! Concerning recall and precision, one is more important than the other in different use cases -> in some use cases only the hits on the top of the list have to be relevant and there is not interest in looking at every document that is relevant (high precision) -> in some use cases we like to get the recall as high as possible and we will tolerate to see low precision results 89 medialab
    • Trouvaille Precision Actual Search Google Recall medialab
    • Trouvaille ! Thesaurus application: ! During search: keywords in auto-completion, spellcheck and synonyms ! User friendly interface: ! Facetted search: programma, genre, journalist ! Different output views: keywords, thumbnails, Google-maps ! Use of a standard NewsML-G2 ! Metadata is time-coded -> Matching keyframe 91 medialab
    • Trouvaille: future work ! Clustering: integration of copy detection to Precision find duplicates in the retrieved hits ! Intelligent Information Clustering:Concept 100% relationships detection ! Feature extraction: Topic detection ! Combination of system quality and user Intelligent Information clustering satisfaction for the evaluation Trouvaille Feature extraction (MS1) Actual Search Google 100% Recall 92 medialab
    • Trouvaille 93 medialab