SlideShare a Scribd company logo
PISA
Production, Indexing and Search
of Audio-visual Material
 De wiskundige logica achter search en retrieval
          van audiovisueel materiaal
           Valérie De Witte, VRT-medialab
Archiving



                                                                           archiefnummer : ALG 20010813 1
                                                                           fragmentnummer : 1
                                                                           reeks      : 1000 ZONNEN EN GARNALEN
Opzoekscherm FILM               Set: 16 Aantal:        1                   bandnummer       : E03024404
blz 1 van 3                                                                formaat       : DBCM
 trefwoorden:     ibm and vrt                                              fragmenttitel : 1000 ZONNEN & GARNALEN
                                                                           beeld      : KL/PALPLUS
 archiefnummer:                                            -               fragmentduur    : 18 20
 uitzendjaar:                    maand:            dag:                    tekst     : 0'00quot; TOERISTISCH REPORTAGEMAGAZINE OVERZICHT
 fragmentnummer:                       fragmentduur:                                 ONDERWERPEN GENERIEK TOERISTISCH REPORTAGEMAGAZINE,
 reeks:                                                                              OVERZICHT ONDERWERPEN
 formaat:                       bandnummer:                                          0'50quot; VANDAAG : KUNSTENAAR LUC HOFKENS ONTWIERP EEN OASE
 aflevering:                    afleveringsnummer:                                   OP ZIJN DAKTERRAS IN BORGERHOUT DIE DOET DENKEN AAN DE
 programma:                        uitzenddatum:                                     GRAND CANYON INTERVIEW MET LUC EN ZIJN VROUW
 fragmenttitel:                                                                      MARILOU BUITENBEELD DAK MET OMGEVING BUITENKANT
 tekst:                                                                              ARBEIDERSWONING, PANO OVER ROTSWANDEN, KRATEN MET WATER,
 kategorie:                                                                          BEPANTING, FOTOALBUM MET VERLOOP WERKEN
 opnamedatum:                       opnamenummer:                                    4'00quot; JUNIOR : KLAARTJE ALAERTS, 13 JAAR WIL ASTRONAUTEN
 journalist:                    rechthebbende:                                       WORDEN ZE BEZOEKT HETEUROSPACE CENTER METRUIMTEVEREN,
                                                                                     RAKETTEN SIMULATIE IN RUIMTEVEER, INTERVIEW, HEEFT EEN
                                                                                     UFO GEZIEN MAAKT ZELF KLEIN RAKETJE, SCHIET HET AF
            SETS                                                                     7'50quot; DE SCHEURKALENDER : ARCHIEF RECLAMEFILM IBM
The strings required for the operation are not defined                               INTERVIEW MAURICE DE WILDE, EERSTE PERSOONLIJKECOMPUTER
                                                                           trefwoorden    : BELGIE; BORGERHOUT; ARTIEST; OASE; KUNST; GRAND
                                                                                     CANYON (NATUURGEBIED); DAK; TERRAS; INTERVIEW; EURO
 F11      F12     F13   F14      F17      F18     F19          F20   Ent             SPACE CENTER; RUIMTEVAART; PC; BOOTTOCHT; RIJKDOM;
Eindigen Sets Refset Toon Vorige Volg/Leeg Thesaurus Commando Opzoeken               PASSAGIER; GASTRONOMIE; RESTAURANT; PERSONEEL;
                                                                                     VAKANTIE; BINNENBEELD; SCHIP; BECKERS LEEN; VRT;
                                                                                     LOTTO; RADIOOMROEPSTER; KLANKSTUDIO; UITVINDING;
                                                                                     BARBECUE; BETONMOLEN; IBM; RECLAMESPOT
                                                                           rechthebbende : VRT




                                                                                                                                                81
medialab
Issues




               -> “Annotation” provides structured metadata and
                  needs to become scalable for the increasing set
                  of information

               -> Automated processing of information is a key
                  issue, but it requires correct and structured
                  metadata

               -> Product Engineering is the source of structured
                  and meaningful information




                                                                    82
medialab
Alternative solution




medialab
Milestone 1 – Searching Audiovisual Material
    Assumptions:
    • A “scene” is the logical unit of search                              Search Client
                                                                       (Custom Development)

    The ideal search engine:
    • retrieves all relevant items (recall 100%)
    • without false positives (precision 100%)
    • provides grouping of similar results
    • gives instant access to digital media
    • with respect to intellectual property.




                     Legacy Video Library
                         (Basisplus)

                                            NewsML-G2

      Raw Material
    (EBU Superpop)                                         Media Asset                 Search Engine
                                                        Management System             (Lucene/SOLR)
                                                            (Ardome)



                     Actual news items
                         (Ardome)
                                                                                                       84
medialab
Milestone 2 – Computer Assisted Analysis
    !   Shot segmentation
    !   Audio classification
    !   Face detection
    !   Face recognition
    !   Scene detection
    !   Subtitle processing
    !   Topic recognition

                           Legacy Video Library
                               (Basisplus)

                                                   NewsML-G2

          Raw Material                                           Media Asset
        (EBU Superpop)                                           Management Asset
                                                                         Media                  Search Engine
                                                                      Management System        (Lucene/SOLR)
                                                                  (Ardome)(Ardome)


                         Actual news items
                             (Ardome)
                                                                            Face
                                                                          Detection
                                                     Shot                                    Topic
                                                  Segmentation                            Recognition

         Media                                                             Scene
                                                                                                                85
medialab
      Production                                                          Detection
Search systems

      Actual search implementations are excellent in terms of search capabilities
                - Boolean logic (AND-, OR- and NOT-operators)
                - truncation (plural, stemming, capital letters)
                - thesaurus (synonyms, homonyms,…)
                - structured metadata and range search
                - single word and phrase searching

      But… retrieval efficiency
                - coverage (composition of the used index, which parts of the documents
                  that are indexed, update frequency)
                - response time (average waiting time between issuing a search
                  command and displaying the first batch of results on the screen)
                - user effort (user-friendly interface)
                - output option (number of output options, layout, clarity)




                                                                                          86
medialab
Qualitative evaluation

      -> precision = l relevant documents ! retrieved documents l
                              l retrieved documents l

           - fraction of the returned results that are relevant

           - requires knowledge of the relevant and non-relevant hits in the
             set of retrieved documents




                                                                               87
medialab
Qualitative evaluation

      -> recall = l relevant documents ! retrieved documents l
                         l relevant documents l

           - fraction of the relevant documents in the collection that are
             retrieved

           - requires knowledge not only of the relevant and retrieved
             documents but also of those not retrieved




                                                                             88
medialab
Qualitative evaluation

      ! There is often an inverse relationship between precision and recall:
        increasing one will reduce the other

      ! Concerning recall and precision, one is more important than the other in
        different use cases

           -> in some use cases only the hits on the top of the list have to be
              relevant and there is not interest in looking at every document that is
              relevant (high precision)

           -> in some use cases we like to get the recall as high as possible and
               we will tolerate to see low precision results




                                                                                        89
medialab
Trouvaille

           Precision




                                Actual Search




                       Google




                                                Recall



medialab
Trouvaille

      ! Thesaurus application:
          ! During search: keywords in auto-completion, spellcheck and
             synonyms
      ! User friendly interface:
          ! Facetted search: programma, genre, journalist
          ! Different output views: keywords, thumbnails, Google-maps
      ! Use of a standard NewsML-G2
      ! Metadata is time-coded
          -> Matching keyframe




                                                                         91
medialab
Trouvaille: future work

                                                          ! Clustering: integration of copy detection to
   Precision                                                find duplicates in the retrieved hits
                                                          ! Intelligent Information Clustering:Concept
     100%
                                                            relationships detection
                                                          ! Feature extraction: Topic detection
                                                          ! Combination of system quality and user
                              Intelligent
                        Information clustering
                                                            satisfaction for the evaluation



                                             Trouvaille     Feature extraction
                                               (MS1)



                         Actual Search




               Google




                                                                                 100%
                                                                                        Recall

                                                                                                       92
medialab
Trouvaille




                   93
medialab

More Related Content

More from vrt-medialab

Multischermenonderzoek
MultischermenonderzoekMultischermenonderzoek
Multischermenonderzoekvrt-medialab
 
Browser as a broadcast medium
Browser as a broadcast mediumBrowser as a broadcast medium
Browser as a broadcast medium
vrt-medialab
 
Champ iMinds
Champ iMindsChamp iMinds
Champ iMinds
vrt-medialab
 
Taming your media chaos
Taming your media chaosTaming your media chaos
Taming your media chaos
vrt-medialab
 
Presentatie iMinds MediaCRM
Presentatie iMinds MediaCRMPresentatie iMinds MediaCRM
Presentatie iMinds MediaCRM
vrt-medialab
 
Evaluatiestudie VillaSquare
 Evaluatiestudie VillaSquare Evaluatiestudie VillaSquare
Evaluatiestudie VillaSquarevrt-medialab
 
iMinds VillaSquare evaluation IBBT-SMIT
iMinds VillaSquare evaluation IBBT-SMITiMinds VillaSquare evaluation IBBT-SMIT
iMinds VillaSquare evaluation IBBT-SMIT
vrt-medialab
 
Building second screen TV apps
Building second screen TV appsBuilding second screen TV apps
Building second screen TV apps
vrt-medialab
 
Multischermenonderzoek
MultischermenonderzoekMultischermenonderzoek
Multischermenonderzoekvrt-medialab
 
Exploring your media with the Semantic Web
Exploring your media with the Semantic WebExploring your media with the Semantic Web
Exploring your media with the Semantic Web
vrt-medialab
 
BDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRM
BDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRMBDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRM
BDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRM
vrt-medialab
 
Champ belgian broadcast_days
Champ belgian broadcast_daysChamp belgian broadcast_days
Champ belgian broadcast_days
vrt-medialab
 
Champ Pitch Celtic-Plus Event 2011
Champ Pitch Celtic-Plus Event 2011Champ Pitch Celtic-Plus Event 2011
Champ Pitch Celtic-Plus Event 2011
vrt-medialab
 
medialoep
medialoepmedialoep
medialoep
vrt-medialab
 
video for html5
video for html5video for html5
video for html5
vrt-medialab
 
html5 an introduction
html5 an introductionhtml5 an introduction
html5 an introduction
vrt-medialab
 
Boost your search with semantic technology
Boost your search with semantic technologyBoost your search with semantic technology
Boost your search with semantic technology
vrt-medialab
 
Media Square : platform for second screen experiences
Media Square : platform for second screen experiencesMedia Square : platform for second screen experiences
Media Square : platform for second screen experiences
vrt-medialab
 
MediaSquare - Check into your favourite media
MediaSquare - Check into your favourite mediaMediaSquare - Check into your favourite media
MediaSquare - Check into your favourite media
vrt-medialab
 
Transmedia
TransmediaTransmedia
Transmedia
vrt-medialab
 

More from vrt-medialab (20)

Multischermenonderzoek
MultischermenonderzoekMultischermenonderzoek
Multischermenonderzoek
 
Browser as a broadcast medium
Browser as a broadcast mediumBrowser as a broadcast medium
Browser as a broadcast medium
 
Champ iMinds
Champ iMindsChamp iMinds
Champ iMinds
 
Taming your media chaos
Taming your media chaosTaming your media chaos
Taming your media chaos
 
Presentatie iMinds MediaCRM
Presentatie iMinds MediaCRMPresentatie iMinds MediaCRM
Presentatie iMinds MediaCRM
 
Evaluatiestudie VillaSquare
 Evaluatiestudie VillaSquare Evaluatiestudie VillaSquare
Evaluatiestudie VillaSquare
 
iMinds VillaSquare evaluation IBBT-SMIT
iMinds VillaSquare evaluation IBBT-SMITiMinds VillaSquare evaluation IBBT-SMIT
iMinds VillaSquare evaluation IBBT-SMIT
 
Building second screen TV apps
Building second screen TV appsBuilding second screen TV apps
Building second screen TV apps
 
Multischermenonderzoek
MultischermenonderzoekMultischermenonderzoek
Multischermenonderzoek
 
Exploring your media with the Semantic Web
Exploring your media with the Semantic WebExploring your media with the Semantic Web
Exploring your media with the Semantic Web
 
BDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRM
BDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRMBDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRM
BDMA workshop presentation - Using the Second Screen - MediaSquare - MediaCRM
 
Champ belgian broadcast_days
Champ belgian broadcast_daysChamp belgian broadcast_days
Champ belgian broadcast_days
 
Champ Pitch Celtic-Plus Event 2011
Champ Pitch Celtic-Plus Event 2011Champ Pitch Celtic-Plus Event 2011
Champ Pitch Celtic-Plus Event 2011
 
medialoep
medialoepmedialoep
medialoep
 
video for html5
video for html5video for html5
video for html5
 
html5 an introduction
html5 an introductionhtml5 an introduction
html5 an introduction
 
Boost your search with semantic technology
Boost your search with semantic technologyBoost your search with semantic technology
Boost your search with semantic technology
 
Media Square : platform for second screen experiences
Media Square : platform for second screen experiencesMedia Square : platform for second screen experiences
Media Square : platform for second screen experiences
 
MediaSquare - Check into your favourite media
MediaSquare - Check into your favourite mediaMediaSquare - Check into your favourite media
MediaSquare - Check into your favourite media
 
Transmedia
TransmediaTransmedia
Transmedia
 

Recently uploaded

Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 

Recently uploaded (20)

Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 

search and retrieval of audiovisual material

  • 1. PISA Production, Indexing and Search of Audio-visual Material De wiskundige logica achter search en retrieval van audiovisueel materiaal Valérie De Witte, VRT-medialab
  • 2. Archiving archiefnummer : ALG 20010813 1 fragmentnummer : 1 reeks : 1000 ZONNEN EN GARNALEN Opzoekscherm FILM Set: 16 Aantal: 1 bandnummer : E03024404 blz 1 van 3 formaat : DBCM trefwoorden: ibm and vrt fragmenttitel : 1000 ZONNEN & GARNALEN beeld : KL/PALPLUS archiefnummer: - fragmentduur : 18 20 uitzendjaar: maand: dag: tekst : 0'00quot; TOERISTISCH REPORTAGEMAGAZINE OVERZICHT fragmentnummer: fragmentduur: ONDERWERPEN GENERIEK TOERISTISCH REPORTAGEMAGAZINE, reeks: OVERZICHT ONDERWERPEN formaat: bandnummer: 0'50quot; VANDAAG : KUNSTENAAR LUC HOFKENS ONTWIERP EEN OASE aflevering: afleveringsnummer: OP ZIJN DAKTERRAS IN BORGERHOUT DIE DOET DENKEN AAN DE programma: uitzenddatum: GRAND CANYON INTERVIEW MET LUC EN ZIJN VROUW fragmenttitel: MARILOU BUITENBEELD DAK MET OMGEVING BUITENKANT tekst: ARBEIDERSWONING, PANO OVER ROTSWANDEN, KRATEN MET WATER, kategorie: BEPANTING, FOTOALBUM MET VERLOOP WERKEN opnamedatum: opnamenummer: 4'00quot; JUNIOR : KLAARTJE ALAERTS, 13 JAAR WIL ASTRONAUTEN journalist: rechthebbende: WORDEN ZE BEZOEKT HETEUROSPACE CENTER METRUIMTEVEREN, RAKETTEN SIMULATIE IN RUIMTEVEER, INTERVIEW, HEEFT EEN UFO GEZIEN MAAKT ZELF KLEIN RAKETJE, SCHIET HET AF SETS 7'50quot; DE SCHEURKALENDER : ARCHIEF RECLAMEFILM IBM The strings required for the operation are not defined INTERVIEW MAURICE DE WILDE, EERSTE PERSOONLIJKECOMPUTER trefwoorden : BELGIE; BORGERHOUT; ARTIEST; OASE; KUNST; GRAND CANYON (NATUURGEBIED); DAK; TERRAS; INTERVIEW; EURO F11 F12 F13 F14 F17 F18 F19 F20 Ent SPACE CENTER; RUIMTEVAART; PC; BOOTTOCHT; RIJKDOM; Eindigen Sets Refset Toon Vorige Volg/Leeg Thesaurus Commando Opzoeken PASSAGIER; GASTRONOMIE; RESTAURANT; PERSONEEL; VAKANTIE; BINNENBEELD; SCHIP; BECKERS LEEN; VRT; LOTTO; RADIOOMROEPSTER; KLANKSTUDIO; UITVINDING; BARBECUE; BETONMOLEN; IBM; RECLAMESPOT rechthebbende : VRT 81 medialab
  • 3. Issues -> “Annotation” provides structured metadata and needs to become scalable for the increasing set of information -> Automated processing of information is a key issue, but it requires correct and structured metadata -> Product Engineering is the source of structured and meaningful information 82 medialab
  • 5. Milestone 1 – Searching Audiovisual Material Assumptions: • A “scene” is the logical unit of search Search Client (Custom Development) The ideal search engine: • retrieves all relevant items (recall 100%) • without false positives (precision 100%) • provides grouping of similar results • gives instant access to digital media • with respect to intellectual property. Legacy Video Library (Basisplus) NewsML-G2 Raw Material (EBU Superpop) Media Asset Search Engine Management System (Lucene/SOLR) (Ardome) Actual news items (Ardome) 84 medialab
  • 6. Milestone 2 – Computer Assisted Analysis ! Shot segmentation ! Audio classification ! Face detection ! Face recognition ! Scene detection ! Subtitle processing ! Topic recognition Legacy Video Library (Basisplus) NewsML-G2 Raw Material Media Asset (EBU Superpop) Management Asset Media Search Engine Management System (Lucene/SOLR) (Ardome)(Ardome) Actual news items (Ardome) Face Detection Shot Topic Segmentation Recognition Media Scene 85 medialab Production Detection
  • 7. Search systems Actual search implementations are excellent in terms of search capabilities - Boolean logic (AND-, OR- and NOT-operators) - truncation (plural, stemming, capital letters) - thesaurus (synonyms, homonyms,…) - structured metadata and range search - single word and phrase searching But… retrieval efficiency - coverage (composition of the used index, which parts of the documents that are indexed, update frequency) - response time (average waiting time between issuing a search command and displaying the first batch of results on the screen) - user effort (user-friendly interface) - output option (number of output options, layout, clarity) 86 medialab
  • 8. Qualitative evaluation -> precision = l relevant documents ! retrieved documents l l retrieved documents l - fraction of the returned results that are relevant - requires knowledge of the relevant and non-relevant hits in the set of retrieved documents 87 medialab
  • 9. Qualitative evaluation -> recall = l relevant documents ! retrieved documents l l relevant documents l - fraction of the relevant documents in the collection that are retrieved - requires knowledge not only of the relevant and retrieved documents but also of those not retrieved 88 medialab
  • 10. Qualitative evaluation ! There is often an inverse relationship between precision and recall: increasing one will reduce the other ! Concerning recall and precision, one is more important than the other in different use cases -> in some use cases only the hits on the top of the list have to be relevant and there is not interest in looking at every document that is relevant (high precision) -> in some use cases we like to get the recall as high as possible and we will tolerate to see low precision results 89 medialab
  • 11. Trouvaille Precision Actual Search Google Recall medialab
  • 12. Trouvaille ! Thesaurus application: ! During search: keywords in auto-completion, spellcheck and synonyms ! User friendly interface: ! Facetted search: programma, genre, journalist ! Different output views: keywords, thumbnails, Google-maps ! Use of a standard NewsML-G2 ! Metadata is time-coded -> Matching keyframe 91 medialab
  • 13. Trouvaille: future work ! Clustering: integration of copy detection to Precision find duplicates in the retrieved hits ! Intelligent Information Clustering:Concept 100% relationships detection ! Feature extraction: Topic detection ! Combination of system quality and user Intelligent Information clustering satisfaction for the evaluation Trouvaille Feature extraction (MS1) Actual Search Google 100% Recall 92 medialab
  • 14. Trouvaille 93 medialab