audio and video fingerprinting




John Schavemaker, Werner Bailer, Peter-Jan Doets, Jaap Blom
techniek even in kort:

    duplicaatherkenning (video fingerprinting)
        • bestaat een video in onze databases?


  ...
duplicaatherkenning

    VRAAG: bestaat een video in onze databases?

    video fingerprints houden rekening
     met vera...
SWOT video fingerprinting


    STRENGTHS                             WEAKNESSES
    • uitontwikkelde technologie         ...
video categorisatie

    VRAAG: Wat voor categorie video is het?
     Close-up gezicht, binnensport, buitensport?




    ...
SWOT video categorisatie

     STRENGTHS                         WEAKNESSES
     • veel belovende techniek         • onvol...
object- en logoherkenning


    VRAAG: bestaat
    een object of logo
    in onze databases?




                         ...
SWOT object- en logoherkenning

     STRENGTHS                        WEAKNESSES
     • goede, robuuste performance    • a...
video fingerprinting




9     audio and video fingerprinting
Use of FP: identification

                       Audio/visual    Fingerprint
Labeled                  signal        extra...
Sound & Vision Pilot
     • Observations
       • Problem harder than expected
       • Transformations
         • Crop & ...
Sound & Vision Pilot – results ZiuZ

     • TNO has used the ZiuZ video fingerprinting tool on the dataset
     • ZiuZ vid...
Sound & Vision Pilot – Results JRS
     • Recall: 36% (min: 16%, max. 55%)
     • Precision: difficult to determine, many ...
Sound & Vision Pilot - Results
     • Transformations our system handles




14   audio and video fingerprinting
Sound & Vision Pilot - Results
     • False positives




15   audio and video fingerprinting
Experiments with SIFT (1)
     • we do not have a SIFT based fingerprinting
       solution in the consortium
     • JRS h...
Experiments with SIFT (2)




17   audio and video fingerprinting
Experiments with SIFT (3)




18   audio and video fingerprinting
Experiments with SIFT (4)
     • Conclusion
       • SIFT can handle cases of scaling and cropping
         reliably
     ...
Characteristics of the data set - audio

     • Not all archive fragments contain audio
     • Often the original audio is...
Characteristics of the data set – audio example

                                                            Time line of ...
Characteristics of the data set - audio

     • Limitations of the use of audio
         • the reference material must con...
Identification results - audio

     • Only checked if the correct archive file name is returned
                         ...
Fingerprinting – audio algorithm

     • Algorithm well-known from literature:
         • Haitsma, Kalker, “A Highly Robus...
Future improvements on current results

     • Trailing parts contain silence and black frames (no content). The
       si...
Consortium

     http://instituut.beeldengeluid.nl/

     http://www.joanneum.at/en/digital.html

     http://www.ziuz.com...
Upcoming SlideShare
Loading in …5
×

Vdfp audio and video fingerprinting

2,028 views
1,917 views

Published on

Presentation about audio and video fingerprinting, see for more information

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,028
On SlideShare
0
From Embeds
0
Number of Embeds
24
Actions
Shares
0
Downloads
68
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Vdfp audio and video fingerprinting

  1. 1. audio and video fingerprinting John Schavemaker, Werner Bailer, Peter-Jan Doets, Jaap Blom
  2. 2. techniek even in kort: duplicaatherkenning (video fingerprinting) • bestaat een video in onze databases? categorisatie • wat voor categorie video is het? Nieuws, sport, film? object- en logoherkenning • bestaat een object of logo (plaatje) in onze databases? Zie ook ons online rapport over stand van de techniek: http://research.imagesforthefuture.org/index.php/video-fingerprinting-state-of-the-art-report/ 2 audio and video fingerprinting
  3. 3. duplicaatherkenning VRAAG: bestaat een video in onze databases? video fingerprints houden rekening met veranderingen in: • resolutie • codec • ruis • kleur 3 audio and video fingerprinting
  4. 4. SWOT video fingerprinting STRENGTHS WEAKNESSES • uitontwikkelde technologie • veel concurrerende partijen, welk • zeer goede performance op softwarepakket te kiezen? geproduceerd materiaal • geschiktheid voor video materiaal dat • veel commerciële pakketten niet geproduceerd is? verkrijgbaar op de markt OPPORTUNITIES THREATS • grotere video databases • video fingerprints gesloten • niet geproduceerd materiaal standaarden • open standaard video fingerprints • versleuteling video • combinatie met audio • slimme “gebruikers” 4 audio and video fingerprinting
  5. 5. video categorisatie VRAAG: Wat voor categorie video is het? Close-up gezicht, binnensport, buitensport? images UvA http://www.science.uva.nl/research/mediamill/ 5 audio and video fingerprinting
  6. 6. SWOT video categorisatie STRENGTHS WEAKNESSES • veel belovende techniek • onvolwassen techniek • generieke herkenning mogelijk • performance (sterk) afhankelijk • aanvulling op duplicaat- en van gebruikte leervoorbeelden objectherkenning • leren systeem voor nieuwe • brug van de ‘semantic gap’ categorieën duurt relatief lang OPPORTUNITIES THREATS • combinatie van categorieën • variëteit te groot voor categorie • sneller en beter leren • keuze van categorieën • automatische annotatie • afhankelijk van annotatie leervoorbeelden 6 audio and video fingerprinting
  7. 7. object- en logoherkenning VRAAG: bestaat een object of logo in onze databases? picture from http://www.omniperception.com/ 7 audio and video fingerprinting
  8. 8. SWOT object- en logoherkenning STRENGTHS WEAKNESSES • goede, robuuste performance • alleen 2D objecten (logo’s) • commerciële pakketten • echte duplicaatherkenning • snel leren en herkennen • rekenintensief • revolutie in computer vision OPPORTUNITIES THREATS • grotere video databases • pre-processing al het materiaal • open standaard noodzakelijk • 3D object herkenning • patenten 8 audio and video fingerprinting
  9. 9. video fingerprinting 9 audio and video fingerprinting
  10. 10. Use of FP: identification Audio/visual Fingerprint Labeled signal extraction Fingerprints Multimedia and items Metadata Metadata Training phase Identification phase Unlabeled Fingerprint Audio/visual Match Which item? Multimedia extraction signal Metadata items 10 audio and video fingerprinting
  11. 11. Sound & Vision Pilot • Observations • Problem harder than expected • Transformations • Crop & scale • Brightness/contrast • Logos, captions • very difficult PIP • many matching sequences of black frames 11 audio and video fingerprinting
  12. 12. Sound & Vision Pilot – results ZiuZ • TNO has used the ZiuZ video fingerprinting tool on the dataset • ZiuZ video fingerprinting is optimized for child-abuse material: • short clips • low resolution • low image quality • Preliminary results on the Sound & Vision dataset show • material is very challenging • some but limited recall performance • application domain differs • queries containing multiple clips of reference material were not enabled by this version of the tool 12 audio and video fingerprinting
  13. 13. Sound & Vision Pilot – Results JRS • Recall: 36% (min: 16%, max. 55%) • Precision: difficult to determine, many black sequences matching, needs manual checking 13 audio and video fingerprinting
  14. 14. Sound & Vision Pilot - Results • Transformations our system handles 14 audio and video fingerprinting
  15. 15. Sound & Vision Pilot - Results • False positives 15 audio and video fingerprinting
  16. 16. Experiments with SIFT (1) • we do not have a SIFT based fingerprinting solution in the consortium • JRS has SIFT-based interactive tool to locate recurring objects in video • created video from episode + source clips and performed analysis and search 16 audio and video fingerprinting
  17. 17. Experiments with SIFT (2) 17 audio and video fingerprinting
  18. 18. Experiments with SIFT (3) 18 audio and video fingerprinting
  19. 19. Experiments with SIFT (4) • Conclusion • SIFT can handle cases of scaling and cropping reliably • even PIP with distortions • Scalability issues • time for extraction and esp. matching • not sure if ranking of matches is still reliable on huge datasets 19 audio and video fingerprinting
  20. 20. Characteristics of the data set - audio • Not all archive fragments contain audio • Often the original audio is used – just cut-and-paste, no serious distortions • Sometimes the audio is replaced or combined with a voice over • Time segmentation of the audio in the episode is different from the video used. The audio is not always used with the corresponding video fragments. Example on next slide illustrates this. The other ways around, and other variations also occur. 20 audio and video fingerprinting
  21. 21. Characteristics of the data set – audio example Time line of one archive video video audio Time line of one Andere Tijden episode video audio Continuous audio fragment, with several shorter video fragments 21 audio and video fingerprinting
  22. 22. Characteristics of the data set - audio • Limitations of the use of audio • the reference material must contain audio • the audio track might not originate from the same material as the video track; this is dependent on the video material used. • the playout speed must not be changed too much (less than +/- 2%) • Advantages of the use of audio • Highly robust algorithms • Usually audio is undistorted; video is cropped, scaled, etc. • Audio usually is used continuously, while video fragments are cut-and-paste from different sections of the reference video, and ‘glued together’. 22 audio and video fingerprinting
  23. 23. Identification results - audio • Only checked if the correct archive file name is returned False Episode Correct Missed Positive Liggadjati 8 3 0 Veertig jaar STER-reclame 10 4 1 75 jaar afsluitdijk 0 5 2 Strijd tegen de file 9 1 6 Kronkels van de Maas 1 9 1 Op zoek naar Nederland 2 6 1 Modderen in de polder: Lelystad 3 1 2 Burgemeesters in oorlogstijd 6 10 0 De wording van Paars 8 1 0 Pim en zijn volk 7 3 0 23 audio and video fingerprinting silent parts in the video
  24. 24. Fingerprinting – audio algorithm • Algorithm well-known from literature: • Haitsma, Kalker, “A Highly Robust Audio Fingerprinting System”, In Proceedings of 3rd International Conference onMusic Information Retrieval (ISMIR), October 2002. • Features: energy in 33 audio frequency bands • Every 11.6 ms a 32-bit sub-fingerprint is computed, consisting of coarsely quantized differences between these energy samples • Fingerprint consists of a time series of sub-fingerprints • The implementation returns the best matching fragments only (settings to return no false positives) • Algorithm is highly robust, and highly discriminative 24 audio and video fingerprinting
  25. 25. Future improvements on current results • Trailing parts contain silence and black frames (no content). The silences give rise to false positives and irrelevant detections. A silence/activity detector is needed to exclude these parts. • Our current implementation from literature allows for only one fragment per reference file to be returned. • Our current implementation has only coarse time localization. • Combination of audio and video fingerprinting 25 audio and video fingerprinting
  26. 26. Consortium http://instituut.beeldengeluid.nl/ http://www.joanneum.at/en/digital.html http://www.ziuz.com http://hs-art.com/ http://www.tno.nl 26 audio and video fingerprinting

×