• Like
Information Retrieval On Digital Video Information
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Information Retrieval On Digital Video Information

  • 1,579 views
Published

 

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,579
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
62
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The TREC2001 Video Track: Information Retrieval on Digital Video Information Alan F. Smeaton Centre for Digital Video Processing, Dublin City University, Ireland Paul Over National Institute for Standards and Technology, USA Cash J. Costello Applied Physics Laboratory, Johns Hopkins University, USA Arjen P. de Vries CWI, Amsterdam, The Netherlands David Doermann Laboratory for Language and Media Processing, University of Maryland, USA Alexander Hauptmann School of Computer Science, Carnegie Mellon University, USA Mark E. Rorvig School of Library and Information Sciences, University of North Texas, USA John R. Smith IBM T.J. Watson Research Center, USA Lide Wu Dept. of Computer Science, Fudan University, China
  • 2.
    • TREC2001
    • TREC2001 Video Track
    • TREC2001 Video Track Tasks
      • Shot Boundary Detection Task
      • Search Task
    • Search Task
    • Participants in Search Task & Their Focus
    • Summary of approaches by participants
    • Conclusion
    Presentation overview 2/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
  • 3.
    • Annual activity (1992- ) to “benchmark the retrieval effectiveness of Information Retrieval tasks”
    • Co-ordinator NIST (National Institute for Standards and Technology, US) defines & distributes:
      • Test document corpus
      • Topics (queries)
    • Participating groups develop an IR system, run Topics against Test document corpus, sends the results to NIST
    • NIST generate relevance assessments and calculate the performance in terms of precision & recall
    • Annual conference in Gaithersburg, Maryland
    TREC (Text REtrieval Conference) 3/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
  • 4.
    • Different streams, introduced to focuses on a particular sub-problems in Information Retrieval
    • 15 different “tracks” have been introduced, some stopped, some continuing, e.g:
      • Interactive track 1993-
      • Chinese language track 1995-1998
      • Web track 1998-
      • Question Answering track 1998-
      • Video track 2001-
    “ Tracks” in TREC 4/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
  • 5.
    • 1st Video Track in 2001
    • Promote progress in content-based retrieval from digital video via open, metrics-based evaluation
    • 12 Participating groups (5 USA, 2 Asia, 5 Europe) - contributing definition of corpus, topics, task via discussion, and running of the track
    • Following the TREC framework: NIST co-ordinated and provided:
      • Video document corpus
      • Topic queries
    Video Track in TREC2001 5/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
  • 6.
    • Video document corpus - total 11.2 hours (85 video files in MPEG-1 format; 6.3 Gbytes), mostly documentary nature, varying in age, style and quality e.g:
    Video Track in TREC2001 6/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
    • “ A New Horizon” (16 min; colour; documentary) - This Great Plains orientation tape explains the boundaries of the Great Plains Region which is one of five regions that make up the Bureau of Reclamation
    • “ Challenge at Glen Canyon” (26 min; colour; documentary) - Shows how the repairing of the spillway caused by flooding along the Colorado River System was conducted
  • 7.
    • 74 Topics (queries) - with multimedia examples (audio/image/video) along with each topic, e.g:
      • Topic #8: “find clips showing the planet Jupiter”
      • (with 2 images depicting Jupiter)
      • Topic #32: “find clips with a chopper landing”
      • (with 3 audio clips of a helicopter sound)
      • Topic #54: “find clips showing Glen Canyon dam”
      • (with a short video clip showing Glen Canyon dam)
    Video Track in TREC2001 7/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information Number of topics 74 No. topics with image examples / Avg. number of images 26 / 2.0 No. topics with audio examples / Avg. number of audio 10 / 4.3 No. topics with video examples / Avg. number of videos 51 / 2.4
  • 8.
    • Two distinctive tasks:
      • Shot Boundary Detection task: engineering exercise to evaluate the accuracy of automatically detecting camera shot boundaries in the video corpus
    Tasks in Video Track in TREC2001 8/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
      • Facilitates higher-level video indexing/browsing (e.g scene detection/navigation, news story segmentation…)
    Video file Camera shot
  • 9. Tasks in Video Track in TREC2001 9/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
    • Two distinctive tasks:
      • Search task: running topic queries against the video corpus, searching for the video segments that answer the queries
        • Automatic
        • Interactive
      • Answer segments are submitted to NIST for evaluation
  • 10.
    • Among 12 participating groups in the TREC2001 Video Track:
      • all 12 groups took part in the Shot Boundary Task
      • 8 groups took part in the Search Task
    • Participants in Search Task:
      • Carnegie Mellon University, USA
      • Dublin City University, Ireland
      • Fudan University, China
      • IBM Research, USA
      • Johns Hopkins University, USA
      • Lowlands Group (Netherlands)
      • University of Maryland, USA
      • University of North Texas, USA
    Participating Groups in Search Task 10/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
  • 11.
    • Used Informedia Digital Video Library’s standard processing modules
      • Shot Boundary Detection (using color histogram comparison)
      • Keyframe extraction
      • Speech recognition (using Sphinx speech recogniser with 64,000 word vocabulary)
      • Face detection
      • Video OCR
      • Image search based on color histogram features in different colour spaces and textures
    • Informedia interface for Interactive track, users allowed to switch between multiple image search engines
    • Image retrieval augmented to process I-frames (not only keyframes)
    • Speaker identification component used to compare query audio example to the audio in the retrieved video segment
    • Image retrieval & video OCR had the largest impact on performance
    Carnegie Mellon University (USA) 11/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
  • 12.
    • Using Físchlár Digital Video System
    • Shot boundary detection & Keyframe extraction
    • Allowed users to browse through keyframes with different browsing interfaces including:
      • Timeline browser (linear, spatial keyframe presentation)
      • Slide Show browser (linear, temporal keyframe presentation)
      • Hierarchical browser (hierarchical, spatial keyframe presentation)
    • 30 test users (final year undergrads & research students) interacted with the system in controlled environment
      • 12 topic queries / user
      • 6 minutes / topic query
      • within-user setting (each user used all 3 browsers 4 times each, in round robin fashion)
    • Timeline browser allowed largest number of answer submissions, with lowest precision, Slide Show vice versa
    Dublin City University (Ireland) 12/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
  • 13.
    • Tried 17 topics including people searching, video text searching, camera motion, etc.)
    • Feature extraction module:
      • qualitative camera motion analysis module
      • face detection/recognition module (skin color based segmentation + motion/shape filtering, use of a new optimal discrimination criterion)
      • video text detection/recognition module (vertical edge based methods to detect text blocks; improved logical level technique to binarize text blocks)
      • speaker recognition / speaker clustering module
      • Speech SDK (Microsoft) to get transcript
    • Off -line indexing followed by on-line searching
    Fudan University (China) 13/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
  • 14.
    • Members from IBM T.J. Watson Research Center & IBM Almaden Research Center
    • Using IBM CueVideo System
      • Shot Boundary Detection & Keyframe extraction
      • MPEG-7 visual descriptors for indexing keyframes & answering automatic searches
      • Statistical model for classifying & generating labels/scores for:
        • events (fire, smoke, launch)
        • scenes (greenery, land, outdoors, rock, sand, sky, water)
        • objects (airplane, boat, rocket, vehicle, faces)
      • Query/filter pipelines to cascaded content- & model-based searching, e.g “shots that have similar colour to this image, have label ‘outdoors’ and show a ‘boat’ ”
    • Compared performance of content/module-based system vs. speech-based system: best results obtained by combining the two methods
    IBM Research 14/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
  • 15.
    • Automatic searching:
      • Keyframes are used for indexing by color histogram & image texture
      • Query representation consisting of image & video portion of information need
      • Similarity measure by weighting distance between the image features of the query representation and the indexed keyframes: Shots with most similar keyframes associated are then retrieved.
    Johns Hopkins University (USA) 15/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
  • 16.
    • Joint group among database group of CWI, multimedia group of TNO, vision group of University of Amsterdam, language technology group of University of Twente
    • Retrieval engine based on:
      • face detection
      • camera motion detection (pan, tilt, zoom)
      • monologue detection
      • video OCR detection
    • System heuristically selected a set of filters based on the detectors by analysing the query text with WordNet
    • Compared performance with Transcript-based (provided by CMU) system
    • Transcript-based system outperformed features-based system
    Lowlands Group (The Netherlands) 16/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
  • 17.
    • Temporal Color Correlogram - to capture the spatio-temporal relationship of colors in a video shot
    • Using MERIT system with VideoLogger video editing software (from Virage)
    • Keyframe extraction (1st frame in the shot) => static image color correlogram calculation => temporal correlogram calculation (by shot segmentation in equal intervals, then shot features fed into CMRS retrieval system)
    • TREC topic queries were translated into example videos/images
    University of Maryland (USA) 17/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
  • 18.
    • Keyframe extraction (frames every 5 seconds)
    • Redundant keyframe removal (to ensure presence of frames outside the prescribed normal distribution limits)
    • Resulting keyframes placed into UNT’s Brighton Image Searcher application (retrieval based on mathematical measures that correspond to primitive image features)
    • 13 topics used by 2 members to retrieve relevant keyframes against topics
    • Chosen keyframes were then used as an exemplar to find other keyframes similar to them.
    • Precision scores were better than expected due to the human judgement presence
    University of North Texas (USA) 18/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
  • 19.
    • Varied approaches by different groups
      • Interactive searching vs. automatic searching
      • Speech recognition transcript vs. visual-only
      • Various combination of different features for retrieval
      • Experienced groups vs. new groups in video retrieval
    • Performance (Precision) results varied greatly:
      • Interactive: Best group 0.6 - Worst group 0.23 (across same 31 topics)
      • Automatic: 0.609 - 0.002
    • The video track was still shaping itself in 2001 & not complete
      • only small-scale comparisons possible (within-topic, between closely related system variants)
      • cross-system comparison possible only after achieving better consistency in topic formulation, agreement on better measures, larger numbers of data points)
    • Difficulties & unforeseen problems highlighted, tackled in 2nd Video track in TREC2002
    Summary & Analysis of Approaches 19/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
  • 20. Conclusions
    • Revealed lots of issues to be addressed in evaluating the performance of retrieval on digital video information
    • There are groups working in this area worldwide who have the capability and the systems to support real information retrieval on significant volumes of digital video content
    • 2nd Video Track (2002)
      • more than 20 participating groups
      • 68.5 hours of video document corpus
      • 25 focused set of topic queries
      • Tasks:
        • Shot Boundary Detection - as before
        • Semantic feature extraction task (face, indoor/outdoor, landscape/cityscape, speech/music/monologue, etc.)
        • Search - interactive or automatic as before
    20/21 TREC2001 Video Track: Information Retrieval on Digital Video Information
  • 21. Conclusion
    • TREC2001 Video Track website with papers:
    • http://www-nlpir.nist.gov/projects/t01v/t01v.html
    • Authors’ Note: The authors wish to extend our sympathies to the family and friends of our co-author, Mark E. Rorvig, who passed away shortly before this paper was submitted.
    21/21 TREC2001 Video Track: Information Retrieval on Digital Video Information