Information Retrieval On Digital Video Information - Presentation Transcript
The TREC2001 Video Track: Information Retrieval on Digital Video Information Alan F. Smeaton Centre for Digital Video Processing, Dublin City University, Ireland Paul Over National Institute for Standards and Technology, USA Cash J. Costello Applied Physics Laboratory, Johns Hopkins University, USA Arjen P. de Vries CWI, Amsterdam, The Netherlands David Doermann Laboratory for Language and Media Processing, University of Maryland, USA Alexander Hauptmann School of Computer Science, Carnegie Mellon University, USA Mark E. Rorvig School of Library and Information Sciences, University of North Texas, USA John R. Smith IBM T.J. Watson Research Center, USA Lide Wu Dept. of Computer Science, Fudan University, China
TREC2001
TREC2001 Video Track
TREC2001 Video Track Tasks
Shot Boundary Detection Task
Search Task
Search Task
Participants in Search Task & Their Focus
Summary of approaches by participants
Conclusion
Presentation overview 2/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Annual activity (1992- ) to “benchmark the retrieval effectiveness of Information Retrieval tasks”
Co-ordinator NIST (National Institute for Standards and Technology, US) defines & distributes:
Test document corpus
Topics (queries)
Participating groups develop an IR system, run Topics against Test document corpus, sends the results to NIST
NIST generate relevance assessments and calculate the performance in terms of precision & recall
Annual conference in Gaithersburg, Maryland
TREC (Text REtrieval Conference) 3/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Different streams, introduced to focuses on a particular sub-problems in Information Retrieval
15 different “tracks” have been introduced, some stopped, some continuing, e.g:
Interactive track 1993-
Chinese language track 1995-1998
Web track 1998-
Question Answering track 1998-
Video track 2001-
“ Tracks” in TREC 4/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
1st Video Track in 2001
Promote progress in content-based retrieval from digital video via open, metrics-based evaluation
12 Participating groups (5 USA, 2 Asia, 5 Europe) - contributing definition of corpus, topics, task via discussion, and running of the track
Following the TREC framework: NIST co-ordinated and provided:
Video document corpus
Topic queries
Video Track in TREC2001 5/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Video document corpus - total 11.2 hours (85 video files in MPEG-1 format; 6.3 Gbytes), mostly documentary nature, varying in age, style and quality e.g:
Video Track in TREC2001 6/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
“ A New Horizon” (16 min; colour; documentary) - This Great Plains orientation tape explains the boundaries of the Great Plains Region which is one of five regions that make up the Bureau of Reclamation
“ Challenge at Glen Canyon” (26 min; colour; documentary) - Shows how the repairing of the spillway caused by flooding along the Colorado River System was conducted
74 Topics (queries) - with multimedia examples (audio/image/video) along with each topic, e.g:
Topic #8: “find clips showing the planet Jupiter”
(with 2 images depicting Jupiter)
Topic #32: “find clips with a chopper landing”
(with 3 audio clips of a helicopter sound)
Topic #54: “find clips showing Glen Canyon dam”
(with a short video clip showing Glen Canyon dam)
Video Track in TREC2001 7/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information Number of topics 74 No. topics with image examples / Avg. number of images 26 / 2.0 No. topics with audio examples / Avg. number of audio 10 / 4.3 No. topics with video examples / Avg. number of videos 51 / 2.4
Two distinctive tasks:
Shot Boundary Detection task: engineering exercise to evaluate the accuracy of automatically detecting camera shot boundaries in the video corpus
Tasks in Video Track in TREC2001 8/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Facilitates higher-level video indexing/browsing (e.g scene detection/navigation, news story segmentation…)
Video file Camera shot
Tasks in Video Track in TREC2001 9/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Two distinctive tasks:
Search task: running topic queries against the video corpus, searching for the video segments that answer the queries
Automatic
Interactive
Answer segments are submitted to NIST for evaluation
Among 12 participating groups in the TREC2001 Video Track:
all 12 groups took part in the Shot Boundary Task
8 groups took part in the Search Task
Participants in Search Task:
Carnegie Mellon University, USA
Dublin City University, Ireland
Fudan University, China
IBM Research, USA
Johns Hopkins University, USA
Lowlands Group (Netherlands)
University of Maryland, USA
University of North Texas, USA
Participating Groups in Search Task 10/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Used Informedia Digital Video Library’s standard processing modules
Shot Boundary Detection (using color histogram comparison)
Keyframe extraction
Speech recognition (using Sphinx speech recogniser with 64,000 word vocabulary)
Face detection
Video OCR
Image search based on color histogram features in different colour spaces and textures
Informedia interface for Interactive track, users allowed to switch between multiple image search engines
Image retrieval augmented to process I-frames (not only keyframes)
Speaker identification component used to compare query audio example to the audio in the retrieved video segment
Image retrieval & video OCR had the largest impact on performance
Carnegie Mellon University (USA) 11/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Using Físchlár Digital Video System
Shot boundary detection & Keyframe extraction
Allowed users to browse through keyframes with different browsing interfaces including:
30 test users (final year undergrads & research students) interacted with the system in controlled environment
12 topic queries / user
6 minutes / topic query
within-user setting (each user used all 3 browsers 4 times each, in round robin fashion)
Timeline browser allowed largest number of answer submissions, with lowest precision, Slide Show vice versa
Dublin City University (Ireland) 12/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Tried 17 topics including people searching, video text searching, camera motion, etc.)
Feature extraction module:
qualitative camera motion analysis module
face detection/recognition module (skin color based segmentation + motion/shape filtering, use of a new optimal discrimination criterion)
video text detection/recognition module (vertical edge based methods to detect text blocks; improved logical level technique to binarize text blocks)
speaker recognition / speaker clustering module
Speech SDK (Microsoft) to get transcript
Off -line indexing followed by on-line searching
Fudan University (China) 13/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Members from IBM T.J. Watson Research Center & IBM Almaden Research Center
Using IBM CueVideo System
Shot Boundary Detection & Keyframe extraction
MPEG-7 visual descriptors for indexing keyframes & answering automatic searches
Statistical model for classifying & generating labels/scores for:
events (fire, smoke, launch)
scenes (greenery, land, outdoors, rock, sand, sky, water)
objects (airplane, boat, rocket, vehicle, faces)
Query/filter pipelines to cascaded content- & model-based searching, e.g “shots that have similar colour to this image, have label ‘outdoors’ and show a ‘boat’ ”
Compared performance of content/module-based system vs. speech-based system: best results obtained by combining the two methods
IBM Research 14/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Automatic searching:
Keyframes are used for indexing by color histogram & image texture
Query representation consisting of image & video portion of information need
Similarity measure by weighting distance between the image features of the query representation and the indexed keyframes: Shots with most similar keyframes associated are then retrieved.
Johns Hopkins University (USA) 15/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Joint group among database group of CWI, multimedia group of TNO, vision group of University of Amsterdam, language technology group of University of Twente
Retrieval engine based on:
face detection
camera motion detection (pan, tilt, zoom)
monologue detection
video OCR detection
System heuristically selected a set of filters based on the detectors by analysing the query text with WordNet
Compared performance with Transcript-based (provided by CMU) system
Transcript-based system outperformed features-based system
Lowlands Group (The Netherlands) 16/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Temporal Color Correlogram - to capture the spatio-temporal relationship of colors in a video shot
Using MERIT system with VideoLogger video editing software (from Virage)
Keyframe extraction (1st frame in the shot) => static image color correlogram calculation => temporal correlogram calculation (by shot segmentation in equal intervals, then shot features fed into CMRS retrieval system)
TREC topic queries were translated into example videos/images
University of Maryland (USA) 17/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Keyframe extraction (frames every 5 seconds)
Redundant keyframe removal (to ensure presence of frames outside the prescribed normal distribution limits)
Resulting keyframes placed into UNT’s Brighton Image Searcher application (retrieval based on mathematical measures that correspond to primitive image features)
13 topics used by 2 members to retrieve relevant keyframes against topics
Chosen keyframes were then used as an exemplar to find other keyframes similar to them.
Precision scores were better than expected due to the human judgement presence
University of North Texas (USA) 18/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Varied approaches by different groups
Interactive searching vs. automatic searching
Speech recognition transcript vs. visual-only
Various combination of different features for retrieval
Experienced groups vs. new groups in video retrieval
Performance (Precision) results varied greatly:
Interactive: Best group 0.6 - Worst group 0.23 (across same 31 topics)
Automatic: 0.609 - 0.002
The video track was still shaping itself in 2001 & not complete
only small-scale comparisons possible (within-topic, between closely related system variants)
cross-system comparison possible only after achieving better consistency in topic formulation, agreement on better measures, larger numbers of data points)
Difficulties & unforeseen problems highlighted, tackled in 2nd Video track in TREC2002
Summary & Analysis of Approaches 19/ 21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Conclusions
Revealed lots of issues to be addressed in evaluating the performance of retrieval on digital video information
There are groups working in this area worldwide who have the capability and the systems to support real information retrieval on significant volumes of digital video content
20/21 TREC2001 Video Track: Information Retrieval on Digital Video Information
Conclusion
TREC2001 Video Track website with papers:
http://www-nlpir.nist.gov/projects/t01v/t01v.html
Authors’ Note: The authors wish to extend our sympathies to the family and friends of our co-author, Mark E. Rorvig, who passed away shortly before this paper was submitted.
21/21 TREC2001 Video Track: Information Retrieval on Digital Video Information
0 comments
Post a comment