The data setThe ground truth creationSaying some words about submissions
Motivation, why is this useful (placing)(Task) Spur innovationVideos are harder than images – different, less likely to be well tagged, hence video/audio only run
US, Belgium+UK, Germany, Netherlands, France, France+Spain BrazilUniversity of California, Berkeley, USAU of Ghent, Belgium + U of Cardiff, UKDelft University of Technology, The NetherlandsCommissariat àl’energieatomique, FranceINRIA, IRISA, France, University PompeuFabra, SpainUniversity of Campinas, BrazilTechnischeUniversität Berlin
How were test videos selected?
Red signifies team connected to an organiserHighest performance @1km: 1166/4182 = 27.9% - CEALISTRising to 48% @ 10kmComparison, last year: UGENT got 40% @ 1km, 57% at 10km – BEWARE COMPARISONS
----- Meeting Notes (04/10/2012 01:16) -----CEALIST - combination of lnaguage models and user models
Why is UNICAMP so much better at the visual only run? Used Histogram of Motion Patterns, 2 x Bag of Scenes using Colour and Edge Directivity Descriptors
Teams to test on previous year’s data set in future for comparisonFor text based approaches: See this teamFor xxx. Etc.
Working Notes for the Placing Task at MediaEval 2012
! Smile!? Placing Task Organisers: Adam Rae (Yahoo! Research) Pascal Kelm (Technische Universität Berlin)
Task Description• Given a video, how accurately can it be placed on a map and be given latitude and longitude coordinates? METADATA
Task Overview• Automatic location annotation of online videos• 7 teams submitted results (17% up) – 5 veterans – 2 new participants• First year for code sharing – GitHub (currently)
Data• Provided – Textual metadata: tags, titles, descriptions – Visual: 9 visual features extracted for key frames every 4 seconds – Additional media: images with textual and visual feature data• Available (external) – Up to the participant, but controlled according to run submission
Data• Training – 15,563 videos (combination of last year’s training and test data) – 3,185,258 additional Flickr images• Test – 4,182 videos
Evaluation• Take the latitude + longitude suggested by participants for each video• Compute Haversine distance between that and the ‘true’ location• We group results into buckets of increasing radii, e.g. 1km, 10km, 20km, etc.
Overall Best Results Percentage of correct locations @ 1km TUD ICSI TUB Organiser-connected team GENTUNICAMP IRISA CEALIST 0% 5% 10% 15% 20% 25% 30%
Only Restriction: No new material, gazetteer permitted 4500 4000 3500Correct Test Videos 3000 2500 2000 1500 1000 500 0 1 10 100 1000 10000 100000 Distance from Ground Truth ICSI TUD UG-CU UNICAMP CEA_LIST London Baseline
Restriction: Visual Only 4500 4000 3500Correct Test Videos 3000 2500 2000 1500 1000 500 0 1 10 100 1000 10000 100000 Distance from Ground Truth CEA_LIST ICSI IRISA UG-CU UNICAMP TUB
Detected trends and activity of note• What classes of approaches were taken (has this change since last year?) – Textual, visual – Graph modelling – User modelling – …combinations of above• Challenging Assumptions – Spatial locality visual stability?• Absolute performance lower than last year – but… – Different data set – Less textual metadata in general
Future of the task• Still room for improvement• Still a valuable task?• Standard of science improving• Need new organisers! Talk to Pascal and me