• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
SED2012 Dataset
 

SED2012 Dataset

on

  • 645 views

Presentation of the SED2012 dataset @ MMSys 2013, Oslo, Norway

Presentation of the SED2012 dataset @ MMSys 2013, Oslo, Norway

Statistics

Views

Total Views
645
Views on SlideShare
645
Embed Views
0

Actions

Likes
0
Downloads
4
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Events with 1 or 2 photos are much harder to detect, e.g. by methods based on clustering.

SED2012 Dataset SED2012 Dataset Presentation Transcript

  • The 2012 Social Event Detection DatasetSymeon Papadopoulos1, Emmanouil Schinas1, Vasileios Mezaris1,Raphaël Troncy2, Yiannis Kompatsiaris11 CERTH-ITI, Thessaloniki, Greece2 EURECOM, Sophia Antipolis, FranceOslo, 28 Feb - 1 Mar 2013
  • SED2012 Overview• Large collection (>160K) of CC-licensed Flickr photos and some of their metadata• Event annotations for 149 target events (of specific categories and locations of interest)• Primary use: Social event detection – Used in the context of MediaEval 2012 (SED task)• Secondary uses: image geotagging, distractors in CBIR, city summarization 2
  • Dataset OverviewFlickr photo collection• 167,332 photos• 4,422 unique contributors• Creative Commons licensesEvent Annotations• Challenge 1: Technical events in Germany• Challenge 2: Soccer events in Hamburg and Madrid• Challenge 3: Indignados movement events in Madrid 3
  • Data Collection Process• Flickr API: http://www.flickr.com/services/api/• Used method flickr.photo.search with five geographical centres: Barcelona, Cologne, Hamburg, Hannover, Madrid• Time period: Jan 2009 – Dec 2011• All photos CC licensed• 403 photos from the EventMedia collection R. Troncy, B. Malocha, and A. Fialho. Linking Events with Media. In 6th Intern. Conference on Semantic Systems (I-SEMANTICS), Graz, Austria, 2010 4
  • Photo DistributionPlace distributionYearly distributionLanguage distribution 5
  • Dataset Collection MotivationSelection of five cities (three German, two Spanish):• Include large number of non-English text metadata (cf. language distribution table)• Ensure existence of numerous events for the target types• Include distractor images: – Challenge 2: Cologne, Hannover distractor for Hamburg, Barcelona distractor for Madrid – Challenge 3: Barcelona distractor for MadridSelection of only geotagged photos:• Ease of annotationSelection of only CC-licensed photos:• Reuse of collection for research 6
  • Tag Statistics (1/2) number of users using the tag51,611 unique tagsprevalence oflocation specific tagsevent-specific tags 7
  • Tag Statistics (2/2) barcelona>20K photos have no tags spain madrid >57% of tags appear once or twice 83.9% less than or equal to 10 tags >40K tags appear less than 10 times 8
  • User Statistics 60% of users less than 10 photos 30 most active users contribute ~30% of dataset 9
  • Ground Truth Creation• Manual annotations by use of CrEve – web-based annotation – two-round annotation by five annotators (three in the first, two in the second) – interactive annotation (search & annotate) – each round terminated as soon as no new event-related photos discovered – approximate effort: 100 person-hours C. Zigkolis, S. Papadopoulos, G. Filippou, Y. Kompatsiaris, A. Vakali. Collaborative Event Annotation in Tagged Photo Collections. Multimedia Tools & Applications, 2012• Annotations for Challenge 1 enriched by EventMedia (403 photos featuring technical events in Germany) 10
  • Ground Truth Statistics (1/3) 10 events related with >100 photos ~27% of events associated with 1 or 2 photos 11
  • Ground Truth Statistics (2/3)106 events are captured bysingle users erroneous timestamps in photos 9 events captured by more The majority of events last for less than 10 people than a day (typical for soccer) 12
  • Ground Truth Statistics (3/3) Madrid events Santiago Bernabeu stadium Puerta del SolStadium of Butarque Vicente Calderon stadium 13
  • Technical Event ExamplesPHP Unconf. 2010 Gamescom 2009 CeBIT 2010 Convention Camp 2011 14
  • Soccer Event ExamplesReal Madrid – Milan (2010) World Cup 2010 St. Pauli – HSV (2010) Spain – Colombia (2011) 15
  • Indignados Event ExamplesInaugural march, 15 May Large gathering, 20 May Gathering, 15 Oct Demonstration, 17 Nov 16
  • Evaluation• F-measure (macro), Precision, Recall – goodness of retrieved photos, but not how well they were clustered into events• Normalized Mutual Information (NMI) – compares automatically extracted clustering of photos into events with the ground truth• Evaluation script is made available together with the dataset.• Implementation of event detection available: http://mklab.iti.gr/project/sed2012_certh 17
  • Questions @sympapadopoulos www.slideshare.net/sympapadopoulos