Your SlideShare is downloading. ×
SED2012 Dataset
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

SED2012 Dataset


Published on

Presentation of the SED2012 dataset @ MMSys 2013, Oslo, Norway

Presentation of the SED2012 dataset @ MMSys 2013, Oslo, Norway

Published in: Technology

1 Comment
  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Events with 1 or 2 photos are much harder to detect, e.g. by methods based on clustering.
  • Transcript

    • 1. The 2012 Social Event Detection DatasetSymeon Papadopoulos1, Emmanouil Schinas1, Vasileios Mezaris1,Raphaël Troncy2, Yiannis Kompatsiaris11 CERTH-ITI, Thessaloniki, Greece2 EURECOM, Sophia Antipolis, FranceOslo, 28 Feb - 1 Mar 2013
    • 2. SED2012 Overview• Large collection (>160K) of CC-licensed Flickr photos and some of their metadata• Event annotations for 149 target events (of specific categories and locations of interest)• Primary use: Social event detection – Used in the context of MediaEval 2012 (SED task)• Secondary uses: image geotagging, distractors in CBIR, city summarization 2
    • 3. Dataset OverviewFlickr photo collection• 167,332 photos• 4,422 unique contributors• Creative Commons licensesEvent Annotations• Challenge 1: Technical events in Germany• Challenge 2: Soccer events in Hamburg and Madrid• Challenge 3: Indignados movement events in Madrid 3
    • 4. Data Collection Process• Flickr API:• Used method with five geographical centres: Barcelona, Cologne, Hamburg, Hannover, Madrid• Time period: Jan 2009 – Dec 2011• All photos CC licensed• 403 photos from the EventMedia collection R. Troncy, B. Malocha, and A. Fialho. Linking Events with Media. In 6th Intern. Conference on Semantic Systems (I-SEMANTICS), Graz, Austria, 2010 4
    • 5. Photo DistributionPlace distributionYearly distributionLanguage distribution 5
    • 6. Dataset Collection MotivationSelection of five cities (three German, two Spanish):• Include large number of non-English text metadata (cf. language distribution table)• Ensure existence of numerous events for the target types• Include distractor images: – Challenge 2: Cologne, Hannover distractor for Hamburg, Barcelona distractor for Madrid – Challenge 3: Barcelona distractor for MadridSelection of only geotagged photos:• Ease of annotationSelection of only CC-licensed photos:• Reuse of collection for research 6
    • 7. Tag Statistics (1/2) number of users using the tag51,611 unique tagsprevalence oflocation specific tagsevent-specific tags 7
    • 8. Tag Statistics (2/2) barcelona>20K photos have no tags spain madrid >57% of tags appear once or twice 83.9% less than or equal to 10 tags >40K tags appear less than 10 times 8
    • 9. User Statistics 60% of users less than 10 photos 30 most active users contribute ~30% of dataset 9
    • 10. Ground Truth Creation• Manual annotations by use of CrEve – web-based annotation – two-round annotation by five annotators (three in the first, two in the second) – interactive annotation (search & annotate) – each round terminated as soon as no new event-related photos discovered – approximate effort: 100 person-hours C. Zigkolis, S. Papadopoulos, G. Filippou, Y. Kompatsiaris, A. Vakali. Collaborative Event Annotation in Tagged Photo Collections. Multimedia Tools & Applications, 2012• Annotations for Challenge 1 enriched by EventMedia (403 photos featuring technical events in Germany) 10
    • 11. Ground Truth Statistics (1/3) 10 events related with >100 photos ~27% of events associated with 1 or 2 photos 11
    • 12. Ground Truth Statistics (2/3)106 events are captured bysingle users erroneous timestamps in photos 9 events captured by more The majority of events last for less than 10 people than a day (typical for soccer) 12
    • 13. Ground Truth Statistics (3/3) Madrid events Santiago Bernabeu stadium Puerta del SolStadium of Butarque Vicente Calderon stadium 13
    • 14. Technical Event ExamplesPHP Unconf. 2010 Gamescom 2009 CeBIT 2010 Convention Camp 2011 14
    • 15. Soccer Event ExamplesReal Madrid – Milan (2010) World Cup 2010 St. Pauli – HSV (2010) Spain – Colombia (2011) 15
    • 16. Indignados Event ExamplesInaugural march, 15 May Large gathering, 20 May Gathering, 15 Oct Demonstration, 17 Nov 16
    • 17. Evaluation• F-measure (macro), Precision, Recall – goodness of retrieved photos, but not how well they were clustered into events• Normalized Mutual Information (NMI) – compares automatically extracted clustering of photos into events with the ground truth• Evaluation script is made available together with the dataset.• Implementation of event detection available: 17
    • 18. Questions @sympapadopoulos