Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

QMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo Collections


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

QMUL @ MediaEval 2012: Social Event Detection in Collaborative Photo Collections

  1. 1. QMUL @ MediaEval 2012:Social Event Detection inCollaborative Photo CollectionsMarkus Brenner, Prof. Ebroul IzquierdoMultimedia and Vision Research GroupQueen Mary University of London, UK
  2. 2. OBJECTIVEIn Collaborative Photo Collection …1. Find and detect social events2. Retrieve photos associated with the events… with the help of additional, external information
  3. 3. INTRODUCTION ANDBACKGROUND Internet enables people to host, access and share their photos online; for example, through websites like Flickr and Facebook Collaborative annotations and tags as well as public comments are commonplace Information people assign varies greatly but often seems to include some sort of references to what happened where and who was involved  observed experiences or occurrences  simply referred to as events
  4. 4. INTRODUCTION ANDBACKGROUND Easier to search through photo collections if photos are grouped into events Link events in photo collections to public social media like online news feeds Automatically link news with corresponding photos Provide additional information that might be relevant to users to facilitate their search, like the date and location of an event
  5. 5. OVERVIEW OF FRAMEWORK Query Preprocessing Matching Composing Textual Extracting Visual Geographic Features Features Gathering External Data Locations General Detecting Events Limiting Search Space Looking up Translating Terms Geographic Google Google Locations GeocodingTranslate API API By Date/Time By Date and By Date and Time By Location and Topic Location Compiling Names Expanding the of Geographic WordNet Topic GeoNames Locations Detected Retrieving Photos Events DBpedia(via SPARQL) Topic-Specific Textual Features Retrieved Photos Expanding Visual Pruning Soccer Matches* Classification Feature Space (Classification) * Example. Framework extendable to other topics.
  6. 6. GATHERING EXTERNAL DATA Expanding the topic Handling geographic locations (e.g. compiling names of locations)
  7. 7. Expanding the Topic Social events often revolve around a topic Examples: Festivals, sport events, … Problem: Users to no adhere to a controlled vocabulary Idea: Expand textual representation of a given topic Example: Expand the term concert by relating terms like festival, gig, band, sound, etc. Accomplish through combination of WordNet, DBpedia and some initial evidence
  8. 8. Handling Geographic Locations Venue location of a social event is an important cue Interested in gaining a more complete understanding such as of the city and country a event takes place to expand the query Beneficial as users often refer to a different geographical hierarchy, e.g. foreigner to a country but local to a city Also consider geographic coordinates to later match geo-tagged photos Use Google Geocoding API
  9. 9. Compiling Names of Locations Identify and understand any textual annotations in photos that refer to geographic locations Used in retrieval process to isolate photos that do not likely correspond to the venue of a queried event Extract all countries and larger cities from the GeoNames dataset
  10. 10. Topic-Specific: Soccer MatchesUse DBpedia (SPARQL) to find all soccer clubs andassociated stadiums for a given city in the query
  11. 11. PREPROCESSING Matching geographic locations Translating terms and stop-words Composing textual features
  12. 12. Matching Geographic Locations Geo-tagged photos are becoming more and more popular Identify photos as belonging and not belonging to a venue (and an event when also considering the time) For each venue compile two sets of photos (within/outside its bounds)
  13. 13. Translating Terms and Stop-words Photos get annotated and tagged in many different languages Translate topic-related terms and stop-words into other languages Limit to languages prevailing in the countries in which the query venues are located Use Google Translate API
  14. 14. Composing Textual Features Concatenate all information into a combined textual representation (title, description, keywords, username, …) Also include information obtained from external sources Use Roman preprocessor to converts text into lower case, strip punctuation as well as whitespaces and remove accents from Unicode characters Eliminates common stop-words, numbers and terms commonly associated with photography Apply language-agnostic character-based tokenizer Convert tokens into a matrix of occurrences (TF/IDF)
  15. 15. RETRIEVING PHOTOS OF AN EVENTIn the most basic case, we (already) know about a specificevent, and we wish to simply retrieve all photos associatedwith it Classification-based approach Limiting search space Expanding feature space Visual pruning
  16. 16. Classification-based Approach: I Treat each event independently (we instantiate a separate classifier for each event for a series of events) Train classifier on the textual features we compose beforehand according to each event No separate training dataset required
  17. 17. Classification-based Approach: II Binary classification, but also introduce a third class that reflects events of the same topic to improve results Possible to include features of another query Two different fusing strategies implemented Experiment with multiple classifiers (Linear SVC, SGD, …) Use spare data representation and sparse-adjusted classifier
  18. 18. Limiting Search Space Generally, the date and time a photo was captured are effective cues to bound the search space For each event’s prediction step, we consider only those photos that lie within the event’s temporal search window  Specified by the query (e.g. New Year’s Eve)  Retrieved by the framework through external topic-specific sources (e.g. the specific days of a concert tour)  Roughly estimated (based on a clustering scheme) in the forthcoming event detection method Exclude photos not matching geographic location
  19. 19. Expanding Feature Space Expand feature space based on query information and photo collection itself Helpful when “training” information is sparse (the case when there are few geo-tagged photos) Iterative two-step process: 1. Train initial classifier on the few query terms available 2. Then compile new list of textual terms based on the predicted outcome over all applicable photos 3. Finally, used gained terms to refine initial query terms Example: Photos related to a specific music venue contain terms of the playing band or artist
  20. 20. Visual Pruning Mixing textual and visual features is not straightforward Employ a cascade of two separate classifiers, each separately adjusted to its feature space and data representation First fast textual classification, then visual binary pruning on few remaining photos Utilize MPEG-7 color and texture features Experiment with several classifiers (Random Forrest, SVC with RBF kernel, Linear SVC)
  21. 21. DETECTING EVENTSTwo proposals: If the date but not time of day is known, apply a clustering method on all candidates of a given day  largest clusters then reflect events Otherwise: Expand approach by performing a prediction step for any day instead of just selected days conforming to the events  will inadvertently grow the search space In both cases apply a threshold (number of photos relating to potential event) prior considering a new event
  22. 22. EXPERIMENTS Dataset Implementation details and setup Results
  23. 23. Dataset 2012 MediaEval SED Dataset – Challenge II 167.332 photos collected from Flickr Metadata: unique Flickr ID, capture timestamp, username, title, description, keywords and partial geographic coordinates (in about a fifth of the cases:) Ground truth in the form of event clusters (specifying associated photos) for two topics/challenges “Training set”: 2011 MediaEval SED Dataset
  24. 24. Implementation Details andSetup Define event as a distinct combination of location and date (one event per day at the same location) Use English names of locations only Bounding threshold of 500 meter Default: Linear SVC, no feature expansion, no visual pruning Evaluation measures: Precision (P), Recall (R), F-score, Normalized Mutual Information (NMI)
  25. 25. Dataset Setup Focus on Challenge II Challenge I/III: Current approach has limitation  No event/venue detection through social media websites like Twitter  Only basic venue/location detection/clustering  issue when the destination covers a large area (e.g. entire country)
  26. 26. Results: Challenge II Detected: 32 events Identified several thousand photos not belonging to any relevant venue  substantial reduction of candidates  large amount of training samples P R F NMI Default configuration 79.0 67.1 72.6 0.65 Basic event detection 56.0 69.6 62.0 0.53  worse With visual pruning 83.2 61.9 71.0 0.63 With feature expansion 79.0 66.9 72.5 0.65
  27. 27. CONCLUSION External information, e.g. about a venue, helpful for both event detection and retrieval of associated photos Finding and linking external data in a uniform way still challenging Visual information does not improve results much Future considerations:  Social media websites like Facebook and Twitter  Improved venue/location detection/clustering
  28. 28. Thank you! Questions?