Atu media eval_sed2014

CLUSTERING AND RETRIEVAL OF
SOCIAL EVENTS IN FLICKR
Maia Zaharieva  
Daniel Schopfhauser Manfred Del Fabro Matthias Zeppezauer
MediaEval Workshop, October 16-17, 2014, Barcelona, Spain

SOCIAL EVENT CLUSTERING
How far can we go using available metadata only?

Fundamental principles:

• simple but robust heuristics / decision criteria

• minimize number of parameter

• minimize assumptions on dataset

• multi-stage approach: 
from the most reliable to the less reliable information

TIME-BASED CLUSTERING
• Adaptive approach

• No assumptions about the duration of an event

• Input: single item clusters

• Iterative cluster merging if

• Time difference between consecutive images by the
same user < thresholdTIME (e.g. 2h)
capture time (ct)ct1 ct2 ct3 ct4 ctn
...
t=(ct1,ct2)
t=(ct2,ct3) t=(ct3,ct4)
ct5
t=(ct4,ct5)
t
tht
Event cluster 1

LOCATION-BASED CLUSTERING I
• The very same adaptive approach

• No assumptions about the size of an event

• Input: time-based clusters


• min geo distance between two clusters < thresholdLOC

• min time difference < thresholdTIME

LOCATION-BASED CLUSTERING II
• Input: time-based clusters

• Predeﬁned radius (1km)

• Estimation of representative location for each cluster

• distance from each geo-tagged photo to all other geo-tagged photos

• representative location = location with min distance to all others


• distance between representative locations within  
the predeﬁned radius

• temporal constraints apply

TEXT-BASED CLUSTERING
• Input: time-based / location-based clusters

• Features:

• Event-based term dictionaries

• Global topics (LDA)

• Combined Merging Schema: iteratively merge clusters if

• Intersection term dictionaries > 40% OR

• Number shared topics > 2 AND

• Temporal and/or spatial constraints apply

RESULTS & LESSONS LEARNED
• Overall: metadata information achieve robust results
and high generalization ability

• Already capture time, user, and location information
achieve impressive results

• Add-on by text-based analysis marginal for the test data
Social Event Clustering
Development set Test set
F1 NMI F1 NMI
Text +Time-based clustering 0.9356 0.9873 0.9476 0.9886
Text + Location-based clustering II 0.9343 0.9872 0.9466 0.9884
Location-based clustering I 0.9178 0.9840 0.9407 0.9872
Location-based clustering II 0.9159 0.9836 0.9404 0.9871
Time-based clustering 0.9098 0.9822 0.9386 0.9866

SOCIAL EVENT RETRIEVAL
• Features

• TF-IDF cluster representation

• Capture time

• Location information (GPS data, GeoNames)

• Event topic models (one class SVM)

• Query expansion (WordNet synsets)

• Overall score considers all available constraints

RESULTS & LESSONS LEARNED
• Unsupervised approach performs best

• Query expansion reduces precision

• Event-type models do not help
Social Event Retrieval
Development queries Test queries
R P F1 R P F1
event models,  
no query expansion
0.4656 0.8990 0.5367 0.2242 0.4570 0.2287
event models,  
query expansion
0.5052 0.8974 0.6192 0.2365 0.3268 0.2109
no event models,

no query expansion
0.4770 0.4391 0.3838 0.4057 0.4203 0.2877

maia.zaharieva@[tuwien|univie].ac.at
FOR QUESTIONS AND
DISCUSSIONS …
… come and see us at the poster session :)

Atu media eval_sed2014

More Related Content

Similar to Atu media eval_sed2014

More from multimediaeval

Recently uploaded

Atu media eval_sed2014