CLUSTERING AND RETRIEVAL OF
SOCIAL EVENTS IN FLICKR
Maia Zaharieva 

Daniel Schopfhauser Manfred Del Fabro Matthias Zeppezauer
MediaEval Workshop, October 16-17, 2014, Barcelona, Spain
SOCIAL EVENT CLUSTERING
How far can we go using available metadata only?	

Fundamental principles:	

• simple but robust heuristics / decision criteria	

• minimize number of parameter	

• minimize assumptions on dataset	

• multi-stage approach:

from the most reliable to the less reliable information
MediaEval Workshop, October 16-17, 2014, Barcelona, Spain
TIME-BASED CLUSTERING
• Adaptive approach	

• No assumptions about the duration of an event	

• Input: single item clusters	

• Iterative cluster merging if	

• Time difference between consecutive images by the
same user < thresholdTIME (e.g. 2h)
MediaEval Workshop, October 16-17, 2014, Barcelona, Spain
capture time (ct)ct1 ct2 ct3 ct4 ctn
...
t=(ct1,ct2)
t=(ct2,ct3) t=(ct3,ct4)
ct5
t=(ct4,ct5)
t
tht
Event cluster 1
LOCATION-BASED CLUSTERING I
• The very same adaptive approach	

• No assumptions about the size of an event	

• Input: time-based clusters	

• Iterative cluster merging if	

• min geo distance between two clusters < thresholdLOC	

• min time difference < thresholdTIME
MediaEval Workshop, October 16-17, 2014, Barcelona, Spain
LOCATION-BASED CLUSTERING II
• Input: time-based clusters	

• Predefined radius (1km)	

• Estimation of representative location for each cluster	

• distance from each geo-tagged photo to all other geo-tagged photos	

• representative location = location with min distance to all others	

• Iterative cluster merging if	

• distance between representative locations within 

the predefined radius	

• temporal constraints apply
MediaEval Workshop, October 16-17, 2014, Barcelona, Spain
TEXT-BASED CLUSTERING
• Input: time-based / location-based clusters	

• Features:	

• Event-based term dictionaries	

• Global topics (LDA)	

• Combined Merging Schema: iteratively merge clusters if	

• Intersection term dictionaries > 40% OR	

• Number shared topics > 2 AND	

• Temporal and/or spatial constraints apply
MediaEval Workshop, October 16-17, 2014, Barcelona, Spain
RESULTS & LESSONS LEARNED
• Overall: metadata information achieve robust results
and high generalization ability	

• Already capture time, user, and location information
achieve impressive results	

• Add-on by text-based analysis marginal for the test data
MediaEval Workshop, October 16-17, 2014, Barcelona, Spain
Social Event Clustering
Development set Test set
F1 NMI F1 NMI
Text +Time-based clustering 0.9356 0.9873 0.9476 0.9886
Text + Location-based clustering II 0.9343 0.9872 0.9466 0.9884
Location-based clustering I 0.9178 0.9840 0.9407 0.9872
Location-based clustering II 0.9159 0.9836 0.9404 0.9871
Time-based clustering 0.9098 0.9822 0.9386 0.9866
SOCIAL EVENT RETRIEVAL
• Features	

• TF-IDF cluster representation	

• Capture time	

• Location information (GPS data, GeoNames)	

• Event topic models (one class SVM)	

• Query expansion (WordNet synsets)	

• Overall score considers all available constraints
MediaEval Workshop, October 16-17, 2014, Barcelona, Spain
RESULTS & LESSONS LEARNED
• Unsupervised approach performs best	

• Query expansion reduces precision	

• Event-type models do not help
MediaEval Workshop, October 16-17, 2014, Barcelona, Spain
Social Event Retrieval
Development queries Test queries
R P F1 R P F1
event models, 

no query expansion
0.4656 0.8990 0.5367 0.2242 0.4570 0.2287
event models, 

query expansion
0.5052 0.8974 0.6192 0.2365 0.3268 0.2109
no event models,	

no query expansion
0.4770 0.4391 0.3838 0.4057 0.4203 0.2877
maia.zaharieva@[tuwien|univie].ac.at
MediaEval Workshop, October 16-17, 2014, Barcelona, Spain
FOR QUESTIONS AND
DISCUSSIONS …
… come and see us at the poster session :)

Atu media eval_sed2014

  • 1.
    CLUSTERING AND RETRIEVALOF SOCIAL EVENTS IN FLICKR Maia Zaharieva 
 Daniel Schopfhauser Manfred Del Fabro Matthias Zeppezauer MediaEval Workshop, October 16-17, 2014, Barcelona, Spain
  • 2.
    SOCIAL EVENT CLUSTERING Howfar can we go using available metadata only? Fundamental principles: • simple but robust heuristics / decision criteria • minimize number of parameter • minimize assumptions on dataset • multi-stage approach:
 from the most reliable to the less reliable information MediaEval Workshop, October 16-17, 2014, Barcelona, Spain
  • 3.
    TIME-BASED CLUSTERING • Adaptiveapproach • No assumptions about the duration of an event • Input: single item clusters • Iterative cluster merging if • Time difference between consecutive images by the same user < thresholdTIME (e.g. 2h) MediaEval Workshop, October 16-17, 2014, Barcelona, Spain capture time (ct)ct1 ct2 ct3 ct4 ctn ... t=(ct1,ct2) t=(ct2,ct3) t=(ct3,ct4) ct5 t=(ct4,ct5) t tht Event cluster 1
  • 4.
    LOCATION-BASED CLUSTERING I •The very same adaptive approach • No assumptions about the size of an event • Input: time-based clusters • Iterative cluster merging if • min geo distance between two clusters < thresholdLOC • min time difference < thresholdTIME MediaEval Workshop, October 16-17, 2014, Barcelona, Spain
  • 5.
    LOCATION-BASED CLUSTERING II •Input: time-based clusters • Predefined radius (1km) • Estimation of representative location for each cluster • distance from each geo-tagged photo to all other geo-tagged photos • representative location = location with min distance to all others • Iterative cluster merging if • distance between representative locations within 
 the predefined radius • temporal constraints apply MediaEval Workshop, October 16-17, 2014, Barcelona, Spain
  • 6.
    TEXT-BASED CLUSTERING • Input:time-based / location-based clusters • Features: • Event-based term dictionaries • Global topics (LDA) • Combined Merging Schema: iteratively merge clusters if • Intersection term dictionaries > 40% OR • Number shared topics > 2 AND • Temporal and/or spatial constraints apply MediaEval Workshop, October 16-17, 2014, Barcelona, Spain
  • 7.
    RESULTS & LESSONSLEARNED • Overall: metadata information achieve robust results and high generalization ability • Already capture time, user, and location information achieve impressive results • Add-on by text-based analysis marginal for the test data MediaEval Workshop, October 16-17, 2014, Barcelona, Spain Social Event Clustering Development set Test set F1 NMI F1 NMI Text +Time-based clustering 0.9356 0.9873 0.9476 0.9886 Text + Location-based clustering II 0.9343 0.9872 0.9466 0.9884 Location-based clustering I 0.9178 0.9840 0.9407 0.9872 Location-based clustering II 0.9159 0.9836 0.9404 0.9871 Time-based clustering 0.9098 0.9822 0.9386 0.9866
  • 8.
    SOCIAL EVENT RETRIEVAL •Features • TF-IDF cluster representation • Capture time • Location information (GPS data, GeoNames) • Event topic models (one class SVM) • Query expansion (WordNet synsets) • Overall score considers all available constraints MediaEval Workshop, October 16-17, 2014, Barcelona, Spain
  • 9.
    RESULTS & LESSONSLEARNED • Unsupervised approach performs best • Query expansion reduces precision • Event-type models do not help MediaEval Workshop, October 16-17, 2014, Barcelona, Spain Social Event Retrieval Development queries Test queries R P F1 R P F1 event models, 
 no query expansion 0.4656 0.8990 0.5367 0.2242 0.4570 0.2287 event models, 
 query expansion 0.5052 0.8974 0.6192 0.2365 0.3268 0.2109 no event models, no query expansion 0.4770 0.4391 0.3838 0.4057 0.4203 0.2877
  • 10.
    maia.zaharieva@[tuwien|univie].ac.at MediaEval Workshop, October16-17, 2014, Barcelona, Spain FOR QUESTIONS AND DISCUSSIONS … … come and see us at the poster session :)