1. Social Event Detection with
Clustering and Filtering
Yanxiang Wang Australian National University
Lexing Xie Australian National University
Hari Sundaram Arizona State University
3. Introduction
• Previous Approaches
– Supervised[Firan CIKM’102]
– Unsupervised[Becker WSDM’101,
Rapadopoulos3]
• Query partial specified motivate a
Clustering and Filtering approach
Learning Similarity Metrics for Event Identification in Social Media, Becker1
Bring Order to Your Photos: Event-Driven Classification of Flickr Images Based on Social Knowledge, Firan2
Cluster-Based Landmark and Event Detection for Tagged Photo Collection, Papadopoulos3
SED with Clustering and Filtering 3
4. Similarity Metric
t1 − t2
• Time: Time Difference in minutes 1−
tw
• Location: Great Circle Distance 1−
gcd
50
• Tag: Jaccard index ta ∩ tb
ta ∪ tb
• Text: Cosine similarity A• B
A B
SED with Clustering and Filtering 4
5. Overview
Tag
+
Text
+
Time Location
Time + Location
Visual Tag + Text
SED with Clustering and Filtering 5
6. Clustering
1 2
• Incremental Clustering1
1. Time Clustering
2. Tag + Text + Location wt st + wx sx + wl sl
– Weighted sum combination
– Weight corresponds to training performance
Learning Similarity Metrics for Event Identification in Social Media, Becker1
SED with Clustering and Filtering 6
7. Filtering
1 2 3
1. Time + Location:
– Time: outside time-frame
– Location: outside radius of central point
2. Tag + Text: Query Expansion
3. Visual: Concept List
SED with Clustering and Filtering 7
8. Tag + Text Filtering
• Use Flickr API to construct query
– Tag: flickr.tags.getClusters
– Text: flickr.photos.search
• Use online event directory last.fm to
retrieve tag and text information
• Filter the clusters with same similarity
metric wt st + wx sx
SED with Clustering and Filtering 8
10. Visual Filtering
• Filter clusters with invalid concept
• e.g. the list for soccer event
Concept Threshold
Beach 0.3
Flower Scene 0.4
Infant 0.3
…
SED with Clustering and Filtering 10
11. Training
• Setup
– No training set from organizer
– Compile from subset of upcoming dataset
– Additional random photos from flickr
–
• Result
– 80% on F1 evaluation after clustering
– 40% on F1 evaluation after filtering
SED with Clustering and Filtering 11
12. Result
• Query Expansion
– Challenge 1: Barcelona, Rome, soccer
– Challenge 2: Paradiso, Parc del Forum
• Runs
– Different thresholds µ for the tag + text
filtering
SED with Clustering and Filtering 12
14. Summary
• Simple clustering and filtering algorithm
Didn’t find Incorrect result Correct result
SED with Clustering and Filtering 14
15. Future work
• Thorough result analysis on available
ground-truth
• Refine the filtering process
• Incorporate methods to merge and rank
clusters
SED with Clustering and Filtering 15
16. Thoughts for SED 2012 (and beyond?)
• Provide a common training set?
– E.g. 2009 photos for training, 2010 for evaluation
• TREC-style ranked-list evaluation
– e.g. AP, F1 vs depth, so as to easily see how an algorithm
(could) easily achieve
• Accommodate other event definitions?
– Multi-city long-lasting events, e.g. Olympic torch relay
http://www.flickr.com/search/?q=olympic+torch+relay
+2010&s=rec
– Recurring events, e.g. French Open Tennis
SED with Clustering and Filtering 16