3. +
Introduction
nOnline social media are extensively
distribute content related to real-world
events.
n event: something that occurs at a certain time
in a certain place
3
4. +
Introduction
nOnline social media are extensively
distribute content related to real-world
events.
n event: something that occurs at a certain time
in a certain place
4
Goal:Identifying Events and
Associated Social Media Documents
9. +
Known-Event Identification
nSocial media content related to known
events
n reside in multiple social media sites, each
contributing different information
nTo retrieve cross-site social media
documents for same event
n miss many relevant event documents
9
10. +
Known-Event Identification
nIn the first step, using the known event
properties to achieve high-precision
results.
nIn the second step, using term extraction
and frequency analysis to improve
recall.
10
12. +
Unknown-Event Identification
nThe proposed online clustering
framework
n leverages the multiple features to decide
when two social media documents
correspond to the same event
12
13. Social Media Document Clustering
Framework
Document feature
representation
Social media
documents
Event clusters
13
14. Ensemble Algorithm
nThe proposed online clustering framework
n deployed ensemble learning methods to learn and
associate each feature with a weight and a
threshold that capture the importance of the
features
14
18. +
Improving Identification
Effectiveness
nHow events behave over time have a
significant impact on the effectiveness
of the document clustering procedure?
nHow to refine the clustering procedure
to benefit from these factors is a
challenging task?
18
19. +
URLs
nURLs in event-related social streams
are ubiquitous. Individuals use them to
share meaningful event-related external
content.
19
20. +
Bursty Vocabulary
nThe social media content related to an
event tends to revolve around a central
topic.
n this central topic is expressed by a set of
terms that is significantly more frequent
n span a wide time range exhibit a different set
of these bursty terms at different points of
their lifetime.
20
22. +
Time Decay
na time decay function to the clustering
framework
n penalizes clusters that have been inactive for
a long time.
n re-triggers events that have been inactive for
some time if the similarity score without the
time-decay factor is strong enough.
22
23. +
Experimental Evaluation
nData
n Upcoming dataset
n 273,842 multi-featured Flickr photos that
correspond to 9,613 real-world events from
the Upcoming event.
nthe BurstyV + TimeDec technique
obtained the highest quality results.
23
24. +
Conclusion
nThis article discussed the event
identification task under two different
scenarios, known- and.
nWe showed how to identify event
content effectively
n how we can exploit rich features of the social
media documents
n revealing temporal patterns of the relevant
content
24
Fotis Psallidas: Columbia University
Hila Becker: Google, Inc.
For instance, YouTube might contain videos for the Super Bowl event, whereas Twitter users might discuss the event by sharing short text messages, or tweets.
Such highly specific queries tend to retrieve event-related documents with high precision but with low recall.
We can cluster out document collection according to the variety of feature reps. discussed, each would have its own