Event Identification in Social Media

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Notes on slide 1

    Social media sites host a variety of event information We use the traditional event definition from the event detection literature, stating that an event is…, In particular, we consider events that range from …

    Our goal is to… This could facilitate application such as… as can be seen in this image, similar to news aggregation sites but for events, including a variety of rich media content Our approach for even identification uses clustering to group similar event documents, such that…

    Social media data quality is uneven When developing our approach, we had to consider the scalability of our algorithms as there exists a vast amount of social media event data on the web

    Define and motivate the event identification task…

    We have different notions of similarity for different types of features… We have to come up with a principled way to combine these different notions into a single similarity

    We can cluster out document collection according to the variety of feature reps. discussed, each would have its own

    Mention briefly where the event IDs came from…

    I’ll add a note on where these experiments stand

    Backup slide

    1 Favorite

    Event Identification in Social Media - Presentation Transcript

    1. EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University
    2. Social Media Sites Host Many “Event” Documents Photo-sharing: Flickr Video-sharing: YouTube Social networking: Facebook
      • “ Event”= something that occurs at a certain time in a certain place [Yang et al. ’99]
        • Popular, widely known events Presidential Inauguration, Thanksgiving Day Parade
        • Smaller events, without traditional news coverage Local food drive, street fair
      Social media documents for “All Points West” festival, Liberty State Park, New Jersey, 8/8/08
    3. Identifying Events and Associated Social Media Documents
      • Applications
        • Event search and browsing
        • Local search
      • General approach: group similar documents via clustering Each cluster corresponds to one event and its associated social media documents
    4. Event Identification: Challenges
      • Uneven data quality
        • Missing, short, uninformative text
        • … but revealing structured context available: tags, date/time, geo-coordinates
      • Scalability
      • Dynamic data stream of event information
      • Unknown number of events
        • Necessary for many clustering algorithms
        • Difficult to estimate
    5. Clustering Social Media Documents
      • Social media document representation
      • Social media document similarity
      • Social media document clustering
        • Clustering task: definition
        • Ensemble algorithm: combining multiple clustering results
      • Preliminary evaluation
    6. Social Media Document Representation Title Description Tags Date/Time Location All-Text
    7. Social Media Document Similarity
      • Text: tf-idf weights, cosine similarity
      Title Description Tags Date/Time Location All-Text Title Description Tags Date/Time-Keywords Location-Proximity All-Text Location-Keywords Date/Time-Proximity time
      • Location: geo-coordinate proximity
      A A A B B B
      • Time: proximity in minutes
    8. Social Media Document Clustering Framework Document feature representation Social media documents Event clusters
    9. Clustering: Ensemble Algorithm Consensus Function: combine ensemble similarities W title W tags W time f(C,W) C title C tags C time Ensemble clustering solution Learned in a training step
    10. Clustering: Measuring Quality
      • Homogeneous clusters
      ✔ ✔
      • Complete clusters
      • Metric: Normalized Mutual Information (NMI) Shared information between clustering solution and “ground truth”
    11. Experimental Setup
      • Data: >270K Flickr photos
        • Event labels from Yahoo!’s “upcoming” event database
        • Split into 3 parts for training/validation/testing
      • Clusterers: single pass algorithm with centroid similarity
      • Weighing scheme: Normalized Mutual Information (NMI) scores on validation set
      • Consensus function: weighted average of clusterers’ binary predictions
      • Final prediction step: single pass clustering algorithm
    12. Preliminary Evaluation Results
      • Individual clusterer performance
        • Highest NMI: Tags, All-Text
        • Lowest NMI: Description, Title
      • Ensemble performance, compared against all individual clusterers
        • Highest overall performance in terms of NMI
        • More homogenous clusters: each event is spread over fewer clusters
      Details in paper
      • Document similarity metric
        • Ensemble approach
          • Weight assignment
          • Choice of clusterers
        • Train a classifier to predict document similarity
          • Features correspond to similarity scores
            • All-text, title, tags, time, location, etc.
            • Numeric values in [0,1]
          • State-of-the-art classifiers: SVM, Logistic Regression, …
      Future Work: Alternative Choices
    13. Future Work: Alternative Choices
      • Final clustering step
        • Apply graph partitioning algorithms
        • Requires estimating the number of clusters
      • Evaluation metrics: beyond NMI
      • Datasets
        • Flickr LastFM, YouTube
        • Exploit social network connections
    14. Conclusions
      • Identified events and their corresponding social media documents
        • Proposed a clustering solution
        • Leveraged different representations of social media documents
        • Employed various social media similarity metrics
      • Developed a weighted ensemble clustering approach
      • Reported preliminary results of our event identification approach on a large-scale dataset of Flickr photographs
    SlideShare Zeitgeist 2009

    + Columbia UniversityColumbia University Nominate

    custom

    162 views, 1 favs, 0 embeds more stats

    Hila Becker, Mor Naaman, Luis Gravano , "Event Iden more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 162
      • 162 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 3
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories