Major project presentation
Upcoming SlideShare
Loading in...5
×
 

Major project presentation

on

  • 289 views

 

Statistics

Views

Total Views
289
Views on SlideShare
289
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Major project presentation Major project presentation Presentation Transcript

  • Topic Detection From Events
  • Abstract We live in a world of information overload. Manually annotating text with topics isn't an option anymore. In this paper, we deal with tweets. Firstly we recognize the topics/entities they speak about. Having done that, we cluster them based on the recognized entities and verbs to get hierarchy of clusters. The clusters are then labelled based on the most frequent entities.
  • Tools Used ● Twitter NLP ● Wiki Semantic Distance ● Verb net
  • Approach To get first level of clusters: 1. Tokenize the tweets. 2. Apply POS tagging. 3. Apply IOB tagging on each token using Feature Extraction. 4. Extract Entities by applying some rules on the tweet with IOB token. 5. For the identified entity, find the nearest wikipedia entity using string edit distance. 6. Create an inverted index based on the identified entities.
  • Approach contd... Then: 1. We use k-means clustering using jaccard similarity as the similarity metric at each level. 2. We get the most frequent tags from each of the clusters and use them to label the clusters.
  • Architecture
  • Results
  • Results contd...
  • Results contd...
  • Conclusion Our methods successfully cluster tweets into a semantically related hierarchy. We took a dataset that was constrained to a specific domain i.e. elections. Future work may involve experimenting with different datasets. Wiki semantic distance might be more useful in case of a more diverse dataset. Future work can also focus on experimenting with different datasets to find out when wiki semantic distance begins to significantly outperform jaccard similarity.
  • Thanks! Team 15 Garima Ahuja Harish Kolli Ashwin Venkatram