Major project presentation

381 views

Published on

Published in: Technology, Education
1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total views
381
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Major project presentation

  1. 1. Topic Detection From Events
  2. 2. Abstract We live in a world of information overload. Manually annotating text with topics isn't an option anymore. In this paper, we deal with tweets. Firstly we recognize the topics/entities they speak about. Having done that, we cluster them based on the recognized entities and verbs to get hierarchy of clusters. The clusters are then labelled based on the most frequent entities.
  3. 3. Tools Used ● Twitter NLP ● Wiki Semantic Distance ● Verb net
  4. 4. Approach To get first level of clusters: 1. Tokenize the tweets. 2. Apply POS tagging. 3. Apply IOB tagging on each token using Feature Extraction. 4. Extract Entities by applying some rules on the tweet with IOB token. 5. For the identified entity, find the nearest wikipedia entity using string edit distance. 6. Create an inverted index based on the identified entities.
  5. 5. Approach contd... Then: 1. We use k-means clustering using jaccard similarity as the similarity metric at each level. 2. We get the most frequent tags from each of the clusters and use them to label the clusters.
  6. 6. Architecture
  7. 7. Results
  8. 8. Results contd...
  9. 9. Results contd...
  10. 10. Conclusion Our methods successfully cluster tweets into a semantically related hierarchy. We took a dataset that was constrained to a specific domain i.e. elections. Future work may involve experimenting with different datasets. Wiki semantic distance might be more useful in case of a more diverse dataset. Future work can also focus on experimenting with different datasets to find out when wiki semantic distance begins to significantly outperform jaccard similarity.
  11. 11. Thanks! Team 15 Garima Ahuja Harish Kolli Ashwin Venkatram

×