Using Tag Recommendations to Homogenize Folksonomies in Microblogging Environments
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Using Tag Recommendations to Homogenize Folksonomies in Microblogging Environments

on

  • 1,330 views

 

Statistics

Views

Total Views
1,330
Views on SlideShare
1,324
Embed Views
6

Actions

Likes
4
Downloads
9
Comments
0

2 Embeds 6

http://a0.twimg.com 3
http://paper.li 3

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Using Tag Recommendations to Homogenize Folksonomies in Microblogging Environments Presentation Transcript

  • 1. Using Tag Recommendations toHomogenize Folksonomies inMicroblogging EnvironmentsEva Zangerle, Wolfgang Gassler and Günther Specht 1
  • 2. Outline• Motivation• Approach• Ranking Methods• Evaluation• Future Directions• Conclusion 2
  • 3. Hashtags• Tags for Tweets• (Manual) Categorization of conversations• Follow streams of conversation• Indicator for certain topic or audience 3
  • 4. Motivation• Only 20% of tweets contain hashtags• Hashtags can be chosen freely – #socinfo2011? #socinfo11? #socinfo? all? – Synonymous hashtags – Heterogeneity – Search capability limited – Which stream to follow? 4
  • 5. Motivation 5
  • 6. Motivation Proposed Solution: Hashtag Recommendations 6
  • 7. Goals• Recommendation of suitable hashtags during entering a tweet• Encourage use of hashtags – Improve search capabilities – Better categorization• Fight heterogeneity – Avoid use of synonymous hashtags 7
  • 8. Our Approach in a Nutshell• Based on a set of existing tweets• Analysis of entered tweet• Analysis of dataset• Recommendations based on hashtags within similar messages 8
  • 9. Approach - WorkflowUser enters message Retrieve 500 most similar messages Retrieve candidate-set of Hashtags Ranking of Hashtags Top-k Recommendations 9
  • 10. Crawled Dataset• Crawled July 2010 – April 2011• 18,731,800 messages in total• 3,753,927 messages containing hashtags – about 20% – used as dataset for evaluation• 5,968,571 hashtags → avg of 1.6 hashtags• 585,140 distinct hashtags – 502,172 hashtags occurred less then 5 times 10
  • 11. Longtail Distribution 11
  • 12. Hashtags per Tweet 12
  • 13. Candidate Set Generation• Find tweets most similar to the user‘s tweet• Cosine similarity of tf/idf weighted term vectors• Take 500 most similar tweets• Extract hashtags from these tweets• Next step: ranking the hashtags 13
  • 14. Basic Ranking MethodsInput: Set of Candidate Hashtags (from 500 similar tweets)Output: Ranked Candidate List -> top k shown1. SimRank – Use similarity measure of tweets for ranking (tf/idf cosine similarity) – The higher the similarity of the tweets, the higher the ranking of the corresponding hashtags2. TimeRank – Recency of usage of the hashtag – The more recently a hashtag has been used, the higher the ranking within the candidate hashtags 14
  • 15. Basic Ranking MethodsInput: Set of Candidate Hashtags (from 500 similar tweets)Output: Ranked Candidate List -> top k shown3. RecCountRank – Count number of occurrences for each hashtag within candidate list – The more similar tweets feature the hashtag, the higher the rank of the hashtag4. PopRank – Global popularity of the hashtag within the whole dataset – The more popular a hashtag is overall, the higher is its ranking 15
  • 16. Hybrid Ranking Methods• Based on 4 basic ranking methods• ℎ(1, 2) = ∗ 1 + 1 − ∗ 2• Hybrid ranking computed for all possible combinations of basic ranking methods 16
  • 17. Evaluation Randomly select tweet t from dataset Remove hashtags from t Use t as input for recommendation algorithm Compute hashtag recommendations for t Use proposed ranking methods Compare top-k recommendations 17
  • 18. Evaluation• Dataset – 3,753,927 messages – 5,968,571 hashtags – 585,140 distinct hashtags• Testrun – 10,000 randomly chosen tweets (max. 5 hashtags) – Retweets excluded 18
  • 19. Recall - Basic Methods Top-5 recommendations enough? 19
  • 20. Recall@5 - Hybrid Methods 20
  • 21. Precision@5 21
  • 22. Development of Recall Values 22
  • 23. Future Directions• Social Graph• User‘s Timeline• Realtime Recommendations• Real User Tests 23
  • 24. Conclusion• Motivation• Hashtag Recommendations• Simple, straight-forward approach• Promising results 24
  • 25. 25