Using Tag Recommendations toHomogenize Folksonomies inMicroblogging EnvironmentsEva Zangerle, Wolfgang Gassler and Günther...
Outline•   Motivation•   Approach•   Ranking Methods•   Evaluation•   Future Directions•   Conclusion                     ...
Hashtags•   Tags for Tweets•   (Manual) Categorization of conversations•   Follow streams of conversation•   Indicator for...
Motivation• Only 20% of tweets contain hashtags• Hashtags can be chosen freely  – #socinfo2011? #socinfo11? #socinfo? all?...
Motivation             5
Motivation          Proposed Solution:       Hashtag Recommendations                                 6
Goals• Recommendation of suitable hashtags during  entering a tweet• Encourage use of hashtags  – Improve search capabilit...
Our Approach in a Nutshell•   Based on a set of existing tweets•   Analysis of entered tweet•   Analysis of dataset•   Rec...
Approach - WorkflowUser enters message   Retrieve 500 most similar messages      Retrieve candidate-set of Hashtags       ...
Crawled Dataset• Crawled July 2010 – April 2011• 18,731,800 messages in total• 3,753,927 messages containing hashtags  – a...
Longtail Distribution                        11
Hashtags per Tweet                     12
Candidate Set Generation• Find tweets most similar to the user‘s tweet• Cosine similarity of tf/idf weighted term  vectors...
Basic Ranking MethodsInput: Set of Candidate Hashtags (from 500 similar tweets)Output: Ranked Candidate List -> top k show...
Basic Ranking MethodsInput: Set of Candidate Hashtags (from 500 similar tweets)Output: Ranked Candidate List -> top k show...
Hybrid Ranking Methods• Based on 4 basic ranking methods• ℎ(1, 2) =  ∗ 1 + 1 −  ∗ 2• Hybrid ranking computed for all possi...
Evaluation    Randomly select tweet t from dataset          Remove hashtags from t Use t as input for recommendation algor...
Evaluation• Dataset  – 3,753,927 messages  – 5,968,571 hashtags  – 585,140 distinct hashtags• Testrun  – 10,000 randomly c...
Recall - Basic Methods                              Top-5                         recommendations                         ...
Recall@5 - Hybrid Methods                            20
Precision@5              21
Development of Recall Values                               22
Future Directions•   Social Graph•   User‘s Timeline•   Realtime Recommendations•   Real User Tests                       ...
Conclusion•   Motivation•   Hashtag Recommendations•   Simple, straight-forward approach•   Promising results             ...
25
Upcoming SlideShare
Loading in …5
×

Using Tag Recommendations to Homogenize Folksonomies in Microblogging Environments

1,410 views

Published on

Published in: Technology

Using Tag Recommendations to Homogenize Folksonomies in Microblogging Environments

  1. 1. Using Tag Recommendations toHomogenize Folksonomies inMicroblogging EnvironmentsEva Zangerle, Wolfgang Gassler and Günther Specht 1
  2. 2. Outline• Motivation• Approach• Ranking Methods• Evaluation• Future Directions• Conclusion 2
  3. 3. Hashtags• Tags for Tweets• (Manual) Categorization of conversations• Follow streams of conversation• Indicator for certain topic or audience 3
  4. 4. Motivation• Only 20% of tweets contain hashtags• Hashtags can be chosen freely – #socinfo2011? #socinfo11? #socinfo? all? – Synonymous hashtags – Heterogeneity – Search capability limited – Which stream to follow? 4
  5. 5. Motivation 5
  6. 6. Motivation Proposed Solution: Hashtag Recommendations 6
  7. 7. Goals• Recommendation of suitable hashtags during entering a tweet• Encourage use of hashtags – Improve search capabilities – Better categorization• Fight heterogeneity – Avoid use of synonymous hashtags 7
  8. 8. Our Approach in a Nutshell• Based on a set of existing tweets• Analysis of entered tweet• Analysis of dataset• Recommendations based on hashtags within similar messages 8
  9. 9. Approach - WorkflowUser enters message Retrieve 500 most similar messages Retrieve candidate-set of Hashtags Ranking of Hashtags Top-k Recommendations 9
  10. 10. Crawled Dataset• Crawled July 2010 – April 2011• 18,731,800 messages in total• 3,753,927 messages containing hashtags – about 20% – used as dataset for evaluation• 5,968,571 hashtags → avg of 1.6 hashtags• 585,140 distinct hashtags – 502,172 hashtags occurred less then 5 times 10
  11. 11. Longtail Distribution 11
  12. 12. Hashtags per Tweet 12
  13. 13. Candidate Set Generation• Find tweets most similar to the user‘s tweet• Cosine similarity of tf/idf weighted term vectors• Take 500 most similar tweets• Extract hashtags from these tweets• Next step: ranking the hashtags 13
  14. 14. Basic Ranking MethodsInput: Set of Candidate Hashtags (from 500 similar tweets)Output: Ranked Candidate List -> top k shown1. SimRank – Use similarity measure of tweets for ranking (tf/idf cosine similarity) – The higher the similarity of the tweets, the higher the ranking of the corresponding hashtags2. TimeRank – Recency of usage of the hashtag – The more recently a hashtag has been used, the higher the ranking within the candidate hashtags 14
  15. 15. Basic Ranking MethodsInput: Set of Candidate Hashtags (from 500 similar tweets)Output: Ranked Candidate List -> top k shown3. RecCountRank – Count number of occurrences for each hashtag within candidate list – The more similar tweets feature the hashtag, the higher the rank of the hashtag4. PopRank – Global popularity of the hashtag within the whole dataset – The more popular a hashtag is overall, the higher is its ranking 15
  16. 16. Hybrid Ranking Methods• Based on 4 basic ranking methods• ℎ(1, 2) = ∗ 1 + 1 − ∗ 2• Hybrid ranking computed for all possible combinations of basic ranking methods 16
  17. 17. Evaluation Randomly select tweet t from dataset Remove hashtags from t Use t as input for recommendation algorithm Compute hashtag recommendations for t Use proposed ranking methods Compare top-k recommendations 17
  18. 18. Evaluation• Dataset – 3,753,927 messages – 5,968,571 hashtags – 585,140 distinct hashtags• Testrun – 10,000 randomly chosen tweets (max. 5 hashtags) – Retweets excluded 18
  19. 19. Recall - Basic Methods Top-5 recommendations enough? 19
  20. 20. Recall@5 - Hybrid Methods 20
  21. 21. Precision@5 21
  22. 22. Development of Recall Values 22
  23. 23. Future Directions• Social Graph• User‘s Timeline• Realtime Recommendations• Real User Tests 23
  24. 24. Conclusion• Motivation• Hashtag Recommendations• Simple, straight-forward approach• Promising results 24
  25. 25. 25

×