Recommending #-Tags in Twitter<br />Eva Zangerle, Wolfgang Gasslerand Günther Specht<br />1<br />
Outline<br />Motivation<br />Approach<br />Ranking Methods<br />Evaluation<br />Future Directions<br />2<br />
Hashtags<br />Tags forTweets<br />(Manual) Categorizationofconversations<br />Follow streamsofconversation<br />3<br />
Motivation<br />Only 20% oftweetscontainhashtags<br />Hashtagscanbechosenfreely<br />#umap2011? #umap11? #umap?<br />Synon...
Are hashtagrecommendationspossible?<br />Motivation<br />5<br />
Goals<br />Recommendationofsuitablehashtagsduringentering a tweet<br />Encourageuseofhashtags<br />Improvesearchcapabiliti...
Approach<br />First Attempt<br />Crawl setoftweetscontaininghashtags<br />Analysis ofdataset<br />Can itbedonebased on con...
Content-based Approach<br />8<br />
Crawled Dataset<br />CrawledJuly 2010 – February 2011<br />16,034,195 messages in total<br />3,209,281 messagescontainingh...
Hashtags per Tweet<br />10<br />
Hashtags per Tweet<br />RT @Bhupeshtweet: #Quad #loop-http://bit.ly/ciHX2U #retweet#India #Jobs #World #news #canada #ad #...
Longtail Distribution<br />12<br />
Ranking Methods<br />Input: Set ofCandiateHashtags (from 500 similartweets)<br />Output: RankedCandidate List -> top k sho...
Ranking Methods<br />Input: Set ofCandiateHashtags (from 500 similartweets)<br />Output: RankedCandidate List -> top k sho...
The moresimilartweetsfeaturethehashtag, thehigherthe rank ofthehashtag</li></ul>14<br />
Evaluation<br />15<br />
Evaluation<br />Dataset<br />3,209,281 messages<br />5,097,545 hashtags<br />510,170 distincthashtags<br />Testrun<br />10...
Evaluation - Precision<br />17<br />Avg. Numberofhashtags per message in dataset<br />
Evaluation - Recall<br />18<br />Top-3 recommendationsenough? <br />
Whatweshowed…<br />Motivation forrecommendationofhashtags<br />Content-basedrecommendations<br />Simple, straight-forward ...
{ "coordinates": null, "favorited": false, "created_at": "Thu Jul 15 23:26:04 +0000 2010", "truncated": false, "text": <br...
ThankYou!<br />21<br />#ideas?<br />
Upcoming SlideShare
Loading in …5
×

Recommending #-Tags in Twitter

1,508 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,508
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
5
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Recommending #-Tags in Twitter

  1. 1. Recommending #-Tags in Twitter<br />Eva Zangerle, Wolfgang Gasslerand Günther Specht<br />1<br />
  2. 2. Outline<br />Motivation<br />Approach<br />Ranking Methods<br />Evaluation<br />Future Directions<br />2<br />
  3. 3. Hashtags<br />Tags forTweets<br />(Manual) Categorizationofconversations<br />Follow streamsofconversation<br />3<br />
  4. 4. Motivation<br />Only 20% oftweetscontainhashtags<br />Hashtagscanbechosenfreely<br />#umap2011? #umap11? #umap?<br />Synonymoushashtags<br />Heterogeneity<br />Search capability limited<br />4<br />
  5. 5. Are hashtagrecommendationspossible?<br />Motivation<br />5<br />
  6. 6. Goals<br />Recommendationofsuitablehashtagsduringentering a tweet<br />Encourageuseofhashtags<br />Improvesearchcapabilities<br />Bettercategorization<br />Fight heterogeneity<br />Avoiduseofsynonymoushashtags<br />6<br />
  7. 7. Approach<br />First Attempt<br />Crawl setoftweetscontaininghashtags<br />Analysis ofdataset<br />Can itbedonebased on content?<br />Compareenteredtweettoexistingtweets<br />7<br />
  8. 8. Content-based Approach<br />8<br />
  9. 9. Crawled Dataset<br />CrawledJuly 2010 – February 2011<br />16,034,195 messages in total<br />3,209,281 messagescontaininghashtags (20%) -> usedasdatasetforevaluation<br /><ul><li>Top five contained in 8% of all messages containing hashtags (#jobs, #nowplaying, #zodiacfacts, #news and #fb)</li></ul>9<br />
  10. 10. Hashtags per Tweet<br />10<br />
  11. 11. Hashtags per Tweet<br />RT @Bhupeshtweet: #Quad #loop-http://bit.ly/ciHX2U #retweet#India #Jobs #World #news #canada #ad #win #USA #tdf #oea #hacking #icantstop #sdcc #game<br />11<br />
  12. 12. Longtail Distribution<br />12<br />
  13. 13. Ranking Methods<br />Input: Set ofCandiateHashtags (from 500 similartweets)<br />Output: RankedCandidate List -> top k shown<br />Similarity Rank<br />Usesimilaritymeasureoftweetsforranking (tf/idfcosinesimilarity)<br />The higherthesimilarityofthetweets, thehighertherankingofthecorrespondinghashtags<br />Overall Popularity Rank<br />Most popularhashtagsoverwholedataset<br />The morepopular, thehighertherankingwithinthecandidatehashtags<br />13<br />
  14. 14. Ranking Methods<br />Input: Set ofCandiateHashtags (from 500 similartweets)<br />Output: RankedCandidate List -> top k shown<br />RecommendationPopularityRank<br /><ul><li>Count numberofoccurrencesforeachhashtagswithincandidatelist
  15. 15. The moresimilartweetsfeaturethehashtag, thehigherthe rank ofthehashtag</li></ul>14<br />
  16. 16. Evaluation<br />15<br />
  17. 17. Evaluation<br />Dataset<br />3,209,281 messages<br />5,097,545 hashtags<br />510,170 distincthashtags<br />Testrun<br />10,000 randomlychosentweets (max. 5 hashtags)<br />Retweetsexcluded<br />30,000 testruns (3 rankingmethods)<br />16<br />
  18. 18. Evaluation - Precision<br />17<br />Avg. Numberofhashtags per message in dataset<br />
  19. 19. Evaluation - Recall<br />18<br />Top-3 recommendationsenough? <br />
  20. 20. Whatweshowed…<br />Motivation forrecommendationofhashtags<br />Content-basedrecommendations<br />Simple, straight-forward approach<br />40% Recall@3 <br />… so itcanbedone!<br />19<br />
  21. 21. { "coordinates": null, "favorited": false, "created_at": "Thu Jul 15 23:26:04 +0000 2010", "truncated": false, "text": <br />"RT @ApeyBaby44: Labels r runbylawyer & accountants. http://tl.gd/2hkmas", "contributors": null, "id": 18639444000, "geo": null, "in_reply_to_user_id": null, "place": null, "in_reply_to_screen_name": null, "user": { "name": "DIGGZ", "profile_sidebar_border_color": "F2E195", "profile_background_tile": true, "profile_sidebar_fill_color": "FFF7CC", "created_at": "Fri Apr 03 03:16:01 +0000 2009", "profile_image_url": "http://a3.twimg.com/profile_images/1079346239/untitled_normal.JPG", "location": "ATL, NC, VA, NY all day!", "profile_link_color": "FF0000", "follow_request_sent": null, "url": "http://thisisseriousbiz.com", "favourites_count": 42, "contributors_enabled": false, "utc_offset": -18000, "id": 28489988, "profile_use_background_image": true, "profile_text_color": "0C3E53", "protected": false, "followers_count": 588, "lang": "en", "notifications": null, "time_zone": "Quito", "verified": false, "profile_background_color": "BADFCD", "geo_enabled": true, "description": "Half ofProductionduoSeriousBIZ circa 2008rn#teamSERIOUSBIZrn#teamblackberry PIN 315442C9rn#teamfollowback", "friends_count": 477, "statuses_count": 6269, "profile_background_image_url": "http://a1.twimg.com/profile_background_images/118926256/_MG_43571.JPG", "following": null, "screen_name": "DIGGZSeriousBIZ" }, "source": "<a href="http://www.ubertwitter.com/bb/download.php" rel="nofollow">u00dcberTwitter</a>", "in_reply_to_status_id": null }<br />We‘vebarelyscratchedthesurface…<br />Exploitedonlysmallfractionofavailableinformation<br />90% aremetadata<br />20<br />
  22. 22. ThankYou!<br />21<br />#ideas?<br />
  23. 23. Future Challenges<br />User Interface<br />Social Graph<br />Realtime Recommendations<br />Synonymous Tags?<br />Ranking<br />Real User Tests<br />22<br />

×