Your SlideShare is downloading. ×

Recommending #-Tags in Twitter

1,269

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,269
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Recommending #-Tags in Twitter
    Eva Zangerle, Wolfgang Gasslerand Günther Specht
    1
  • 2. Outline
    Motivation
    Approach
    Ranking Methods
    Evaluation
    Future Directions
    2
  • 3. Hashtags
    Tags forTweets
    (Manual) Categorizationofconversations
    Follow streamsofconversation
    3
  • 4. Motivation
    Only 20% oftweetscontainhashtags
    Hashtagscanbechosenfreely
    #umap2011? #umap11? #umap?
    Synonymoushashtags
    Heterogeneity
    Search capability limited
    4
  • 5. Are hashtagrecommendationspossible?
    Motivation
    5
  • 6. Goals
    Recommendationofsuitablehashtagsduringentering a tweet
    Encourageuseofhashtags
    Improvesearchcapabilities
    Bettercategorization
    Fight heterogeneity
    Avoiduseofsynonymoushashtags
    6
  • 7. Approach
    First Attempt
    Crawl setoftweetscontaininghashtags
    Analysis ofdataset
    Can itbedonebased on content?
    Compareenteredtweettoexistingtweets
    7
  • 8. Content-based Approach
    8
  • 9. Crawled Dataset
    CrawledJuly 2010 – February 2011
    16,034,195 messages in total
    3,209,281 messagescontaininghashtags (20%) -> usedasdatasetforevaluation
    • Top five contained in 8% of all messages containing hashtags (#jobs, #nowplaying, #zodiacfacts, #news and #fb)
    9
  • 10. Hashtags per Tweet
    10
  • 11. Hashtags per Tweet
    RT @Bhupeshtweet: #Quad #loop-http://bit.ly/ciHX2U #retweet#India #Jobs #World #news #canada #ad #win #USA #tdf #oea #hacking #icantstop #sdcc #game
    11
  • 12. Longtail Distribution
    12
  • 13. Ranking Methods
    Input: Set ofCandiateHashtags (from 500 similartweets)
    Output: RankedCandidate List -> top k shown
    Similarity Rank
    Usesimilaritymeasureoftweetsforranking (tf/idfcosinesimilarity)
    The higherthesimilarityofthetweets, thehighertherankingofthecorrespondinghashtags
    Overall Popularity Rank
    Most popularhashtagsoverwholedataset
    The morepopular, thehighertherankingwithinthecandidatehashtags
    13
  • 14. Ranking Methods
    Input: Set ofCandiateHashtags (from 500 similartweets)
    Output: RankedCandidate List -> top k shown
    RecommendationPopularityRank
    • Count numberofoccurrencesforeachhashtagswithincandidatelist
    • 15. The moresimilartweetsfeaturethehashtag, thehigherthe rank ofthehashtag
    14
  • 16. Evaluation
    15
  • 17. Evaluation
    Dataset
    3,209,281 messages
    5,097,545 hashtags
    510,170 distincthashtags
    Testrun
    10,000 randomlychosentweets (max. 5 hashtags)
    Retweetsexcluded
    30,000 testruns (3 rankingmethods)
    16
  • 18. Evaluation - Precision
    17
    Avg. Numberofhashtags per message in dataset
  • 19. Evaluation - Recall
    18
    Top-3 recommendationsenough?
  • 20. Whatweshowed…
    Motivation forrecommendationofhashtags
    Content-basedrecommendations
    Simple, straight-forward approach
    40% Recall@3
    … so itcanbedone!
    19
  • 21. { "coordinates": null, "favorited": false, "created_at": "Thu Jul 15 23:26:04 +0000 2010", "truncated": false, "text":
    "RT @ApeyBaby44: Labels r runbylawyer & accountants. http://tl.gd/2hkmas", "contributors": null, "id": 18639444000, "geo": null, "in_reply_to_user_id": null, "place": null, "in_reply_to_screen_name": null, "user": { "name": "DIGGZ", "profile_sidebar_border_color": "F2E195", "profile_background_tile": true, "profile_sidebar_fill_color": "FFF7CC", "created_at": "Fri Apr 03 03:16:01 +0000 2009", "profile_image_url": "http://a3.twimg.com/profile_images/1079346239/untitled_normal.JPG", "location": "ATL, NC, VA, NY all day!", "profile_link_color": "FF0000", "follow_request_sent": null, "url": "http://thisisseriousbiz.com", "favourites_count": 42, "contributors_enabled": false, "utc_offset": -18000, "id": 28489988, "profile_use_background_image": true, "profile_text_color": "0C3E53", "protected": false, "followers_count": 588, "lang": "en", "notifications": null, "time_zone": "Quito", "verified": false, "profile_background_color": "BADFCD", "geo_enabled": true, "description": "Half ofProductionduoSeriousBIZ circa 2008rn#teamSERIOUSBIZrn#teamblackberry PIN 315442C9rn#teamfollowback", "friends_count": 477, "statuses_count": 6269, "profile_background_image_url": "http://a1.twimg.com/profile_background_images/118926256/_MG_43571.JPG", "following": null, "screen_name": "DIGGZSeriousBIZ" }, "source": "<a href="http://www.ubertwitter.com/bb/download.php" rel="nofollow">u00dcberTwitter</a>", "in_reply_to_status_id": null }
    We‘vebarelyscratchedthesurface…
    Exploitedonlysmallfractionofavailableinformation
    90% aremetadata
    20
  • 22. ThankYou!
    21
    #ideas?
  • 23. Future Challenges
    User Interface
    Social Graph
    Realtime Recommendations
    Synonymous Tags?
    Ranking
    Real User Tests
    22

×