Using Topic Models for Twitter hashtag recommendation

2,064 views

Published on

Presentation given at the Making Sense of Micropost Worksop at the World Wide Web conference of 2013

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,064
On SlideShare
0
From Embeds
0
Number of Embeds
530
Actions
Shares
0
Downloads
29
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • Footer: Micropost -> Microposts
  • … and that could be used …Allow for effective search of tweets (through hashtags)
  • Remove the full stopsLanguage dependent -> Language-dependentWhy? -> Why (for reasons of consistency)
  • Those 4000 keywords are used to getsomemeaningfultweets. Otherwise the set was to big for training the algorithm. Ifyoutake a smaller sample than 4 days, thenagainyou of to few coherent tweets to train the model. Thosekeywordsdon’tbecome the most important keywordwithin a topic. Ex. Keyword president. The topic was fiscalcliff and politicalproblems.
  • Misschienverduidelijken hoe je de verdeling van de topics bemonsterd?Op de vorige slide misschienookverduidelijken hoe je de topics hebtgeselecteerd?
  • an hashtag -> a hashtagsocial graph -> social graph, …To suggest general keywords -> Suggests general keywordsFuture work: anderetechniekenom topics tebepalen? Bayesian inference, deep learning, … ;-)?
  • Using Topic Models for Twitter hashtag recommendation

    1. 1. ELIS – Multimedia LabFréderic Godin, Viktor Slavkovikj, Wesley DeNeve, Benjamin Schrauwen and Rik Van de WalleUsing Topic Models forTwitter Hashtag RecommendationMultimedia Lab, Ghent University – iMinds, BelgiumReservoir Lab, Ghent University, BelgiumImage and Video Systems Lab, KAIST, South Korea
    2. 2. 2ELIS – Multimedia LabUsing Topic Models for Twitter Hashtag RecommendationFréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de WalleMaking Sense of Microposts Workshop @ World Wide Web Conference 2013Introduction (1)IndexingSearchLinkingGeneral TopicMemes GroupingInformation retrieval
    3. 3. 3ELIS – Multimedia LabUsing Topic Models for Twitter Hashtag RecommendationFréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de WalleMaking Sense of Microposts Workshop @ World Wide Web Conference 2013Introduction (2)±10% of tweets contain a hashtag3% of the hashtags are used more than 5 timesIndexingSearchLinkingGeneral TopicMemesGrouping
    4. 4. 4ELIS – Multimedia LabUsing Topic Models for Twitter Hashtag RecommendationFréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de WalleMaking Sense of Microposts Workshop @ World Wide Web Conference 2013GoalSuggest keywords that resemble the general topic of a tweetand that could be used as a hashtagPromote hashtags for effective indexingAllow for effective search of tweets through hashtagsReduce the use of sparse hashtags
    5. 5. 5ELIS – Multimedia LabUsing Topic Models for Twitter Hashtag RecommendationFréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de WalleMaking Sense of Microposts Workshop @ World Wide Web Conference 2013Architectural overviewBasic filterTweetLanguageidentificationTopicdistributionHashtagsuggestionHashtaggedtweet
    6. 6. 6ELIS – Multimedia LabUsing Topic Models for Twitter Hashtag RecommendationFréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de WalleMaking Sense of Microposts Workshop @ World Wide Web Conference 2013Basic filterClean up the tweet: URLs, special HTML entities, digits,punctuations, the hash character, …During training:Remove tweets with just one wordRemove retweets
    7. 7. 7ELIS – Multimedia LabUsing Topic Models for Twitter Hashtag RecommendationFréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de WalleMaking Sense of Microposts Workshop @ World Wide Web Conference 2013Language identificationWhy We need to build a language-dependent topic model.Goal Build unsupervised classifier that discriminates betweenEnglish and non-English tweets.How Using Naive Bayes and the Expectation-Maximizationalgorithm + character n-gram featuresResult Evaluation on a test set of 1000 randomly selected tweetsLui & Baldwin (LangID.py) Our algorithmPrecision 97.9% 97.0%Recall 91.8% 97.8%F1 94.8% 97.4%
    8. 8. 8ELIS – Multimedia LabUsing Topic Models for Twitter Hashtag RecommendationFréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de WalleMaking Sense of Microposts Workshop @ World Wide Web Conference 2013Calculating the topic distributionIdea Find the general topic(s) of a tweetHow Using Latent Dirichlet Allocation to findthe topic distribution in an unsupervised mannerTraining 1.8 million tweets pre-filtered on 4000 keywords200 topics, α=0.1, β=0.1Example “Please RT!! sign Bernie Sanders petition for thefiscal cliff! http://..”0 1 2 3 57 199[0.1; 0.0 ; 0.0 ; 0.0 ; … ; 0.8 ; … ; 0.05]Topic 57:1. Fiscal2. Political3. President…
    9. 9. 9ELIS – Multimedia LabUsing Topic Models for Twitter Hashtag RecommendationFréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de WalleMaking Sense of Microposts Workshop @ World Wide Web Conference 2013Hashtag suggestion (1)Idea Suggest a number of hashtags based onthe topic distribution of the tweetHow Sample the topic distribution and suggestthe top ranked keywordsYay, we got sixth period today school business light time periodPlease RT!! Sign Bernie Sanderspetition for the fiscall! Http://..fiscal political traffic president policycomfort, elegance, prettiness little good love relationship godExample
    10. 10. 10ELIS – Multimedia LabUsing Topic Models for Twitter Hashtag RecommendationFréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de WalleMaking Sense of Microposts Workshop @ World Wide Web Conference 2013Hashtag suggestion (2)051015202530350 1 2 3 4 5 6 7 8 9 10Percentageoftweets(%)Number of correctly suggested hashtags5 hashtags10 hashtagsEvaluation of 100 tweets
    11. 11. 11ELIS – Multimedia LabUsing Topic Models for Twitter Hashtag RecommendationFréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de WalleMaking Sense of Microposts Workshop @ World Wide Web Conference 2013Conclusions and Future WorkWe built a hashtag recommendation system:Suggests general keywordsUnsupervisedIn the future:Use more context information: semantic web,social graph,…Adopt a hybrid approach between general and specifichashtags
    12. 12. 12ELIS – Multimedia LabUsing Topic Models for Twitter Hashtag RecommendationFréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de WalleMaking Sense of Microposts Workshop @ World Wide Web Conference 2013#Questions @frederic_godin

    ×