Successfully reported this slideshow.

Using Topic Models for Twitter hashtag recommendation

4

Share

1 of 12
1 of 12

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Using Topic Models for Twitter hashtag recommendation

  1. 1. ELIS – Multimedia Lab Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Using Topic Models for Twitter Hashtag Recommendation Multimedia Lab, Ghent University – iMinds, Belgium Reservoir Lab, Ghent University, Belgium Image and Video Systems Lab, KAIST, South Korea
  2. 2. 2 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Introduction (1) Indexing Search Linking General Topic Memes Grouping Information retrieval
  3. 3. 3 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Introduction (2) ±10% of tweets contain a hashtag 3% of the hashtags are used more than 5 times Indexing Search Linking General Topic Memes Grouping
  4. 4. 4 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Goal Suggest keywords that resemble the general topic of a tweet and that could be used as a hashtag Promote hashtags for effective indexing Allow for effective search of tweets through hashtags Reduce the use of sparse hashtags
  5. 5. 5 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Architectural overview Basic filterTweet Language identification Topic distribution Hashtag suggestion Hashtagged tweet
  6. 6. 6 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Basic filter Clean up the tweet: URLs, special HTML entities, digits, punctuations, the hash character, … During training: Remove tweets with just one word Remove retweets
  7. 7. 7 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Language identification Why We need to build a language-dependent topic model. Goal Build unsupervised classifier that discriminates between English and non-English tweets. How Using Naive Bayes and the Expectation-Maximization algorithm + character n-gram features Result Evaluation on a test set of 1000 randomly selected tweets Lui & Baldwin (LangID.py) Our algorithm Precision 97.9% 97.0% Recall 91.8% 97.8% F1 94.8% 97.4%
  8. 8. 8 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Calculating the topic distribution Idea Find the general topic(s) of a tweet How Using Latent Dirichlet Allocation to find the topic distribution in an unsupervised manner Training 1.8 million tweets pre-filtered on 4000 keywords 200 topics, α=0.1, β=0.1 Example “Please RT!! sign Bernie Sanders petition for the fiscal cliff! http://..” 0 1 2 3 57 199 [0.1; 0.0 ; 0.0 ; 0.0 ; … ; 0.8 ; … ; 0.05] Topic 57: 1. Fiscal 2. Political 3. President …
  9. 9. 9 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Hashtag suggestion (1) Idea Suggest a number of hashtags based on the topic distribution of the tweet How Sample the topic distribution and suggest the top ranked keywords Yay, we got sixth period today school business light time period Please RT!! Sign Bernie Sanders petition for the fiscall! Http://.. fiscal political traffic president policy comfort, elegance, prettiness little good love relationship god Example
  10. 10. 10 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Hashtag suggestion (2) 0 5 10 15 20 25 30 35 0 1 2 3 4 5 6 7 8 9 10 Percentageoftweets(%) Number of correctly suggested hashtags 5 hashtags 10 hashtags Evaluation of 100 tweets
  11. 11. 11 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Conclusions and Future Work We built a hashtag recommendation system: Suggests general keywords Unsupervised In the future: Use more context information: semantic web, social graph,… Adopt a hybrid approach between general and specific hashtags
  12. 12. 12 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 #Questions @frederic_godin

Editor's Notes

  • Footer: Micropost -> Microposts
  • … and that could be used …Allow for effective search of tweets (through hashtags)
  • Remove the full stopsLanguage dependent -> Language-dependentWhy? -> Why (for reasons of consistency)
  • Those 4000 keywords are used to getsomemeaningfultweets. Otherwise the set was to big for training the algorithm. Ifyoutake a smaller sample than 4 days, thenagainyou of to few coherent tweets to train the model. Thosekeywordsdon’tbecome the most important keywordwithin a topic. Ex. Keyword president. The topic was fiscalcliff and politicalproblems.
  • Misschienverduidelijken hoe je de verdeling van de topics bemonsterd?Op de vorige slide misschienookverduidelijken hoe je de topics hebtgeselecteerd?
  • an hashtag -> a hashtagsocial graph -> social graph, …To suggest general keywords -> Suggests general keywordsFuture work: anderetechniekenom topics tebepalen? Bayesian inference, deep learning, … ;-)?
  • ×