Using Topic Models for Twitter hashtag recommendation
ELIS – Multimedia Lab
Fréderic Godin, Viktor Slavkovikj, Wesley De
Neve, Benjamin Schrauwen and Rik Van de Walle
Using Topic Models for
Twitter Hashtag Recommendation
Multimedia Lab, Ghent University – iMinds, Belgium
Reservoir Lab, Ghent University, Belgium
Image and Video Systems Lab, KAIST, South Korea
2
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Introduction (1)
Indexing
Search
Linking
General Topic
Memes Grouping
Information retrieval
3
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Introduction (2)
±10% of tweets contain a hashtag
3% of the hashtags are used more than 5 times
Indexing
Search
Linking
General Topic
Memes
Grouping
4
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Goal
Suggest keywords that resemble the general topic of a tweet
and that could be used as a hashtag
Promote hashtags for effective indexing
Allow for effective search of tweets through hashtags
Reduce the use of sparse hashtags
5
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Architectural overview
Basic filterTweet
Language
identification
Topic
distribution
Hashtag
suggestion
Hashtagged
tweet
6
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Basic filter
Clean up the tweet: URLs, special HTML entities, digits,
punctuations, the hash character, …
During training:
Remove tweets with just one word
Remove retweets
7
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Language identification
Why We need to build a language-dependent topic model.
Goal Build unsupervised classifier that discriminates between
English and non-English tweets.
How Using Naive Bayes and the Expectation-Maximization
algorithm + character n-gram features
Result Evaluation on a test set of 1000 randomly selected tweets
Lui & Baldwin (LangID.py) Our algorithm
Precision 97.9% 97.0%
Recall 91.8% 97.8%
F1 94.8% 97.4%
8
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Calculating the topic distribution
Idea Find the general topic(s) of a tweet
How Using Latent Dirichlet Allocation to find
the topic distribution in an unsupervised manner
Training 1.8 million tweets pre-filtered on 4000 keywords
200 topics, α=0.1, β=0.1
Example “Please RT!! sign Bernie Sanders petition for the
fiscal cliff! http://..”
0 1 2 3 57 199
[0.1; 0.0 ; 0.0 ; 0.0 ; … ; 0.8 ; … ; 0.05]
Topic 57:
1. Fiscal
2. Political
3. President
…
9
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Hashtag suggestion (1)
Idea Suggest a number of hashtags based on
the topic distribution of the tweet
How Sample the topic distribution and suggest
the top ranked keywords
Yay, we got sixth period today school business light time period
Please RT!! Sign Bernie Sanders
petition for the fiscall! Http://..
fiscal political traffic president policy
comfort, elegance, prettiness little good love relationship god
Example
10
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Hashtag suggestion (2)
0
5
10
15
20
25
30
35
0 1 2 3 4 5 6 7 8 9 10
Percentageoftweets(%)
Number of correctly suggested hashtags
5 hashtags
10 hashtags
Evaluation of 100 tweets
11
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Conclusions and Future Work
We built a hashtag recommendation system:
Suggests general keywords
Unsupervised
In the future:
Use more context information: semantic web,
social graph,…
Adopt a hybrid approach between general and specific
hashtags
12
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
#Questions @frederic_godin
Editor's Notes
Footer: Micropost -> Microposts
… and that could be used …Allow for effective search of tweets (through hashtags)
Remove the full stopsLanguage dependent -> Language-dependentWhy? -> Why (for reasons of consistency)
Those 4000 keywords are used to getsomemeaningfultweets. Otherwise the set was to big for training the algorithm. Ifyoutake a smaller sample than 4 days, thenagainyou of to few coherent tweets to train the model. Thosekeywordsdon’tbecome the most important keywordwithin a topic. Ex. Keyword president. The topic was fiscalcliff and politicalproblems.
Misschienverduidelijken hoe je de verdeling van de topics bemonsterd?Op de vorige slide misschienookverduidelijken hoe je de topics hebtgeselecteerd?
an hashtag -> a hashtagsocial graph -> social graph, …To suggest general keywords -> Suggests general keywordsFuture work: anderetechniekenom topics tebepalen? Bayesian inference, deep learning, … ;-)?