SlideShare a Scribd company logo
1 of 12
ELIS – Multimedia Lab
Fréderic Godin, Viktor Slavkovikj, Wesley De
Neve, Benjamin Schrauwen and Rik Van de Walle
Using Topic Models for
Twitter Hashtag Recommendation
Multimedia Lab, Ghent University – iMinds, Belgium
Reservoir Lab, Ghent University, Belgium
Image and Video Systems Lab, KAIST, South Korea
2
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Introduction (1)
Indexing
Search
Linking
General Topic
Memes Grouping
Information retrieval
3
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Introduction (2)
±10% of tweets contain a hashtag
3% of the hashtags are used more than 5 times
Indexing
Search
Linking
General Topic
Memes
Grouping
4
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Goal
Suggest keywords that resemble the general topic of a tweet
and that could be used as a hashtag
Promote hashtags for effective indexing
Allow for effective search of tweets through hashtags
Reduce the use of sparse hashtags
5
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Architectural overview
Basic filterTweet
Language
identification
Topic
distribution
Hashtag
suggestion
Hashtagged
tweet
6
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Basic filter
Clean up the tweet: URLs, special HTML entities, digits,
punctuations, the hash character, …
During training:
Remove tweets with just one word
Remove retweets
7
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Language identification
Why We need to build a language-dependent topic model.
Goal Build unsupervised classifier that discriminates between
English and non-English tweets.
How Using Naive Bayes and the Expectation-Maximization
algorithm + character n-gram features
Result Evaluation on a test set of 1000 randomly selected tweets
Lui & Baldwin (LangID.py) Our algorithm
Precision 97.9% 97.0%
Recall 91.8% 97.8%
F1 94.8% 97.4%
8
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Calculating the topic distribution
Idea Find the general topic(s) of a tweet
How Using Latent Dirichlet Allocation to find
the topic distribution in an unsupervised manner
Training 1.8 million tweets pre-filtered on 4000 keywords
200 topics, α=0.1, β=0.1
Example “Please RT!! sign Bernie Sanders petition for the
fiscal cliff! http://..”
0 1 2 3 57 199
[0.1; 0.0 ; 0.0 ; 0.0 ; … ; 0.8 ; … ; 0.05]
Topic 57:
1. Fiscal
2. Political
3. President
…
9
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Hashtag suggestion (1)
Idea Suggest a number of hashtags based on
the topic distribution of the tweet
How Sample the topic distribution and suggest
the top ranked keywords
Yay, we got sixth period today school business light time period
Please RT!! Sign Bernie Sanders
petition for the fiscall! Http://..
fiscal political traffic president policy
comfort, elegance, prettiness little good love relationship god
Example
10
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Hashtag suggestion (2)
0
5
10
15
20
25
30
35
0 1 2 3 4 5 6 7 8 9 10
Percentageoftweets(%)
Number of correctly suggested hashtags
5 hashtags
10 hashtags
Evaluation of 100 tweets
11
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
Conclusions and Future Work
We built a hashtag recommendation system:
Suggests general keywords
Unsupervised
In the future:
Use more context information: semantic web,
social graph,…
Adopt a hybrid approach between general and specific
hashtags
12
ELIS – Multimedia Lab
Using Topic Models for Twitter Hashtag Recommendation
Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle
Making Sense of Microposts Workshop @ World Wide Web Conference 2013
#Questions @frederic_godin

More Related Content

Similar to Using Topic Models for Twitter hashtag recommendation

M-Assessment_D-NDave
M-Assessment_D-NDaveM-Assessment_D-NDave
M-Assessment_D-NDaveDavid Sugden
 
m-Assessment_Brum_DaveNDanny
m-Assessment_Brum_DaveNDannym-Assessment_Brum_DaveNDanny
m-Assessment_Brum_DaveNDannyDavid Sugden
 
Vector Search for Data Scientists.pdf
Vector Search for Data Scientists.pdfVector Search for Data Scientists.pdf
Vector Search for Data Scientists.pdfConnorShorten2
 
Seven Skills Of Highly Effective Web2 Science Teachers
Seven Skills Of Highly Effective Web2 Science TeachersSeven Skills Of Highly Effective Web2 Science Teachers
Seven Skills Of Highly Effective Web2 Science TeachersCandace Figg
 
Basics of using social media for learning
Basics of using social media for learningBasics of using social media for learning
Basics of using social media for learningLovely Kumar
 
Sociale media voor dummies - Markant/Davidsfonds Heestert
Sociale media voor dummies - Markant/Davidsfonds HeestertSociale media voor dummies - Markant/Davidsfonds Heestert
Sociale media voor dummies - Markant/Davidsfonds HeestertGene Vangampelaere
 
Vidi webinar for Developers
Vidi webinar for DevelopersVidi webinar for Developers
Vidi webinar for DevelopersMarieke Guy
 
Building a Learning Platform fit for 2017
Building a Learning Platform fit for 2017Building a Learning Platform fit for 2017
Building a Learning Platform fit for 2017Lewis Carr
 
Fitsi web based tools presentation
Fitsi web based tools presentationFitsi web based tools presentation
Fitsi web based tools presentationMarquis
 
Social media in adult education
Social media in adult educationSocial media in adult education
Social media in adult educationNell Eckersley
 
Large-scale Learning Analytics at TU Delft
Large-scale Learning Analytics at TU DelftLarge-scale Learning Analytics at TU Delft
Large-scale Learning Analytics at TU DelftClaudia Hauff
 
Approaches to Analyzing Scientific Communication on Twitter
Approaches to Analyzing Scientific Communication on TwitterApproaches to Analyzing Scientific Communication on Twitter
Approaches to Analyzing Scientific Communication on TwitterKatrin Weller
 
Motivation and Emotion - Assessment task skills
Motivation and Emotion - Assessment task skillsMotivation and Emotion - Assessment task skills
Motivation and Emotion - Assessment task skillsJames Neill
 
Groundhog Day: Near-Duplicate Detection on Twitter
Groundhog Day: Near-Duplicate Detection on Twitter Groundhog Day: Near-Duplicate Detection on Twitter
Groundhog Day: Near-Duplicate Detection on Twitter Ke Tao
 
Educational Technology UWEX
Educational Technology UWEXEducational Technology UWEX
Educational Technology UWEXHeidi Dusek
 
Educational Technology YWC
Educational Technology YWCEducational Technology YWC
Educational Technology YWCHeidi Dusek
 
SMW Poland Day 2
SMW Poland Day 2SMW Poland Day 2
SMW Poland Day 2Tom Dixon
 
socialmedianotes
socialmedianotessocialmedianotes
socialmedianotesRussellWill
 

Similar to Using Topic Models for Twitter hashtag recommendation (20)

M-Assessment_D-NDave
M-Assessment_D-NDaveM-Assessment_D-NDave
M-Assessment_D-NDave
 
m-Assessment_Brum_DaveNDanny
m-Assessment_Brum_DaveNDannym-Assessment_Brum_DaveNDanny
m-Assessment_Brum_DaveNDanny
 
Vector Search for Data Scientists.pdf
Vector Search for Data Scientists.pdfVector Search for Data Scientists.pdf
Vector Search for Data Scientists.pdf
 
Seven Skills Of Highly Effective Web2 Science Teachers
Seven Skills Of Highly Effective Web2 Science TeachersSeven Skills Of Highly Effective Web2 Science Teachers
Seven Skills Of Highly Effective Web2 Science Teachers
 
Basics of using social media for learning
Basics of using social media for learningBasics of using social media for learning
Basics of using social media for learning
 
Sociale media voor dummies - Markant/Davidsfonds Heestert
Sociale media voor dummies - Markant/Davidsfonds HeestertSociale media voor dummies - Markant/Davidsfonds Heestert
Sociale media voor dummies - Markant/Davidsfonds Heestert
 
Vidi webinar for Developers
Vidi webinar for DevelopersVidi webinar for Developers
Vidi webinar for Developers
 
Building a Learning Platform fit for 2017
Building a Learning Platform fit for 2017Building a Learning Platform fit for 2017
Building a Learning Platform fit for 2017
 
Fitsi web based tools presentation
Fitsi web based tools presentationFitsi web based tools presentation
Fitsi web based tools presentation
 
Social media in adult education
Social media in adult educationSocial media in adult education
Social media in adult education
 
Large-scale Learning Analytics at TU Delft
Large-scale Learning Analytics at TU DelftLarge-scale Learning Analytics at TU Delft
Large-scale Learning Analytics at TU Delft
 
eMarketing Session
eMarketing SessioneMarketing Session
eMarketing Session
 
Approaches to Analyzing Scientific Communication on Twitter
Approaches to Analyzing Scientific Communication on TwitterApproaches to Analyzing Scientific Communication on Twitter
Approaches to Analyzing Scientific Communication on Twitter
 
Motivation and Emotion - Assessment task skills
Motivation and Emotion - Assessment task skillsMotivation and Emotion - Assessment task skills
Motivation and Emotion - Assessment task skills
 
Groundhog Day: Near-Duplicate Detection on Twitter
Groundhog Day: Near-Duplicate Detection on Twitter Groundhog Day: Near-Duplicate Detection on Twitter
Groundhog Day: Near-Duplicate Detection on Twitter
 
Educational Technology UWEX
Educational Technology UWEXEducational Technology UWEX
Educational Technology UWEX
 
Educational Technology YWC
Educational Technology YWCEducational Technology YWC
Educational Technology YWC
 
SMW Poland Day 2
SMW Poland Day 2SMW Poland Day 2
SMW Poland Day 2
 
Adobe presentation
Adobe presentationAdobe presentation
Adobe presentation
 
socialmedianotes
socialmedianotessocialmedianotes
socialmedianotes
 

More from fgodin

Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...fgodin
 
Skip, residual and densely connected RNN architectures
Skip, residual and densely connected RNN architecturesSkip, residual and densely connected RNN architectures
Skip, residual and densely connected RNN architecturesfgodin
 
Improving Language Modeling using Densely Connected Recurrent Neural Networks
Improving Language Modeling using Densely Connected Recurrent Neural NetworksImproving Language Modeling using Densely Connected Recurrent Neural Networks
Improving Language Modeling using Densely Connected Recurrent Neural Networksfgodin
 
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...fgodin
 
Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...
Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...
Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...fgodin
 
The Normalized Freebase Distance (NFD)
The Normalized Freebase Distance (NFD)The Normalized Freebase Distance (NFD)
The Normalized Freebase Distance (NFD)fgodin
 
Msm2013challenge
Msm2013challengeMsm2013challenge
Msm2013challengefgodin
 

More from fgodin (7)

Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
 
Skip, residual and densely connected RNN architectures
Skip, residual and densely connected RNN architecturesSkip, residual and densely connected RNN architectures
Skip, residual and densely connected RNN architectures
 
Improving Language Modeling using Densely Connected Recurrent Neural Networks
Improving Language Modeling using Densely Connected Recurrent Neural NetworksImproving Language Modeling using Densely Connected Recurrent Neural Networks
Improving Language Modeling using Densely Connected Recurrent Neural Networks
 
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
Named Entity Recognition for Twitter Microposts (only) using Distributed Word...
 
Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...
Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...
Alleviating Manual Feature Engineering for Part-of-Speech Tagging of Twitter ...
 
The Normalized Freebase Distance (NFD)
The Normalized Freebase Distance (NFD)The Normalized Freebase Distance (NFD)
The Normalized Freebase Distance (NFD)
 
Msm2013challenge
Msm2013challengeMsm2013challenge
Msm2013challenge
 

Recently uploaded

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Using Topic Models for Twitter hashtag recommendation

  • 1. ELIS – Multimedia Lab Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Using Topic Models for Twitter Hashtag Recommendation Multimedia Lab, Ghent University – iMinds, Belgium Reservoir Lab, Ghent University, Belgium Image and Video Systems Lab, KAIST, South Korea
  • 2. 2 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Introduction (1) Indexing Search Linking General Topic Memes Grouping Information retrieval
  • 3. 3 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Introduction (2) ±10% of tweets contain a hashtag 3% of the hashtags are used more than 5 times Indexing Search Linking General Topic Memes Grouping
  • 4. 4 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Goal Suggest keywords that resemble the general topic of a tweet and that could be used as a hashtag Promote hashtags for effective indexing Allow for effective search of tweets through hashtags Reduce the use of sparse hashtags
  • 5. 5 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Architectural overview Basic filterTweet Language identification Topic distribution Hashtag suggestion Hashtagged tweet
  • 6. 6 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Basic filter Clean up the tweet: URLs, special HTML entities, digits, punctuations, the hash character, … During training: Remove tweets with just one word Remove retweets
  • 7. 7 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Language identification Why We need to build a language-dependent topic model. Goal Build unsupervised classifier that discriminates between English and non-English tweets. How Using Naive Bayes and the Expectation-Maximization algorithm + character n-gram features Result Evaluation on a test set of 1000 randomly selected tweets Lui & Baldwin (LangID.py) Our algorithm Precision 97.9% 97.0% Recall 91.8% 97.8% F1 94.8% 97.4%
  • 8. 8 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Calculating the topic distribution Idea Find the general topic(s) of a tweet How Using Latent Dirichlet Allocation to find the topic distribution in an unsupervised manner Training 1.8 million tweets pre-filtered on 4000 keywords 200 topics, α=0.1, β=0.1 Example “Please RT!! sign Bernie Sanders petition for the fiscal cliff! http://..” 0 1 2 3 57 199 [0.1; 0.0 ; 0.0 ; 0.0 ; … ; 0.8 ; … ; 0.05] Topic 57: 1. Fiscal 2. Political 3. President …
  • 9. 9 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Hashtag suggestion (1) Idea Suggest a number of hashtags based on the topic distribution of the tweet How Sample the topic distribution and suggest the top ranked keywords Yay, we got sixth period today school business light time period Please RT!! Sign Bernie Sanders petition for the fiscall! Http://.. fiscal political traffic president policy comfort, elegance, prettiness little good love relationship god Example
  • 10. 10 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Hashtag suggestion (2) 0 5 10 15 20 25 30 35 0 1 2 3 4 5 6 7 8 9 10 Percentageoftweets(%) Number of correctly suggested hashtags 5 hashtags 10 hashtags Evaluation of 100 tweets
  • 11. 11 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 Conclusions and Future Work We built a hashtag recommendation system: Suggests general keywords Unsupervised In the future: Use more context information: semantic web, social graph,… Adopt a hybrid approach between general and specific hashtags
  • 12. 12 ELIS – Multimedia Lab Using Topic Models for Twitter Hashtag Recommendation Fréderic Godin, Viktor Slavkovikj, Wesley De Neve, Benjamin Schrauwen and Rik Van de Walle Making Sense of Microposts Workshop @ World Wide Web Conference 2013 #Questions @frederic_godin

Editor's Notes

  1. Footer: Micropost -> Microposts
  2. … and that could be used …Allow for effective search of tweets (through hashtags)
  3. Remove the full stopsLanguage dependent -> Language-dependentWhy? -> Why (for reasons of consistency)
  4. Those 4000 keywords are used to getsomemeaningfultweets. Otherwise the set was to big for training the algorithm. Ifyoutake a smaller sample than 4 days, thenagainyou of to few coherent tweets to train the model. Thosekeywordsdon’tbecome the most important keywordwithin a topic. Ex. Keyword president. The topic was fiscalcliff and politicalproblems.
  5. Misschienverduidelijken hoe je de verdeling van de topics bemonsterd?Op de vorige slide misschienookverduidelijken hoe je de topics hebtgeselecteerd?
  6. an hashtag -> a hashtagsocial graph -> social graph, …To suggest general keywords -> Suggests general keywordsFuture work: anderetechniekenom topics tebepalen? Bayesian inference, deep learning, … ;-)?