SlideShare a Scribd company logo
1 of 36
Download to read offline
Visualizing Topic Models
Ben Mabey
@bmabey
2
2
Latent Dirichlet Allocation
(LDA)
2
0 1 … k
doc a 0.25 0.14 … 0.02
doc b 0.01 0.30 … 0.09
… … … … 0.31
doc D 0.13 0.07 … 0.01
Document-Topic Distributions
Latent Dirichlet Allocation
(LDA)
2
0 1 … k
doc a 0.25 0.14 … 0.02
doc b 0.01 0.30 … 0.09
… … … … 0.31
doc D 0.13 0.07 … 0.01
Document-Topic Distributions
0 1 … k
bird 0.002 0.01 … 0.004
coffee 0.001 0.003 … 0.009
… … … … 0.031
work 0.002 0.006 … 0.021
Term-Topic Distributions
Latent Dirichlet Allocation
(LDA)
3
3
250k+ stories
July 2007 - May 2014
4
Game written by 14 year old passes Angry Birds as the top free iphone app
4
Topic P(T|D)
58 0.19
38 0.14
16 0.06
… …
Game written by 14 year old passes Angry Birds as the top free iphone app
4
Topic P(T|D)
58 0.19
38 0.14
16 0.06
… …
58 38 16
app game language
developer player code
mobile video game programming
user gaming java
app store developer programmer
Game written by 14 year old passes Angry Birds as the top free iphone app
5
Topic P(T|D)
mobile apps 0.19
38 0.14
16 0.06
… …
Table 2
58mobile apps 38video games 16programming
app game language
developer player code
application video game programming
user gaming java
app store developer programmer
mobile play programming language
mobile apps 38 16
app game language
developer player code
mobile video game programming
user gaming java
app store developer programmer
Game written by 14 year old passes Angry Birds as the top free iphone app
6
Topic P(T|D)
mobile apps 0.19
video games 0.14
16 0.06
… …
Table 2
58mobile apps 38video games 16programming
app game language
developer player code
application video game programming
user gaming java
app store developer programmer
mobile play programming language
mobile apps video games 16
app game language
developer player code
mobile video game programming
user gaming java
app store developer programmer
Game written by 14 year old passes Angry Birds as the top free iphone app
7
Topic P(T|D)
mobile apps 0.19
video games 0.14
programming 0.06
… …
Table 2
58mobile apps 38video games 16programming
app game language
developer player code
application video game programming
user gaming java
app store developer programmer
mobile play programming language
mobile apps video games programming
app game language
developer player code
mobile video game programming
user gaming java
app store developer programmer
Game written by 14 year old passes Angry Birds as the top free iphone app
8
Interpreting Topic Models
8
What	
  is	
  the	
  meaning	
  of	
  each	
  topic?
Interpreting Topic Models
8
What	
  is	
  the	
  meaning	
  of	
  each	
  topic?
How	
  prevalent	
  is	
  each	
  topic?
Interpreting Topic Models
8
What	
  is	
  the	
  meaning	
  of	
  each	
  topic?
How	
  prevalent	
  is	
  each	
  topic?
How	
  do	
  the	
  topics	
  relate	
  to	
  each	
  other?
Interpreting Topic Models
8
What	
  is	
  the	
  meaning	
  of	
  each	
  topic?
How	
  prevalent	
  is	
  each	
  topic?
How	
  do	
  the	
  topics	
  relate	
  to	
  each	
  other?
How	
  do	
  the	
  documents	
  relate	
  to	
  each	
  other?
Interpreting Topic Models
9
What	
  is	
  the	
  meaning	
  of	
  each	
  topic?	
  
How	
  prevalent	
  is	
  each	
  topic?	
  
How	
  do	
  the	
  topics	
  relate	
  to	
  each	
  other?	
  
How	
  do	
  the	
  documents	
  relate	
  to	
  each	
  other?
Interpreting Topic Models
LDAvis
10
https://github.com/cpsievert/LDAvis
pyLDAvis
11
https://github.com/bmabey/pyLDAvis
py
pyLDAvis
11
https://github.com/bmabey/pyLDAvis
py
BYOM
(bring your own model)
Demo Time!
12
Distinctiveness & Saliency
13
Termite: Visualization Techniques for Assessing Textual Topic Models
Jason Chuang, Christopher D. Manning and Jeffrey Heer. 2012
measure	
  how	
  much	
  information	
  a	
  term	
  conveys	
  about	
  topics
Distinctiveness & Saliency
14
coding tech news video games distinctiveness P(w) saliency
game 10 10 50 0.03 0.28 0.01
apple 20 40 20 -0.16 0.32 -0.05
angry birds 1 1 30 0.25 0.13 0.03
python 50 5 10 0.17 0.26 0.05
TOTAL 81 56 110
P(T|game) 0.14 0.14 0.71
P(T|apple) 0.25 0.50 0.25
P(T|angry birds) 0.03 0.03 0.94
P(T|pyhton) 0.77 0.08 0.15
P(T) 0.33 0.23 0.45
Distinctiveness & Saliency
14
coding tech news video games distinctiveness P(w) saliency
game 10 10 50 0.03 0.28 0.01
apple 20 40 20 -0.16 0.32 -0.05
angry birds 1 1 30 0.25 0.13 0.03
python 50 5 10 0.17 0.26 0.05
TOTAL 81 56 110
P(T|game) 0.14 0.14 0.71
P(T|apple) 0.25 0.50 0.25
P(T|angry birds) 0.03 0.03 0.94
P(T|pyhton) 0.77 0.08 0.15
P(T) 0.33 0.23 0.45
computes the KL divergence between
the distribution of topics given a term and
the marginal distribution of topics
Distinctiveness & Saliency
14
coding tech news video games distinctiveness P(w) saliency
game 10 10 50 0.03 0.28 0.01
apple 20 40 20 -0.16 0.32 -0.05
angry birds 1 1 30 0.25 0.13 0.03
python 50 5 10 0.17 0.26 0.05
TOTAL 81 56 110
P(T|game) 0.14 0.14 0.71
P(T|apple) 0.25 0.50 0.25
P(T|angry birds) 0.03 0.03 0.94
P(T|pyhton) 0.77 0.08 0.15
P(T) 0.33 0.23 0.45
computes the KL divergence between
the distribution of topics given a term and
the marginal distribution of topics
Distinctiveness & Saliency
15
coding tech news video games distinctiveness P(w) saliency
game 10 10 50 0.03 0.28 0.01
apple 20 40 20 -0.16 0.32 -0.05
angry birds 1 1 30 0.25 0.13 0.03
python 50 5 10 0.17 0.26 0.05
TOTAL 81 56 110
P(T|game) 0.14 0.14 0.71
P(T|apple) 0.25 0.50 0.25
P(T|angry birds) 0.03 0.03 0.94
P(T|pyhton) 0.77 0.08 0.15
P(T) 0.33 0.23 0.45
computes the KL divergence between
the distribution of topics given a term and
the marginal distribution of topics
Distinctiveness & Saliency
16
coding tech news video games distinctiveness P(w) saliency
game 10 10 50 0.03 0.28 0.01
apple 20 40 20 -0.16 0.32 -0.05
angry birds 1 1 30 0.25 0.13 0.03
python 50 5 10 0.17 0.26 0.05
TOTAL 81 56 110
P(T|game) 0.14 0.14 0.71
P(T|apple) 0.25 0.50 0.25
P(T|angry birds) 0.03 0.03 0.94
P(T|pyhton) 0.77 0.08 0.15
P(T) 0.33 0.23 0.45
computes the KL divergence between
the distribution of topics given a term and
the marginal distribution of topics
Distinctiveness & Saliency
17
coding tech news video games distinctiveness P(w) saliency
game 10 10 50 0.03 0.28 0.01
apple 20 40 20 -0.16 0.32 -0.05
angry birds 1 1 30 0.25 0.13 0.03
python 50 5 10 0.17 0.26 0.05
TOTAL 81 56 110
P(T|game) 0.14 0.14 0.71
P(T|apple) 0.25 0.50 0.25
P(T|angry birds) 0.03 0.03 0.94
P(T|pyhton) 0.77 0.08 0.15
P(T) 0.33 0.23 0.45
computes the KL divergence between
the distribution of topics given a term and
the marginal distribution of topics
Distinctiveness & Saliency
18
coding tech news video games distinctiveness P(w) saliency
game 10 10 50 0.03 0.28 0.01
apple 20 40 20 -0.16 0.32 -0.05
angry birds 1 1 30 0.25 0.13 0.03
python 50 5 10 0.17 0.26 0.05
TOTAL 81 56 110
P(T|game) 0.14 0.14 0.71
P(T|apple) 0.25 0.50 0.25
P(T|angry birds) 0.03 0.03 0.94
P(T|pyhton) 0.77 0.08 0.15
P(T) 0.33 0.23 0.45
distinctiveness weighted by the
term's overall frequency
computes the KL divergence between
the distribution of topics given a term and
the marginal distribution of topics
Distinctiveness & Saliency
18
coding tech news video games distinctiveness P(w) saliency
game 10 10 50 0.03 0.28 0.01
apple 20 40 20 -0.16 0.32 -0.05
angry birds 1 1 30 0.25 0.13 0.03
python 50 5 10 0.17 0.26 0.05
TOTAL 81 56 110
P(T|game) 0.14 0.14 0.71
P(T|apple) 0.25 0.50 0.25
P(T|angry birds) 0.03 0.03 0.94
P(T|pyhton) 0.77 0.08 0.15
P(T) 0.33 0.23 0.45
distinctiveness weighted by the
term's overall frequency
computes the KL divergence between
the distribution of topics given a term and
the marginal distribution of topics
Distinctiveness & Saliency
18
coding tech news video games distinctiveness P(w) saliency
game 10 10 50 0.03 0.28 0.01
apple 20 40 20 -0.16 0.32 -0.05
angry birds 1 1 30 0.25 0.13 0.03
python 50 5 10 0.17 0.26 0.05
TOTAL 81 56 110
P(T|game) 0.14 0.14 0.71
P(T|apple) 0.25 0.50 0.25
P(T|angry birds) 0.03 0.03 0.94
P(T|pyhton) 0.77 0.08 0.15
P(T) 0.33 0.23 0.45
distinctiveness weighted by the
term's overall frequency
computes the KL divergence between
the distribution of topics given a term and
the marginal distribution of topics
Distinctiveness & Saliency
19
measure	
  how	
  much	
  information	
  a	
  term	
  conveys	
  about	
  topics…
Distinctiveness & Saliency
19
measure	
  how	
  much	
  information	
  a	
  term	
  conveys	
  about	
  topics…
globally
Thank you!
Learn more at http://github.com/bmabey/pyLDAvis
Ben Mabey
@bmabey
http://nbviewer.ipython.org/github/bmabey/hacker_news_topic_modelling/

More Related Content

Viewers also liked

A Note on Expectation-Propagation for Latent Dirichlet Allocation
A Note on Expectation-Propagation for Latent Dirichlet AllocationA Note on Expectation-Propagation for Latent Dirichlet Allocation
A Note on Expectation-Propagation for Latent Dirichlet AllocationTomonari Masada
 
Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Al...
Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Al...Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Al...
Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Al...Tomonari Masada
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentationSoojung Hong
 
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic ModelA Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic ModelTomonari Masada
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationMarco Righini
 
[SEN#6] Le classement des Entreprises de Services du Numérique (ESN, ex SSII)...
[SEN#6] Le classement des Entreprises de Services du Numérique (ESN, ex SSII)...[SEN#6] Le classement des Entreprises de Services du Numérique (ESN, ex SSII)...
[SEN#6] Le classement des Entreprises de Services du Numérique (ESN, ex SSII)...FrenchWeb.fr
 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationTomonari Masada
 
Introduction au Data Mining et Méthodes Statistiques
Introduction au Data Mining et Méthodes StatistiquesIntroduction au Data Mining et Méthodes Statistiques
Introduction au Data Mining et Méthodes StatistiquesGiorgio Pauletto
 
Blei ngjordan2003
Blei ngjordan2003Blei ngjordan2003
Blei ngjordan2003Ajay Ohri
 
Salon Big Data 2015 : Big Data et Marketing Digital, Retours d’expérience en ...
Salon Big Data 2015 : Big Data et Marketing Digital, Retours d’expérience en ...Salon Big Data 2015 : Big Data et Marketing Digital, Retours d’expérience en ...
Salon Big Data 2015 : Big Data et Marketing Digital, Retours d’expérience en ...Soft Computing
 
Du datamining à la datascience
Du datamining à la datascienceDu datamining à la datascience
Du datamining à la datascienceSoft Computing
 
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation System
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation SystemLatent Dirichlet Allocation as a Twitter Hashtag Recommendation System
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation SystemShailly Saxena
 
Graphiques interactifs avec R
Graphiques interactifs avec RGraphiques interactifs avec R
Graphiques interactifs avec Rparisraddicts
 
CV et lettre de motivation Objectif 2017
CV et lettre de motivation Objectif 2017CV et lettre de motivation Objectif 2017
CV et lettre de motivation Objectif 2017REALIZ
 

Viewers also liked (19)

A Note on Expectation-Propagation for Latent Dirichlet Allocation
A Note on Expectation-Propagation for Latent Dirichlet AllocationA Note on Expectation-Propagation for Latent Dirichlet Allocation
A Note on Expectation-Propagation for Latent Dirichlet Allocation
 
Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Al...
Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Al...Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Al...
Accelerating Collapsed Variational Bayesian Inference for Latent Dirichlet Al...
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic ModelA Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
[SEN#6] Le classement des Entreprises de Services du Numérique (ESN, ex SSII)...
[SEN#6] Le classement des Entreprises de Services du Numérique (ESN, ex SSII)...[SEN#6] Le classement des Entreprises de Services du Numérique (ESN, ex SSII)...
[SEN#6] Le classement des Entreprises de Services du Numérique (ESN, ex SSII)...
 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
 
Introduction au Data Mining et Méthodes Statistiques
Introduction au Data Mining et Méthodes StatistiquesIntroduction au Data Mining et Méthodes Statistiques
Introduction au Data Mining et Méthodes Statistiques
 
Blei ngjordan2003
Blei ngjordan2003Blei ngjordan2003
Blei ngjordan2003
 
C4.5
C4.5C4.5
C4.5
 
Text mining
Text miningText mining
Text mining
 
LDA入門
LDA入門LDA入門
LDA入門
 
Du CRM à la DMP
Du CRM à la DMPDu CRM à la DMP
Du CRM à la DMP
 
Salon Big Data 2015 : Big Data et Marketing Digital, Retours d’expérience en ...
Salon Big Data 2015 : Big Data et Marketing Digital, Retours d’expérience en ...Salon Big Data 2015 : Big Data et Marketing Digital, Retours d’expérience en ...
Salon Big Data 2015 : Big Data et Marketing Digital, Retours d’expérience en ...
 
Du datamining à la datascience
Du datamining à la datascienceDu datamining à la datascience
Du datamining à la datascience
 
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation System
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation SystemLatent Dirichlet Allocation as a Twitter Hashtag Recommendation System
Latent Dirichlet Allocation as a Twitter Hashtag Recommendation System
 
Graphiques interactifs avec R
Graphiques interactifs avec RGraphiques interactifs avec R
Graphiques interactifs avec R
 
CV et lettre de motivation Objectif 2017
CV et lettre de motivation Objectif 2017CV et lettre de motivation Objectif 2017
CV et lettre de motivation Objectif 2017
 

More from Turi, Inc.

Webinar - Analyzing Video
Webinar - Analyzing VideoWebinar - Analyzing Video
Webinar - Analyzing VideoTuri, Inc.
 
Webinar - Patient Readmission Risk
Webinar - Patient Readmission RiskWebinar - Patient Readmission Risk
Webinar - Patient Readmission RiskTuri, Inc.
 
Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Turi, Inc.
 
Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)Turi, Inc.
 
Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)Turi, Inc.
 
Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)Turi, Inc.
 
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsScaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsTuri, Inc.
 
Pattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log DataPattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log DataTuri, Inc.
 
Intelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsIntelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsTuri, Inc.
 
Text Analysis with Machine Learning
Text Analysis with Machine LearningText Analysis with Machine Learning
Text Analysis with Machine LearningTuri, Inc.
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab CreateTuri, Inc.
 
Machine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive ServicesMachine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive ServicesTuri, Inc.
 
Machine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos GuestrinMachine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos GuestrinTuri, Inc.
 
Scalable data structures for data science
Scalable data structures for data scienceScalable data structures for data science
Scalable data structures for data scienceTuri, Inc.
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Turi, Inc.
 
Introduction to Recommender Systems
Introduction to Recommender SystemsIntroduction to Recommender Systems
Introduction to Recommender SystemsTuri, Inc.
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in productionTuri, Inc.
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringTuri, Inc.
 
Building Personalized Data Products with Dato
Building Personalized Data Products with DatoBuilding Personalized Data Products with Dato
Building Personalized Data Products with DatoTuri, Inc.
 

More from Turi, Inc. (20)

Webinar - Analyzing Video
Webinar - Analyzing VideoWebinar - Analyzing Video
Webinar - Analyzing Video
 
Webinar - Patient Readmission Risk
Webinar - Patient Readmission RiskWebinar - Patient Readmission Risk
Webinar - Patient Readmission Risk
 
Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)
 
Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)
 
Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)
 
Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)
 
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsScaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
 
Pattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log DataPattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log Data
 
Intelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsIntelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning Toolkits
 
Text Analysis with Machine Learning
Text Analysis with Machine LearningText Analysis with Machine Learning
Text Analysis with Machine Learning
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab Create
 
Machine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive ServicesMachine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive Services
 
Machine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos GuestrinMachine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos Guestrin
 
Scalable data structures for data science
Scalable data structures for data scienceScalable data structures for data science
Scalable data structures for data science
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
 
Introduction to Recommender Systems
Introduction to Recommender SystemsIntroduction to Recommender Systems
Introduction to Recommender Systems
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature Engineering
 
SFrame
SFrameSFrame
SFrame
 
Building Personalized Data Products with Dato
Building Personalized Data Products with DatoBuilding Personalized Data Products with Dato
Building Personalized Data Products with Dato
 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 

Visualzing Topic Models

  • 2. 2
  • 4. 2 0 1 … k doc a 0.25 0.14 … 0.02 doc b 0.01 0.30 … 0.09 … … … … 0.31 doc D 0.13 0.07 … 0.01 Document-Topic Distributions Latent Dirichlet Allocation (LDA)
  • 5. 2 0 1 … k doc a 0.25 0.14 … 0.02 doc b 0.01 0.30 … 0.09 … … … … 0.31 doc D 0.13 0.07 … 0.01 Document-Topic Distributions 0 1 … k bird 0.002 0.01 … 0.004 coffee 0.001 0.003 … 0.009 … … … … 0.031 work 0.002 0.006 … 0.021 Term-Topic Distributions Latent Dirichlet Allocation (LDA)
  • 6. 3
  • 8. 4 Game written by 14 year old passes Angry Birds as the top free iphone app
  • 9. 4 Topic P(T|D) 58 0.19 38 0.14 16 0.06 … … Game written by 14 year old passes Angry Birds as the top free iphone app
  • 10. 4 Topic P(T|D) 58 0.19 38 0.14 16 0.06 … … 58 38 16 app game language developer player code mobile video game programming user gaming java app store developer programmer Game written by 14 year old passes Angry Birds as the top free iphone app
  • 11. 5 Topic P(T|D) mobile apps 0.19 38 0.14 16 0.06 … … Table 2 58mobile apps 38video games 16programming app game language developer player code application video game programming user gaming java app store developer programmer mobile play programming language mobile apps 38 16 app game language developer player code mobile video game programming user gaming java app store developer programmer Game written by 14 year old passes Angry Birds as the top free iphone app
  • 12. 6 Topic P(T|D) mobile apps 0.19 video games 0.14 16 0.06 … … Table 2 58mobile apps 38video games 16programming app game language developer player code application video game programming user gaming java app store developer programmer mobile play programming language mobile apps video games 16 app game language developer player code mobile video game programming user gaming java app store developer programmer Game written by 14 year old passes Angry Birds as the top free iphone app
  • 13. 7 Topic P(T|D) mobile apps 0.19 video games 0.14 programming 0.06 … … Table 2 58mobile apps 38video games 16programming app game language developer player code application video game programming user gaming java app store developer programmer mobile play programming language mobile apps video games programming app game language developer player code mobile video game programming user gaming java app store developer programmer Game written by 14 year old passes Angry Birds as the top free iphone app
  • 15. 8 What  is  the  meaning  of  each  topic? Interpreting Topic Models
  • 16. 8 What  is  the  meaning  of  each  topic? How  prevalent  is  each  topic? Interpreting Topic Models
  • 17. 8 What  is  the  meaning  of  each  topic? How  prevalent  is  each  topic? How  do  the  topics  relate  to  each  other? Interpreting Topic Models
  • 18. 8 What  is  the  meaning  of  each  topic? How  prevalent  is  each  topic? How  do  the  topics  relate  to  each  other? How  do  the  documents  relate  to  each  other? Interpreting Topic Models
  • 19. 9 What  is  the  meaning  of  each  topic?   How  prevalent  is  each  topic?   How  do  the  topics  relate  to  each  other?   How  do  the  documents  relate  to  each  other? Interpreting Topic Models
  • 24. Distinctiveness & Saliency 13 Termite: Visualization Techniques for Assessing Textual Topic Models Jason Chuang, Christopher D. Manning and Jeffrey Heer. 2012 measure  how  much  information  a  term  conveys  about  topics
  • 25. Distinctiveness & Saliency 14 coding tech news video games distinctiveness P(w) saliency game 10 10 50 0.03 0.28 0.01 apple 20 40 20 -0.16 0.32 -0.05 angry birds 1 1 30 0.25 0.13 0.03 python 50 5 10 0.17 0.26 0.05 TOTAL 81 56 110 P(T|game) 0.14 0.14 0.71 P(T|apple) 0.25 0.50 0.25 P(T|angry birds) 0.03 0.03 0.94 P(T|pyhton) 0.77 0.08 0.15 P(T) 0.33 0.23 0.45
  • 26. Distinctiveness & Saliency 14 coding tech news video games distinctiveness P(w) saliency game 10 10 50 0.03 0.28 0.01 apple 20 40 20 -0.16 0.32 -0.05 angry birds 1 1 30 0.25 0.13 0.03 python 50 5 10 0.17 0.26 0.05 TOTAL 81 56 110 P(T|game) 0.14 0.14 0.71 P(T|apple) 0.25 0.50 0.25 P(T|angry birds) 0.03 0.03 0.94 P(T|pyhton) 0.77 0.08 0.15 P(T) 0.33 0.23 0.45 computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics
  • 27. Distinctiveness & Saliency 14 coding tech news video games distinctiveness P(w) saliency game 10 10 50 0.03 0.28 0.01 apple 20 40 20 -0.16 0.32 -0.05 angry birds 1 1 30 0.25 0.13 0.03 python 50 5 10 0.17 0.26 0.05 TOTAL 81 56 110 P(T|game) 0.14 0.14 0.71 P(T|apple) 0.25 0.50 0.25 P(T|angry birds) 0.03 0.03 0.94 P(T|pyhton) 0.77 0.08 0.15 P(T) 0.33 0.23 0.45 computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics
  • 28. Distinctiveness & Saliency 15 coding tech news video games distinctiveness P(w) saliency game 10 10 50 0.03 0.28 0.01 apple 20 40 20 -0.16 0.32 -0.05 angry birds 1 1 30 0.25 0.13 0.03 python 50 5 10 0.17 0.26 0.05 TOTAL 81 56 110 P(T|game) 0.14 0.14 0.71 P(T|apple) 0.25 0.50 0.25 P(T|angry birds) 0.03 0.03 0.94 P(T|pyhton) 0.77 0.08 0.15 P(T) 0.33 0.23 0.45 computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics
  • 29. Distinctiveness & Saliency 16 coding tech news video games distinctiveness P(w) saliency game 10 10 50 0.03 0.28 0.01 apple 20 40 20 -0.16 0.32 -0.05 angry birds 1 1 30 0.25 0.13 0.03 python 50 5 10 0.17 0.26 0.05 TOTAL 81 56 110 P(T|game) 0.14 0.14 0.71 P(T|apple) 0.25 0.50 0.25 P(T|angry birds) 0.03 0.03 0.94 P(T|pyhton) 0.77 0.08 0.15 P(T) 0.33 0.23 0.45 computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics
  • 30. Distinctiveness & Saliency 17 coding tech news video games distinctiveness P(w) saliency game 10 10 50 0.03 0.28 0.01 apple 20 40 20 -0.16 0.32 -0.05 angry birds 1 1 30 0.25 0.13 0.03 python 50 5 10 0.17 0.26 0.05 TOTAL 81 56 110 P(T|game) 0.14 0.14 0.71 P(T|apple) 0.25 0.50 0.25 P(T|angry birds) 0.03 0.03 0.94 P(T|pyhton) 0.77 0.08 0.15 P(T) 0.33 0.23 0.45 computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics
  • 31. Distinctiveness & Saliency 18 coding tech news video games distinctiveness P(w) saliency game 10 10 50 0.03 0.28 0.01 apple 20 40 20 -0.16 0.32 -0.05 angry birds 1 1 30 0.25 0.13 0.03 python 50 5 10 0.17 0.26 0.05 TOTAL 81 56 110 P(T|game) 0.14 0.14 0.71 P(T|apple) 0.25 0.50 0.25 P(T|angry birds) 0.03 0.03 0.94 P(T|pyhton) 0.77 0.08 0.15 P(T) 0.33 0.23 0.45 distinctiveness weighted by the term's overall frequency computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics
  • 32. Distinctiveness & Saliency 18 coding tech news video games distinctiveness P(w) saliency game 10 10 50 0.03 0.28 0.01 apple 20 40 20 -0.16 0.32 -0.05 angry birds 1 1 30 0.25 0.13 0.03 python 50 5 10 0.17 0.26 0.05 TOTAL 81 56 110 P(T|game) 0.14 0.14 0.71 P(T|apple) 0.25 0.50 0.25 P(T|angry birds) 0.03 0.03 0.94 P(T|pyhton) 0.77 0.08 0.15 P(T) 0.33 0.23 0.45 distinctiveness weighted by the term's overall frequency computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics
  • 33. Distinctiveness & Saliency 18 coding tech news video games distinctiveness P(w) saliency game 10 10 50 0.03 0.28 0.01 apple 20 40 20 -0.16 0.32 -0.05 angry birds 1 1 30 0.25 0.13 0.03 python 50 5 10 0.17 0.26 0.05 TOTAL 81 56 110 P(T|game) 0.14 0.14 0.71 P(T|apple) 0.25 0.50 0.25 P(T|angry birds) 0.03 0.03 0.94 P(T|pyhton) 0.77 0.08 0.15 P(T) 0.33 0.23 0.45 distinctiveness weighted by the term's overall frequency computes the KL divergence between the distribution of topics given a term and the marginal distribution of topics
  • 34. Distinctiveness & Saliency 19 measure  how  much  information  a  term  conveys  about  topics…
  • 35. Distinctiveness & Saliency 19 measure  how  much  information  a  term  conveys  about  topics… globally
  • 36. Thank you! Learn more at http://github.com/bmabey/pyLDAvis Ben Mabey @bmabey http://nbviewer.ipython.org/github/bmabey/hacker_news_topic_modelling/