SlideShare a Scribd company logo
LDAvis:
A method for visualizing and interpreting topics
Carson Sievert Iowa State University
Kenneth E. Shirley AT&T Labs Research
Illvi2014
• LDAvis, a web-based interactive visualization of topics
estimated using Latent Dirichlet Allocation that is built using
a combination of R and D3.
• We introduce LDAvis that attempts to answer a few basic
questions about a fitted topic model:
I . I N T R O D U C T I O N
A. What is the meaning of each topic?,
B. How prevalent is each topic?,
C. How do the topics relate to each other?
I . I N T R O D U C T I O N
I I . R E L A T E D W O R K
LDA Turbo-
Topics
liftPMI
I I . R E L A T E D W O R K
LDA
Latent Dirichlet allocation (LDA)
is an example of a topic model and
was first presented as a graphical
model for topic discovery by
David Blei, Andrew Ng, and
Michael I. Jordan in 2003.
I I . R E L A T E D W O R K
Turbo-
Topics
Blei and Lafferty (2009) developed
“Turbo Topics”, a method of
identifying n-grams within LDA
inferred topics。
the resulting output is still simply a
ranked list containing a mixture of
terms and n-grams。
I I . R E L A T E D W O R K
Turbo-
Topics
I I . R E L A T E D W O R K
Pointwise Mutual Information(PMI) :
PMI
Independence : P(x , y) = P(x) × P(y)
PMI = log1 = 0
I I . R E L A T E D W O R K
lift
Lift:
Φ kw denote the probability of term
w ∈ {1, ..., V } for topic k ∈ {1, ..., K},
where V denotes the number of
terms in the vocabulary, and let pw
denote the marginal probability of
term w in the corpus.
I I . T O P I C M O D E L V I S U A L I Z A T I O N
• A number of visualization systems for topic models have been
developed in recent years.
• But, the visualization elements are limited to barcharts or
word clouds of term probabilities for each topic, pie charts of
topic probabilities for each document, and/or various
barcharts or scatterplots related to document metadata.
I I . T O P I C M O D E L V I S U A L I Z A T I O N
• Chuang et al. (2012b) develop such a tool, called “Termite”,
which visualizes the set of topic term distributions estimated
in LDA using a matrix layout.
• http://vis.stanford.edu/papers/termite
I I . T O P I C M O D E L V I S U A L I Z A T I O N
• 在代表主題內容的關鍵字選擇上,Termite定義了saliency 變數: saliency(w)
= P(w)× distinctiveness(w)。
• w 代表一個關鍵字; P(w)代表w的頻率;distinctiveness(w)代表w在主題間
的差異性,包含w的主題越多,distinctiveness(w)值越低。
• 在關鍵詞排列上,Termite 把經常連續出現的詞,如social和networks 排在
一起以便增強語義讓使用者較好理解。
I I I . R E L E V A N C E O F T E R M S T O T O P I C S
• Here we define relevance, our method for ranking terms
within topics, and we describe the results of a user study to
learn an optimal tuning parameter in the computation of
relevance.
• We define the relevance of term w to topic k given a weight
parameter λ (where 0 ≦ λ ≦ 1).
I I I . R E L E V A N C E O F T E R M S T O T O P I C S
• “λ” determines the weight given to the probability of term w
under topic k relative to its lift.
• Setting “λ” = 1 results in the familiar ranking of terms in
decreasing order of their topic-specific probability, and
setting “λ”= 0 ranks terms solely by their lift.
I I I . R E L E V A N C E O F T E R M S T O T O P I C S
• We fit a 50-topic model to the 20 Newsgroups data and each
term in the vocabulary (which has size V = 22, 524) for a
given topic.
• Figure shows this plot for Topic 29, which documents posted
to the “Motorcycles” Newsgroup, but also from documents
posted to the “Automobiles” Newsgroup and the “Electronics”
Newsgroup.
I I I . U S E R S T U D Y

13, 695 documents
20 Newsgroups
I I I . U S E R S T U D Y
• Some of the LDA-inferred topics occurred almost exclusively
(> 90% of occurrences) from a single Newsgroup, such as
Topic 38. which was came from the documents posted to the
“Medicine” (or “sci.med”) Newsgroup.
• Other topics occurred in a wide variety of Newsgroups. One
would expect these “spread-out” topics to be harder to
interpret than the “pure” topics like Topic 38.
I I I . U S E R S T U D Y
• we recruited 29 subjects among our colleagues (research
scientists at AT&T Labs with moderate familiarity with text
mining techniques and topic models).
• each subject completed an online experiment consisting of 50
tasks k (for k ∈ {1, ..., 50}).
• Task k was to read a list of five terms, ranked from 1-5 in
order of relevance to topic k, where “λ” ∈ (0, 1) was randomly
sampled to compute relevance.
I I I . U S E R S T U D Y
• we expected the proportion of correct responses to be roughly
1/3 no matter the value of λ used to compute relevance.
• In fact, seven of the topics were correctly identified by all 29
users, and one topic was incorrectly by all users.
• we estimated a topic specific intercept term to control the
difficulty of the topic (not just due to its tokens variety, but
also to account for the inherent familiarity of each topic to
our subject.)
I I I . U S E R S T U D Y
• The estimated effects of λ and λ² was statistically significant
(χ² p-value = 0.018).
• there was roughly a 67% baseline probability of correct
identification. As Figure 3 shows, for these topics, the
“optimal” value of λ was about 0.6.
• λ:0 ≒ 53 % and λ:1 ≒ 63 %
• We view this as evidence that where λ < 1 can really improve
topic interpretability.
I I I . U S E R S T U D Y
I V . T H E L D A V I S S Y S T E M
Installation: pip install pyldavis
I V . T H E L D A V I S S Y S T E M
http://nbviewer.jupyter.org/github/bmabey/pyLDAvis/blob/
master/notebooks/pyLDAvis_overview.ipynb
I V . T H E L D A V I S S Y S T E M
http://nbviewer.jupyter.org/github/effytseng/dh_ldavis/blob/ma
ster/LDA-practice-1.html#topic=0&lambda=1&term=
http://nbviewer.jupyter.org/github/effytseng/dh_ldavis/blob/ma
ster/python-lldavis_test1-24-2.html
http://nbviewer.jupyter.org/github/bmabey/pyLDAvis/blob/mast
er/notebooks/Gensim%20Newsgroup.ipynb
.用 gensim 處理文本:
.自由中國 ldavis 圖表:
.自由中國 l-ldavis 圖表:
I V . T H E L D A V I S S Y S T E M
• the areas of the circles are proportional to the relative
prevalences of the topics in the corpus
• The default for computing inter-topic distances is Jensen-
Shannon divergence.
• The default for scaling the set of inter-topic distances defaults
to Principal Components.
I V . T H E L D A V I S S Y S T E M
I V . T H E L D A V I S S Y S T E M
I V . T H E L D A V I S S Y S T E M
V . D I S C U S S I O N
A. What is the meaning of each topic?,
B. How prevalent is each topic?,
C. How do the topics relate to each other?
V . D I S C U S S I O N
• For future work, we anticipate performing a larger user study to
further understand how to facilitate topic interpretation in fitted
LDA models
• In addition to relevance. The need to visualize correlations
between topics can provide insight into what is happening on the
document level without actually displaying entire documents.
• Last, we seek a solution to the problem of visualizing a large
number of topics (say, from 100 - 500 topics) in a compact way.

More Related Content

What's hot

Computing with Directed Labeled Graphs
Computing with Directed Labeled GraphsComputing with Directed Labeled Graphs
Computing with Directed Labeled GraphsMarko Rodriguez
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
Marco Righini
 
Interactive Latent Dirichlet Allocation
Interactive Latent Dirichlet AllocationInteractive Latent Dirichlet Allocation
Interactive Latent Dirichlet Allocation
Quentin Pleplé
 
Bytewise Approximate Match: Theory, Algorithms and Applications
Bytewise Approximate Match:  Theory, Algorithms and ApplicationsBytewise Approximate Match:  Theory, Algorithms and Applications
Bytewise Approximate Match: Theory, Algorithms and Applications
Liwei Ren任力偉
 
Author Topic Model
Author Topic ModelAuthor Topic Model
Author Topic Model
FReeze FRancis
 
The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning
Jeff Z. Pan
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
Nik Spirin
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Leonardo Di Donato
 
Transfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningTransfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine Learning
Sebastian Ruder
 
Semantics reloaded
Semantics reloadedSemantics reloaded
Semantics reloaded
Steffen Staab
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
Soojung Hong
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
Yueshen Xu
 
Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...
Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...
Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...
Lukas Galke
 
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge GraphJoint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
FedorNikolaev
 
Meta learning tutorial
Meta learning tutorialMeta learning tutorial
Meta learning tutorial
Joaquin Vanschoren
 
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Johann Petrak
 
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
Jeff Z. Pan
 
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Isabelle Augenstein
 
Supporting Springer Nature Editors by means of Semantic Technologies
Supporting Springer Nature Editors by means of Semantic TechnologiesSupporting Springer Nature Editors by means of Semantic Technologies
Supporting Springer Nature Editors by means of Semantic Technologies
Francesco Osborne
 

What's hot (20)

Computing with Directed Labeled Graphs
Computing with Directed Labeled GraphsComputing with Directed Labeled Graphs
Computing with Directed Labeled Graphs
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Interactive Latent Dirichlet Allocation
Interactive Latent Dirichlet AllocationInteractive Latent Dirichlet Allocation
Interactive Latent Dirichlet Allocation
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Bytewise Approximate Match: Theory, Algorithms and Applications
Bytewise Approximate Match:  Theory, Algorithms and ApplicationsBytewise Approximate Match:  Theory, Algorithms and Applications
Bytewise Approximate Match: Theory, Algorithms and Applications
 
Author Topic Model
Author Topic ModelAuthor Topic Model
Author Topic Model
 
The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning The Maze of Deletion in Ontology Stream Reasoning
The Maze of Deletion in Ontology Stream Reasoning
 
Language Models for Information Retrieval
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
 
Transfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine LearningTransfer Learning -- The Next Frontier for Machine Learning
Transfer Learning -- The Next Frontier for Machine Learning
 
Semantics reloaded
Semantics reloadedSemantics reloaded
Semantics reloaded
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
 
Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...
Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...
Evaluating the Impact of Word Embeddings on Similarity Scoring in Practical I...
 
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge GraphJoint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
Joint Word and Entity Embeddings for Entity Retrieval from Knowledge Graph
 
Meta learning tutorial
Meta learning tutorialMeta learning tutorial
Meta learning tutorial
 
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
Semantics2018 Zhang,Petrak,Maynard: Adapted TextRank for Term Extraction: A G...
 
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
 
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
Machine Reading Using Neural Machines (talk at Microsoft Research Faculty Sum...
 
Supporting Springer Nature Editors by means of Semantic Technologies
Supporting Springer Nature Editors by means of Semantic TechnologiesSupporting Springer Nature Editors by means of Semantic Technologies
Supporting Springer Nature Editors by means of Semantic Technologies
 

Similar to LDAvis

NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic Classification
Eugene Nho
 
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Jonathon Hare
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
kelbedweihy
 
Lecture1.pptx
Lecture1.pptxLecture1.pptx
Lecture1.pptx
jonathanG19
 
Towards Computational Research Objects
Towards Computational Research ObjectsTowards Computational Research Objects
Towards Computational Research Objects
David De Roure
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
rusbase
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text Analysis
Jonathan Stray
 
Word_Embeddings.pptx
Word_Embeddings.pptxWord_Embeddings.pptx
Word_Embeddings.pptx
GowrySailaja
 
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Anubhav Jain
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
cscpconf
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modelling
csandit
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKenna
openseesdays
 
A scalable gibbs sampler for probabilistic entity linking
A scalable gibbs sampler for probabilistic entity linkingA scalable gibbs sampler for probabilistic entity linking
A scalable gibbs sampler for probabilistic entity linking
Sunny Kr
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Matthew Lease
 
Semantic Web languages: Expressivity vs scalability
Semantic Web languages: Expressivity vs scalabilitySemantic Web languages: Expressivity vs scalability
Semantic Web languages: Expressivity vs scalability
nvitucci
 
Kdd 2014 tutorial bringing structure to text - chi
Kdd 2014 tutorial   bringing structure to text - chiKdd 2014 tutorial   bringing structure to text - chi
Kdd 2014 tutorial bringing structure to text - chi
Barbara Starr
 
A Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual EntailmentA Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual EntailmentFaculty of Computer Science
 
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
Patrice Bellot - Aix-Marseille Université / CNRS (LIS, INS2I)
 

Similar to LDAvis (20)

NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic Classification
 
A-Study_TopicModeling
A-Study_TopicModelingA-Study_TopicModeling
A-Study_TopicModeling
 
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
Lecture1.pptx
Lecture1.pptxLecture1.pptx
Lecture1.pptx
 
Towards Computational Research Objects
Towards Computational Research ObjectsTowards Computational Research Objects
Towards Computational Research Objects
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text Analysis
 
Word_Embeddings.pptx
Word_Embeddings.pptxWord_Embeddings.pptx
Word_Embeddings.pptx
 
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modelling
 
Introduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKennaIntroduction to OpenSees by Frank McKenna
Introduction to OpenSees by Frank McKenna
 
A scalable gibbs sampler for probabilistic entity linking
A scalable gibbs sampler for probabilistic entity linkingA scalable gibbs sampler for probabilistic entity linking
A scalable gibbs sampler for probabilistic entity linking
 
Wi presentation
Wi presentationWi presentation
Wi presentation
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
Semantic Web languages: Expressivity vs scalability
Semantic Web languages: Expressivity vs scalabilitySemantic Web languages: Expressivity vs scalability
Semantic Web languages: Expressivity vs scalability
 
Kdd 2014 tutorial bringing structure to text - chi
Kdd 2014 tutorial   bringing structure to text - chiKdd 2014 tutorial   bringing structure to text - chi
Kdd 2014 tutorial bringing structure to text - chi
 
A Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual EntailmentA Distributed Architecture System for Recognizing Textual Entailment
A Distributed Architecture System for Recognizing Textual Entailment
 
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
 

Recently uploaded

somanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptxsomanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
Howard Spence
 
Getting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control TowerGetting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control Tower
Vladimir Samoylov
 
Tom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issueTom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issue
amekonnen
 
María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024
eCommerce Institute
 
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Sebastiano Panichella
 
International Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software TestingInternational Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software Testing
Sebastiano Panichella
 
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AwangAniqkmals
 
Gregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptxGregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptx
gharris9
 
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
0x01 - Newton's Third Law:  Static vs. Dynamic Abusers0x01 - Newton's Third Law:  Static vs. Dynamic Abusers
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
OWASP Beja
 
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdfSupercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Access Innovations, Inc.
 
Bitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXOBitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXO
Matjaž Lipuš
 
Obesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditionsObesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditions
Faculty of Medicine And Health Sciences
 
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
OECD Directorate for Financial and Enterprise Affairs
 
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Dutch Power
 
Acorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutesAcorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutes
IP ServerOne
 
Media as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern EraMedia as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern Era
faizulhassanfaiz1670
 
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Dutch Power
 
Burning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdfBurning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdf
kkirkland2
 
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdfBonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
khadija278284
 
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Sebastiano Panichella
 

Recently uploaded (20)

somanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptxsomanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
 
Getting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control TowerGetting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control Tower
 
Tom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issueTom tresser burning issue.pptx My Burning issue
Tom tresser burning issue.pptx My Burning issue
 
María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024María Carolina Martínez - eCommerce Day Colombia 2024
María Carolina Martínez - eCommerce Day Colombia 2024
 
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
 
International Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software TestingInternational Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software Testing
 
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
AWANG ANIQKMALBIN AWANG TAJUDIN B22080004 ASSIGNMENT 2 MPU3193 PHILOSOPHY AND...
 
Gregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptxGregory Harris' Civics Presentation.pptx
Gregory Harris' Civics Presentation.pptx
 
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
0x01 - Newton's Third Law:  Static vs. Dynamic Abusers0x01 - Newton's Third Law:  Static vs. Dynamic Abusers
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
 
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdfSupercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
Supercharge your AI - SSP Industry Breakout Session 2024-v2_1.pdf
 
Bitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXOBitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXO
 
Obesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditionsObesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditions
 
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
 
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
Presentatie 4. Jochen Cremer - TU Delft 28 mei 2024
 
Acorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutesAcorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutes
 
Media as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern EraMedia as a Mind Controlling Strategy In Old and Modern Era
Media as a Mind Controlling Strategy In Old and Modern Era
 
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
Presentatie 8. Joost van der Linde & Daniel Anderton - Eliq 28 mei 2024
 
Burning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdfBurning Issue Presentation By Kenmaryon.pdf
Burning Issue Presentation By Kenmaryon.pdf
 
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdfBonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
 
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...
 

LDAvis

  • 1. LDAvis: A method for visualizing and interpreting topics Carson Sievert Iowa State University Kenneth E. Shirley AT&T Labs Research Illvi2014
  • 2. • LDAvis, a web-based interactive visualization of topics estimated using Latent Dirichlet Allocation that is built using a combination of R and D3. • We introduce LDAvis that attempts to answer a few basic questions about a fitted topic model: I . I N T R O D U C T I O N
  • 3. A. What is the meaning of each topic?, B. How prevalent is each topic?, C. How do the topics relate to each other? I . I N T R O D U C T I O N
  • 4. I I . R E L A T E D W O R K LDA Turbo- Topics liftPMI
  • 5. I I . R E L A T E D W O R K LDA Latent Dirichlet allocation (LDA) is an example of a topic model and was first presented as a graphical model for topic discovery by David Blei, Andrew Ng, and Michael I. Jordan in 2003.
  • 6. I I . R E L A T E D W O R K Turbo- Topics Blei and Lafferty (2009) developed “Turbo Topics”, a method of identifying n-grams within LDA inferred topics。 the resulting output is still simply a ranked list containing a mixture of terms and n-grams。
  • 7. I I . R E L A T E D W O R K Turbo- Topics
  • 8. I I . R E L A T E D W O R K Pointwise Mutual Information(PMI) : PMI Independence : P(x , y) = P(x) × P(y) PMI = log1 = 0
  • 9. I I . R E L A T E D W O R K lift Lift: Φ kw denote the probability of term w ∈ {1, ..., V } for topic k ∈ {1, ..., K}, where V denotes the number of terms in the vocabulary, and let pw denote the marginal probability of term w in the corpus.
  • 10. I I . T O P I C M O D E L V I S U A L I Z A T I O N • A number of visualization systems for topic models have been developed in recent years. • But, the visualization elements are limited to barcharts or word clouds of term probabilities for each topic, pie charts of topic probabilities for each document, and/or various barcharts or scatterplots related to document metadata.
  • 11. I I . T O P I C M O D E L V I S U A L I Z A T I O N • Chuang et al. (2012b) develop such a tool, called “Termite”, which visualizes the set of topic term distributions estimated in LDA using a matrix layout. • http://vis.stanford.edu/papers/termite
  • 12. I I . T O P I C M O D E L V I S U A L I Z A T I O N • 在代表主題內容的關鍵字選擇上,Termite定義了saliency 變數: saliency(w) = P(w)× distinctiveness(w)。 • w 代表一個關鍵字; P(w)代表w的頻率;distinctiveness(w)代表w在主題間 的差異性,包含w的主題越多,distinctiveness(w)值越低。 • 在關鍵詞排列上,Termite 把經常連續出現的詞,如social和networks 排在 一起以便增強語義讓使用者較好理解。
  • 13. I I I . R E L E V A N C E O F T E R M S T O T O P I C S • Here we define relevance, our method for ranking terms within topics, and we describe the results of a user study to learn an optimal tuning parameter in the computation of relevance. • We define the relevance of term w to topic k given a weight parameter λ (where 0 ≦ λ ≦ 1).
  • 14. I I I . R E L E V A N C E O F T E R M S T O T O P I C S • “λ” determines the weight given to the probability of term w under topic k relative to its lift. • Setting “λ” = 1 results in the familiar ranking of terms in decreasing order of their topic-specific probability, and setting “λ”= 0 ranks terms solely by their lift.
  • 15. I I I . R E L E V A N C E O F T E R M S T O T O P I C S • We fit a 50-topic model to the 20 Newsgroups data and each term in the vocabulary (which has size V = 22, 524) for a given topic. • Figure shows this plot for Topic 29, which documents posted to the “Motorcycles” Newsgroup, but also from documents posted to the “Automobiles” Newsgroup and the “Electronics” Newsgroup.
  • 16.
  • 17. I I I . U S E R S T U D Y  13, 695 documents 20 Newsgroups
  • 18. I I I . U S E R S T U D Y • Some of the LDA-inferred topics occurred almost exclusively (> 90% of occurrences) from a single Newsgroup, such as Topic 38. which was came from the documents posted to the “Medicine” (or “sci.med”) Newsgroup. • Other topics occurred in a wide variety of Newsgroups. One would expect these “spread-out” topics to be harder to interpret than the “pure” topics like Topic 38.
  • 19. I I I . U S E R S T U D Y • we recruited 29 subjects among our colleagues (research scientists at AT&T Labs with moderate familiarity with text mining techniques and topic models). • each subject completed an online experiment consisting of 50 tasks k (for k ∈ {1, ..., 50}). • Task k was to read a list of five terms, ranked from 1-5 in order of relevance to topic k, where “λ” ∈ (0, 1) was randomly sampled to compute relevance.
  • 20. I I I . U S E R S T U D Y • we expected the proportion of correct responses to be roughly 1/3 no matter the value of λ used to compute relevance. • In fact, seven of the topics were correctly identified by all 29 users, and one topic was incorrectly by all users. • we estimated a topic specific intercept term to control the difficulty of the topic (not just due to its tokens variety, but also to account for the inherent familiarity of each topic to our subject.)
  • 21. I I I . U S E R S T U D Y • The estimated effects of λ and λ² was statistically significant (χ² p-value = 0.018). • there was roughly a 67% baseline probability of correct identification. As Figure 3 shows, for these topics, the “optimal” value of λ was about 0.6. • λ:0 ≒ 53 % and λ:1 ≒ 63 % • We view this as evidence that where λ < 1 can really improve topic interpretability.
  • 22. I I I . U S E R S T U D Y
  • 23. I V . T H E L D A V I S S Y S T E M Installation: pip install pyldavis
  • 24. I V . T H E L D A V I S S Y S T E M http://nbviewer.jupyter.org/github/bmabey/pyLDAvis/blob/ master/notebooks/pyLDAvis_overview.ipynb
  • 25. I V . T H E L D A V I S S Y S T E M http://nbviewer.jupyter.org/github/effytseng/dh_ldavis/blob/ma ster/LDA-practice-1.html#topic=0&lambda=1&term= http://nbviewer.jupyter.org/github/effytseng/dh_ldavis/blob/ma ster/python-lldavis_test1-24-2.html http://nbviewer.jupyter.org/github/bmabey/pyLDAvis/blob/mast er/notebooks/Gensim%20Newsgroup.ipynb .用 gensim 處理文本: .自由中國 ldavis 圖表: .自由中國 l-ldavis 圖表:
  • 26. I V . T H E L D A V I S S Y S T E M • the areas of the circles are proportional to the relative prevalences of the topics in the corpus • The default for computing inter-topic distances is Jensen- Shannon divergence. • The default for scaling the set of inter-topic distances defaults to Principal Components.
  • 27. I V . T H E L D A V I S S Y S T E M
  • 28. I V . T H E L D A V I S S Y S T E M
  • 29. I V . T H E L D A V I S S Y S T E M
  • 30. V . D I S C U S S I O N A. What is the meaning of each topic?, B. How prevalent is each topic?, C. How do the topics relate to each other?
  • 31. V . D I S C U S S I O N • For future work, we anticipate performing a larger user study to further understand how to facilitate topic interpretation in fitted LDA models • In addition to relevance. The need to visualize correlations between topics can provide insight into what is happening on the document level without actually displaying entire documents. • Last, we seek a solution to the problem of visualizing a large number of topics (say, from 100 - 500 topics) in a compact way.