SlideShare a Scribd company logo
Topic Models  Claudia Wagner Graz, 16.9.2010
Semantic Representation of Text ,[object Object],[object Object],[object Object],(Griffiths, 2007)
Topic Models ,[object Object],[object Object],[object Object]
  Topic Models source: http://www.cs.umass.edu/~wallach/talks/priors.pdf
Topic Models Topic 1 Topic 2 3 latent variables: Word distribution per topic (word-topic-matrix) Topic distribution per doc (topic-doc-matrix) Topic word assignment (Steyvers, 2006)
Summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Topic Models ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
pLSA  (Hoffmann, 1999) ,[object Object],[object Object],number of documents number of words P( z |  θ ) P( w | z ) Topic distribution of a document
Latent Dirichlet Allocation (LDA)  (Blei, 2003) ,[object Object],[object Object],P( w | z,  φ  (z)   ) P( φ (z)  |  β ) number of documents number of words
Dirichlet Prior  α ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],High  α Low  α Each doc’s topic distribution  θ is a smooth mix of all topics  Each doc’s topic distribution  θ must favor few topics Topic-distr. of Doc1 =  (1/3, 1/3, 1/3) Topic-distr. of Doc2 =  (1, 0, 0) Doc1 Doc2
Dirichlet Prior  β ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],High  β Low  β Topic-distr. of Doc1 =  (1/3, 1/3, 1/3) Word-distr. of Topic2 =  (1, 0, 0) Topic1 Topic2
Matrix Representation  of LDA observed latent latent θ (d) φ (z)
Statistical Inference and  Parameter Estimation ,[object Object],[object Object],[object Object],(Blei, 2003) Latent Vars Observed Vars and Priors
Statistical Inference and  Parameter Estimation ,[object Object],[object Object],[object Object],[object Object]
Markov Chain Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],source: http://en.wikipedia.org/wiki/Examples_of_Markov_chains
Markov Chain Example ,[object Object],[object Object],[object Object],[object Object]
Gibbs Sampling  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Gibbs Sampling for LDA  ,[object Object],[object Object],[object Object],[object Object],[object Object]
Run Gibbs Sampling  Example (1) 1 1 2 2 2 2 1 1 1 2 1 2 1 2 1 1 2 1 2 2 1 2 1 2 ,[object Object],[object Object],[object Object],[object Object],1 2 Stream 2 2 River 1 2 Loan 6 3 bank 2 3 money topic2 topic1 topic2 topic1 4 4 doc1 4 4 doc2 4 4 doc3
Gibbs Sampling for LDA  Probability that topic  j  is chosen for word  w i ,  conditioned on all other assigned topics of words in this doc and all other observed vars. Count number of times a word token  w i  was assigned to a topic  j  across all docs Count number of times a topic  j  was already assigned to some word token in doc  d i unnormalized! => divide the probability of assigning topic j to word wi by the sum over all topics T
Run Gibbs Sampling ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Run Gibbs Sampling  Example (2) 1 2 2 2 2 1 1 1 2 1 2 1 2 1 1 2 1 2 2 1 2 1 2 ,[object Object],[object Object],[object Object],3 2 2 5 3 1 2 Stream 2 2 River 1 2 Loan 6 3 bank 2 3 money topic2 topic1 topic2 topic1 4 4 doc1 4 4 doc2 4 4 doc3
Run Gibbs Sampling  Example (2) 1 2 2 2 2 1 1 1 2 1 2 1 2 1 1 2 1 2 2 1 2 1 2 ,[object Object],[object Object],[object Object],2 4 2 5 5 6 1 2 Stream 2 2 River 1 2 Loan 6 3 bank 3 2 money topic2 topic1 topic2 topic1 5 3 doc1 4 4 doc2 4 4 doc3
Run Gibbs Sampling  Example (3) ,[object Object],“ Bank” is assigned to Topic 2 How often were all other topics used in doc  d i   How often was topic j used in doc  d i
Summary:  Run Gibbs Sampling  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Gibbs Sampling  Convergence  Black = topic 1 White = topic2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Gibbs Sampling  Convergence ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Gibbs Sampling  Parameter Estimation ,[object Object],num of times word wi  was related with topic j num of times all other words  were related with topic j num of times topic j  was related with doc d num of times all other topics were related with doc d predictive distributions of sampling  a new token of word  i  from topic  j , predictive distributions of sampling  a new token in document d from topic j
Author-Topic (AT)Model (Rosen-Zvi, 2004) ,[object Object],[object Object],[object Object],[object Object],[object Object]
AT-Model Algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],P( w | z,  φ  (z)   ) P( z | x,  θ (x)   )
AT Model Latent Variables Latent Variables: 2) Author-distribution of each topic    determines which topics are used by which authors    count matrix C AT 1) Author-Topic assignment for each word  3) Word-distribution of each topic    count matrix C WT ?
Matrix Representation of   Author-Topic-Model source: http://www.ics.uci.edu/~smyth/kddpapers/UCI_KD-D_author_topic_preprint.pdf θ (x) φ (z) a d observed observed latent latent
Example (1) 1 1 2 2 2 2 1 1 1 2 1 2 1 2 1 1 2 1 2 1 2 1 2 ,[object Object],[object Object],[object Object],[object Object],1 2 1 2 1 1 2 2 2 2 2 2 2 2 2 2 3 2 2 2 3 3 3 2 2 2 1 2 stream 2 2 river 1 2 loan 6 3 bank 2 3 money topic2 topic1 8 8 author2 topic2 topic1 0 4 author1 4 0 author3
Gibbs Sampling for  Author-Topic-Model ,[object Object],[object Object],[object Object],Count number of times an author k was already assigned to topic j. Count number of times a word token  w i  was assigned to a topic  j  across all docs
Problems of the  AT Model ,[object Object],[object Object],[object Object]
AT Model with Fictitious Authors ,[object Object],[object Object],[object Object],[object Object],[object Object]
Predictive Power of different models (Rosen-Zvi, 2005) Experiment: Trainingsdata: 1 557 papers  Testdata:183 papers (102 are single-authored papers). They choose test data documents in such a way that each author of a test set document also appears in the training set as an author.
Author-Recipients-Topic  (ART) Model (McCallum, 2004) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],,  R ,   x P( z | x, a d ,  θ (A,R)   ) P( w | z, φ (z)  )
Gibbs Sampling ART-Model Random Start:  Sample author-recipient pair for each word Sample topic for each word Compute for each word w i : Number of recipients of message to which word  w i  belongs Number of times topic t was assigned to  an author-recipient-pair Number of times current word token was  assigned to topic t Number of times all other topics were  assigned to an author-recipient-pair Number of times all other words were  assigned to topic t Number of words * beta
Labeled LDA (Ramage, 2009) ,[object Object],[object Object],[object Object],[object Object],[object Object]
Group-Topic Model (Wang, 2005) ,[object Object],[object Object],[object Object],[object Object]
Group-Topic Model (Wang, 2005) ,[object Object],[object Object],[object Object],[object Object],Number of events (=interactions between entities) Number of entities
CART Model (Pathak, 2008) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Gibbs-sampling: alternates between updating latent communities c conditioned on other variables, and updating recipient-topic tuples (r, z) for each word conditioned on other variables.
Copycat Model (Dietz, 2007) ,[object Object],[object Object],[object Object],[object Object],[object Object]
Copycat Model (Dietz, 2007) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],d1 c d2 cites
Copycat Model (Dietz, 2007) ,[object Object],[object Object],[object Object],[object Object]
Copycat Model (Dietz, 2007) ,[object Object],[object Object],[object Object]
Citation InfluenceModel (Dietz, 2007) ,[object Object],[object Object],[object Object],[object Object],i nnovation topic mixture of a citing publication distribution of citation influences parameter of the coin flip, choosing to draw topics from θ or ψ
References ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
YONG ZHENG
 
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
👋 Christopher Moody
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
Sangwoo Mo
 
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
error007
 
Convolutional Neural Networks and Natural Language Processing
Convolutional Neural Networks and Natural Language ProcessingConvolutional Neural Networks and Natural Language Processing
Convolutional Neural Networks and Natural Language Processing
Thomas Delteil
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/Categorization
Oswal Abhishek
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
ankit_ppt
 
MapReduce
MapReduceMapReduce
Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)
HJ van Veen
 
Wordnet
WordnetWordnet
Wordnet
Govind Raj
 
Supervised Machine Learning
Supervised Machine LearningSupervised Machine Learning
Supervised Machine Learning
Ankit Rai
 
Data mining techniques unit III
Data mining techniques unit IIIData mining techniques unit III
Data mining techniques unit III
malathieswaran29
 
Computational Learning Theory
Computational Learning TheoryComputational Learning Theory
Computational Learning Theorybutest
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
Benazir Income Support Program (BISP)
 
Text classification presentation
Text classification presentationText classification presentation
Text classification presentation
Marijn van Zelst
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classification
sathish sak
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
hina firdaus
 

What's hot (20)

Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
SPADE -
SPADE - SPADE -
SPADE -
 
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Convolutional Neural Networks and Natural Language Processing
Convolutional Neural Networks and Natural Language ProcessingConvolutional Neural Networks and Natural Language Processing
Convolutional Neural Networks and Natural Language Processing
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/Categorization
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
 
MapReduce
MapReduceMapReduce
MapReduce
 
Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)
 
Wordnet
WordnetWordnet
Wordnet
 
Supervised Machine Learning
Supervised Machine LearningSupervised Machine Learning
Supervised Machine Learning
 
Data mining techniques unit III
Data mining techniques unit IIIData mining techniques unit III
Data mining techniques unit III
 
Computational Learning Theory
Computational Learning TheoryComputational Learning Theory
Computational Learning Theory
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Text classification presentation
Text classification presentationText classification presentation
Text classification presentation
 
Language models
Language modelsLanguage models
Language models
 
Wordnet Introduction
Wordnet IntroductionWordnet Introduction
Wordnet Introduction
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classification
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
 

Viewers also liked

WSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender Systems
WSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender SystemsWSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender Systems
WSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender Systems
Kotaro Tanahashi
 
Manifold learning with application to object recognition
Manifold learning with application to object recognitionManifold learning with application to object recognition
Manifold learning with application to object recognitionzukun
 
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
wl820609
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Ryan B Harvey, CSDP, CSM
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)
James McMurray
 
関東CV勉強会 Kernel PCA (2011.2.19)
関東CV勉強会 Kernel PCA (2011.2.19)関東CV勉強会 Kernel PCA (2011.2.19)
関東CV勉強会 Kernel PCA (2011.2.19)Akisato Kimura
 
Self-organizing map
Self-organizing mapSelf-organizing map
Self-organizing map
Tarat Diloksawatdikul
 
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
Shuyo Nakatani
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNE
Tomoki Hayashi
 
AutoEncoderで特徴抽出
AutoEncoderで特徴抽出AutoEncoderで特徴抽出
AutoEncoderで特徴抽出
Kai Sasaki
 
非線形データの次元圧縮 150905 WACODE 2nd
非線形データの次元圧縮 150905 WACODE 2nd非線形データの次元圧縮 150905 WACODE 2nd
非線形データの次元圧縮 150905 WACODE 2nd
Mika Yoshimura
 
CVIM#11 3. 最小化のための数値計算
CVIM#11 3. 最小化のための数値計算CVIM#11 3. 最小化のための数値計算
CVIM#11 3. 最小化のための数値計算
sleepy_yoshi
 
Numpy scipyで独立成分分析
Numpy scipyで独立成分分析Numpy scipyで独立成分分析
Numpy scipyで独立成分分析Shintaro Fukushima
 
基底変換、固有値・固有ベクトル、そしてその先
基底変換、固有値・固有ベクトル、そしてその先基底変換、固有値・固有ベクトル、そしてその先
基底変換、固有値・固有ベクトル、そしてその先
Taketo Sano
 
Hyperoptとその周辺について
Hyperoptとその周辺についてHyperoptとその周辺について
Hyperoptとその周辺について
Keisuke Hosaka
 

Viewers also liked (16)

WSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender Systems
WSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender SystemsWSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender Systems
WSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender Systems
 
Manifold learning with application to object recognition
Manifold learning with application to object recognitionManifold learning with application to object recognition
Manifold learning with application to object recognition
 
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)
 
関東CV勉強会 Kernel PCA (2011.2.19)
関東CV勉強会 Kernel PCA (2011.2.19)関東CV勉強会 Kernel PCA (2011.2.19)
関東CV勉強会 Kernel PCA (2011.2.19)
 
Self-organizing map
Self-organizing mapSelf-organizing map
Self-organizing map
 
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametri...
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNE
 
AutoEncoderで特徴抽出
AutoEncoderで特徴抽出AutoEncoderで特徴抽出
AutoEncoderで特徴抽出
 
LDA入門
LDA入門LDA入門
LDA入門
 
非線形データの次元圧縮 150905 WACODE 2nd
非線形データの次元圧縮 150905 WACODE 2nd非線形データの次元圧縮 150905 WACODE 2nd
非線形データの次元圧縮 150905 WACODE 2nd
 
CVIM#11 3. 最小化のための数値計算
CVIM#11 3. 最小化のための数値計算CVIM#11 3. 最小化のための数値計算
CVIM#11 3. 最小化のための数値計算
 
Numpy scipyで独立成分分析
Numpy scipyで独立成分分析Numpy scipyで独立成分分析
Numpy scipyで独立成分分析
 
基底変換、固有値・固有ベクトル、そしてその先
基底変換、固有値・固有ベクトル、そしてその先基底変換、固有値・固有ベクトル、そしてその先
基底変換、固有値・固有ベクトル、そしてその先
 
Hyperoptとその周辺について
Hyperoptとその周辺についてHyperoptとその周辺について
Hyperoptとその周辺について
 

Similar to Topic Models

Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
Claudia Wagner
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
rusbase
 
Canini09a
Canini09aCanini09a
Canini09a
Ajay Ohri
 
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
Aaron Li
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingContext-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Tomonari Masada
 
graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)Sihan Chen
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
Soojung Hong
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
Research Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, GruberResearch Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, Gruber
Alex Klibisz
 
Topic modelling
Topic modellingTopic modelling
Topic modelling
Shubhmay Potdar
 
Mini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic ModelingMini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic Modeling
Tomonari Masada
 
Author Topic Model
Author Topic ModelAuthor Topic Model
Author Topic Model
FReeze FRancis
 
话题模型2
话题模型2话题模型2
话题模型2
Bryan Gummibearehausen
 
NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic Classification
Eugene Nho
 
Diversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesDiversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News Stories
Bryan Gummibearehausen
 
Calculating Projections via Type Checking
Calculating Projections via Type CheckingCalculating Projections via Type Checking
Calculating Projections via Type Checking
Daisuke BEKKI
 
LDA on social bookmarking systems
LDA on social bookmarking systemsLDA on social bookmarking systems
LDA on social bookmarking systems
Denis Parra Santander
 

Similar to Topic Models (20)

Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
 
Canini09a
Canini09aCanini09a
Canini09a
 
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingContext-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
 
graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)
 
Lec1
Lec1Lec1
Lec1
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
Research Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, GruberResearch Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, Gruber
 
poster
posterposter
poster
 
Topic modelling
Topic modellingTopic modelling
Topic modelling
 
Mini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic ModelingMini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic Modeling
 
Author Topic Model
Author Topic ModelAuthor Topic Model
Author Topic Model
 
话题模型2
话题模型2话题模型2
话题模型2
 
NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic Classification
 
A-Study_TopicModeling
A-Study_TopicModelingA-Study_TopicModeling
A-Study_TopicModeling
 
Diversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesDiversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News Stories
 
Calculating Projections via Type Checking
Calculating Projections via Type CheckingCalculating Projections via Type Checking
Calculating Projections via Type Checking
 
LDA on social bookmarking systems
LDA on social bookmarking systemsLDA on social bookmarking systems
LDA on social bookmarking systems
 

More from Claudia Wagner

Measuring Gender Inequality in Wikipedia
Measuring Gender Inequality in WikipediaMeasuring Gender Inequality in Wikipedia
Measuring Gender Inequality in Wikipedia
Claudia Wagner
 
Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"
Claudia Wagner
 
It's a Man's Wikipedia?
It's a Man's Wikipedia? It's a Man's Wikipedia?
It's a Man's Wikipedia?
Claudia Wagner
 
Food and Culture
Food and CultureFood and Culture
Food and Culture
Claudia Wagner
 
Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014
Claudia Wagner
 
When politicians talk: Assessing online conversational practices of political...
When politicians talk: Assessing online conversational practices of political...When politicians talk: Assessing online conversational practices of political...
When politicians talk: Assessing online conversational practices of political...
Claudia Wagner
 
WWW2014 Semantic Stability in Social Tagging Streams
WWW2014 Semantic Stability in Social Tagging StreamsWWW2014 Semantic Stability in Social Tagging Streams
WWW2014 Semantic Stability in Social Tagging StreamsClaudia Wagner
 
Welcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESISWelcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESISClaudia Wagner
 
Spatio and Temporal Dietary Patterns
Spatio and Temporal Dietary PatternsSpatio and Temporal Dietary Patterns
Spatio and Temporal Dietary Patterns
Claudia Wagner
 
Eswc2013 audience short
Eswc2013 audience shortEswc2013 audience short
Eswc2013 audience shortClaudia Wagner
 
The Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social NetworksThe Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social Networks
Claudia Wagner
 
It’s not in their tweets: Modeling topical expertise of Twitter users
It’s not in their tweets: Modeling topical expertise of Twitter users It’s not in their tweets: Modeling topical expertise of Twitter users
It’s not in their tweets: Modeling topical expertise of Twitter users
Claudia Wagner
 
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...
Claudia Wagner
 
SDOW (ISWC2011)
SDOW (ISWC2011)SDOW (ISWC2011)
SDOW (ISWC2011)
Claudia Wagner
 
Knowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness StreamsKnowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness Streams
Claudia Wagner
 
The wisdom in Tweetonomies
The wisdom in TweetonomiesThe wisdom in Tweetonomies
The wisdom in Tweetonomies
Claudia Wagner
 

More from Claudia Wagner (17)

Measuring Gender Inequality in Wikipedia
Measuring Gender Inequality in WikipediaMeasuring Gender Inequality in Wikipedia
Measuring Gender Inequality in Wikipedia
 
Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"Slam about "Discrimination and Inequalities in socio-computational systems"
Slam about "Discrimination and Inequalities in socio-computational systems"
 
It's a Man's Wikipedia?
It's a Man's Wikipedia? It's a Man's Wikipedia?
It's a Man's Wikipedia?
 
Food and Culture
Food and CultureFood and Culture
Food and Culture
 
Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014
 
When politicians talk: Assessing online conversational practices of political...
When politicians talk: Assessing online conversational practices of political...When politicians talk: Assessing online conversational practices of political...
When politicians talk: Assessing online conversational practices of political...
 
WWW2014 Semantic Stability in Social Tagging Streams
WWW2014 Semantic Stability in Social Tagging StreamsWWW2014 Semantic Stability in Social Tagging Streams
WWW2014 Semantic Stability in Social Tagging Streams
 
Welcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESISWelcome 1st Computational Social Science Workshop 2013 at GESIS
Welcome 1st Computational Social Science Workshop 2013 at GESIS
 
Spatio and Temporal Dietary Patterns
Spatio and Temporal Dietary PatternsSpatio and Temporal Dietary Patterns
Spatio and Temporal Dietary Patterns
 
Eswc2013 audience short
Eswc2013 audience shortEswc2013 audience short
Eswc2013 audience short
 
The Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social NetworksThe Impact of Socialbots in Online Social Networks
The Impact of Socialbots in Online Social Networks
 
It’s not in their tweets: Modeling topical expertise of Twitter users
It’s not in their tweets: Modeling topical expertise of Twitter users It’s not in their tweets: Modeling topical expertise of Twitter users
It’s not in their tweets: Modeling topical expertise of Twitter users
 
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...
Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online ...
 
Socialbots www2012
Socialbots www2012Socialbots www2012
Socialbots www2012
 
SDOW (ISWC2011)
SDOW (ISWC2011)SDOW (ISWC2011)
SDOW (ISWC2011)
 
Knowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness StreamsKnowledge Acquisition from Social Awareness Streams
Knowledge Acquisition from Social Awareness Streams
 
The wisdom in Tweetonomies
The wisdom in TweetonomiesThe wisdom in Tweetonomies
The wisdom in Tweetonomies
 

Topic Models

  • 1. Topic Models Claudia Wagner Graz, 16.9.2010
  • 2.
  • 3.
  • 4. Topic Models source: http://www.cs.umass.edu/~wallach/talks/priors.pdf
  • 5. Topic Models Topic 1 Topic 2 3 latent variables: Word distribution per topic (word-topic-matrix) Topic distribution per doc (topic-doc-matrix) Topic word assignment (Steyvers, 2006)
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12. Matrix Representation of LDA observed latent latent θ (d) φ (z)
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20. Gibbs Sampling for LDA Probability that topic j is chosen for word w i , conditioned on all other assigned topics of words in this doc and all other observed vars. Count number of times a word token w i was assigned to a topic j across all docs Count number of times a topic j was already assigned to some word token in doc d i unnormalized! => divide the probability of assigning topic j to word wi by the sum over all topics T
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31. AT Model Latent Variables Latent Variables: 2) Author-distribution of each topic  determines which topics are used by which authors  count matrix C AT 1) Author-Topic assignment for each word 3) Word-distribution of each topic  count matrix C WT ?
  • 32. Matrix Representation of Author-Topic-Model source: http://www.ics.uci.edu/~smyth/kddpapers/UCI_KD-D_author_topic_preprint.pdf θ (x) φ (z) a d observed observed latent latent
  • 33.
  • 34.
  • 35.
  • 36.
  • 37. Predictive Power of different models (Rosen-Zvi, 2005) Experiment: Trainingsdata: 1 557 papers Testdata:183 papers (102 are single-authored papers). They choose test data documents in such a way that each author of a test set document also appears in the training set as an author.
  • 38.
  • 39. Gibbs Sampling ART-Model Random Start: Sample author-recipient pair for each word Sample topic for each word Compute for each word w i : Number of recipients of message to which word w i belongs Number of times topic t was assigned to an author-recipient-pair Number of times current word token was assigned to topic t Number of times all other topics were assigned to an author-recipient-pair Number of times all other words were assigned to topic t Number of words * beta
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.