SlideShare a Scribd company logo
The Author-Topic Model
Presented by:
Darshan Ramakant Bhat
Freeze Francis
Mohammed Haroon D
by M Rosen-Zvi, et al. (UAI 2004)
Overview
● Motivation
● Model Formulation
○ Topic Model ( i.e LDA)
○ Author Model
○ Author Topic Model
● Parameter Estimation
○ Gibbs Sampling Algorithms
● Results and Evaluation
● Applications
Motivation
● joint author-topic modeling had received little attention.
● author modeling has tended to focus on problem of predicting an author given
the document.
● modelling the interests of authors is a fundamental problem raised by a large
corpus (eg: NIPS dataset).
● this paper introduced a generative model that simultaneously models the
content of documents and the interests of authors.
Notations
● Vocabulary set size : V
● Total unique authors : A
● Document d is identified as (wd
, ad
)
● Nd
: number of words in Document d.
● wd
: Vector of Nd
words (subset of vocabulary)
wd
= [w1d
w2d
…….. wNd d
]
● wid
: ith
word in document d
● Ad
: number of authors of Document d
● ad
: Vector of Ad
authors (subset of authors)
ad
= [a1d
a2d
…….. aAd d
]
● Corpus : Collection D documents
= { (w1
, a1
) , (w2
, a2
) ……. (wD
, aD
) }
Plate Notation for Graphical models
● Nodes are random variables
● Edges denote possible dependence
● Observed variables are shaded
● Plates denote replicated structure
Dirichlet distribution
● generalization of beta distribution into multiple dimensions
● Domain : X = (x1
, x2
….. xk
)
○ xi
∈ (0 , 1)
○ k
xi
= 1
○ k-dimensional multinomial distributions
● "distribution over multinomial distributions"
● Parameter : = ( 1
, 2
…… k
)
○ i
> 0 , ∀i
○ determines how the probability mass is distributed
○ Symmetric : 1
= 2
=…… k
● Each sample from a dirichlet is a multinomial distribution.
○ ↑ : denser samples
○ ↓ : sparse samples
Topic 2
(0,1,0)
Topic 1
(1,0,0)
Topic 3
(0,0,1)
.
(⅓,⅓,⅓ )
Fig : Domain of Dirichlet (k=3)
Fig : Symmetric Dirichlet distributions for k=3
pdf
samples
Model Formulation
Topic Model (LDA)
Model topics as distribution over words
Model documents as distribution over topics
Author Model
Model author as distribution over words
Author-Topic Model
Probabilistic model for both author and topics
Model topics as distribution over words
Model authors as distribution over topics
Topic Model : LDA
● Generative model
● Bag of word
● Mixed membership
● Each word in a document has a topic
assignment.
● Each document is associated with a
distribution over topics.
● Each topic is associated with a
distribution over words.
● α : Dirichlet prior parameter on the
per-document topic distributions.
● β : Dirichlet prior parameter on the
per-topic word distribution.
● w : observed word in document
● z : is the topic assigned for w
Fig : LDA model
Author Model
● Simple generative model
● Each author is associated with a
distribution over words.
● Each word has a author assignment.
● β : Dirichlet prior parameter on the
per-author word distributions.
● w : Observed word
● x : is the author chosen for word w
Fig : Author Model
Model Formulation
Topic Model (LDA)
Model topics as distribution over words
Model documents as distribution over topics
Author Model
Model author as distribution over words
Author-Topic Model
Probabilistic model for both author and topics
Model topics as distribution over words
Model authors as distribution over topics
Author-Topic
Model
● Each topic is associated with a
distribution over words.
● Each author is associated with a
distribution over topics.
● Each word in a document has an
author and a topic assignment.
● Topic distribution of document is a
mixture of its authors topic distribution
● α : Dirichlet prior parameter on the
per-author topic distributions.
● x : author chosen for word w
● z : is the topic assigned for w from
author x topic distribution
● w : observed word in document
Fig : Author Topic Model plate notation
Author-Topic
Model
Fig : Author Topic model
Special Cases of ATM
Topic Model (LDA):
Each document is written by a
unique author
● the document determines
the author.
● author assignment becomes
trivial.
● author’s topic distribution
can be viewed as
document’s topic
distribution.
Special Cases of ATM
Author Model:
each author writes about a unique topic
● the author determines the topic.
● topic assignment becomes trivial.
● topic’s word distribution can be
viewed as author’s word
distribution.
Variables to be inferred
Variables to be inferred
Variables to be inferred
Collapsed Gibbs Sampling - LDA
● The assignment variables contain information about and .
● Use this information to directly update the Topic assignment variable with the
next sample. This is Collapsed Gibbs Sampling.
Collapsed Gibbs Sampling - Author Model
● Need to infer Author - word distribution A
● WIth Collapsing GIbbs sampling, directly infer author word assignments
Author Topic Model - Collapsed Gibbs Sampling
● Need to infer T
and ϴA
● Instead through
collapsed Gibbs
Sampling directly infer
Topic and Author
assignments
● By finding the author -
topic joint distribution
conditioned on all other
variables
Example
AUTHOR 1 2 2 1 1
TOPIC 3 2 1 3 1
WORD epilepsy dynamic bayesian EEG model
TOPIC-1 TOPIC-2 TOPIC-3
epilepsy 1 0 35
dynamic 18 20 1
bayesian 42 15 0
EEG 0 5 20
model 10 8 1
...
TOPIC-1 TOPIC-2 TOPIC-3
AUTHOR-1 20 14 2
AUTHOR-2 20 21 5
AUTHOR-3 11 8 15
AUTHOR-4 5 15 14
...
Example
AUTHOR 1 2 ? 1 1
TOPIC 3 2 ? 3 1
WORD epilepsy dynamic bayesian EEG model
TOPIC-1 TOPIC-2 TOPIC-3
epilepsy 1 0 35
dynamic 18 20 1
bayesian 42(41) 15 0
EEG 0 5 20
model 10 8 1
...
TOPIC-1 TOPIC-2 TOPIC-3
AUTHOR-1 20 14 2
AUTHOR-2 20(19) 21 5
AUTHOR-3 11 8 15
AUTHOR-4 5 15 14
...
Author-Topic How much topic ‘j’ likes the word ‘bayesian’ How much author ‘k’ likes the topic
A1T1 41/97 20/36
A2T1 41/97 19/45
A1T2 15/105 14/36
A2T2 ... ...
A2T3 ... ...
A2T3 ... ...
● To draw new Author-Topic assignment (equivalently)
○ Roll K X ad
sided die with these probabilities
○ Assign the Author-Topic tuple to the word
AUTHOR 1 2 1 1 1
TOPIC 3 2 3 3 1
WORD epilepsy dynamic bayesian EEG model
Example (cont..)
● Update the probability ( T
and A
) tables with new values
● This implies increasing the popularity of topic in author and word in topic
Sample Output (CORA dataset)Topicprobabilities
Topics shown with 10 most probable words
Author :
Experimental Results
● Experiment is done on two types of dataset, papers are chosen from NIPS and
CiteSeer database
● Extremely common words are removed from the corpus, leading V=13,649
unique words in NIPS and V=30,799 in CiteSeer, 2037 authors in NIPS and
85,465 in CiteSeer
● NIPS contains many documents which are closely related to learning theory
and machine learning
● CiteSeer contains variety of topics from User Interface to Solar astrophysics
Examples of topic author distribution
Importance ofTopicprobabilityofarandomauthor
Evaluating the predictive power
● Predictive power of the model is measured quantitatively using perplexity.
● Perplexity is ability to predict the words on a new unseen documents
● Perplexity of set of d words in a document (wd
,ad
) is
● When p(wd
|ad
) is high perplexity ≈ 1, otherwise it will be a high positive
quantity.
Continued….
● Approximate equation to calculate the joint probability
Continued….
Applications
● It can be used in automatic reviewer recommendation for a paper review.
● Given an abstract of a paper and a list of the authors plus their known past
collaborators, generate a list of other highly likely authors for this abstract who
might serve as good reviewers.
References
● Rosen-Zvi, Michal, et al. "The author-topic model for authors and documents."
Proceedings of the 20th conference on Uncertainty in artificial intelligence.
AUAI Press, 2004.
● D. M. Blei, A. Y. Ng, and M. I. Jordan (2003). “Latent Dirichlet Allocation”.
Journal of Machine Learning Research, 3 : 993–1022.
● T. L. Griffiths, and M. Steyvers (2004). “Finding scientific topics”. Proceedings
of the National Academy of Sciences, 101 (suppl. 1), 5228–5235.
● https://github.com/arongdari/python-topic-model
● https://www.coursera.org/learn/ml-clustering-and-retrieval/home/welcome
Questions?

More Related Content

What's hot

балархай эгшиг
балархай эгшигбалархай эгшиг
балархай эгшигariunch
 
1 5.パラメトリックブートストラップ検定と確率分布
1 5.パラメトリックブートストラップ検定と確率分布1 5.パラメトリックブートストラップ検定と確率分布
1 5.パラメトリックブートストラップ検定と確率分布logics-of-blue
 
Hylbar shugaman programmuud хичээл 4
Hylbar shugaman programmuud хичээл 4Hylbar shugaman programmuud хичээл 4
Hylbar shugaman programmuud хичээл 4Urantuya Purevtseren
 
有意性と効果量について しっかり考えてみよう
有意性と効果量について しっかり考えてみよう有意性と効果量について しっかり考えてみよう
有意性と効果量について しっかり考えてみようKen Urano
 
мат бие даалт ньютон лейбницийн томъёо
мат бие даалт ньютон лейбницийн томъёомат бие даалт ньютон лейбницийн томъёо
мат бие даалт ньютон лейбницийн томъёоNBDNKWS Bujee Davaa
 
文法圧縮入門:超高速テキスト処理のためのデータ圧縮(NLP2014チュートリアル)
文法圧縮入門:超高速テキスト処理のためのデータ圧縮(NLP2014チュートリアル)文法圧縮入門:超高速テキスト処理のためのデータ圧縮(NLP2014チュートリアル)
文法圧縮入門:超高速テキスト処理のためのデータ圧縮(NLP2014チュートリアル)Shirou Maruyama
 
G社のNMT論文を読んでみた
G社のNMT論文を読んでみたG社のNMT論文を読んでみた
G社のNMT論文を読んでみたToshiaki Nakazawa
 
ぞくパタ最終回: 13章「共クラスタリング」
ぞくパタ最終回: 13章「共クラスタリング」ぞくパタ最終回: 13章「共クラスタリング」
ぞくパタ最終回: 13章「共クラスタリング」Akifumi Eguchi
 
lecture15.2022- 2023.pdf
lecture15.2022- 2023.pdflecture15.2022- 2023.pdf
lecture15.2022- 2023.pdfssuserca5598
 
外国語教育研究における尺度の構成と妥当性検証
外国語教育研究における尺度の構成と妥当性検証外国語教育研究における尺度の構成と妥当性検証
外国語教育研究における尺度の構成と妥当性検証Yusaku Kawaguchi
 
トピックモデル勉強会: 第2章 Latent Dirichlet Allocation
トピックモデル勉強会: 第2章 Latent Dirichlet Allocationトピックモデル勉強会: 第2章 Latent Dirichlet Allocation
トピックモデル勉強会: 第2章 Latent Dirichlet AllocationHaruka Ozaki
 
AtCoder Beginner Contest 012 解説
AtCoder Beginner Contest 012 解説AtCoder Beginner Contest 012 解説
AtCoder Beginner Contest 012 解説AtCoder Inc.
 
非線形データの次元圧縮 150905 WACODE 2nd
非線形データの次元圧縮 150905 WACODE 2nd非線形データの次元圧縮 150905 WACODE 2nd
非線形データの次元圧縮 150905 WACODE 2ndMika Yoshimura
 

What's hot (20)

балархай эгшиг
балархай эгшигбалархай эгшиг
балархай эгшиг
 
1 5.パラメトリックブートストラップ検定と確率分布
1 5.パラメトリックブートストラップ検定と確率分布1 5.パラメトリックブートストラップ検定と確率分布
1 5.パラメトリックブートストラップ検定と確率分布
 
ARC#003D
ARC#003DARC#003D
ARC#003D
 
Hylbar shugaman programmuud хичээл 4
Hylbar shugaman programmuud хичээл 4Hylbar shugaman programmuud хичээл 4
Hylbar shugaman programmuud хичээл 4
 
有意性と効果量について しっかり考えてみよう
有意性と効果量について しっかり考えてみよう有意性と効果量について しっかり考えてみよう
有意性と効果量について しっかり考えてみよう
 
мат анализ 1
мат анализ 1мат анализ 1
мат анализ 1
 
мат бие даалт ньютон лейбницийн томъёо
мат бие даалт ньютон лейбницийн томъёомат бие даалт ньютон лейбницийн томъёо
мат бие даалт ньютон лейбницийн томъёо
 
бодит тоо
бодит тоободит тоо
бодит тоо
 
Олигополь пүүсийн үнэ бүрдэлт
Олигополь пүүсийн үнэ бүрдэлтОлигополь пүүсийн үнэ бүрдэлт
Олигополь пүүсийн үнэ бүрдэлт
 
文法圧縮入門:超高速テキスト処理のためのデータ圧縮(NLP2014チュートリアル)
文法圧縮入門:超高速テキスト処理のためのデータ圧縮(NLP2014チュートリアル)文法圧縮入門:超高速テキスト処理のためのデータ圧縮(NLP2014チュートリアル)
文法圧縮入門:超高速テキスト処理のためのデータ圧縮(NLP2014チュートリアル)
 
Makro l 3
Makro l 3Makro l 3
Makro l 3
 
леdf
леdfлеdf
леdf
 
G社のNMT論文を読んでみた
G社のNMT論文を読んでみたG社のNMT論文を読んでみた
G社のNMT論文を読んでみた
 
ぞくパタ最終回: 13章「共クラスタリング」
ぞくパタ最終回: 13章「共クラスタリング」ぞくパタ最終回: 13章「共クラスタリング」
ぞくパタ最終回: 13章「共クラスタリング」
 
lecture15.2022- 2023.pdf
lecture15.2022- 2023.pdflecture15.2022- 2023.pdf
lecture15.2022- 2023.pdf
 
外国語教育研究における尺度の構成と妥当性検証
外国語教育研究における尺度の構成と妥当性検証外国語教育研究における尺度の構成と妥当性検証
外国語教育研究における尺度の構成と妥当性検証
 
トピックモデル勉強会: 第2章 Latent Dirichlet Allocation
トピックモデル勉強会: 第2章 Latent Dirichlet Allocationトピックモデル勉強会: 第2章 Latent Dirichlet Allocation
トピックモデル勉強会: 第2章 Latent Dirichlet Allocation
 
Cost analysis
Cost analysisCost analysis
Cost analysis
 
AtCoder Beginner Contest 012 解説
AtCoder Beginner Contest 012 解説AtCoder Beginner Contest 012 解説
AtCoder Beginner Contest 012 解説
 
非線形データの次元圧縮 150905 WACODE 2nd
非線形データの次元圧縮 150905 WACODE 2nd非線形データの次元圧縮 150905 WACODE 2nd
非線形データの次元圧縮 150905 WACODE 2nd
 

Similar to Author Topic Model

TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)Sihan Chen
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentationSoojung Hong
 
Diversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesDiversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesBryan Gummibearehausen
 
Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008Roman Stanchak
 
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...Aaron Li
 
Author paper identification problem final presentation
Author  paper identification problem final presentationAuthor  paper identification problem final presentation
Author paper identification problem final presentationPooja Mishra
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015rusbase
 
Graph Techniques for Natural Language Processing
Graph Techniques for Natural Language ProcessingGraph Techniques for Natural Language Processing
Graph Techniques for Natural Language ProcessingSujit Pal
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsClaudia Wagner
 
A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2Jisoo Jang
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGcscpconf
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modellingcsandit
 

Similar to Author Topic Model (20)

Topic modelling
Topic modellingTopic modelling
Topic modelling
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
 
Topic Models
Topic ModelsTopic Models
Topic Models
 
Diversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesDiversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News Stories
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008Survey of Generative Clustering Models 2008
Survey of Generative Clustering Models 2008
 
话题模型2
话题模型2话题模型2
话题模型2
 
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
 
Author paper identification problem final presentation
Author  paper identification problem final presentationAuthor  paper identification problem final presentation
Author paper identification problem final presentation
 
Lec1
Lec1Lec1
Lec1
 
Spatial LDA
Spatial LDASpatial LDA
Spatial LDA
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
 
Graph Techniques for Natural Language Processing
Graph Techniques for Natural Language ProcessingGraph Techniques for Natural Language Processing
Graph Techniques for Natural Language Processing
 
Probabilistic content models,
Probabilistic content models,Probabilistic content models,
Probabilistic content models,
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
 
A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modelling
 

Recently uploaded

AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101vincent683379
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyUXDXConf
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024Stephanie Beckett
 
Transforming The New York Times: Empowering Evolution through UX
Transforming The New York Times: Empowering Evolution through UXTransforming The New York Times: Empowering Evolution through UX
Transforming The New York Times: Empowering Evolution through UXUXDXConf
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityScyllaDB
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsStefano
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationZilliz
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 

Recently uploaded (20)

AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Transforming The New York Times: Empowering Evolution through UX
Transforming The New York Times: Empowering Evolution through UXTransforming The New York Times: Empowering Evolution through UX
Transforming The New York Times: Empowering Evolution through UX
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 

Author Topic Model

  • 1. The Author-Topic Model Presented by: Darshan Ramakant Bhat Freeze Francis Mohammed Haroon D by M Rosen-Zvi, et al. (UAI 2004)
  • 2. Overview ● Motivation ● Model Formulation ○ Topic Model ( i.e LDA) ○ Author Model ○ Author Topic Model ● Parameter Estimation ○ Gibbs Sampling Algorithms ● Results and Evaluation ● Applications
  • 3. Motivation ● joint author-topic modeling had received little attention. ● author modeling has tended to focus on problem of predicting an author given the document. ● modelling the interests of authors is a fundamental problem raised by a large corpus (eg: NIPS dataset). ● this paper introduced a generative model that simultaneously models the content of documents and the interests of authors.
  • 4. Notations ● Vocabulary set size : V ● Total unique authors : A ● Document d is identified as (wd , ad ) ● Nd : number of words in Document d. ● wd : Vector of Nd words (subset of vocabulary) wd = [w1d w2d …….. wNd d ] ● wid : ith word in document d ● Ad : number of authors of Document d ● ad : Vector of Ad authors (subset of authors) ad = [a1d a2d …….. aAd d ] ● Corpus : Collection D documents = { (w1 , a1 ) , (w2 , a2 ) ……. (wD , aD ) }
  • 5. Plate Notation for Graphical models ● Nodes are random variables ● Edges denote possible dependence ● Observed variables are shaded ● Plates denote replicated structure
  • 6. Dirichlet distribution ● generalization of beta distribution into multiple dimensions ● Domain : X = (x1 , x2 ….. xk ) ○ xi ∈ (0 , 1) ○ k xi = 1 ○ k-dimensional multinomial distributions ● "distribution over multinomial distributions" ● Parameter : = ( 1 , 2 …… k ) ○ i > 0 , ∀i ○ determines how the probability mass is distributed ○ Symmetric : 1 = 2 =…… k ● Each sample from a dirichlet is a multinomial distribution. ○ ↑ : denser samples ○ ↓ : sparse samples Topic 2 (0,1,0) Topic 1 (1,0,0) Topic 3 (0,0,1) . (⅓,⅓,⅓ ) Fig : Domain of Dirichlet (k=3)
  • 7. Fig : Symmetric Dirichlet distributions for k=3 pdf samples
  • 8. Model Formulation Topic Model (LDA) Model topics as distribution over words Model documents as distribution over topics Author Model Model author as distribution over words Author-Topic Model Probabilistic model for both author and topics Model topics as distribution over words Model authors as distribution over topics
  • 9. Topic Model : LDA ● Generative model ● Bag of word ● Mixed membership ● Each word in a document has a topic assignment. ● Each document is associated with a distribution over topics. ● Each topic is associated with a distribution over words. ● α : Dirichlet prior parameter on the per-document topic distributions. ● β : Dirichlet prior parameter on the per-topic word distribution. ● w : observed word in document ● z : is the topic assigned for w Fig : LDA model
  • 10. Author Model ● Simple generative model ● Each author is associated with a distribution over words. ● Each word has a author assignment. ● β : Dirichlet prior parameter on the per-author word distributions. ● w : Observed word ● x : is the author chosen for word w Fig : Author Model
  • 11. Model Formulation Topic Model (LDA) Model topics as distribution over words Model documents as distribution over topics Author Model Model author as distribution over words Author-Topic Model Probabilistic model for both author and topics Model topics as distribution over words Model authors as distribution over topics
  • 12. Author-Topic Model ● Each topic is associated with a distribution over words. ● Each author is associated with a distribution over topics. ● Each word in a document has an author and a topic assignment. ● Topic distribution of document is a mixture of its authors topic distribution ● α : Dirichlet prior parameter on the per-author topic distributions. ● x : author chosen for word w ● z : is the topic assigned for w from author x topic distribution ● w : observed word in document Fig : Author Topic Model plate notation
  • 14. Special Cases of ATM Topic Model (LDA): Each document is written by a unique author ● the document determines the author. ● author assignment becomes trivial. ● author’s topic distribution can be viewed as document’s topic distribution.
  • 15. Special Cases of ATM Author Model: each author writes about a unique topic ● the author determines the topic. ● topic assignment becomes trivial. ● topic’s word distribution can be viewed as author’s word distribution.
  • 16. Variables to be inferred
  • 17. Variables to be inferred
  • 18. Variables to be inferred
  • 19. Collapsed Gibbs Sampling - LDA ● The assignment variables contain information about and . ● Use this information to directly update the Topic assignment variable with the next sample. This is Collapsed Gibbs Sampling.
  • 20. Collapsed Gibbs Sampling - Author Model ● Need to infer Author - word distribution A ● WIth Collapsing GIbbs sampling, directly infer author word assignments
  • 21. Author Topic Model - Collapsed Gibbs Sampling ● Need to infer T and ϴA ● Instead through collapsed Gibbs Sampling directly infer Topic and Author assignments ● By finding the author - topic joint distribution conditioned on all other variables
  • 22. Example AUTHOR 1 2 2 1 1 TOPIC 3 2 1 3 1 WORD epilepsy dynamic bayesian EEG model TOPIC-1 TOPIC-2 TOPIC-3 epilepsy 1 0 35 dynamic 18 20 1 bayesian 42 15 0 EEG 0 5 20 model 10 8 1 ... TOPIC-1 TOPIC-2 TOPIC-3 AUTHOR-1 20 14 2 AUTHOR-2 20 21 5 AUTHOR-3 11 8 15 AUTHOR-4 5 15 14 ...
  • 23. Example AUTHOR 1 2 ? 1 1 TOPIC 3 2 ? 3 1 WORD epilepsy dynamic bayesian EEG model TOPIC-1 TOPIC-2 TOPIC-3 epilepsy 1 0 35 dynamic 18 20 1 bayesian 42(41) 15 0 EEG 0 5 20 model 10 8 1 ... TOPIC-1 TOPIC-2 TOPIC-3 AUTHOR-1 20 14 2 AUTHOR-2 20(19) 21 5 AUTHOR-3 11 8 15 AUTHOR-4 5 15 14 ...
  • 24. Author-Topic How much topic ‘j’ likes the word ‘bayesian’ How much author ‘k’ likes the topic A1T1 41/97 20/36 A2T1 41/97 19/45 A1T2 15/105 14/36 A2T2 ... ... A2T3 ... ... A2T3 ... ...
  • 25. ● To draw new Author-Topic assignment (equivalently) ○ Roll K X ad sided die with these probabilities ○ Assign the Author-Topic tuple to the word AUTHOR 1 2 1 1 1 TOPIC 3 2 3 3 1 WORD epilepsy dynamic bayesian EEG model
  • 26. Example (cont..) ● Update the probability ( T and A ) tables with new values ● This implies increasing the popularity of topic in author and word in topic
  • 27. Sample Output (CORA dataset)Topicprobabilities Topics shown with 10 most probable words Author :
  • 28. Experimental Results ● Experiment is done on two types of dataset, papers are chosen from NIPS and CiteSeer database ● Extremely common words are removed from the corpus, leading V=13,649 unique words in NIPS and V=30,799 in CiteSeer, 2037 authors in NIPS and 85,465 in CiteSeer ● NIPS contains many documents which are closely related to learning theory and machine learning ● CiteSeer contains variety of topics from User Interface to Solar astrophysics
  • 29. Examples of topic author distribution
  • 31. Evaluating the predictive power ● Predictive power of the model is measured quantitatively using perplexity. ● Perplexity is ability to predict the words on a new unseen documents ● Perplexity of set of d words in a document (wd ,ad ) is ● When p(wd |ad ) is high perplexity ≈ 1, otherwise it will be a high positive quantity.
  • 32. Continued…. ● Approximate equation to calculate the joint probability
  • 34. Applications ● It can be used in automatic reviewer recommendation for a paper review. ● Given an abstract of a paper and a list of the authors plus their known past collaborators, generate a list of other highly likely authors for this abstract who might serve as good reviewers.
  • 35. References ● Rosen-Zvi, Michal, et al. "The author-topic model for authors and documents." Proceedings of the 20th conference on Uncertainty in artificial intelligence. AUAI Press, 2004. ● D. M. Blei, A. Y. Ng, and M. I. Jordan (2003). “Latent Dirichlet Allocation”. Journal of Machine Learning Research, 3 : 993–1022. ● T. L. Griffiths, and M. Steyvers (2004). “Finding scientific topics”. Proceedings of the National Academy of Sciences, 101 (suppl. 1), 5228–5235. ● https://github.com/arongdari/python-topic-model ● https://www.coursera.org/learn/ml-clustering-and-retrieval/home/welcome