SlideShare a Scribd company logo
1 of 6
Download to read offline
Practical Collapsed Stochastic Variational
Inference

Arnim Bleier, arnim.bleier@gesis.org
CSS Workshop, GESIS, Köln, 16.12.2013
Classic Collapsed LDA Inference
repeat
for each document d do
for each word i in d do

ndk = ndk zdik
nkwdi = nkwdi zdik

zdik / (ndk + ↵⇡k )
ndk = ndk + zdik
nkwdi = nkwdi + zdik
end for
end for
until stopping criterion is met.

nkwdi +
nk. + V
Stochastic Practical Collapsed Variational Inference
for the HDP
repeat
for each document d do
for each word i in d do

ndk = ndk zdik
nkwdi = nkwdi zdik

zdik / (ndk + ↵⇡k )

nkwdi +
nk. + V

ndk = ndk + zdik
nkwdi = nkwdi + zdik
end for

uk = 1 +

X

E[ (ndk

1)]

dT

vk =

⇡k = ⇡k
¯

X

E[ (ndl

(1

+

¯
⇡l ) ; with ⇡k =
¯

l=k+1,d
k 1
Y

l=1

1)]
uk
uk + v k

end for
until stopping criterion is met.
Sato, I., Kurihara, K., Nakagawa, H.: Practical Collapsed Variational Bayes Inference for Hierarchical Dirichlet Process. KDD (2012)
Stochastic Practical Collapsed Variational Inference
for the HDP
repeat
for each document d do
for each word i in d do

ndk = ndk zdik
nkwdi = nkwdi zdik

zdik / (ndk + ↵⇡k )

nkwdi +
nk. + V
ndk
nkw

ndk = ndk + zdik
nkwdi = nkwdi + zdik

(1
(1

⇢d )ndk + ⇢d Nd zk
t
t
⇢c )nkw + ⇢c N zk [wdi = w]
t
t

end for

uk = 1 +

X

E[ (ndk

uk

1)]

(1

⇢h )uk + ⇢h (1 + DE[ (ndk
t
t

(1

⇢h )vk
t

dT

vk =

⇡k = ⇡k
¯

E[ (ndl

(1

+

X

¯
⇡l ) ; with ⇡k =
¯

l=k+1,d
k 1
Y

vk

1)]

+

⇢h (
t

+D

T
X

1)])

E[ (ndl

1)])

l=k+1

l=1

uk
uk + v k

end for
until stopping criterion is met.
Bleier, A.: Practical Collapsed Stochastic Variational Inference for the HDP. NIPS Topic Models Workshop (2013)
Stochastic Variational Inference for LDA
repeat
for each document d do
for each word i in d do

ndk = ndk zdik
nkwdi = nkwdi zdik

zdik / (ndk + ↵⇡k )
ndk = ndk + zdik
nkwdi = nkwdi + zdik

nkwdi +
nk. + V
ndk
nkw

(1
(1

⇢d )ndk + ⇢d Nd zk
t
t
⇢c )nkw + ⇢c N zk [wdi = w]
t
t

end for
end for
until stopping criterion is met.

Foulds, J., Boyles, L., Smyth, P., Welling, M.: Stochastic Collapsed Variational Bayesian Inference for LDA. KDD (2013)
Evaluation

Predictive performance of the algorithm on the Associated
Press (TREC-1) data. For the evaluation we compared the
perplexity versus the number of documents seen for
PCSVB0, SCVB0 and PCVB0. We trained the model on
80% of the documents. All held out documents were split;
70% of the tokens in each held out document were used to
estimate the document parameters, the remaining 30%
were used to compute the perplexity.
s
Step-size schedule: ⇢t =
(⌧ + t)0.9
Configuration:
corpus-wide schedule ⇢c : s = 10, ⌧ = 1000
t
document schedule ⇢d : s = 1, ⌧ = 10
t
hyperparameter & stick weights ⇢h : s = 5, ⌧ = 100
t

More Related Content

Similar to Practical Collapsed Stochastic Variational Inference

Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataVrije Universiteit Amsterdam
 
A Note on Expectation-Propagation for Latent Dirichlet Allocation
A Note on Expectation-Propagation for Latent Dirichlet AllocationA Note on Expectation-Propagation for Latent Dirichlet Allocation
A Note on Expectation-Propagation for Latent Dirichlet AllocationTomonari Masada
 
Low-rank tensor approximation (Introduction)
Low-rank tensor approximation (Introduction)Low-rank tensor approximation (Introduction)
Low-rank tensor approximation (Introduction)Alexander Litvinenko
 
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...Chuancong Gao
 
Expectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocationExpectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocationTomonari Masada
 
Slides: Jeffreys centroids for a set of weighted histograms
Slides: Jeffreys centroids for a set of weighted histogramsSlides: Jeffreys centroids for a set of weighted histograms
Slides: Jeffreys centroids for a set of weighted histogramsFrank Nielsen
 
A Note on Correlated Topic Models
A Note on Correlated Topic ModelsA Note on Correlated Topic Models
A Note on Correlated Topic ModelsTomonari Masada
 
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...Aaron Li
 
High-Performance Approach to String Similarity using Most Frequent K Characters
High-Performance Approach to String Similarity using Most Frequent K CharactersHigh-Performance Approach to String Similarity using Most Frequent K Characters
High-Performance Approach to String Similarity using Most Frequent K CharactersHolistic Benchmarking of Big Linked Data
 

Similar to Practical Collapsed Stochastic Variational Inference (10)

Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
 
A Note on Expectation-Propagation for Latent Dirichlet Allocation
A Note on Expectation-Propagation for Latent Dirichlet AllocationA Note on Expectation-Propagation for Latent Dirichlet Allocation
A Note on Expectation-Propagation for Latent Dirichlet Allocation
 
Low-rank tensor approximation (Introduction)
Low-rank tensor approximation (Introduction)Low-rank tensor approximation (Introduction)
Low-rank tensor approximation (Introduction)
 
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
 
Expectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocationExpectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocation
 
Slides: Jeffreys centroids for a set of weighted histograms
Slides: Jeffreys centroids for a set of weighted histogramsSlides: Jeffreys centroids for a set of weighted histograms
Slides: Jeffreys centroids for a set of weighted histograms
 
Data mining techniques
Data mining techniquesData mining techniques
Data mining techniques
 
A Note on Correlated Topic Models
A Note on Correlated Topic ModelsA Note on Correlated Topic Models
A Note on Correlated Topic Models
 
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
KDD 2014 Presentation (Best Research Paper Award): Alias Topic Modelling (Red...
 
High-Performance Approach to String Similarity using Most Frequent K Characters
High-Performance Approach to String Similarity using Most Frequent K CharactersHigh-Performance Approach to String Similarity using Most Frequent K Characters
High-Performance Approach to String Similarity using Most Frequent K Characters
 

Recently uploaded

Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 

Recently uploaded (20)

Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 

Practical Collapsed Stochastic Variational Inference

  • 1. Practical Collapsed Stochastic Variational Inference Arnim Bleier, arnim.bleier@gesis.org CSS Workshop, GESIS, Köln, 16.12.2013
  • 2. Classic Collapsed LDA Inference repeat for each document d do for each word i in d do ndk = ndk zdik nkwdi = nkwdi zdik zdik / (ndk + ↵⇡k ) ndk = ndk + zdik nkwdi = nkwdi + zdik end for end for until stopping criterion is met. nkwdi + nk. + V
  • 3. Stochastic Practical Collapsed Variational Inference for the HDP repeat for each document d do for each word i in d do ndk = ndk zdik nkwdi = nkwdi zdik zdik / (ndk + ↵⇡k ) nkwdi + nk. + V ndk = ndk + zdik nkwdi = nkwdi + zdik end for uk = 1 + X E[ (ndk 1)] dT vk = ⇡k = ⇡k ¯ X E[ (ndl (1 + ¯ ⇡l ) ; with ⇡k = ¯ l=k+1,d k 1 Y l=1 1)] uk uk + v k end for until stopping criterion is met. Sato, I., Kurihara, K., Nakagawa, H.: Practical Collapsed Variational Bayes Inference for Hierarchical Dirichlet Process. KDD (2012)
  • 4. Stochastic Practical Collapsed Variational Inference for the HDP repeat for each document d do for each word i in d do ndk = ndk zdik nkwdi = nkwdi zdik zdik / (ndk + ↵⇡k ) nkwdi + nk. + V ndk nkw ndk = ndk + zdik nkwdi = nkwdi + zdik (1 (1 ⇢d )ndk + ⇢d Nd zk t t ⇢c )nkw + ⇢c N zk [wdi = w] t t end for uk = 1 + X E[ (ndk uk 1)] (1 ⇢h )uk + ⇢h (1 + DE[ (ndk t t (1 ⇢h )vk t dT vk = ⇡k = ⇡k ¯ E[ (ndl (1 + X ¯ ⇡l ) ; with ⇡k = ¯ l=k+1,d k 1 Y vk 1)] + ⇢h ( t +D T X 1)]) E[ (ndl 1)]) l=k+1 l=1 uk uk + v k end for until stopping criterion is met. Bleier, A.: Practical Collapsed Stochastic Variational Inference for the HDP. NIPS Topic Models Workshop (2013)
  • 5. Stochastic Variational Inference for LDA repeat for each document d do for each word i in d do ndk = ndk zdik nkwdi = nkwdi zdik zdik / (ndk + ↵⇡k ) ndk = ndk + zdik nkwdi = nkwdi + zdik nkwdi + nk. + V ndk nkw (1 (1 ⇢d )ndk + ⇢d Nd zk t t ⇢c )nkw + ⇢c N zk [wdi = w] t t end for end for until stopping criterion is met. Foulds, J., Boyles, L., Smyth, P., Welling, M.: Stochastic Collapsed Variational Bayesian Inference for LDA. KDD (2013)
  • 6. Evaluation Predictive performance of the algorithm on the Associated Press (TREC-1) data. For the evaluation we compared the perplexity versus the number of documents seen for PCSVB0, SCVB0 and PCVB0. We trained the model on 80% of the documents. All held out documents were split; 70% of the tokens in each held out document were used to estimate the document parameters, the remaining 30% were used to compute the perplexity. s Step-size schedule: ⇢t = (⌧ + t)0.9 Configuration: corpus-wide schedule ⇢c : s = 10, ⌧ = 1000 t document schedule ⇢d : s = 1, ⌧ = 10 t hyperparameter & stick weights ⇢h : s = 5, ⌧ = 100 t