This is an introduction of Topic Modeling, including tf-idf, LSA, pLSA, LDA, EM, and some other related materials. I know there are definitely some mistakes, and you can correct them with your wisdom. Thank you~
2. Outline
๏ฐ Basic Concepts
๏ฐ Application and Background
๏ฐ Famous Researchers
๏ฐ Language Model
๏ฐ Vector Space Model (VSM)
๏ฐ Term Frequency-Inverse Document Frequency (TF-IDF)
๏ฐ Latent Semantic Indexing (LSA)
๏ฐ Probabilistic Latent Semantic Indexing (pLSA)
๏ฐ Expectation-Maximization Algorithm (EM) & Maximum-
Likelihood Estimation (MLE)
6/11/2014 2 Middleware, CCNT, ZJU, Yueshen Xu
3. Outline
๏ฐ Latent Dirichlet Allocation (LDA)
๏ฐ Conjugate Prior
๏ฐ Possion Distribution
๏ฐ Variational Distribution and Variational Inference (VD
&VI)
๏ฐ Markov Chain Monte Carlo (MCMC)
๏ฐ Metropolis-Hastings Sampling (MH)
๏ฐ Gibbs Sampling and GS for LDA
๏ฐ Bayesian Theory v.s. Probability Theory
6/11/2014 3 Middleware, CCNT, ZJU, Yueshen Xu
4. Concepts
๏ฐ Latent Semantic Analysis
๏ฐ Topic Model
๏ฐ Text Mining
๏ฐ Natural Language Processing
๏ฐ Computational Linguistics
๏ฐ Information Retrieval
๏ฐ Dimension Reduction
๏ฐ Expectation-Maximization(EM)
6/11/2014 Middleware, CCNT, ZJU
Information Retrieval
Computational Linguistics
Natural Language Processing
LSA/Topic Model
Text Mining
LSA/Topic Model
Data Mining
Reduction
Dimension
Machine
Learning
EM
4
Machine
Translation
Aim:find the topic that a word or a document belongs to
Latent Factor Model
, Yueshen Xu
5. Application
๏ฐ LFM has been a fundamental technique in modern
search engine, recommender system, tag extraction,
blog clustering, twitter topic mining, news (text)
summarization, etc.
๏ฐ Search Engine
๏ฎ PageRank๏ How importantโฆ.this web page?
๏ฎ LFM๏ How relevanceโฆ.this web page?
๏ฎ LFM๏ How relevanceโฆthe userโs query
vs. one document?
๏ฐ Recommender System
๏ฎ Opinion Extraction
๏ฎ Spam Detection
๏ฎ Tag Extraction
6/11/2014 5 Middleware, CCNT, ZJU
๏ฐ Text Summarization
๏ฎ Abstract Generation
๏ฎ Twitter Topic Mining
Text: Steven Jobs had left us for about two yearsโฆ..the appleโs price will fall
downโฆ.
, Yueshen Xu
6. Famous Researcher
6/11/2014 6 Middleware, CCNT, ZJU
David Blei,
Princeton,
LDA
Chengxiang Zhai,
UIUC, Presidential
Early Career Award
W. Bruce Croft, UMA
Language Model
Bing Liu, UIC
Opinion Mining
John D. Lafferty,
CMU, CRF&IBM
Thomas Hofmann
Brown, pLSA
Andrew McCallum,
UMA, CRF&IBM
Susan Dumais,
Microsoft, LSI
, Yueshen Xu
7. Language Model
๏ฐ Unigram Language Model == Zero-order Markov Chain
๏ฐ Bigram Language Model == First-order Markov Chain
๏ฐ N-gram Language Model == (N-1)-order Markov Chain
๏ฐ Mixture-unigram Language Model
6/11/2014 Middleware, CCNT, ZJU
๏๏
๏ฝ
sw
i
i
MwpMwp )|()|(
๏ฒ
Bag of Words(BoW)
No order, no grammar, only multiplicity
๏๏
๏ญ๏ฝ
sw
ii
i
MwwpMwp )|()|( ,1
๏ฒ
8
w
N
M
w
N
M
z
๐ ๐ =
๐ง
๐(๐ง)
๐=1
๐
๐(๐ค ๐|๐ง)
, Yueshen Xu
8. 9
Vector Space Model
๏ฐ A document is represented as a vector of identifier
๏ฐ Identifier
๏ฎ Boolean: 0, 1
๏ฎ Term Count: How many timesโฆ
๏ฎ Term Frequency: How frequentโฆin this document
๏ฎ TF-IDF: How importantโฆin the corpus ๏ most used
๏ฐ Relevance Ranking
๏ฐ First used in SMART(Gerard Salton, Cornell)
6/11/2014 Middleware, CCNT, ZJU
),,,(
),,,(
21
21
tqqq
tjjjj
wwwq
wwwd
๏
๏
๏ฝ
๏ฝ
Gerard Salton
Award(SIGIR)
qd
qd
j
j ๏
๏ฝ๏ฑcos
, Yueshen Xu
9. TF-IDF
๏ฐ Mixture language model
๏ฎ Linear combination of a certain distribution(Gaussian)
๏ฎ Better Performance
๏ฐ TF: Term Frequency
๏ฐ IDF: Inversed Document Frequency
๏ฐ TF-IDF
6/11/2014 Middleware, CCNT, ZJU
๏ฅ
๏ฝ
k kj
ij
ij
n
n
tf Term i, document j, count of i in j
)
|}:{|1
log(
dtDd
N
idf
i
i
๏๏๏ซ
๏ฝ N documents in the corpus
iijjij idftfDdtidftf ๏ด๏ฝ๏ญ ),,(
How important โฆin this document
How important โฆin this corpus
10, Yueshen Xu
10. Latent Semantic Indexing
๏ฐ Challenge
๏ฎ Compare document in the same concept space
๏ฎ Compare documents across languages
๏ฎ Synonymy, ex: buy - purchase, user - consumer
๏ฎ Polysemy, ex; book - book, draw - draw
๏ฐ Key Idea
๏ฎ Dimensionality reduction of word-document co-occurrence matrix
๏ฎ Construction of latent semantic space
6/11/2014 Middleware, CCNT, ZJU
Defects of VSM
Word Document
Word DocumentConcept
VSM
LSI
11, Yueshen Xu
Aspect
Topic
Latent
Factor
11. Singular Value Decomposition
๏ฐ LSI ~= SVD
๏ฎ U, V: orthogonal matrices
๏ฎ โ :the diagonal matrix with the singular values of N
6/11/2014 Middleware, CCNT, ZJU12
T
VUN ๏ฅ๏ฝ
U
t * m
Document
Terms
t * d
m* m m* d
N โU V
k < m || k <<mCount, Frequency, TF-IDF
t * m
Document
Terms
t * k
k* k m* d
U V๏ฅ N
word: Exchangeability
k < m || k <<m
k
, Yueshen Xu
12. Singular Value Decomposition
๏ฐ The K-largest singular values
๏ฎ Distinguish the variance between words and documents to a
greatest extent
๏ฐ Discarding the lowest dimensions
๏ฎ Reduce noise
๏ฐ Fill the matrix
๏ฎ Predict & Lower computational complexity
๏ฎ Enlarge the distinctiveness
๏ฐ Decomposition
๏ฎ Concept, semantic, topic (aspect)
6/11/2014 13 Middleware, CCNT, ZJU
(Probabilistic) Matrix Factorization/
Factorization Model: Analytic
solution of SVD
Unsupervised
Learning
, Yueshen Xu
13. Probabilistic Latent Semantic Indexing
๏ฐ pLSI Model
6/11/2014 14 Middleware, CCNT, ZJU
w1
w2
wN
z1
zK
z2
d1
d2
dM
โฆ..
โฆ..
โฆ..
)(dp)|( dzp)|( zwp
๏ฐ Assumption
๏ฎ Pairs(d,w) are assumed to be
generated independently
๏ฎ Conditioned on z, w is generated
independently of d
๏ฎ Words in a document are
exchangeable
๏ฎ Documents are exchangeable
๏ฎ Latent topics z are independent
Generative Process/Model
๏ฅ๏ฅ ๏๏
๏ฝ๏ฝ๏ฝ
ZzZz
zwpdzpdpdzwpdpdpdwpwdp )|()|()()|,()()()|(),(
Multinomial Distribution
Multinomial Distribution
One layer of โDeep
Neutral Networkโ
Global
Local
, Yueshen Xu
14. Probabilistic Latent Semantic Indexing
6/11/2014 15 Middleware, CCNT, ZJU
d z w
N
M
๏ฅ๏
๏ฝ
Zz
zwpdzpdwp )|()|()|(
๏ฅ
๏ฅ๏ฅ
๏
๏๏
๏ฝ
๏ฝ๏ฝ
Zz
ZzZz
zpzdpzwp
zdpzdwpzwdpdwp
)()|()|(
),(),|(),,(),(
d
z w
N
M
These are two ways to
formulate pLSA, which are
equivalent but lead to two
different inference processes
Equivalent in Bayes Rule
Probabilistic
Graph Model
d:Exchangeability
Directed Acyclic
Graph (DAG)
, Yueshen Xu
15. Expectation-Maximization
๏ฐ EM is a general algorithm for maximum-likelihood estimation
(MLE) where the data are โincompleteโ or contains latent
variables: pLSA, GMM, HMMโฆ---Cross Domain๏
๏ฐ Deduction Process
๏ฎ ฮธ:parameter to be estimated; ฮธ0: initialize randomly; ฮธn: the current
value; ฮธn+1: the next value
6/11/2014 16 Middleware, CCNT, ZJU
)()(max1 nn
LL ๏ฑ๏ฑ๏ฑ
๏ฑ
๏ญ๏ฝ๏ซ
),|(log)( ๏ฑ๏ฑ XpL ๏ฝ )|,(log)( ๏ฑ๏ฑ HXpLc ๏ฝ
Latent Variable
),|(log)(),|(log)|(log)|,(log)( ๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ XHpLXHpXpHXpLc ๏ซ๏ฝ๏ซ๏ฝ๏ฝ
),|(
),|(
log)()()()(
๏ฑ
๏ฑ
๏ฑ๏ฑ๏ฑ๏ฑ
XHp
XHp
LLLL
n
n
cc
n
๏ซ๏ญ๏ฝ๏ญ
, Yueshen Xu
Objective:
16. Expectation-Maximization
6/11/2014 17 Middleware, CCNT, ZJU
),|(
),|(
log),|(
),|()(),|()()()(
๏ฑ
๏ฑ
๏ฑ
๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ
XHp
XHp
XHp
XHpLXHpLLL
n
H
n
H
nn
c
H
n
c
n
๏ฅ
๏ฅ๏ฅ
๏ซ
๏ญ๏ฝ๏ญ
K-L divergence: non-negative
Kullback-Leibler Divergence, or Relative Entropy
๏ฅ๏ฅ ๏ญ๏ซ๏ณ
H
nn
c
H
nn
c XHpLLXHpLL ),|()()(),|()()( ๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ
Lower Bound
๏ฅ๏ฝ๏ฝ
H
n
ccXHp
n
XHpLLEQ n ),|()()]([);( ),|(
๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ ๏ฑ
Q-function
E-step (expectation): Compute Q;
M-step(maximization): Re-estimate ฮธ by maximizing Q
Convergence
How is EM used in pLSA?
, Yueshen Xu
17. EM in pLSA
6/11/2014 18 Middleware, CCNT, ZJU
๏ฅ๏ฅ๏ฅ
๏ฅ ๏ฅ๏ฅ
๏ฅ
๏ฝ๏ฝ ๏ฝ
๏ฝ ๏ฝ ๏ฝ
๏ฝ
๏ฝ
๏ฝ๏ฝ
K
k
ikkjijk
N
i
M
j
ji
K
k
ikkj
N
i
M
j
jiijk
H
n
ccXHp
n
dzpzwpdwzpwdn
dzpzwpwdndwzp
XHpLLEQ n
11 1
1 1 1
),|(
))|()|(log(),|(),(
))|()|(log(),(),|(
),|()()]([);( ๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ ๏ฑ
Posterior Random value in initialization
Likelyhood function
Constraints:
1.
2.
1)|(
1
๏ฝ๏ฅ๏ฝ
M
j
kj
zwp
1)|(
1
๏ฝ๏ฅ๏ฝ
K
k
jk dzp
Lagrange
Multiplier
๏จ ๏ฉ ๏ฅ ๏ฅ๏ฅ ๏ฅ ๏ฝ ๏ฝ๏ฝ ๏ฝ
๏ญ๏ซ๏ญ๏ซ๏ฝ
M
i
K
k
iki
K
k
M
j
kjkc dzpzwpLEH
1 11 1
))|(1())|(1(][ ๏ฒ๏ด๏ฑ
Partial derivative=0
independent
variable
independent
variable
๏ฅ๏ฅ
๏ฅ
๏ฝ ๏ฝ
๏ฝ
๏ฝ M
m
N
i
imkim
N
i
ijkij
kj
dwzpdwn
dwzpdwn
zwp
1 1
1
),|(),(
),|(),(
)|(
)(
),|(),(
)|(
1
i
M
j
ijkij
ik
dn
dwzpdwn
dzp
๏ฅ๏ฝ
๏ฝ
M-Step
E-Step
๏ฅ
๏ฅ
๏ฝ
๏ฝ
๏ฝ
๏ฝ
K
l
illj
ikkj
K
l
illji
iikkj
ijk
dzpzwp
dzpzwp
dzpzwpdp
dpdzpzwp
dwzp
1
1
)|()|(
)|()|(
)|()|()(
)()|()|(
),|(
Associative
Law &
Distributive
Law
, Yueshen Xu
๐๐๐ ๐(๐ค|๐) ๐(๐,๐ค)
18. Bayesian Theory v.s.
Probability Theory
๏ฐ Bayesian Theory v.s. Probability Theory
๏ฎ Estimate ๐ through posterior v.s. Estimate ๐ through the
maximization of likelihood
๏ฎ Bayesian theory ๏ prior v.s. Probability theory ๏ statistic
๏ฎ When the number of samples โ โ, Bayesian theory == Probability
theory
๏ฐ Parameter Estimation
๏ฎ ๐ ๐ ๐ท โ ๐ ๐ท ๐ ๐ ๐ ๏ ๐ ๐ ? ๏ Conjugate Prior ๏ likelihood is
helpful, but its function is limited ๏ Otherwise?
6/11/2014 19 Middleware, CCNT, ZJU
๏ฐ Non-parametric Bayesian Methods (Complicated)
๏ฎ Kernel methods: I just know a little...
๏ฎ VSM ๏ CF ๏ MF ๏ pLSA ๏ LDA ๏ Non-parametric Bayesian๏
Deep Learning
, Yueshen Xu
19. Latent Dirichlet Allocation
๏ฐ Latent Dirichlet Allocation (LDA)
๏ฎ David M. Blei, Andrew Y. Ng, Michael I. Jordan
๏ฎ Journal of Machine Learning Research๏ผ2003, cited > 3000
๏ฎ Hierarchical Bayesian model; Bayesian pLSI
6/11/2014 20 Middleware, CCNT, ZJU
ฮธ z w
N
M
ฮฑ
ฮฒ
Iterative times
Generative Process of a document d in a
corpus according to LDA
๏ Choose N ~ Poisson(๐); ๏ Why?
๏ For each document d={๐ค1, ๐ค2 โฆ ๐ค ๐}
Choose ๐ ~๐ท๐๐(๐ผ); ๏ Why?
๏ For each of the N words ๐ค ๐ in d:
a) Choose a topic ๐ง ๐~๐๐ข๐๐ก๐๐๐๐๐๐๐๐ ๐
๏ Why?
b) Choose a word ๐ค ๐ from ๐ ๐ค ๐ ๐ง ๐, ๐ฝ ,
a multinomial probability conditioned on ๐ง ๐
๏ Why
ACM-Infosys
Awards
, Yueshen Xu
20. Latent Dirichlet Allocation
๏ฐ LDA(Cont.)
6/11/2014 21 Middleware, CCNT, ZJU
ฮธ z w
N
Mฮฑ
๐
ฮฒ
K
ฮฒ
Generative Process of a document d in LDA
๏ Choose N ~ Poisson(๐); ๏ Not important
๏ For each document d={๐ค1, ๐ค2 โฆ ๐ค ๐}
Choose ๐ ~๐ท๐๐(๐ผ);๐ = ๐1, ๐2 โฆ ๐ ๐พ , ๐ = ๐พ ,
K is fixed, 1
๐พ
๐ = 1, ๐ท๐๐~๐๐ข๐๐ก๐ โ๐ถ๐๐๐๐ข๐๐๐ก๐
๐๐๐๐๐
๏ For each of the N words ๐ค ๐ in d:
a) Choose a topic ๐ง ๐~๐๐ข๐๐ก๐๐๐๐๐๐๐๐ ๐
b) Choose a word ๐ค ๐ from ๐ ๐ค ๐ ๐ง ๐, ๐ฝ ,
a multinomial probability conditioned on
๐ง ๐๏ one word ๏๏ one topic
one document ๏๏ multi-topics
๐ = ๐1, ๐2 โฆ ๐ ๐พ
z= ๐ง1, ๐ง2 โฆ ๐ง ๐พ
For each word ๐ค ๐there is a ๐ง ๐ ๏ ๏
pLSA: the number of p(z|d) is linear
to the number of documents ๏
overfitting
Regularization
M+K Dirichlet-Multinomial
, Yueshen Xu
22. Conjugate Prior &
Distributions
๏ฐ Conjugate Prior:
๏ฎ If the posterior p(ฮธ|x) are in the same family as the p(ฮธ), the prior
and posterior are called conjugate distributions, and the prior is
called a conjugate prior of the likelihood p(x|ฮธ) : p(ฮธ|x) โ p(x|ฮธ)p(ฮธ)
๏ฐ Distributions
๏ฎ Binomial Distribution โโ Beta Distribution
๏ฎ Multinomial Distribution โโ Dirichlet Distribution
๏ฐ Binomial & Beta Distribution
๏ฎ Binomial๏ Bin(m|N,ฮธ)=C(m,N)ฮธm(1-ฮธ)N-m :likelihood
๏ฎ C(m,N)=N!/(N-m)!m!
๏ฎ Beta(ฮธ|a,b) ๏
6/11/2014 23 Middleware, CCNT, ZJU
11-
)1(
)()(
)( ๏ญ
๏ญ
๏๏
๏ซ๏ ba
ba
ba
๏ฑ๏ฑ ๏ฒ
๏ฅ
๏ญ๏ญ
๏ฝ๏
0
1
)( dteta ta
Why do prior and
posterior need to be
conjugate distributions?
, Yueshen Xu
23. Conjugate Prior &
Distributions
6/11/2014 24 Middleware, CCNT, ZJU
11-
)1(
)()(
)(
)1(),(),,,|(
๏ญ
๏ญ
๏๏
๏ซ๏
๏ด
๏ญ๏ซ๏ต
ba
lm
ba
ba
lmmCbalmp
๏ฑ๏ฑ
๏ฑ๏ฑ๏ฑ
11-
)1(
)()(
)(
),,,|( ๏ญ๏ซ๏ซ
๏ญ
๏ซ๏๏ซ๏
๏ซ๏ซ๏ซ๏
๏ฝ blam
blam
blam
balmp ๏ฑ๏ฑ๏ฑ
Beta Distribution!
Parameter Estimation
๏ฐ Multinomial & Dirichlet Distribution
๏ฎ x/ ๐ฅ is a multivariate, ex, ๐ฅ = (0,0,1,0,0,0): event of ๐ฅ3 happens
๏ฎ The probabilistic distribution of ๐ฅ in only one event : ๐ ๐ฅ ๐
= ๐=1
๐พ
๐ ๐
๐ฅ ๐
, ๐ = (๐1, ๐2 โฆ , ๐ ๐)
, Yueshen Xu
24. Conjugate Prior &
Distributions
๏ฐ Multinomial & Dirichlet Distribution (Cont.)
๏ฎ Mult(๐1, ๐2, โฆ , ๐ ๐พ|๐ฝ, ๐)=
๐!
๐1!๐2!โฆ๐ ๐พ!
๐ถ ๐
๐1
๐ถ ๐โ๐1
๐2
๐ถ ๐โ๐1โ๐2
๐3
โฆ
๐ถ ๐โ ๐=1
๐พโ1
๐ ๐
๐ ๐พ
๐=1
๐พ
๐ ๐
๐ฅ ๐
: the likelihood function of ๐
6/11/2014 25 Middleware, CCNT, ZJU
Mult: The exact probabilistic distribution of ๐ ๐ง ๐ ๐๐ and ๐ ๐ค๐ ๐ง ๐
In Bayesian theory, we need to find a conjugate prior of ๐ for
Mult, where 0 < ๐ < 1, ๐=1
๐พ
๐ ๐ = 1
Dirichlet Distribution
๐ท๐๐ ๐ ๐ถ =
ฮ(๐ผ0)
ฮ ๐ผ1 โฆ ฮ ๐ผ ๐พ
๐=1
๐พ
๐ ๐
๐ผ ๐โ1
a vector
Hyper-parameter: parameter in
probabilistic distribution function (pdf)
, Yueshen Xu
26. Poisson Distribution
๏ฐ Why Poisson distribution?
๏ฎ The number of births per hour during a given day; the number of
particles emitted by a radioactive source in a given time; the number
of cases of a disease in different towns
๏ฎ For Bin(n,p), when n is large, and p is small ๏ p(X=k)โ
๐ ๐ ๐โ๐
๐!
, ๐ โ ๐๐
๏ฎ ๐บ๐๐๐๐ ๐ฅ ๐ผ =
๐ฅ ๐ผโ1 ๐โ๐ฅ
ฮ(๐ผ)
๏ ๐บ๐๐๐๐ ๐ฅ ๐ผ = ๐ + 1 =
๐ฅ ๐ ๐โ๐ฅ
๐!
(ฮ ๐ + 1 = ๐!)
(Poisson ๏ discrete; Gamma ๏ continuous)
6/11/2014 27 Middleware, CCNT, ZJU
๏ฐ Poisson Distribution
๏ฎ ๐ ๐|๐ =
๐ ๐ ๐โ๐
๐!
๏ฎ Many experimental situations occur in which we observe the
counts of events within a set unit of time, area, volume, length .etc
, Yueshen Xu
28. Solution for LDA
6/11/2014 29 Middleware, CCNT, ZJU
The most significant generative model in Machine Learning Community in the
recent ten years
๐ ๐ ๐ผ, ๐ฝ =
ฮ( ๐ ๐ผ๐)
๐ ฮ(๐ผ๐)
๐=1
๐
๐๐
๐ผ ๐โ1
๐=1
๐
๐=1
๐
๐=1
๐
(๐๐ ๐ฝ๐๐) ๐ค ๐
๐
๐๐
p ๐ ๐ผ, ๐ฝ = ๐(๐|๐ผ)
๐=1
๐
๐ง ๐
๐ ๐ง ๐ ๐ ๐(๐ค ๐|๐ง ๐, ๐ฝ) ๐๐
Rewrite in terms of
model parameters
๐ผ = ๐ผ1, ๐ผ2, โฆ ๐ผ ๐พ ; ๐ฝ โ ๐ ๐พร๐:What we need to solve out
Variational Inference Gibbs Sampling
Deterministic Inference Stochastic Inference
Why variational inference?๏ Simplify the dependency structure
Why sampling?๏ Approximate the
statistical properties of the population
with those of samplesโ
, Yueshen Xu
29. Variational Inference
๏ฐ Variational Inference (Inference through a variational
distribution), VI
๏ฎ VI aims to use an approximating distribution that has a simpler
dependency structure than that of the exact posterior distribution
6/11/2014 30 Middleware, CCNT, ZJU
๐(๐ป|๐ท) โ ๐(๐ป)
true posterior distribution
variational distribution
Dissimilarity between
P and Q?
Kullback-Leibler
Divergence
๐พ๐ฟ(๐| ๐ = ๐ ๐ป ๐๐๐
๐ ๐ป ๐ ๐ท
๐ ๐ป, ๐ท
๐๐ป
= ๐ ๐ป ๐๐๐
๐ ๐ป
๐ ๐ป, ๐ท
๐๐ป + ๐๐๐๐(๐ท)
๐ฟ
๐๐๐
๐ ๐ป ๐๐๐๐ ๐ป, ๐ท ๐๐ป โ ๐ ๐ป ๐๐๐๐ ๐ป ๐๐ป =< ๐๐๐๐(๐ป, ๐ท) >Q(H) +โ ๐
Entropy of Q
, Yueshen Xu
33. Variational Inference
๏ฐ You can refer to more in the original paper.๏๏
๏ฐ Variational EM Algorithm
๏ฎ Aim: (๐ผ
โ
, ๐ฝ
โ
)=arg max ๐=1
๐
๐ ๐|๐ผ, ๐ฝ
๏ฎ Initialize ๐ผ, ๐ฝ
๏ฎ E-Step: compute ๐ผ, ๐ฝ through variational inference for likelihood
approximation
๏ฎ M-Step: Maximize the likelihood according to ๐ผ, ๐ฝ
๏ฎ End until convergence
6/11/2014 34 Middleware, CCNT, ZJU, Yueshen Xu
36. Markov Chain Monte Carlo
๏ฐ MCMC Sampling
๏ฎ We should construct the relationship between ๐(๐ฅ) and MC
transition process ๏ Detailed Balance Condition
๏ฎ In a common MC, if for ๐ ๐ , ๐ ๐ก๐๐๐๐ ๐๐ก๐๐๐ ๐๐๐ก๐๐๐ฅ , ๐ ๐ ๐๐๐ = ๐(j)
๐๐๐, ๐๐๐ ๐๐๐ ๐, ๐ ๏ ๐(๐ฅ) is the stationary distribution of this MC
๏ฎ Prove: ๐=1
โ
๐ ๐ ๐๐๐ = ๐=1
โ
๐ ๐ ๐๐๐ = ๐ ๐ โโ ๐๐ = ๐๏ ๐ is the
solution of the equation ๐๐ = ๐ ๏ Done
๏ฎ For a common MC(q(i,j), q(j|i), q(i๏ j)), and for any probabilistic
distribution p(x) (the dimension of x is arbitrary) ๏ Transformation
6/11/2014 37 Middleware, CCNT, ZJU
๐ ๐ ๐ ๐, ๐ ๐ผ ๐, ๐ = ๐ ๐ ๐(๐, ๐)๐ผ(๐, ๐)
Qโ(i,j) Qโ(j,i)
๐ผ ๐, ๐ = ๐ ๐ ๐(๐, ๐),๐ผ ๐, ๐ = ๐ ๐ ๐(๐, ๐),
necessary condition
, Yueshen Xu
37. Markov Chain Monte Carlo
๏ฐ MCMC Sampling(cont.)
Step1: Initialize: ๐0 = ๐ฅ0
Step2: for t = 0, 1, 2, โฆ
๐๐ก = ๐ฅ๐ก, ๐ ๐๐๐๐๐ ๐ฆ ๐๐๐๐ ๐(๐ฅ|๐ฅ๐ก) (๐ฆ โ ๐ท๐๐๐๐๐ ๐๐ ๐ท๐๐๐๐๐๐ก๐๐๐)
sample u from Uniform[0,1]
If ๐ข < ๐ผ ๐ฅ๐ก, ๐ฆ = ๐ ๐ฆ ๐ ๐ฅ๐ก ๐ฆ โ ๐ฅ๐ก โ ๐ฆ, ๏ Xt+1 = y
else Xt+1 = xt
6/11/2014 38 Middleware, CCNT, ZJU
๏ฐ Metropolis-Hastings Sampling
Step1: Initialize: ๐0 = ๐ฅ0
Step2: for t = 0, 1, 2, โฆn, n+1, n+2โฆ
๐๐ก = ๐ฅ๐ก, ๐ ๐๐๐๐๐ ๐ฆ ๐๐๐๐ ๐ ๐ฅ ๐ฅ๐ก ๐ฆ โ ๐ท๐๐๐๐๐ ๐๐ ๐ท๐๐๐๐๐๐ก๐on
Burn-in Period
Convergence
, Yueshen Xu