SlideShare a Scribd company logo
1 of 43
Download to read offline
Topic Model
๏ผˆโ‰ˆ
๐Ÿ
๐Ÿ
Text Mining๏ผ‰
Yueshen Xu
xyshzjucs@zju.edu.cn
Middleware, CCNT, ZJU
Middleware, CCNT, ZJU6/11/2014
Text Mining&NLP&ML
1, Yueshen Xu
Outline
๏ฐ Basic Concepts
๏ฐ Application and Background
๏ฐ Famous Researchers
๏ฐ Language Model
๏ฐ Vector Space Model (VSM)
๏ฐ Term Frequency-Inverse Document Frequency (TF-IDF)
๏ฐ Latent Semantic Indexing (LSA)
๏ฐ Probabilistic Latent Semantic Indexing (pLSA)
๏ฐ Expectation-Maximization Algorithm (EM) & Maximum-
Likelihood Estimation (MLE)
6/11/2014 2 Middleware, CCNT, ZJU, Yueshen Xu
Outline
๏ฐ Latent Dirichlet Allocation (LDA)
๏ฐ Conjugate Prior
๏ฐ Possion Distribution
๏ฐ Variational Distribution and Variational Inference (VD
&VI)
๏ฐ Markov Chain Monte Carlo (MCMC)
๏ฐ Metropolis-Hastings Sampling (MH)
๏ฐ Gibbs Sampling and GS for LDA
๏ฐ Bayesian Theory v.s. Probability Theory
6/11/2014 3 Middleware, CCNT, ZJU, Yueshen Xu
Concepts
๏ฐ Latent Semantic Analysis
๏ฐ Topic Model
๏ฐ Text Mining
๏ฐ Natural Language Processing
๏ฐ Computational Linguistics
๏ฐ Information Retrieval
๏ฐ Dimension Reduction
๏ฐ Expectation-Maximization(EM)
6/11/2014 Middleware, CCNT, ZJU
Information Retrieval
Computational Linguistics
Natural Language Processing
LSA/Topic Model
Text Mining
LSA/Topic Model
Data Mining
Reduction
Dimension
Machine
Learning
EM
4
Machine
Translation
Aim:find the topic that a word or a document belongs to
Latent Factor Model
, Yueshen Xu
Application
๏ฐ LFM has been a fundamental technique in modern
search engine, recommender system, tag extraction,
blog clustering, twitter topic mining, news (text)
summarization, etc.
๏ฐ Search Engine
๏ฎ PageRank๏ƒ  How importantโ€ฆ.this web page?
๏ฎ LFM๏ƒ  How relevanceโ€ฆ.this web page?
๏ฎ LFM๏ƒ  How relevanceโ€ฆthe userโ€™s query
vs. one document?
๏ฐ Recommender System
๏ฎ Opinion Extraction
๏ฎ Spam Detection
๏ฎ Tag Extraction
6/11/2014 5 Middleware, CCNT, ZJU
๏ฐ Text Summarization
๏ฎ Abstract Generation
๏ฎ Twitter Topic Mining
Text: Steven Jobs had left us for about two yearsโ€ฆ..the appleโ€™s price will fall
downโ€ฆ.
, Yueshen Xu
Famous Researcher
6/11/2014 6 Middleware, CCNT, ZJU
David Blei,
Princeton,
LDA
Chengxiang Zhai,
UIUC, Presidential
Early Career Award
W. Bruce Croft, UMA
Language Model
Bing Liu, UIC
Opinion Mining
John D. Lafferty,
CMU, CRF&IBM
Thomas Hofmann
Brown, pLSA
Andrew McCallum,
UMA, CRF&IBM
Susan Dumais,
Microsoft, LSI
, Yueshen Xu
Language Model
๏ฐ Unigram Language Model == Zero-order Markov Chain
๏ฐ Bigram Language Model == First-order Markov Chain
๏ฐ N-gram Language Model == (N-1)-order Markov Chain
๏ฐ Mixture-unigram Language Model
6/11/2014 Middleware, CCNT, ZJU
๏ƒ•๏ƒŽ
๏€ฝ
sw
i
i
MwpMwp )|()|(
๏ฒ
Bag of Words(BoW)
No order, no grammar, only multiplicity
๏ƒ•๏ƒŽ
๏€ญ๏€ฝ
sw
ii
i
MwwpMwp )|()|( ,1
๏ฒ
8
w
N
M
w
N
M
z
๐‘ ๐’˜ =
๐‘ง
๐‘(๐‘ง)
๐‘›=1
๐‘
๐‘(๐‘ค ๐‘›|๐‘ง)
, Yueshen Xu
9
Vector Space Model
๏ฐ A document is represented as a vector of identifier
๏ฐ Identifier
๏ฎ Boolean: 0, 1
๏ฎ Term Count: How many timesโ€ฆ
๏ฎ Term Frequency: How frequentโ€ฆin this document
๏ฎ TF-IDF: How importantโ€ฆin the corpus ๏ƒ  most used
๏ฐ Relevance Ranking
๏ฐ First used in SMART(Gerard Salton, Cornell)
6/11/2014 Middleware, CCNT, ZJU
),,,(
),,,(
21
21
tqqq
tjjjj
wwwq
wwwd
๏‹
๏‹
๏€ฝ
๏€ฝ
Gerard Salton
Award(SIGIR)
qd
qd
j
j ๏ƒ—
๏€ฝ๏ฑcos
, Yueshen Xu
TF-IDF
๏ฐ Mixture language model
๏ฎ Linear combination of a certain distribution(Gaussian)
๏ฎ Better Performance
๏ฐ TF: Term Frequency
๏ฐ IDF: Inversed Document Frequency
๏ฐ TF-IDF
6/11/2014 Middleware, CCNT, ZJU
๏ƒฅ
๏€ฝ
k kj
ij
ij
n
n
tf Term i, document j, count of i in j
)
|}:{|1
log(
dtDd
N
idf
i
i
๏ƒŽ๏ƒŽ๏€ซ
๏€ฝ N documents in the corpus
iijjij idftfDdtidftf ๏‚ด๏€ฝ๏€ญ ),,(
How important โ€ฆin this document
How important โ€ฆin this corpus
10, Yueshen Xu
Latent Semantic Indexing
๏ฐ Challenge
๏ฎ Compare document in the same concept space
๏ฎ Compare documents across languages
๏ฎ Synonymy, ex: buy - purchase, user - consumer
๏ฎ Polysemy, ex; book - book, draw - draw
๏ฐ Key Idea
๏ฎ Dimensionality reduction of word-document co-occurrence matrix
๏ฎ Construction of latent semantic space
6/11/2014 Middleware, CCNT, ZJU
Defects of VSM
Word Document
Word DocumentConcept
VSM
LSI
11, Yueshen Xu
Aspect
Topic
Latent
Factor
Singular Value Decomposition
๏ฐ LSI ~= SVD
๏ฎ U, V: orthogonal matrices
๏ฎ โˆ‘ :the diagonal matrix with the singular values of N
6/11/2014 Middleware, CCNT, ZJU12
T
VUN ๏ƒฅ๏€ฝ
U
t * m
Document
Terms
t * d
m* m m* d
N โˆ‘U V
k < m || k <<mCount, Frequency, TF-IDF
t * m
Document
Terms
t * k
k* k m* d
U V๏ƒฅ N
word: Exchangeability
k < m || k <<m
k
, Yueshen Xu
Singular Value Decomposition
๏ฐ The K-largest singular values
๏ฎ Distinguish the variance between words and documents to a
greatest extent
๏ฐ Discarding the lowest dimensions
๏ฎ Reduce noise
๏ฐ Fill the matrix
๏ฎ Predict & Lower computational complexity
๏ฎ Enlarge the distinctiveness
๏ฐ Decomposition
๏ฎ Concept, semantic, topic (aspect)
6/11/2014 13 Middleware, CCNT, ZJU
(Probabilistic) Matrix Factorization/
Factorization Model: Analytic
solution of SVD
Unsupervised
Learning
, Yueshen Xu
Probabilistic Latent Semantic Indexing
๏ฐ pLSI Model
6/11/2014 14 Middleware, CCNT, ZJU
w1
w2
wN
z1
zK
z2
d1
d2
dM
โ€ฆ..
โ€ฆ..
โ€ฆ..
)(dp)|( dzp)|( zwp
๏ฐ Assumption
๏ฎ Pairs(d,w) are assumed to be
generated independently
๏ฎ Conditioned on z, w is generated
independently of d
๏ฎ Words in a document are
exchangeable
๏ฎ Documents are exchangeable
๏ฎ Latent topics z are independent
Generative Process/Model
๏ƒฅ๏ƒฅ ๏ƒŽ๏ƒŽ
๏€ฝ๏€ฝ๏€ฝ
ZzZz
zwpdzpdpdzwpdpdpdwpwdp )|()|()()|,()()()|(),(
Multinomial Distribution
Multinomial Distribution
One layer of โ€˜Deep
Neutral Networkโ€™
Global
Local
, Yueshen Xu
Probabilistic Latent Semantic Indexing
6/11/2014 15 Middleware, CCNT, ZJU
d z w
N
M
๏ƒฅ๏ƒŽ
๏€ฝ
Zz
zwpdzpdwp )|()|()|(
๏ƒฅ
๏ƒฅ๏ƒฅ
๏ƒŽ
๏ƒŽ๏ƒŽ
๏€ฝ
๏€ฝ๏€ฝ
Zz
ZzZz
zpzdpzwp
zdpzdwpzwdpdwp
)()|()|(
),(),|(),,(),(
d
z w
N
M
These are two ways to
formulate pLSA, which are
equivalent but lead to two
different inference processes
Equivalent in Bayes Rule
Probabilistic
Graph Model
d:Exchangeability
Directed Acyclic
Graph (DAG)
, Yueshen Xu
Expectation-Maximization
๏ฐ EM is a general algorithm for maximum-likelihood estimation
(MLE) where the data are โ€˜incompleteโ€™ or contains latent
variables: pLSA, GMM, HMMโ€ฆ---Cross Domain๏Š
๏ฐ Deduction Process
๏ฎ ฮธ:parameter to be estimated; ฮธ0: initialize randomly; ฮธn: the current
value; ฮธn+1: the next value
6/11/2014 16 Middleware, CCNT, ZJU
)()(max1 nn
LL ๏ฑ๏ฑ๏ฑ
๏ฑ
๏€ญ๏€ฝ๏€ซ
),|(log)( ๏ฑ๏ฑ XpL ๏€ฝ )|,(log)( ๏ฑ๏ฑ HXpLc ๏€ฝ
Latent Variable
),|(log)(),|(log)|(log)|,(log)( ๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ XHpLXHpXpHXpLc ๏€ซ๏€ฝ๏€ซ๏€ฝ๏€ฝ
),|(
),|(
log)()()()(
๏ฑ
๏ฑ
๏ฑ๏ฑ๏ฑ๏ฑ
XHp
XHp
LLLL
n
n
cc
n
๏€ซ๏€ญ๏€ฝ๏€ญ
, Yueshen Xu
Objective:
Expectation-Maximization
6/11/2014 17 Middleware, CCNT, ZJU
),|(
),|(
log),|(
),|()(),|()()()(
๏ฑ
๏ฑ
๏ฑ
๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ
XHp
XHp
XHp
XHpLXHpLLL
n
H
n
H
nn
c
H
n
c
n
๏ƒฅ
๏ƒฅ๏ƒฅ
๏€ซ
๏€ญ๏€ฝ๏€ญ
K-L divergence: non-negative
Kullback-Leibler Divergence, or Relative Entropy
๏ƒฅ๏ƒฅ ๏€ญ๏€ซ๏‚ณ
H
nn
c
H
nn
c XHpLLXHpLL ),|()()(),|()()( ๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ
Lower Bound
๏ƒฅ๏€ฝ๏€ฝ
H
n
ccXHp
n
XHpLLEQ n ),|()()]([);( ),|(
๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ ๏ฑ
Q-function
E-step (expectation): Compute Q;
M-step(maximization): Re-estimate ฮธ by maximizing Q
Convergence
How is EM used in pLSA?
, Yueshen Xu
EM in pLSA
6/11/2014 18 Middleware, CCNT, ZJU
๏ƒฅ๏ƒฅ๏ƒฅ
๏ƒฅ ๏ƒฅ๏ƒฅ
๏ƒฅ
๏€ฝ๏€ฝ ๏€ฝ
๏€ฝ ๏€ฝ ๏€ฝ
๏€ฝ
๏€ฝ
๏€ฝ๏€ฝ
K
k
ikkjijk
N
i
M
j
ji
K
k
ikkj
N
i
M
j
jiijk
H
n
ccXHp
n
dzpzwpdwzpwdn
dzpzwpwdndwzp
XHpLLEQ n
11 1
1 1 1
),|(
))|()|(log(),|(),(
))|()|(log(),(),|(
),|()()]([);( ๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ ๏ฑ
Posterior Random value in initialization
Likelyhood function
Constraints:
1.
2.
1)|(
1
๏€ฝ๏ƒฅ๏€ฝ
M
j
kj
zwp
1)|(
1
๏€ฝ๏ƒฅ๏€ฝ
K
k
jk dzp
Lagrange
Multiplier
๏€จ ๏€ฉ ๏ƒฅ ๏ƒฅ๏ƒฅ ๏ƒฅ ๏€ฝ ๏€ฝ๏€ฝ ๏€ฝ
๏€ญ๏€ซ๏€ญ๏€ซ๏€ฝ
M
i
K
k
iki
K
k
M
j
kjkc dzpzwpLEH
1 11 1
))|(1())|(1(][ ๏ฒ๏ด๏ฑ
Partial derivative=0
independent
variable
independent
variable
๏ƒฅ๏ƒฅ
๏ƒฅ
๏€ฝ ๏€ฝ
๏€ฝ
๏€ฝ M
m
N
i
imkim
N
i
ijkij
kj
dwzpdwn
dwzpdwn
zwp
1 1
1
),|(),(
),|(),(
)|(
)(
),|(),(
)|(
1
i
M
j
ijkij
ik
dn
dwzpdwn
dzp
๏ƒฅ๏€ฝ
๏€ฝ
M-Step
E-Step
๏ƒฅ
๏ƒฅ
๏€ฝ
๏€ฝ
๏€ฝ
๏€ฝ
K
l
illj
ikkj
K
l
illji
iikkj
ijk
dzpzwp
dzpzwp
dzpzwpdp
dpdzpzwp
dwzp
1
1
)|()|(
)|()|(
)|()|()(
)()|()|(
),|(
Associative
Law &
Distributive
Law
, Yueshen Xu
๐‘™๐‘œ๐‘” ๐‘(๐‘ค|๐‘‘) ๐‘›(๐‘‘,๐‘ค)
Bayesian Theory v.s.
Probability Theory
๏ฐ Bayesian Theory v.s. Probability Theory
๏ฎ Estimate ๐œƒ through posterior v.s. Estimate ๐œƒ through the
maximization of likelihood
๏ฎ Bayesian theory ๏ƒ  prior v.s. Probability theory ๏ƒ  statistic
๏ฎ When the number of samples โ†’ โˆž, Bayesian theory == Probability
theory
๏ฐ Parameter Estimation
๏ฎ ๐‘ ๐œƒ ๐ท โˆ ๐‘ ๐ท ๐œƒ ๐‘ ๐œƒ ๏ƒ  ๐‘ ๐œƒ ? ๏ƒ  Conjugate Prior ๏ƒ  likelihood is
helpful, but its function is limited ๏ƒ  Otherwise?
6/11/2014 19 Middleware, CCNT, ZJU
๏ฐ Non-parametric Bayesian Methods (Complicated)
๏ฎ Kernel methods: I just know a little...
๏ฎ VSM ๏ƒ  CF ๏ƒ  MF ๏ƒ  pLSA ๏ƒ  LDA ๏ƒ  Non-parametric Bayesian๏ƒ 
Deep Learning
, Yueshen Xu
Latent Dirichlet Allocation
๏ฐ Latent Dirichlet Allocation (LDA)
๏ฎ David M. Blei, Andrew Y. Ng, Michael I. Jordan
๏ฎ Journal of Machine Learning Research๏ผŒ2003, cited > 3000
๏ฎ Hierarchical Bayesian model; Bayesian pLSI
6/11/2014 20 Middleware, CCNT, ZJU
ฮธ z w
N
M
ฮฑ
ฮฒ
Iterative times
Generative Process of a document d in a
corpus according to LDA
๏ƒ˜ Choose N ~ Poisson(๐œ‰); ๏ƒ  Why?
๏ƒ˜ For each document d={๐‘ค1, ๐‘ค2 โ€ฆ ๐‘ค ๐‘›}
Choose ๐œƒ ~๐ท๐‘–๐‘Ÿ(๐›ผ); ๏ƒ  Why?
๏ƒ˜ For each of the N words ๐‘ค ๐‘› in d:
a) Choose a topic ๐‘ง ๐‘›~๐‘€๐‘ข๐‘™๐‘ก๐‘–๐‘›๐‘œ๐‘š๐‘–๐‘›๐‘Ž๐‘™ ๐œƒ
๏ƒ Why?
b) Choose a word ๐‘ค ๐‘› from ๐‘ ๐‘ค ๐‘› ๐‘ง ๐‘›, ๐›ฝ ,
a multinomial probability conditioned on ๐‘ง ๐‘›
๏ƒ Why
ACM-Infosys
Awards
, Yueshen Xu
Latent Dirichlet Allocation
๏ฐ LDA(Cont.)
6/11/2014 21 Middleware, CCNT, ZJU
ฮธ z w
N
Mฮฑ
๐œ‘
ฮฒ
K
ฮฒ
Generative Process of a document d in LDA
๏ƒ˜ Choose N ~ Poisson(๐œ‰); ๏ƒ  Not important
๏ƒ˜ For each document d={๐‘ค1, ๐‘ค2 โ€ฆ ๐‘ค ๐‘›}
Choose ๐œƒ ~๐ท๐‘–๐‘Ÿ(๐›ผ);๐œƒ = ๐œƒ1, ๐œƒ2 โ€ฆ ๐œƒ ๐พ , ๐œƒ = ๐พ ,
K is fixed, 1
๐พ
๐œƒ = 1, ๐ท๐‘–๐‘Ÿ~๐‘€๐‘ข๐‘™๐‘ก๐‘– โ†’๐ถ๐‘œ๐‘›๐‘—๐‘ข๐‘”๐‘Ž๐‘ก๐‘’
๐‘ƒ๐‘Ÿ๐‘–๐‘œ๐‘Ÿ
๏ƒ˜ For each of the N words ๐‘ค ๐‘› in d:
a) Choose a topic ๐‘ง ๐‘›~๐‘€๐‘ข๐‘™๐‘ก๐‘–๐‘›๐‘œ๐‘š๐‘–๐‘›๐‘Ž๐‘™ ๐œƒ
b) Choose a word ๐‘ค ๐‘› from ๐‘ ๐‘ค ๐‘› ๐‘ง ๐‘›, ๐›ฝ ,
a multinomial probability conditioned on
๐‘ง ๐‘›๏ƒ  one word ๏ƒŸ๏ƒ  one topic
one document ๏ƒŸ๏ƒ  multi-topics
๐œƒ = ๐œƒ1, ๐œƒ2 โ€ฆ ๐œƒ ๐พ
z= ๐‘ง1, ๐‘ง2 โ€ฆ ๐‘ง ๐พ
For each word ๐‘ค ๐‘›there is a ๐‘ง ๐‘› ๏ƒŸ ๏ƒ 
pLSA: the number of p(z|d) is linear
to the number of documents ๏ƒ 
overfitting
Regularization
M+K Dirichlet-Multinomial
, Yueshen Xu
Latent Dirichlet Allocation
6/11/2014 22 Middleware, CCNT, ZJU, Yueshen Xu
Conjugate Prior &
Distributions
๏ฐ Conjugate Prior:
๏ฎ If the posterior p(ฮธ|x) are in the same family as the p(ฮธ), the prior
and posterior are called conjugate distributions, and the prior is
called a conjugate prior of the likelihood p(x|ฮธ) : p(ฮธ|x) โˆ p(x|ฮธ)p(ฮธ)
๏ฐ Distributions
๏ฎ Binomial Distribution โ†โ†’ Beta Distribution
๏ฎ Multinomial Distribution โ†โ†’ Dirichlet Distribution
๏ฐ Binomial & Beta Distribution
๏ฎ Binomial๏ƒ  Bin(m|N,ฮธ)=C(m,N)ฮธm(1-ฮธ)N-m :likelihood
๏ฎ C(m,N)=N!/(N-m)!m!
๏ฎ Beta(ฮธ|a,b) ๏ƒ 
6/11/2014 23 Middleware, CCNT, ZJU
11-
)1(
)()(
)( ๏€ญ
๏€ญ
๏‡๏‡
๏€ซ๏‡ ba
ba
ba
๏ฑ๏ฑ ๏ƒฒ
๏‚ฅ
๏€ญ๏€ญ
๏€ฝ๏‡
0
1
)( dteta ta
Why do prior and
posterior need to be
conjugate distributions?
, Yueshen Xu
Conjugate Prior &
Distributions
6/11/2014 24 Middleware, CCNT, ZJU
11-
)1(
)()(
)(
)1(),(),,,|(
๏€ญ
๏€ญ
๏‡๏‡
๏€ซ๏‡
๏‚ด
๏€ญ๏€ซ๏‚ต
ba
lm
ba
ba
lmmCbalmp
๏ฑ๏ฑ
๏ฑ๏ฑ๏ฑ
11-
)1(
)()(
)(
),,,|( ๏€ญ๏€ซ๏€ซ
๏€ญ
๏€ซ๏‡๏€ซ๏‡
๏€ซ๏€ซ๏€ซ๏‡
๏€ฝ blam
blam
blam
balmp ๏ฑ๏ฑ๏ฑ
Beta Distribution!
Parameter Estimation
๏ฐ Multinomial & Dirichlet Distribution
๏ฎ x/ ๐‘ฅ is a multivariate, ex, ๐‘ฅ = (0,0,1,0,0,0): event of ๐‘ฅ3 happens
๏ฎ The probabilistic distribution of ๐‘ฅ in only one event : ๐‘ ๐‘ฅ ๐œƒ
= ๐‘˜=1
๐พ
๐œƒ ๐‘˜
๐‘ฅ ๐‘˜
, ๐œƒ = (๐œƒ1, ๐œƒ2 โ€ฆ , ๐œƒ ๐‘˜)
, Yueshen Xu
Conjugate Prior &
Distributions
๏ฐ Multinomial & Dirichlet Distribution (Cont.)
๏ฎ Mult(๐‘š1, ๐‘š2, โ€ฆ , ๐‘š ๐พ|๐œฝ, ๐‘)=
๐‘!
๐‘š1!๐‘š2!โ€ฆ๐‘š ๐พ!
๐ถ ๐‘
๐‘š1
๐ถ ๐‘โˆ’๐‘š1
๐‘š2
๐ถ ๐‘โˆ’๐‘š1โˆ’๐‘š2
๐‘š3
โ€ฆ
๐ถ ๐‘โˆ’ ๐‘˜=1
๐พโˆ’1
๐‘š ๐‘˜
๐‘š ๐พ
๐‘˜=1
๐พ
๐œƒ ๐‘˜
๐‘ฅ ๐‘˜
: the likelihood function of ๐œƒ
6/11/2014 25 Middleware, CCNT, ZJU
Mult: The exact probabilistic distribution of ๐‘ ๐‘ง ๐‘˜ ๐‘‘๐‘— and ๐‘ ๐‘ค๐‘— ๐‘ง ๐‘˜
In Bayesian theory, we need to find a conjugate prior of ๐œƒ for
Mult, where 0 < ๐œƒ < 1, ๐‘˜=1
๐พ
๐œƒ ๐‘˜ = 1
Dirichlet Distribution
๐ท๐‘–๐‘Ÿ ๐œƒ ๐œถ =
ฮ“(๐›ผ0)
ฮ“ ๐›ผ1 โ€ฆ ฮ“ ๐›ผ ๐พ
๐‘˜=1
๐พ
๐œƒ ๐‘˜
๐›ผ ๐‘˜โˆ’1
a vector
Hyper-parameter: parameter in
probabilistic distribution function (pdf)
, Yueshen Xu
Conjugate Prior &
Distributions
๏ฐ Multinomial & Dirichlet Distribution (Cont.)
๏ฎ ๐‘ ๐œƒ ๐’Ž, ๐œถ โˆ ๐‘ ๐’Ž ๐œƒ ๐‘(๐œƒ|๐œถ) โˆ ๐‘˜=1
๐พ
๐œƒ ๐‘˜
๐›ผ ๐‘˜+๐‘š ๐‘˜โˆ’1
6/11/2014 26 Middleware, CCNT, ZJU
Dirichlet?
๐‘ ๐œƒ ๐’Ž, ๐œถ =๐ท๐‘–๐‘Ÿ ๐œƒ ๐’Ž + ๐œถ =
ฮ“(๐›ผ0+๐‘)
ฮ“ ๐›ผ1+๐‘š1 โ€ฆฮ“ ๐›ผ ๐พ+๐‘š ๐พ
๐‘˜=1
๐พ
๐œƒ ๐‘˜
๐›ผ ๐‘˜+๐‘š ๐‘˜โˆ’1
Why? ๏ƒ  Gamma ฮ“ is a mysterious function
Dirichlet!
๐‘~๐ต๐‘’๐‘ก๐‘Ž ๐‘ก ๐›ผ, ๐›ฝ ๏ƒ  ๐ธ ๐‘ = 0
1
๐‘ก ร—
ฮ“ ๐›ผ+๐›ฝ
ฮ“ ๐›ผ ฮ“ ๐›ฝ
๐‘ก ๐›ผโˆ’1(1 โˆ’ ๐‘ก) ๐›ฝโˆ’1 ๐‘‘๐‘ก =
๐›ผ
๐›ผ+๐›ฝ
๐‘~๐ท๐‘–๐‘Ÿ ๐œƒ ๐›ผ ๏ƒ  ๐ธ ๐‘ =
๐›ผ1
๐‘–=1
๐พ
๐›ผ ๐‘–
,
๐›ผ2
๐‘–=1
๐พ
๐›ผ ๐‘–
, โ€ฆ ,
๐›ผ ๐พ
๐‘–=1
๐พ
๐›ผ ๐‘–
, Yueshen Xu
Poisson Distribution
๏ฐ Why Poisson distribution?
๏ฎ The number of births per hour during a given day; the number of
particles emitted by a radioactive source in a given time; the number
of cases of a disease in different towns
๏ฎ For Bin(n,p), when n is large, and p is small ๏ƒ  p(X=k)โ‰ˆ
๐œ‰ ๐‘˜ ๐‘’โˆ’๐œ‰
๐‘˜!
, ๐œ‰ โ‰ˆ ๐‘›๐‘
๏ฎ ๐บ๐‘Ž๐‘š๐‘š๐‘Ž ๐‘ฅ ๐›ผ =
๐‘ฅ ๐›ผโˆ’1 ๐‘’โˆ’๐‘ฅ
ฮ“(๐›ผ)
๏ƒ ๐บ๐‘Ž๐‘š๐‘š๐‘Ž ๐‘ฅ ๐›ผ = ๐‘˜ + 1 =
๐‘ฅ ๐‘˜ ๐‘’โˆ’๐‘ฅ
๐‘˜!
(ฮ“ ๐‘˜ + 1 = ๐‘˜!)
(Poisson ๏ƒ  discrete; Gamma ๏ƒ  continuous)
6/11/2014 27 Middleware, CCNT, ZJU
๏ฐ Poisson Distribution
๏ฎ ๐‘ ๐‘˜|๐œ‰ =
๐œ‰ ๐‘˜ ๐‘’โˆ’๐œ‰
๐‘˜!
๏ฎ Many experimental situations occur in which we observe the
counts of events within a set unit of time, area, volume, length .etc
, Yueshen Xu
Solution for LDA
๏ฐ LDA(Cont.)
๏ฎ ๐›ผ, ๐›ฝ: corpus-level parameters
๏ฎ ๐œƒ: document-level variable
๏ฎ z, w:word-level variables
๏ฎ Conditionally independent hierarchical models
๏ฎ Parametric Bayes model
6/11/2014 28 Middleware, CCNT, ZJU
๏ƒท
๏ƒท
๏ƒท
๏ƒท
๏ƒท
๏ƒธ
๏ƒถ
๏ƒง
๏ƒง
๏ƒง
๏ƒง
๏ƒง
๏ƒจ
๏ƒฆ
knkk ppp
ppp
ppp
๏Œ
๏๏Œ๏Œ๏
๏Œ
๏Œ
21
n22221
n11211๐‘ง1
๐‘ง2
๐‘ง ๐พ
๐‘ค1
๐‘ง1 ๐‘ง2 ๐‘ง ๐‘›
๐‘ค2 ๐‘ค ๐‘›
p ๐œƒ, ๐’›, ๐’˜ ๐›ผ, ๐›ฝ = ๐‘(๐œƒ|๐›ผ)
๐‘›=1
๐‘
๐‘ ๐‘ง ๐‘› ๐œƒ ๐‘(๐‘ค ๐‘›|๐‘ง ๐‘›, ๐›ฝ)
Solving Process
(๐‘ ๐‘ง๐‘– ๐œฝ = ๐œƒ๐‘–)
p ๐’˜ ๐›ผ, ๐›ฝ = ๐‘(๐œƒ|๐›ผ)
๐‘›=1
๐‘
๐‘ง ๐‘›
๐‘ ๐‘ง ๐‘› ๐œƒ ๐‘(๐‘ค ๐‘›|๐‘ง ๐‘›, ๐›ฝ) ๐‘‘๐œƒ
multiple integral
p ๐‘ซ ๐›ผ, ๐›ฝ =
๐‘‘=1
๐‘€
๐‘(๐œƒ ๐‘‘|๐›ผ)
๐‘›=1
๐‘ ๐‘‘
๐‘ง ๐‘‘๐‘›
๐‘ ๐‘ง ๐‘‘๐‘› ๐œƒ ๐‘‘ ๐‘(๐‘ค ๐‘‘๐‘›|๐‘ง ๐‘‘๐‘›, ๐›ฝ) ๐‘‘๐œƒd
๐›ฝ
, Yueshen Xu
Solution for LDA
6/11/2014 29 Middleware, CCNT, ZJU
The most significant generative model in Machine Learning Community in the
recent ten years
๐‘ ๐’˜ ๐›ผ, ๐›ฝ =
ฮ“( ๐‘– ๐›ผ๐‘–)
๐‘– ฮ“(๐›ผ๐‘–)
๐‘–=1
๐‘˜
๐œƒ๐‘–
๐›ผ ๐‘–โˆ’1
๐‘›=1
๐‘
๐‘–=1
๐‘˜
๐‘—=1
๐‘‰
(๐œƒ๐‘– ๐›ฝ๐‘–๐‘—) ๐‘ค ๐‘›
๐‘—
๐‘‘๐œƒ
p ๐’˜ ๐›ผ, ๐›ฝ = ๐‘(๐œƒ|๐›ผ)
๐‘›=1
๐‘
๐‘ง ๐‘›
๐‘ ๐‘ง ๐‘› ๐œƒ ๐‘(๐‘ค ๐‘›|๐‘ง ๐‘›, ๐›ฝ) ๐‘‘๐œƒ
Rewrite in terms of
model parameters
๐›ผ = ๐›ผ1, ๐›ผ2, โ€ฆ ๐›ผ ๐พ ; ๐›ฝ โˆˆ ๐‘… ๐พร—๐‘‰:What we need to solve out
Variational Inference Gibbs Sampling
Deterministic Inference Stochastic Inference
Why variational inference?๏ƒ Simplify the dependency structure
Why sampling?๏ƒ  Approximate the
statistical properties of the population
with those of samplesโ€™
, Yueshen Xu
Variational Inference
๏ฐ Variational Inference (Inference through a variational
distribution), VI
๏ฎ VI aims to use an approximating distribution that has a simpler
dependency structure than that of the exact posterior distribution
6/11/2014 30 Middleware, CCNT, ZJU
๐‘ƒ(๐ป|๐ท) โ‰ˆ ๐‘„(๐ป)
true posterior distribution
variational distribution
Dissimilarity between
P and Q?
Kullback-Leibler
Divergence
๐พ๐ฟ(๐‘„| ๐‘ƒ = ๐‘„ ๐ป ๐‘™๐‘œ๐‘”
๐‘„ ๐ป ๐‘ƒ ๐ท
๐‘ƒ ๐ป, ๐ท
๐‘‘๐ป
= ๐‘„ ๐ป ๐‘™๐‘œ๐‘”
๐‘„ ๐ป
๐‘ƒ ๐ป, ๐ท
๐‘‘๐ป + ๐‘™๐‘œ๐‘”๐‘ƒ(๐ท)
๐ฟ
๐‘‘๐‘’๐‘“
๐‘„ ๐ป ๐‘™๐‘œ๐‘”๐‘ƒ ๐ป, ๐ท ๐‘‘๐ป โˆ’ ๐‘„ ๐ป ๐‘™๐‘œ๐‘”๐‘„ ๐ป ๐‘‘๐ป =< ๐‘™๐‘œ๐‘”๐‘ƒ(๐ป, ๐ท) >Q(H) +โ„ ๐‘„
Entropy of Q
, Yueshen Xu
Variational Inference
6/11/2014 31 Middleware, CCNT, ZJU
๐‘ƒ ๐ป ๐ท = ๐‘ ๐œƒ, ๐‘ง ๐’˜, ๐›ผ, ๐›ฝ , ๐‘„ ๐ป = ๐‘ž ๐œƒ, ๐‘ง ๐›พ, ๐œ™ = ๐‘ž ๐œƒ ๐›พ ๐‘ž ๐‘ง ๐œ™
= ๐‘ž(๐œƒ|๐›พ) ๐‘›=1
๐‘
๐‘ž(๐‘ง ๐‘›|๐œ™ ๐‘›)
๐›พโˆ—, ๐œ™โˆ— = arg min(๐ท(๐‘ž ๐œƒ, ๐‘ง ๐›พ, ๐œ™ ||๐‘ ๐œƒ, ๐‘ง ๐’˜, ๐›ผ, ๐›ฝ ))๏ผšbut we donโ€™t
know the exact analytical form of the above KL
log ๐‘ ๐‘ค ๐›ผ, ๐›ฝ = ๐‘™๐‘œ๐‘”
๐‘ง
๐‘ ๐œƒ, ๐‘ง, ๐‘ค ๐›ผ, ๐›ฝ ๐‘‘๐œƒ
= ๐‘™๐‘œ๐‘”
๐‘ง
๐‘ ๐œƒ, ๐‘ง, ๐‘ค ๐›ผ, ๐›ฝ ๐‘ž(๐œƒ, ๐‘ง)
๐‘ž(๐œƒ, ๐‘ง)
๐‘‘๐œƒ
โ‰ฅ
๐‘ง
๐‘ž ๐œƒ, ๐‘ง ๐‘™๐‘œ๐‘”
๐‘ ๐œƒ, ๐‘ง, ๐‘ค ๐›ผ, ๐›ฝ
๐‘ž(๐œƒ, ๐‘ง)
๐‘‘๐œƒ
= ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐œƒ, ๐‘ง, ๐‘ค ๐›ผ, ๐›ฝ โˆ’ ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ž ๐œƒ, ๐‘ง = ๐ฟ(๐›พ, ๐œ™; ๐›ผ, ๐›ฝ)
log ๐‘ ๐‘ค ๐›ผ, ๐›ฝ = ๐ฟ ๐›พ, ๐œ™; ๐›ผ, ๐›ฝ + KL ๏ƒ  minimize KL == maximize L
๐œƒ ,z: independent (approximately)
for facilitating computation
, Yueshen Xu
variational distribution
Variational Inference
6/11/2014 32 Middleware, CCNT, ZJU
๐ฟ ๐›พ, ๐œ™; ๐›ผ, ๐›ฝ = ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐œƒ ๐›ผ + ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐‘ง ๐œƒ + ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐‘ค ๐‘ง, ๐›ฝ โˆ’
๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ž ๐œƒ โˆ’ ๐ธ ๐‘ž[๐‘™๐‘œ๐‘”๐‘ž(๐‘ง)]
๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐œƒ ๐›ผ
=
๐‘–=1
๐พ
๐›ผ๐‘– โˆ’ 1 ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐œƒ๐‘– + ๐‘™๐‘œ๐‘”ฮ“
๐‘–=1
๐พ
๐›ผ๐‘– โˆ’
๐‘–=1
๐พ
๐‘™๐‘œ๐‘”ฮ“(๐›ผ๐‘–)
๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐œƒ๐‘– = ๐œ“ ๐›พ๐‘– โˆ’ ๐œ“(
๐‘—=1
๐พ
๐›พ๐‘—)
๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐‘ง ๐œƒ =
๐‘›=1
๐‘
๐‘–=1
๐พ
๐ธ ๐‘ž[๐‘ง๐‘›๐‘–] ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐œƒ๐‘– =
๐‘›=1
๐‘
๐‘–=1
๐พ
๐œ™ ๐‘›๐‘–(๐œ“ ๐›พ๐‘– โˆ’ ๐œ“(
๐‘—=1
๐พ
๐›พ๐‘—) )
๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐‘ค ๐‘ง, ๐›ฝ =
๐‘›=1
๐‘
๐‘–=1
๐พ
๐‘—=1
๐‘‰
๐ธ ๐‘ž[๐‘ง๐‘›๐‘–] ๐‘ค ๐‘›
๐‘—
๐‘™๐‘œ๐‘”๐›ฝ๐‘–๐‘— =
๐‘›=1
๐‘
๐‘–=1
๐พ
๐‘—=1
๐‘‰
๐œ™ ๐‘›๐‘– ๐‘ค ๐‘›
๐‘—
๐‘™๐‘œ๐‘”๐›ฝ๐‘–๐‘—
, Yueshen Xu
Variational Inference
6/11/2014 33 Middleware, CCNT, ZJU
๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ž ๐œƒ ๐›พ is much like ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐œƒ ๐›ผ
๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ž ๐‘ง ๐œ™ = ๐ธ ๐‘ž
๐‘›=1
๐‘
๐‘–=1
๐‘˜
๐‘ง ๐‘›๐‘– ๐‘™๐‘œ๐‘” ๐œ™ ๐‘›๐‘–
Maximize L with respect to ๐œ™ ๐‘›๐‘–:
๐ฟ ๐œ™ ๐‘›๐‘–
= ๐œ™ ๐‘›๐‘–(๐œ“ ๐›พ๐‘– โˆ’ ๐œ“( ๐‘—=1
๐พ
๐›พ๐‘—))+๐œ™ ๐‘›๐‘– ๐‘™๐‘œ๐‘”๐›ฝ๐‘–๐‘—-๐œ™ ๐‘›๐‘–log๐œ™ ๐‘›๐‘– + ๐œ†( ๐‘—=1
๐พ
๐œ™ ๐‘›๐‘– โˆ’ 1)
Lagrangian Multiplier
Taking derivatives with respect to ๐œ™ ๐‘›๐‘–:
๐œ•๐ฟ
๐œ•๐œ™ ๐‘›๐‘–
= (๐œ“ ๐›พ๐‘– โˆ’ ๐œ“( ๐‘—=1
๐พ
๐›พ๐‘—))+๐‘™๐‘œ๐‘”๐›ฝ๐‘–๐‘—-log๐œ™ ๐‘›๐‘– โˆ’ 1 + ๐œ†=0
๐œ™ ๐‘›๐‘– โˆ ๐›ฝ๐‘–๐‘—exp(๐œ“ ๐›พ๐‘– โˆ’ ๐œ“
๐‘—=1
๐พ
๐›พ๐‘— )
, Yueshen Xu
Variational Inference
๏ฐ You can refer to more in the original paper.๏Š๏Œ
๏ฐ Variational EM Algorithm
๏ฎ Aim: (๐›ผ
โˆ—
, ๐›ฝ
โˆ—
)=arg max ๐‘‘=1
๐‘€
๐‘ ๐’˜|๐›ผ, ๐›ฝ
๏ฎ Initialize ๐›ผ, ๐›ฝ
๏ฎ E-Step: compute ๐›ผ, ๐›ฝ through variational inference for likelihood
approximation
๏ฎ M-Step: Maximize the likelihood according to ๐›ผ, ๐›ฝ
๏ฎ End until convergence
6/11/2014 34 Middleware, CCNT, ZJU, Yueshen Xu
Markov Chain Monte Carlo
๏ฐ MCMC๏ƒ  Basic: Markov Chain (First-order) ๏ƒ  Stationary
Distribution ๏ƒ  Fundament of Gibbs Sampling
๏ฐ General: ๐‘ƒ ๐‘‹๐‘ก+๐‘› = ๐‘ฅ ๐‘‹1, ๐‘‹2, โ€ฆ ๐‘‹๐‘ก = ๐‘ƒ(๐‘‹๐‘ก+๐‘› = ๐‘ฅ|๐‘‹๐‘ก)
๏ฐ First-Order: ๐‘ƒ ๐‘‹๐‘ก+1 = ๐‘ฅ ๐‘‹1, ๐‘‹2, โ€ฆ ๐‘‹๐‘ก = ๐‘ƒ(๐‘‹๐‘ก+1 = ๐‘ฅ|๐‘‹๐‘ก)
๏ฐ One-step transition probabilistic matrix
6/11/2014 35 Middleware, CCNT, ZJU
๏ƒท
๏ƒท
๏ƒท
๏ƒท
๏ƒท
๏ƒธ
๏ƒถ
๏ƒง
๏ƒง
๏ƒง
๏ƒง
๏ƒง
๏ƒจ
๏ƒฆ
๏‚ฎ๏‚ฎ๏‚ฎ
๏‚ฎ๏‚ฎ๏‚ฎ
๏‚ฎ๏‚ฎ๏‚ฎ
๏€ฝ
|)||(|...)2|(|)1|(|
)12(p...)22(p)12(p
|)|1(...)21()11(p
SSpSpSp
Spp
P
๏๏๏๏
Xm
Xm+1
, Yueshen Xu
Markov Chain Monte Carlo
๏ฐ Markov Chain
๏ฎ Initialization probability: ๐œ‹0 = {๐œ‹0 1 , ๐œ‹0 2 , โ€ฆ , ๐œ‹0(|๐‘†|)}
๏ฎ ๐œ‹ ๐‘› = ๐œ‹ ๐‘›โˆ’1 ๐‘ƒ = ๐œ‹ ๐‘›โˆ’2 ๐‘ƒ2 = โ‹ฏ = ๐œ‹0 ๐‘ƒ ๐‘›: Chapman-Kolomogrov equation
๏ฎ Central-limit Theorem: Under the premise of connectivity of P, lim
๐‘›โ†’โˆž
๐‘ƒ๐‘–๐‘—
๐‘›
= ๐œ‹ ๐‘— ; ๐œ‹ ๐‘— = ๐‘–=1
|๐‘†|
๐œ‹ ๐‘– ๐‘ƒ๐‘–๐‘—
๏ฎ lim
๐‘›โ†’โˆž
๐œ‹0 ๐‘ƒ ๐‘› =
๐œ‹(1) โ€ฆ ๐œ‹(|๐‘†|)
โ‹ฎ โ‹ฎ โ‹ฎ
๐œ‹(1) ๐œ‹(|๐‘†|)
๏ƒ  ๐œ‹ = {๐œ‹ 1 , ๐œ‹ 2 , โ€ฆ , ๐œ‹ ๐‘— , โ€ฆ , ๐œ‹(|๐‘†|)}
6/11/2014 36 Middleware, CCNT, ZJU
Stationary Distribution
๐‘‹0~๐œ‹0 ๐‘ฅ โˆ’โ†’ ๐‘‹1~๐œ‹1 ๐‘ฅ โˆ’โ†’ โ‹ฏ โˆ’โ†’ ๐‘‹ ๐‘›~๐œ‹ ๐‘ฅ โˆ’โ†’ ๐‘‹ ๐‘›+1~๐œ‹ ๐‘ฅ โˆ’โ†’ ๐‘‹ ๐‘›+2~๐œ‹ ๐‘ฅ โˆ’โ†’
sample Convergence
Stationary Distribution
, Yueshen Xu
Markov Chain Monte Carlo
๏ฐ MCMC Sampling
๏ฎ We should construct the relationship between ๐œ‹(๐‘ฅ) and MC
transition process ๏ƒ  Detailed Balance Condition
๏ฎ In a common MC, if for ๐… ๐’™ , ๐‘ƒ ๐‘ก๐‘Ÿ๐‘Ž๐‘›๐‘ ๐‘–๐‘ก๐‘–๐‘œ๐‘› ๐‘š๐‘Ž๐‘ก๐‘Ÿ๐‘–๐‘ฅ , ๐œ‹ ๐‘– ๐‘ƒ๐‘–๐‘— = ๐œ‹(j)
๐‘ƒ๐‘—๐‘–, ๐‘“๐‘œ๐‘Ÿ ๐‘Ž๐‘™๐‘™ ๐‘–, ๐‘— ๏ƒ ๐œ‹(๐‘ฅ) is the stationary distribution of this MC
๏ฎ Prove: ๐‘–=1
โˆž
๐œ‹ ๐‘– ๐‘ƒ๐‘–๐‘— = ๐‘–=1
โˆž
๐œ‹ ๐‘— ๐‘ƒ๐‘—๐‘– = ๐œ‹ ๐‘— โˆ’โ†’ ๐œ‹๐‘ƒ = ๐œ‹๏ƒ ๐œ‹ is the
solution of the equation ๐œ‹๐‘ƒ = ๐œ‹ ๏ƒ  Done
๏ฎ For a common MC(q(i,j), q(j|i), q(i๏ƒ j)), and for any probabilistic
distribution p(x) (the dimension of x is arbitrary) ๏ƒ  Transformation
6/11/2014 37 Middleware, CCNT, ZJU
๐‘ ๐‘– ๐‘ž ๐‘–, ๐‘— ๐›ผ ๐‘–, ๐‘— = ๐‘ ๐‘— ๐‘ž(๐‘—, ๐‘–)๐›ผ(๐‘—, ๐‘–)
Qโ€™(i,j) Qโ€™(j,i)
๐›ผ ๐‘–, ๐‘— = ๐‘ ๐‘— ๐‘ž(๐‘—, ๐‘–),๐›ผ ๐‘—, ๐‘– = ๐‘ ๐‘– ๐‘ž(๐‘—, ๐‘–),
necessary condition
, Yueshen Xu
Markov Chain Monte Carlo
๏ฐ MCMC Sampling(cont.)
Step1: Initialize: ๐‘‹0 = ๐‘ฅ0
Step2: for t = 0, 1, 2, โ€ฆ
๐‘‹๐‘ก = ๐‘ฅ๐‘ก, ๐‘ ๐‘Ž๐‘š๐‘๐‘™๐‘’ ๐‘ฆ ๐‘“๐‘Ÿ๐‘œ๐‘š ๐‘ž(๐‘ฅ|๐‘ฅ๐‘ก) (๐‘ฆ โˆˆ ๐ท๐‘œ๐‘š๐‘Ž๐‘–๐‘› ๐‘œ๐‘“ ๐ท๐‘’๐‘“๐‘–๐‘›๐‘–๐‘ก๐‘–๐‘œ๐‘›)
sample u from Uniform[0,1]
If ๐‘ข < ๐›ผ ๐‘ฅ๐‘ก, ๐‘ฆ = ๐‘ ๐‘ฆ ๐‘ž ๐‘ฅ๐‘ก ๐‘ฆ โ‡’ ๐‘ฅ๐‘ก โ†’ ๐‘ฆ, ๏ƒ  Xt+1 = y
else Xt+1 = xt
6/11/2014 38 Middleware, CCNT, ZJU
๏ฐ Metropolis-Hastings Sampling
Step1: Initialize: ๐‘‹0 = ๐‘ฅ0
Step2: for t = 0, 1, 2, โ€ฆn, n+1, n+2โ€ฆ
๐‘‹๐‘ก = ๐‘ฅ๐‘ก, ๐‘ ๐‘Ž๐‘š๐‘๐‘™๐‘’ ๐‘ฆ ๐‘“๐‘Ÿ๐‘œ๐‘š ๐‘ž ๐‘ฅ ๐‘ฅ๐‘ก ๐‘ฆ โˆˆ ๐ท๐‘œ๐‘š๐‘Ž๐‘–๐‘› ๐‘œ๐‘“ ๐ท๐‘’๐‘“๐‘–๐‘›๐‘–๐‘ก๐‘–on
Burn-in Period
Convergence
, Yueshen Xu
Gibbs Sampling
sample u from Uniform[0,1]
If ๐‘ข < ๐›ผ ๐‘ฅ๐‘ก, ๐‘ฆ = ๐‘š๐‘–๐‘›{
๐‘ ๐‘ฆ ๐‘ž ๐‘ฅ๐‘ก ๐‘ฆ
๐‘ ๐‘ฅ๐‘ก
๐‘ž ๐‘ฆ ๐‘ฅ๐‘ก
, 1} โ‡’ ๐‘ฅ๐‘ก โ†’ ๐‘ฆ ,๏ƒ  Xt+1 = y
else Xt+1 = xt
6/11/2014 39 Middleware, CCNT, ZJU
Not suitable with regard to high dimensional variables
๏ฐ Gibbs Sampling(Two Dimensions,(x1,y1))
๏ฎ A(x1,y1), B(x1,y2) ๏ƒ  ๐‘ ๐‘ฅ1, ๐‘ฆ1 ๐‘ ๐‘ฆ2 ๐‘ฅ1 = ๐‘ ๐‘ฅ1 ๐‘ ๐‘ฆ1 ๐‘ฅ1 ๐‘(๐‘ฆ2|๐‘ฅ1)
๏ƒ  ๐‘ ๐‘ฅ1, ๐‘ฆ2 ๐‘ ๐‘ฆ1 ๐‘ฅ1 = ๐‘ ๐‘ฅ1 ๐‘ ๐‘ฆ2 ๐‘ฅ1 ๐‘(๐‘ฆ1|๐‘ฅ1)
๐‘ ๐‘ฅ1, ๐‘ฆ1 ๐‘ ๐‘ฆ2 ๐‘ฅ1 = ๐‘ ๐‘ฅ1, ๐‘ฆ2 ๐‘ ๐‘ฆ1 ๐‘ฅ1
๐‘ ๐ด ๐‘ ๐‘ฆ2 ๐‘ฅ1 = ๐‘ ๐ต ๐‘ ๐‘ฆ1 ๐‘ฅ1
A(x1,y1)
B(x1,y2)
C(x2,y1)
D
๐‘ ๐ด ๐‘ ๐‘ฅ2 ๐‘ฆ1 = ๐‘ ๐ถ ๐‘ ๐‘ฅ1 ๐‘ฆ1
, Yueshen Xu
Gibbs Sampling
๏ฐ Gibbs Sampling(Cont.)
๏ฎ We can construct the transition probabilistic matrix Q accordingly
๐‘„ ๐ด โ†’ ๐ต = ๐‘(๐‘ฆ ๐ต|๐‘ฅ1), if ๐‘ฅ ๐ด = ๐‘ฅ ๐ต = ๐‘ฅ1
๐‘„ ๐ด โ†’ ๐ถ = ๐‘(๐‘ฅ ๐ถ|๐‘ฆ1), if ๐‘ฆ ๐ด = ๐‘ฆ ๐ถ = ๐‘ฆ1
๐‘„ ๐ด โ†’ ๐ท = 0, else
6/11/2014 40 Middleware, CCNT, ZJU
A(x1,y1)
B(x1,y2)
C(x2,y1)
D
Detailed Balance Condition:
๐‘ ๐‘‹ ๐‘„ ๐‘‹ โ†’ ๐‘Œ = ๐‘ ๐‘Œ ๐‘„(๐‘Œ โ†’ ๐‘‹) โˆš
๏ฐ Gibbs Sampling(in two dimension)
Step1: Initialize: ๐‘‹0 = ๐‘ฅ0, ๐‘Œ0 = ๐‘ฆ0
Step2: for t = 0, 1, 2, โ€ฆ
1. ๐‘ฆ๐‘ก+1~๐‘ ๐‘ฆ ๐‘ฅ ๐‘ก ;
. 2. ๐‘ฅ๐‘ก+1~๐‘ ๐‘ฅ ๐‘ฆ๐‘ก+1
, Yueshen Xu
Gibbs Sampling
6/11/2014 41 Middleware, CCNT, ZJU
๏ฐ Gibbs Sampling(in two dimension)
Step1: Initialize: ๐‘‹0 = ๐‘ฅ0 = {๐‘ฅ1: ๐‘– = 1,2, โ€ฆ ๐‘›}
Step2: for t = 0, 1, 2, โ€ฆ
1. ๐‘ฅ1
(๐‘ก+1)
~๐‘ ๐‘ฅ1 ๐‘ฅ2
(๐‘ก)
, ๐‘ฅ3
(๐‘ก)
, โ€ฆ , ๐‘ฅ ๐‘›
(๐‘ก)
;
2. ๐‘ฅ2
๐‘ก+1
~๐‘ ๐‘ฅ2 ๐‘ฅ1
(๐‘ก+1)
, ๐‘ฅ3
(๐‘ก)
, โ€ฆ , ๐‘ฅ ๐‘›
(๐‘ก)
3. โ€ฆ
4. ๐‘ฅ๐‘—
๐‘ก+1
~๐‘ ๐‘ฅ๐‘— ๐‘ฅ1
(๐‘ก+1)
, ๐‘ฅ๐‘—โˆ’1
(๐‘ก+1)
, ๐‘ฅ๐‘—+1
(๐‘ก)
โ€ฆ , ๐‘ฅ ๐‘›
(๐‘ก)
5. โ€ฆ
6. ๐‘ฅ ๐‘›
๐‘ก+1~๐‘ ๐‘ฅ ๐‘› ๐‘ฅ1
(๐‘ก+1)
, ๐‘ฅ2
(๐‘ก+1)
, โ€ฆ , ๐‘ฅ ๐‘›โˆ’1
(๐‘ก+1)
t+1 t
, Yueshen Xu
Gibbs Sampling for LDA
๏ฐ Gibbs Sampling in LDA
๏ฎ Dir ๐‘ ๐›ผ =
1
ฮ”(๐›ผ) ๐‘˜=1
๐‘‰
๐‘ ๐‘˜
๐›ผ ๐‘˜โˆ’1
, ฮ”( ๐›ผ) is the normalization factor:
ฮ” ๐›ผ = ๐‘˜=1
๐‘‰
๐‘ ๐‘˜
๐›ผ ๐‘˜โˆ’1
๐‘‘ ๐‘
๐‘ ๐‘ง ๐‘š ๐›ผ = ๐‘ ๐‘ง ๐‘š ๐œƒ ๐‘ ๐œƒ ๐›ผ ๐‘‘ ๐‘ = ๐‘˜=1
๐‘‰
๐œƒ ๐‘˜
๐‘› ๐‘˜
Dir( ๐œƒ| ๐›ผ) ๐‘‘ ๐œƒ
= ๐‘˜=1
๐‘‰
๐œƒ ๐‘˜
๐‘› ๐‘˜ 1
ฮ”(๐›ผ) ๐‘˜=1
๐‘‰
๐œƒ ๐‘˜
๐›ผ ๐‘˜โˆ’1
๐‘‘ ๐œƒ
=
1
ฮ”(๐›ผ) ๐‘˜=1
๐‘‰
๐œƒ ๐‘˜
๐‘› ๐‘˜+๐›ผ ๐‘˜โˆ’1
๐‘‘ ๐œƒ =
ฮ”(๐‘› ๐‘š+๐›ผ)
ฮ”(๐›ผ)
6/11/2014 42 Middleware, CCNT, ZJU
๐‘ ๐’› ๐›ผ = ๐‘š=1
๐‘€
๐‘ ๐‘ง ๐‘š ๐›ผ = ๐‘š=1
๐‘€ ฮ”(๐‘› ๐‘š+๐›ผ)
ฮ”(๐›ผ)
โˆ’โ†’
๐‘ ๐’˜, ๐’› ๐›ผ, ๐›ฝ = ๐‘˜=1
๐พ ฮ”(๐‘› ๐‘˜+๐›ฝ)
ฮ”(๐›ฝ) ๐‘š=1
๐‘€ ฮ”(๐‘› ๐‘š+๐›ผ)
ฮ”(๐›ผ)
, Yueshen Xu
Gibbs Sampling for LDA
๏ฐ Gibbs Sampling in LDA
๏ฎ ๐‘ ๐œƒ ๐‘š ๐‘งยฌ๐‘–, ๐‘คยฌ๐‘– = ๐ท๐‘–๐‘Ÿ(๐œƒ ๐‘š|๐‘› ๐‘š,ยฌ๐‘– + ๐›ผ), ๐‘ ๐œ‘ ๐‘˜ ๐‘งยฌ๐‘–, ๐‘คยฌ๐‘– =
๐ท๐‘–๐‘Ÿ(๐œ‘ ๐‘˜|๐‘› ๐‘˜,ยฌ๐‘– + ๐›ฝ)
๐‘(๐‘ง๐‘– = ๐‘˜| ๐‘งยฌ๐‘–, ๐‘คยฌ๐‘–) โˆ ๐‘ ๐‘ง๐‘– = ๐‘˜, ๐‘ค๐‘– = ๐‘ก, ๐œƒ ๐‘š, ๐œ‘ ๐‘˜ ๐‘งยฌ๐‘–, ๐‘คยฌ๐‘– = ๐ธ ๐œƒ ๐‘š๐‘˜ โˆ™
๐ธ ๐œ‘ ๐‘˜๐‘ก = ๐œƒ ๐‘š๐‘˜ โˆ™ ๐œ‘ ๐‘˜๐‘ก
๐œƒ ๐‘š๐‘˜=
๐‘› ๐‘š,ยฌ๐‘–
(๐‘ก)
+๐›ผ ๐‘˜
๐‘˜=1
๐พ (๐‘› ๐‘š,ยฌ๐‘–
(๐‘˜)
+๐›ผ ๐‘˜)
, ๐œ‘ ๐‘˜๐‘ก=
๐‘› ๐‘˜,ยฌ๐‘–
(๐‘ก)
+๐›ฝ ๐‘˜
๐‘ก=1
๐‘‰ (๐‘› ๐‘˜,ยฌ๐‘–
(๐‘ก)
+๐›ฝ ๐‘˜)
๐‘(๐‘ง๐‘– = ๐‘˜| ๐‘งยฌ๐‘–, ๐‘ค) โˆ
๐‘› ๐‘š,ยฌ๐‘–
(๐‘ก)
+๐›ผ ๐‘˜
๐‘˜=1
๐พ
(๐‘› ๐‘š,ยฌ๐‘–
(๐‘˜)
+๐›ผ ๐‘˜)
ร—
๐‘› ๐‘˜,ยฌ๐‘–
(๐‘ก)
+๐›ฝ ๐‘˜
๐‘ก=1
๐‘‰ (๐‘› ๐‘˜,ยฌ๐‘–
(๐‘ก)
+๐›ฝ ๐‘˜)
๐‘ง๐‘–
(๐‘ก+1)
~ ๐‘(๐‘ง๐‘– = ๐‘˜| ๐‘งยฌ๐‘–, ๐‘ค), i=1โ€ฆK
6/11/2014 43 Middleware, CCNT, ZJU, Yueshen Xu
Q&A
6/11/2014 Middleware, CCNT, ZJU44, Yueshen Xu

More Related Content

What's hot

Topic model, LDA and all that
Topic model, LDA and all thatTopic model, LDA and all that
Topic model, LDA and all thatZhibo Xiao
ย 
Author Topic Model
Author Topic ModelAuthor Topic Model
Author Topic ModelFReeze FRancis
ย 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisNYC Predictive Analytics
ย 
ะกะตั€ะณะตะธฬ† ะšะพะปัŒั†ะพะฒ โ€”ะะ˜ะฃ ะ’ะจะญ โ€”ICBDA 2015
ะกะตั€ะณะตะธฬ† ะšะพะปัŒั†ะพะฒ โ€”ะะ˜ะฃ ะ’ะจะญ โ€”ICBDA 2015ะกะตั€ะณะตะธฬ† ะšะพะปัŒั†ะพะฒ โ€”ะะ˜ะฃ ะ’ะจะญ โ€”ICBDA 2015
ะกะตั€ะณะตะธฬ† ะšะพะปัŒั†ะพะฒ โ€”ะะ˜ะฃ ะ’ะจะญ โ€”ICBDA 2015rusbase
ย 
Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Daniele Di Mitri
ย 
Topic modeling using big data analytics
Topic modeling using big data analyticsTopic modeling using big data analytics
Topic modeling using big data analyticsFarheen Nilofer
ย 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for RetrievalBhaskar Mitra
ย 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
ย 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
ย 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalBhaskar Mitra
ย 
Canini09a
Canini09aCanini09a
Canini09aAjay Ohri
ย 
Topicmodels
TopicmodelsTopicmodels
TopicmodelsAjay Ohri
ย 
NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationEugene Nho
ย 
Duet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackDuet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackBhaskar Mitra
ย 
Topic models
Topic modelsTopic models
Topic modelsAjay Ohri
ย 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
ย 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksLeonardo Di Donato
ย 
The Duet model
The Duet modelThe Duet model
The Duet modelBhaskar Mitra
ย 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
ย 
Collaborative DL
Collaborative DLCollaborative DL
Collaborative DLDai-Hai Nguyen
ย 

What's hot (20)

Topic model, LDA and all that
Topic model, LDA and all thatTopic model, LDA and all that
Topic model, LDA and all that
ย 
Author Topic Model
Author Topic ModelAuthor Topic Model
Author Topic Model
ย 
Introduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic AnalysisIntroduction to Probabilistic Latent Semantic Analysis
Introduction to Probabilistic Latent Semantic Analysis
ย 
ะกะตั€ะณะตะธฬ† ะšะพะปัŒั†ะพะฒ โ€”ะะ˜ะฃ ะ’ะจะญ โ€”ICBDA 2015
ะกะตั€ะณะตะธฬ† ะšะพะปัŒั†ะพะฒ โ€”ะะ˜ะฃ ะ’ะจะญ โ€”ICBDA 2015ะกะตั€ะณะตะธฬ† ะšะพะปัŒั†ะพะฒ โ€”ะะ˜ะฃ ะ’ะจะญ โ€”ICBDA 2015
ะกะตั€ะณะตะธฬ† ะšะพะปัŒั†ะพะฒ โ€”ะะ˜ะฃ ะ’ะจะญ โ€”ICBDA 2015
ย 
Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation Lifelong Topic Modelling presentation
Lifelong Topic Modelling presentation
ย 
Topic modeling using big data analytics
Topic modeling using big data analyticsTopic modeling using big data analytics
Topic modeling using big data analytics
ย 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
ย 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
ย 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
ย 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information Retrieval
ย 
Canini09a
Canini09aCanini09a
Canini09a
ย 
Topicmodels
TopicmodelsTopicmodels
Topicmodels
ย 
NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic Classification
ย 
Duet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackDuet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning Track
ย 
Topic models
Topic modelsTopic models
Topic models
ย 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
ย 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
ย 
The Duet model
The Duet modelThe Duet model
The Duet model
ย 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
ย 
Collaborative DL
Collaborative DLCollaborative DL
Collaborative DL
ย 

Similar to Topic model an introduction

Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learningtelss09
ย 
The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learningfridolin.wild
ย 
LSA and PLSA
LSA and PLSALSA and PLSA
LSA and PLSAYu Ting Chen
ย 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMsDaniel Perez
ย 
Tree-based Translation Models (ใ€ŽๆฉŸๆขฐ็ฟป่จณใ€ยง6.2-6.3)
Tree-based Translation Models (ใ€ŽๆฉŸๆขฐ็ฟป่จณใ€ยง6.2-6.3)Tree-based Translation Models (ใ€ŽๆฉŸๆขฐ็ฟป่จณใ€ยง6.2-6.3)
Tree-based Translation Models (ใ€ŽๆฉŸๆขฐ็ฟป่จณใ€ยง6.2-6.3)Yusuke Oda
ย 
Learning for semantic parsing using statistical syntactic parsing techniques
Learning for semantic parsing using statistical syntactic parsing techniquesLearning for semantic parsing using statistical syntactic parsing techniques
Learning for semantic parsing using statistical syntactic parsing techniquesUKM university
ย 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...Universitat Politรจcnica de Catalunya
ย 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligencevini89
ย 
Lecture14 xing fei-fei
Lecture14 xing fei-feiLecture14 xing fei-fei
Lecture14 xing fei-feiTianlu Wang
ย 
(Hierarchical) Topic Modeling_Yueshen Xu
(Hierarchical) Topic Modeling_Yueshen Xu(Hierarchical) Topic Modeling_Yueshen Xu
(Hierarchical) Topic Modeling_Yueshen XuYueshen Xu
ย 
Non parametric bayesian learning in discrete data
Non parametric bayesian learning in discrete dataNon parametric bayesian learning in discrete data
Non parametric bayesian learning in discrete dataYueshen Xu
ย 
Detecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersDetecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersFeynman Liang
ย 
Learning to summarize using coherence
Learning to summarize using coherenceLearning to summarize using coherence
Learning to summarize using coherenceContent Savvy
ย 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...mathsjournal
ย 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...mathsjournal
ย 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...mathsjournal
ย 
An Approach to Automated Learning of Conceptual Graphs from Text
An Approach to Automated Learning of Conceptual Graphs from TextAn Approach to Automated Learning of Conceptual Graphs from Text
An Approach to Automated Learning of Conceptual Graphs from TextFulvio Rotella
ย 
Text prediction based on Recurrent Neural Network Language Model
Text prediction based on Recurrent Neural Network Language ModelText prediction based on Recurrent Neural Network Language Model
Text prediction based on Recurrent Neural Network Language ModelANIRUDHMALODE2
ย 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
ย 

Similar to Topic model an introduction (20)

Language Technology Enhanced Learning
Language Technology Enhanced LearningLanguage Technology Enhanced Learning
Language Technology Enhanced Learning
ย 
The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learning
ย 
LSA and PLSA
LSA and PLSALSA and PLSA
LSA and PLSA
ย 
Introduction to Tree-LSTMs
Introduction to Tree-LSTMsIntroduction to Tree-LSTMs
Introduction to Tree-LSTMs
ย 
Tree-based Translation Models (ใ€ŽๆฉŸๆขฐ็ฟป่จณใ€ยง6.2-6.3)
Tree-based Translation Models (ใ€ŽๆฉŸๆขฐ็ฟป่จณใ€ยง6.2-6.3)Tree-based Translation Models (ใ€ŽๆฉŸๆขฐ็ฟป่จณใ€ยง6.2-6.3)
Tree-based Translation Models (ใ€ŽๆฉŸๆขฐ็ฟป่จณใ€ยง6.2-6.3)
ย 
Learning for semantic parsing using statistical syntactic parsing techniques
Learning for semantic parsing using statistical syntactic parsing techniquesLearning for semantic parsing using statistical syntactic parsing techniques
Learning for semantic parsing using statistical syntactic parsing techniques
ย 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
ย 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
ย 
Lecture14 xing fei-fei
Lecture14 xing fei-feiLecture14 xing fei-fei
Lecture14 xing fei-fei
ย 
50120140503004
5012014050300450120140503004
50120140503004
ย 
(Hierarchical) Topic Modeling_Yueshen Xu
(Hierarchical) Topic Modeling_Yueshen Xu(Hierarchical) Topic Modeling_Yueshen Xu
(Hierarchical) Topic Modeling_Yueshen Xu
ย 
Non parametric bayesian learning in discrete data
Non parametric bayesian learning in discrete dataNon parametric bayesian learning in discrete data
Non parametric bayesian learning in discrete data
ย 
Detecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersDetecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencoders
ย 
Learning to summarize using coherence
Learning to summarize using coherenceLearning to summarize using coherence
Learning to summarize using coherence
ย 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
ย 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
ย 
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
OPTIMIZING SIMILARITY THRESHOLD FOR ABSTRACT SIMILARITY METRIC IN SPEECH DIAR...
ย 
An Approach to Automated Learning of Conceptual Graphs from Text
An Approach to Automated Learning of Conceptual Graphs from TextAn Approach to Automated Learning of Conceptual Graphs from Text
An Approach to Automated Learning of Conceptual Graphs from Text
ย 
Text prediction based on Recurrent Neural Network Language Model
Text prediction based on Recurrent Neural Network Language ModelText prediction based on Recurrent Neural Network Language Model
Text prediction based on Recurrent Neural Network Language Model
ย 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
ย 

More from Yueshen Xu

Context aware service recommendation
Context aware service recommendationContext aware service recommendation
Context aware service recommendationYueshen Xu
ย 
Course review for ir class ๆœฌ็ง‘่ฏพไปถ
Course review for ir class ๆœฌ็ง‘่ฏพไปถCourse review for ir class ๆœฌ็ง‘่ฏพไปถ
Course review for ir class ๆœฌ็ง‘่ฏพไปถYueshen Xu
ย 
Semantic web ๆœฌ็ง‘่ฏพไปถ
Semantic web ๆœฌ็ง‘่ฏพไปถSemantic web ๆœฌ็ง‘่ฏพไปถ
Semantic web ๆœฌ็ง‘่ฏพไปถYueshen Xu
ย 
Recommender system slides for undergraduate
Recommender system slides for undergraduateRecommender system slides for undergraduate
Recommender system slides for undergraduateYueshen Xu
ย 
ๆŽจ่็ณป็ปŸ ๆœฌ็ง‘่ฏพไปถ
 ๆŽจ่็ณป็ปŸ ๆœฌ็ง‘่ฏพไปถ ๆŽจ่็ณป็ปŸ ๆœฌ็ง‘่ฏพไปถ
ๆŽจ่็ณป็ปŸ ๆœฌ็ง‘่ฏพไปถYueshen Xu
ย 
Text classification ๆœฌ็ง‘่ฏพไปถ
Text classification ๆœฌ็ง‘่ฏพไปถText classification ๆœฌ็ง‘่ฏพไปถ
Text classification ๆœฌ็ง‘่ฏพไปถYueshen Xu
ย 
Thinking in clustering yueshen xu
Thinking in clustering yueshen xuThinking in clustering yueshen xu
Thinking in clustering yueshen xuYueshen Xu
ย 
Text clustering (information retrieval, in chinese)
Text clustering (information retrieval, in chinese)Text clustering (information retrieval, in chinese)
Text clustering (information retrieval, in chinese)Yueshen Xu
ย 
(Hierarchical) topic modeling
(Hierarchical) topic modeling (Hierarchical) topic modeling
(Hierarchical) topic modeling Yueshen Xu
ย 
่š็ฑป (Clustering)
่š็ฑป (Clustering)่š็ฑป (Clustering)
่š็ฑป (Clustering)Yueshen Xu
ย 
Yueshen xu cv
Yueshen xu cvYueshen xu cv
Yueshen xu cvYueshen Xu
ย 
ๅพๆ‚ฆ็”ก็ฎ€ๅŽ†
ๅพๆ‚ฆ็”ก็ฎ€ๅŽ†ๅพๆ‚ฆ็”ก็ฎ€ๅŽ†
ๅพๆ‚ฆ็”ก็ฎ€ๅŽ†Yueshen Xu
ย 
Learning to recommend with user generated content
Learning to recommend with user generated contentLearning to recommend with user generated content
Learning to recommend with user generated contentYueshen Xu
ย 
Social recommender system
Social recommender systemSocial recommender system
Social recommender systemYueshen Xu
ย 
Summary on the Conference of WISE 2013
Summary on the Conference of WISE 2013Summary on the Conference of WISE 2013
Summary on the Conference of WISE 2013Yueshen Xu
ย 
Acoustic modeling using deep belief networks
Acoustic modeling using deep belief networksAcoustic modeling using deep belief networks
Acoustic modeling using deep belief networksYueshen Xu
ย 
Summarization for dragon star program
Summarization for dragon  star programSummarization for dragon  star program
Summarization for dragon star programYueshen Xu
ย 
Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Yueshen Xu
ย 
Aggregation computation over distributed data streams
Aggregation computation over distributed data streamsAggregation computation over distributed data streams
Aggregation computation over distributed data streamsYueshen Xu
ย 
Analysis on tcp ip protocol stack
Analysis on tcp ip protocol stackAnalysis on tcp ip protocol stack
Analysis on tcp ip protocol stackYueshen Xu
ย 

More from Yueshen Xu (20)

Context aware service recommendation
Context aware service recommendationContext aware service recommendation
Context aware service recommendation
ย 
Course review for ir class ๆœฌ็ง‘่ฏพไปถ
Course review for ir class ๆœฌ็ง‘่ฏพไปถCourse review for ir class ๆœฌ็ง‘่ฏพไปถ
Course review for ir class ๆœฌ็ง‘่ฏพไปถ
ย 
Semantic web ๆœฌ็ง‘่ฏพไปถ
Semantic web ๆœฌ็ง‘่ฏพไปถSemantic web ๆœฌ็ง‘่ฏพไปถ
Semantic web ๆœฌ็ง‘่ฏพไปถ
ย 
Recommender system slides for undergraduate
Recommender system slides for undergraduateRecommender system slides for undergraduate
Recommender system slides for undergraduate
ย 
ๆŽจ่็ณป็ปŸ ๆœฌ็ง‘่ฏพไปถ
 ๆŽจ่็ณป็ปŸ ๆœฌ็ง‘่ฏพไปถ ๆŽจ่็ณป็ปŸ ๆœฌ็ง‘่ฏพไปถ
ๆŽจ่็ณป็ปŸ ๆœฌ็ง‘่ฏพไปถ
ย 
Text classification ๆœฌ็ง‘่ฏพไปถ
Text classification ๆœฌ็ง‘่ฏพไปถText classification ๆœฌ็ง‘่ฏพไปถ
Text classification ๆœฌ็ง‘่ฏพไปถ
ย 
Thinking in clustering yueshen xu
Thinking in clustering yueshen xuThinking in clustering yueshen xu
Thinking in clustering yueshen xu
ย 
Text clustering (information retrieval, in chinese)
Text clustering (information retrieval, in chinese)Text clustering (information retrieval, in chinese)
Text clustering (information retrieval, in chinese)
ย 
(Hierarchical) topic modeling
(Hierarchical) topic modeling (Hierarchical) topic modeling
(Hierarchical) topic modeling
ย 
่š็ฑป (Clustering)
่š็ฑป (Clustering)่š็ฑป (Clustering)
่š็ฑป (Clustering)
ย 
Yueshen xu cv
Yueshen xu cvYueshen xu cv
Yueshen xu cv
ย 
ๅพๆ‚ฆ็”ก็ฎ€ๅŽ†
ๅพๆ‚ฆ็”ก็ฎ€ๅŽ†ๅพๆ‚ฆ็”ก็ฎ€ๅŽ†
ๅพๆ‚ฆ็”ก็ฎ€ๅŽ†
ย 
Learning to recommend with user generated content
Learning to recommend with user generated contentLearning to recommend with user generated content
Learning to recommend with user generated content
ย 
Social recommender system
Social recommender systemSocial recommender system
Social recommender system
ย 
Summary on the Conference of WISE 2013
Summary on the Conference of WISE 2013Summary on the Conference of WISE 2013
Summary on the Conference of WISE 2013
ย 
Acoustic modeling using deep belief networks
Acoustic modeling using deep belief networksAcoustic modeling using deep belief networks
Acoustic modeling using deep belief networks
ย 
Summarization for dragon star program
Summarization for dragon  star programSummarization for dragon  star program
Summarization for dragon star program
ย 
Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)Aggregation computation over distributed data streams(the final version)
Aggregation computation over distributed data streams(the final version)
ย 
Aggregation computation over distributed data streams
Aggregation computation over distributed data streamsAggregation computation over distributed data streams
Aggregation computation over distributed data streams
ย 
Analysis on tcp ip protocol stack
Analysis on tcp ip protocol stackAnalysis on tcp ip protocol stack
Analysis on tcp ip protocol stack
ย 

Recently uploaded

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
ย 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
ย 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
ย 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
ย 
Junnasandra Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore...Junnasandra Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore...amitlee9823
ย 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
ย 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
ย 
Call Girls Bannerghatta Road Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Ser...amitlee9823
ย 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
ย 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
ย 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
ย 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
ย 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
ย 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
ย 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
ย 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
ย 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
ย 
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Callshivangimorya083
ย 

Recently uploaded (20)

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
ย 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
ย 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
ย 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
ย 
Junnasandra Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore...Junnasandra Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: ๐Ÿ“ 7737669865 ๐Ÿ“ High Profile Model Escorts | Bangalore...
ย 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
ย 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
ย 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
ย 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
ย 
Call Girls Bannerghatta Road Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call ๐Ÿ‘— 7737669865 ๐Ÿ‘— Top Class Call Girl Ser...
ย 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
ย 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
ย 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
ย 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
ย 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
ย 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
ย 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
ย 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
ย 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
ย 
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
ย 

Topic model an introduction

  • 1. Topic Model ๏ผˆโ‰ˆ ๐Ÿ ๐Ÿ Text Mining๏ผ‰ Yueshen Xu xyshzjucs@zju.edu.cn Middleware, CCNT, ZJU Middleware, CCNT, ZJU6/11/2014 Text Mining&NLP&ML 1, Yueshen Xu
  • 2. Outline ๏ฐ Basic Concepts ๏ฐ Application and Background ๏ฐ Famous Researchers ๏ฐ Language Model ๏ฐ Vector Space Model (VSM) ๏ฐ Term Frequency-Inverse Document Frequency (TF-IDF) ๏ฐ Latent Semantic Indexing (LSA) ๏ฐ Probabilistic Latent Semantic Indexing (pLSA) ๏ฐ Expectation-Maximization Algorithm (EM) & Maximum- Likelihood Estimation (MLE) 6/11/2014 2 Middleware, CCNT, ZJU, Yueshen Xu
  • 3. Outline ๏ฐ Latent Dirichlet Allocation (LDA) ๏ฐ Conjugate Prior ๏ฐ Possion Distribution ๏ฐ Variational Distribution and Variational Inference (VD &VI) ๏ฐ Markov Chain Monte Carlo (MCMC) ๏ฐ Metropolis-Hastings Sampling (MH) ๏ฐ Gibbs Sampling and GS for LDA ๏ฐ Bayesian Theory v.s. Probability Theory 6/11/2014 3 Middleware, CCNT, ZJU, Yueshen Xu
  • 4. Concepts ๏ฐ Latent Semantic Analysis ๏ฐ Topic Model ๏ฐ Text Mining ๏ฐ Natural Language Processing ๏ฐ Computational Linguistics ๏ฐ Information Retrieval ๏ฐ Dimension Reduction ๏ฐ Expectation-Maximization(EM) 6/11/2014 Middleware, CCNT, ZJU Information Retrieval Computational Linguistics Natural Language Processing LSA/Topic Model Text Mining LSA/Topic Model Data Mining Reduction Dimension Machine Learning EM 4 Machine Translation Aim:find the topic that a word or a document belongs to Latent Factor Model , Yueshen Xu
  • 5. Application ๏ฐ LFM has been a fundamental technique in modern search engine, recommender system, tag extraction, blog clustering, twitter topic mining, news (text) summarization, etc. ๏ฐ Search Engine ๏ฎ PageRank๏ƒ  How importantโ€ฆ.this web page? ๏ฎ LFM๏ƒ  How relevanceโ€ฆ.this web page? ๏ฎ LFM๏ƒ  How relevanceโ€ฆthe userโ€™s query vs. one document? ๏ฐ Recommender System ๏ฎ Opinion Extraction ๏ฎ Spam Detection ๏ฎ Tag Extraction 6/11/2014 5 Middleware, CCNT, ZJU ๏ฐ Text Summarization ๏ฎ Abstract Generation ๏ฎ Twitter Topic Mining Text: Steven Jobs had left us for about two yearsโ€ฆ..the appleโ€™s price will fall downโ€ฆ. , Yueshen Xu
  • 6. Famous Researcher 6/11/2014 6 Middleware, CCNT, ZJU David Blei, Princeton, LDA Chengxiang Zhai, UIUC, Presidential Early Career Award W. Bruce Croft, UMA Language Model Bing Liu, UIC Opinion Mining John D. Lafferty, CMU, CRF&IBM Thomas Hofmann Brown, pLSA Andrew McCallum, UMA, CRF&IBM Susan Dumais, Microsoft, LSI , Yueshen Xu
  • 7. Language Model ๏ฐ Unigram Language Model == Zero-order Markov Chain ๏ฐ Bigram Language Model == First-order Markov Chain ๏ฐ N-gram Language Model == (N-1)-order Markov Chain ๏ฐ Mixture-unigram Language Model 6/11/2014 Middleware, CCNT, ZJU ๏ƒ•๏ƒŽ ๏€ฝ sw i i MwpMwp )|()|( ๏ฒ Bag of Words(BoW) No order, no grammar, only multiplicity ๏ƒ•๏ƒŽ ๏€ญ๏€ฝ sw ii i MwwpMwp )|()|( ,1 ๏ฒ 8 w N M w N M z ๐‘ ๐’˜ = ๐‘ง ๐‘(๐‘ง) ๐‘›=1 ๐‘ ๐‘(๐‘ค ๐‘›|๐‘ง) , Yueshen Xu
  • 8. 9 Vector Space Model ๏ฐ A document is represented as a vector of identifier ๏ฐ Identifier ๏ฎ Boolean: 0, 1 ๏ฎ Term Count: How many timesโ€ฆ ๏ฎ Term Frequency: How frequentโ€ฆin this document ๏ฎ TF-IDF: How importantโ€ฆin the corpus ๏ƒ  most used ๏ฐ Relevance Ranking ๏ฐ First used in SMART(Gerard Salton, Cornell) 6/11/2014 Middleware, CCNT, ZJU ),,,( ),,,( 21 21 tqqq tjjjj wwwq wwwd ๏‹ ๏‹ ๏€ฝ ๏€ฝ Gerard Salton Award(SIGIR) qd qd j j ๏ƒ— ๏€ฝ๏ฑcos , Yueshen Xu
  • 9. TF-IDF ๏ฐ Mixture language model ๏ฎ Linear combination of a certain distribution(Gaussian) ๏ฎ Better Performance ๏ฐ TF: Term Frequency ๏ฐ IDF: Inversed Document Frequency ๏ฐ TF-IDF 6/11/2014 Middleware, CCNT, ZJU ๏ƒฅ ๏€ฝ k kj ij ij n n tf Term i, document j, count of i in j ) |}:{|1 log( dtDd N idf i i ๏ƒŽ๏ƒŽ๏€ซ ๏€ฝ N documents in the corpus iijjij idftfDdtidftf ๏‚ด๏€ฝ๏€ญ ),,( How important โ€ฆin this document How important โ€ฆin this corpus 10, Yueshen Xu
  • 10. Latent Semantic Indexing ๏ฐ Challenge ๏ฎ Compare document in the same concept space ๏ฎ Compare documents across languages ๏ฎ Synonymy, ex: buy - purchase, user - consumer ๏ฎ Polysemy, ex; book - book, draw - draw ๏ฐ Key Idea ๏ฎ Dimensionality reduction of word-document co-occurrence matrix ๏ฎ Construction of latent semantic space 6/11/2014 Middleware, CCNT, ZJU Defects of VSM Word Document Word DocumentConcept VSM LSI 11, Yueshen Xu Aspect Topic Latent Factor
  • 11. Singular Value Decomposition ๏ฐ LSI ~= SVD ๏ฎ U, V: orthogonal matrices ๏ฎ โˆ‘ :the diagonal matrix with the singular values of N 6/11/2014 Middleware, CCNT, ZJU12 T VUN ๏ƒฅ๏€ฝ U t * m Document Terms t * d m* m m* d N โˆ‘U V k < m || k <<mCount, Frequency, TF-IDF t * m Document Terms t * k k* k m* d U V๏ƒฅ N word: Exchangeability k < m || k <<m k , Yueshen Xu
  • 12. Singular Value Decomposition ๏ฐ The K-largest singular values ๏ฎ Distinguish the variance between words and documents to a greatest extent ๏ฐ Discarding the lowest dimensions ๏ฎ Reduce noise ๏ฐ Fill the matrix ๏ฎ Predict & Lower computational complexity ๏ฎ Enlarge the distinctiveness ๏ฐ Decomposition ๏ฎ Concept, semantic, topic (aspect) 6/11/2014 13 Middleware, CCNT, ZJU (Probabilistic) Matrix Factorization/ Factorization Model: Analytic solution of SVD Unsupervised Learning , Yueshen Xu
  • 13. Probabilistic Latent Semantic Indexing ๏ฐ pLSI Model 6/11/2014 14 Middleware, CCNT, ZJU w1 w2 wN z1 zK z2 d1 d2 dM โ€ฆ.. โ€ฆ.. โ€ฆ.. )(dp)|( dzp)|( zwp ๏ฐ Assumption ๏ฎ Pairs(d,w) are assumed to be generated independently ๏ฎ Conditioned on z, w is generated independently of d ๏ฎ Words in a document are exchangeable ๏ฎ Documents are exchangeable ๏ฎ Latent topics z are independent Generative Process/Model ๏ƒฅ๏ƒฅ ๏ƒŽ๏ƒŽ ๏€ฝ๏€ฝ๏€ฝ ZzZz zwpdzpdpdzwpdpdpdwpwdp )|()|()()|,()()()|(),( Multinomial Distribution Multinomial Distribution One layer of โ€˜Deep Neutral Networkโ€™ Global Local , Yueshen Xu
  • 14. Probabilistic Latent Semantic Indexing 6/11/2014 15 Middleware, CCNT, ZJU d z w N M ๏ƒฅ๏ƒŽ ๏€ฝ Zz zwpdzpdwp )|()|()|( ๏ƒฅ ๏ƒฅ๏ƒฅ ๏ƒŽ ๏ƒŽ๏ƒŽ ๏€ฝ ๏€ฝ๏€ฝ Zz ZzZz zpzdpzwp zdpzdwpzwdpdwp )()|()|( ),(),|(),,(),( d z w N M These are two ways to formulate pLSA, which are equivalent but lead to two different inference processes Equivalent in Bayes Rule Probabilistic Graph Model d:Exchangeability Directed Acyclic Graph (DAG) , Yueshen Xu
  • 15. Expectation-Maximization ๏ฐ EM is a general algorithm for maximum-likelihood estimation (MLE) where the data are โ€˜incompleteโ€™ or contains latent variables: pLSA, GMM, HMMโ€ฆ---Cross Domain๏Š ๏ฐ Deduction Process ๏ฎ ฮธ:parameter to be estimated; ฮธ0: initialize randomly; ฮธn: the current value; ฮธn+1: the next value 6/11/2014 16 Middleware, CCNT, ZJU )()(max1 nn LL ๏ฑ๏ฑ๏ฑ ๏ฑ ๏€ญ๏€ฝ๏€ซ ),|(log)( ๏ฑ๏ฑ XpL ๏€ฝ )|,(log)( ๏ฑ๏ฑ HXpLc ๏€ฝ Latent Variable ),|(log)(),|(log)|(log)|,(log)( ๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ XHpLXHpXpHXpLc ๏€ซ๏€ฝ๏€ซ๏€ฝ๏€ฝ ),|( ),|( log)()()()( ๏ฑ ๏ฑ ๏ฑ๏ฑ๏ฑ๏ฑ XHp XHp LLLL n n cc n ๏€ซ๏€ญ๏€ฝ๏€ญ , Yueshen Xu Objective:
  • 16. Expectation-Maximization 6/11/2014 17 Middleware, CCNT, ZJU ),|( ),|( log),|( ),|()(),|()()()( ๏ฑ ๏ฑ ๏ฑ ๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ XHp XHp XHp XHpLXHpLLL n H n H nn c H n c n ๏ƒฅ ๏ƒฅ๏ƒฅ ๏€ซ ๏€ญ๏€ฝ๏€ญ K-L divergence: non-negative Kullback-Leibler Divergence, or Relative Entropy ๏ƒฅ๏ƒฅ ๏€ญ๏€ซ๏‚ณ H nn c H nn c XHpLLXHpLL ),|()()(),|()()( ๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ Lower Bound ๏ƒฅ๏€ฝ๏€ฝ H n ccXHp n XHpLLEQ n ),|()()]([);( ),|( ๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ ๏ฑ Q-function E-step (expectation): Compute Q; M-step(maximization): Re-estimate ฮธ by maximizing Q Convergence How is EM used in pLSA? , Yueshen Xu
  • 17. EM in pLSA 6/11/2014 18 Middleware, CCNT, ZJU ๏ƒฅ๏ƒฅ๏ƒฅ ๏ƒฅ ๏ƒฅ๏ƒฅ ๏ƒฅ ๏€ฝ๏€ฝ ๏€ฝ ๏€ฝ ๏€ฝ ๏€ฝ ๏€ฝ ๏€ฝ ๏€ฝ๏€ฝ K k ikkjijk N i M j ji K k ikkj N i M j jiijk H n ccXHp n dzpzwpdwzpwdn dzpzwpwdndwzp XHpLLEQ n 11 1 1 1 1 ),|( ))|()|(log(),|(),( ))|()|(log(),(),|( ),|()()]([);( ๏ฑ๏ฑ๏ฑ๏ฑ๏ฑ ๏ฑ Posterior Random value in initialization Likelyhood function Constraints: 1. 2. 1)|( 1 ๏€ฝ๏ƒฅ๏€ฝ M j kj zwp 1)|( 1 ๏€ฝ๏ƒฅ๏€ฝ K k jk dzp Lagrange Multiplier ๏€จ ๏€ฉ ๏ƒฅ ๏ƒฅ๏ƒฅ ๏ƒฅ ๏€ฝ ๏€ฝ๏€ฝ ๏€ฝ ๏€ญ๏€ซ๏€ญ๏€ซ๏€ฝ M i K k iki K k M j kjkc dzpzwpLEH 1 11 1 ))|(1())|(1(][ ๏ฒ๏ด๏ฑ Partial derivative=0 independent variable independent variable ๏ƒฅ๏ƒฅ ๏ƒฅ ๏€ฝ ๏€ฝ ๏€ฝ ๏€ฝ M m N i imkim N i ijkij kj dwzpdwn dwzpdwn zwp 1 1 1 ),|(),( ),|(),( )|( )( ),|(),( )|( 1 i M j ijkij ik dn dwzpdwn dzp ๏ƒฅ๏€ฝ ๏€ฝ M-Step E-Step ๏ƒฅ ๏ƒฅ ๏€ฝ ๏€ฝ ๏€ฝ ๏€ฝ K l illj ikkj K l illji iikkj ijk dzpzwp dzpzwp dzpzwpdp dpdzpzwp dwzp 1 1 )|()|( )|()|( )|()|()( )()|()|( ),|( Associative Law & Distributive Law , Yueshen Xu ๐‘™๐‘œ๐‘” ๐‘(๐‘ค|๐‘‘) ๐‘›(๐‘‘,๐‘ค)
  • 18. Bayesian Theory v.s. Probability Theory ๏ฐ Bayesian Theory v.s. Probability Theory ๏ฎ Estimate ๐œƒ through posterior v.s. Estimate ๐œƒ through the maximization of likelihood ๏ฎ Bayesian theory ๏ƒ  prior v.s. Probability theory ๏ƒ  statistic ๏ฎ When the number of samples โ†’ โˆž, Bayesian theory == Probability theory ๏ฐ Parameter Estimation ๏ฎ ๐‘ ๐œƒ ๐ท โˆ ๐‘ ๐ท ๐œƒ ๐‘ ๐œƒ ๏ƒ  ๐‘ ๐œƒ ? ๏ƒ  Conjugate Prior ๏ƒ  likelihood is helpful, but its function is limited ๏ƒ  Otherwise? 6/11/2014 19 Middleware, CCNT, ZJU ๏ฐ Non-parametric Bayesian Methods (Complicated) ๏ฎ Kernel methods: I just know a little... ๏ฎ VSM ๏ƒ  CF ๏ƒ  MF ๏ƒ  pLSA ๏ƒ  LDA ๏ƒ  Non-parametric Bayesian๏ƒ  Deep Learning , Yueshen Xu
  • 19. Latent Dirichlet Allocation ๏ฐ Latent Dirichlet Allocation (LDA) ๏ฎ David M. Blei, Andrew Y. Ng, Michael I. Jordan ๏ฎ Journal of Machine Learning Research๏ผŒ2003, cited > 3000 ๏ฎ Hierarchical Bayesian model; Bayesian pLSI 6/11/2014 20 Middleware, CCNT, ZJU ฮธ z w N M ฮฑ ฮฒ Iterative times Generative Process of a document d in a corpus according to LDA ๏ƒ˜ Choose N ~ Poisson(๐œ‰); ๏ƒ  Why? ๏ƒ˜ For each document d={๐‘ค1, ๐‘ค2 โ€ฆ ๐‘ค ๐‘›} Choose ๐œƒ ~๐ท๐‘–๐‘Ÿ(๐›ผ); ๏ƒ  Why? ๏ƒ˜ For each of the N words ๐‘ค ๐‘› in d: a) Choose a topic ๐‘ง ๐‘›~๐‘€๐‘ข๐‘™๐‘ก๐‘–๐‘›๐‘œ๐‘š๐‘–๐‘›๐‘Ž๐‘™ ๐œƒ ๏ƒ Why? b) Choose a word ๐‘ค ๐‘› from ๐‘ ๐‘ค ๐‘› ๐‘ง ๐‘›, ๐›ฝ , a multinomial probability conditioned on ๐‘ง ๐‘› ๏ƒ Why ACM-Infosys Awards , Yueshen Xu
  • 20. Latent Dirichlet Allocation ๏ฐ LDA(Cont.) 6/11/2014 21 Middleware, CCNT, ZJU ฮธ z w N Mฮฑ ๐œ‘ ฮฒ K ฮฒ Generative Process of a document d in LDA ๏ƒ˜ Choose N ~ Poisson(๐œ‰); ๏ƒ  Not important ๏ƒ˜ For each document d={๐‘ค1, ๐‘ค2 โ€ฆ ๐‘ค ๐‘›} Choose ๐œƒ ~๐ท๐‘–๐‘Ÿ(๐›ผ);๐œƒ = ๐œƒ1, ๐œƒ2 โ€ฆ ๐œƒ ๐พ , ๐œƒ = ๐พ , K is fixed, 1 ๐พ ๐œƒ = 1, ๐ท๐‘–๐‘Ÿ~๐‘€๐‘ข๐‘™๐‘ก๐‘– โ†’๐ถ๐‘œ๐‘›๐‘—๐‘ข๐‘”๐‘Ž๐‘ก๐‘’ ๐‘ƒ๐‘Ÿ๐‘–๐‘œ๐‘Ÿ ๏ƒ˜ For each of the N words ๐‘ค ๐‘› in d: a) Choose a topic ๐‘ง ๐‘›~๐‘€๐‘ข๐‘™๐‘ก๐‘–๐‘›๐‘œ๐‘š๐‘–๐‘›๐‘Ž๐‘™ ๐œƒ b) Choose a word ๐‘ค ๐‘› from ๐‘ ๐‘ค ๐‘› ๐‘ง ๐‘›, ๐›ฝ , a multinomial probability conditioned on ๐‘ง ๐‘›๏ƒ  one word ๏ƒŸ๏ƒ  one topic one document ๏ƒŸ๏ƒ  multi-topics ๐œƒ = ๐œƒ1, ๐œƒ2 โ€ฆ ๐œƒ ๐พ z= ๐‘ง1, ๐‘ง2 โ€ฆ ๐‘ง ๐พ For each word ๐‘ค ๐‘›there is a ๐‘ง ๐‘› ๏ƒŸ ๏ƒ  pLSA: the number of p(z|d) is linear to the number of documents ๏ƒ  overfitting Regularization M+K Dirichlet-Multinomial , Yueshen Xu
  • 21. Latent Dirichlet Allocation 6/11/2014 22 Middleware, CCNT, ZJU, Yueshen Xu
  • 22. Conjugate Prior & Distributions ๏ฐ Conjugate Prior: ๏ฎ If the posterior p(ฮธ|x) are in the same family as the p(ฮธ), the prior and posterior are called conjugate distributions, and the prior is called a conjugate prior of the likelihood p(x|ฮธ) : p(ฮธ|x) โˆ p(x|ฮธ)p(ฮธ) ๏ฐ Distributions ๏ฎ Binomial Distribution โ†โ†’ Beta Distribution ๏ฎ Multinomial Distribution โ†โ†’ Dirichlet Distribution ๏ฐ Binomial & Beta Distribution ๏ฎ Binomial๏ƒ  Bin(m|N,ฮธ)=C(m,N)ฮธm(1-ฮธ)N-m :likelihood ๏ฎ C(m,N)=N!/(N-m)!m! ๏ฎ Beta(ฮธ|a,b) ๏ƒ  6/11/2014 23 Middleware, CCNT, ZJU 11- )1( )()( )( ๏€ญ ๏€ญ ๏‡๏‡ ๏€ซ๏‡ ba ba ba ๏ฑ๏ฑ ๏ƒฒ ๏‚ฅ ๏€ญ๏€ญ ๏€ฝ๏‡ 0 1 )( dteta ta Why do prior and posterior need to be conjugate distributions? , Yueshen Xu
  • 23. Conjugate Prior & Distributions 6/11/2014 24 Middleware, CCNT, ZJU 11- )1( )()( )( )1(),(),,,|( ๏€ญ ๏€ญ ๏‡๏‡ ๏€ซ๏‡ ๏‚ด ๏€ญ๏€ซ๏‚ต ba lm ba ba lmmCbalmp ๏ฑ๏ฑ ๏ฑ๏ฑ๏ฑ 11- )1( )()( )( ),,,|( ๏€ญ๏€ซ๏€ซ ๏€ญ ๏€ซ๏‡๏€ซ๏‡ ๏€ซ๏€ซ๏€ซ๏‡ ๏€ฝ blam blam blam balmp ๏ฑ๏ฑ๏ฑ Beta Distribution! Parameter Estimation ๏ฐ Multinomial & Dirichlet Distribution ๏ฎ x/ ๐‘ฅ is a multivariate, ex, ๐‘ฅ = (0,0,1,0,0,0): event of ๐‘ฅ3 happens ๏ฎ The probabilistic distribution of ๐‘ฅ in only one event : ๐‘ ๐‘ฅ ๐œƒ = ๐‘˜=1 ๐พ ๐œƒ ๐‘˜ ๐‘ฅ ๐‘˜ , ๐œƒ = (๐œƒ1, ๐œƒ2 โ€ฆ , ๐œƒ ๐‘˜) , Yueshen Xu
  • 24. Conjugate Prior & Distributions ๏ฐ Multinomial & Dirichlet Distribution (Cont.) ๏ฎ Mult(๐‘š1, ๐‘š2, โ€ฆ , ๐‘š ๐พ|๐œฝ, ๐‘)= ๐‘! ๐‘š1!๐‘š2!โ€ฆ๐‘š ๐พ! ๐ถ ๐‘ ๐‘š1 ๐ถ ๐‘โˆ’๐‘š1 ๐‘š2 ๐ถ ๐‘โˆ’๐‘š1โˆ’๐‘š2 ๐‘š3 โ€ฆ ๐ถ ๐‘โˆ’ ๐‘˜=1 ๐พโˆ’1 ๐‘š ๐‘˜ ๐‘š ๐พ ๐‘˜=1 ๐พ ๐œƒ ๐‘˜ ๐‘ฅ ๐‘˜ : the likelihood function of ๐œƒ 6/11/2014 25 Middleware, CCNT, ZJU Mult: The exact probabilistic distribution of ๐‘ ๐‘ง ๐‘˜ ๐‘‘๐‘— and ๐‘ ๐‘ค๐‘— ๐‘ง ๐‘˜ In Bayesian theory, we need to find a conjugate prior of ๐œƒ for Mult, where 0 < ๐œƒ < 1, ๐‘˜=1 ๐พ ๐œƒ ๐‘˜ = 1 Dirichlet Distribution ๐ท๐‘–๐‘Ÿ ๐œƒ ๐œถ = ฮ“(๐›ผ0) ฮ“ ๐›ผ1 โ€ฆ ฮ“ ๐›ผ ๐พ ๐‘˜=1 ๐พ ๐œƒ ๐‘˜ ๐›ผ ๐‘˜โˆ’1 a vector Hyper-parameter: parameter in probabilistic distribution function (pdf) , Yueshen Xu
  • 25. Conjugate Prior & Distributions ๏ฐ Multinomial & Dirichlet Distribution (Cont.) ๏ฎ ๐‘ ๐œƒ ๐’Ž, ๐œถ โˆ ๐‘ ๐’Ž ๐œƒ ๐‘(๐œƒ|๐œถ) โˆ ๐‘˜=1 ๐พ ๐œƒ ๐‘˜ ๐›ผ ๐‘˜+๐‘š ๐‘˜โˆ’1 6/11/2014 26 Middleware, CCNT, ZJU Dirichlet? ๐‘ ๐œƒ ๐’Ž, ๐œถ =๐ท๐‘–๐‘Ÿ ๐œƒ ๐’Ž + ๐œถ = ฮ“(๐›ผ0+๐‘) ฮ“ ๐›ผ1+๐‘š1 โ€ฆฮ“ ๐›ผ ๐พ+๐‘š ๐พ ๐‘˜=1 ๐พ ๐œƒ ๐‘˜ ๐›ผ ๐‘˜+๐‘š ๐‘˜โˆ’1 Why? ๏ƒ  Gamma ฮ“ is a mysterious function Dirichlet! ๐‘~๐ต๐‘’๐‘ก๐‘Ž ๐‘ก ๐›ผ, ๐›ฝ ๏ƒ  ๐ธ ๐‘ = 0 1 ๐‘ก ร— ฮ“ ๐›ผ+๐›ฝ ฮ“ ๐›ผ ฮ“ ๐›ฝ ๐‘ก ๐›ผโˆ’1(1 โˆ’ ๐‘ก) ๐›ฝโˆ’1 ๐‘‘๐‘ก = ๐›ผ ๐›ผ+๐›ฝ ๐‘~๐ท๐‘–๐‘Ÿ ๐œƒ ๐›ผ ๏ƒ  ๐ธ ๐‘ = ๐›ผ1 ๐‘–=1 ๐พ ๐›ผ ๐‘– , ๐›ผ2 ๐‘–=1 ๐พ ๐›ผ ๐‘– , โ€ฆ , ๐›ผ ๐พ ๐‘–=1 ๐พ ๐›ผ ๐‘– , Yueshen Xu
  • 26. Poisson Distribution ๏ฐ Why Poisson distribution? ๏ฎ The number of births per hour during a given day; the number of particles emitted by a radioactive source in a given time; the number of cases of a disease in different towns ๏ฎ For Bin(n,p), when n is large, and p is small ๏ƒ  p(X=k)โ‰ˆ ๐œ‰ ๐‘˜ ๐‘’โˆ’๐œ‰ ๐‘˜! , ๐œ‰ โ‰ˆ ๐‘›๐‘ ๏ฎ ๐บ๐‘Ž๐‘š๐‘š๐‘Ž ๐‘ฅ ๐›ผ = ๐‘ฅ ๐›ผโˆ’1 ๐‘’โˆ’๐‘ฅ ฮ“(๐›ผ) ๏ƒ ๐บ๐‘Ž๐‘š๐‘š๐‘Ž ๐‘ฅ ๐›ผ = ๐‘˜ + 1 = ๐‘ฅ ๐‘˜ ๐‘’โˆ’๐‘ฅ ๐‘˜! (ฮ“ ๐‘˜ + 1 = ๐‘˜!) (Poisson ๏ƒ  discrete; Gamma ๏ƒ  continuous) 6/11/2014 27 Middleware, CCNT, ZJU ๏ฐ Poisson Distribution ๏ฎ ๐‘ ๐‘˜|๐œ‰ = ๐œ‰ ๐‘˜ ๐‘’โˆ’๐œ‰ ๐‘˜! ๏ฎ Many experimental situations occur in which we observe the counts of events within a set unit of time, area, volume, length .etc , Yueshen Xu
  • 27. Solution for LDA ๏ฐ LDA(Cont.) ๏ฎ ๐›ผ, ๐›ฝ: corpus-level parameters ๏ฎ ๐œƒ: document-level variable ๏ฎ z, w:word-level variables ๏ฎ Conditionally independent hierarchical models ๏ฎ Parametric Bayes model 6/11/2014 28 Middleware, CCNT, ZJU ๏ƒท ๏ƒท ๏ƒท ๏ƒท ๏ƒท ๏ƒธ ๏ƒถ ๏ƒง ๏ƒง ๏ƒง ๏ƒง ๏ƒง ๏ƒจ ๏ƒฆ knkk ppp ppp ppp ๏Œ ๏๏Œ๏Œ๏ ๏Œ ๏Œ 21 n22221 n11211๐‘ง1 ๐‘ง2 ๐‘ง ๐พ ๐‘ค1 ๐‘ง1 ๐‘ง2 ๐‘ง ๐‘› ๐‘ค2 ๐‘ค ๐‘› p ๐œƒ, ๐’›, ๐’˜ ๐›ผ, ๐›ฝ = ๐‘(๐œƒ|๐›ผ) ๐‘›=1 ๐‘ ๐‘ ๐‘ง ๐‘› ๐œƒ ๐‘(๐‘ค ๐‘›|๐‘ง ๐‘›, ๐›ฝ) Solving Process (๐‘ ๐‘ง๐‘– ๐œฝ = ๐œƒ๐‘–) p ๐’˜ ๐›ผ, ๐›ฝ = ๐‘(๐œƒ|๐›ผ) ๐‘›=1 ๐‘ ๐‘ง ๐‘› ๐‘ ๐‘ง ๐‘› ๐œƒ ๐‘(๐‘ค ๐‘›|๐‘ง ๐‘›, ๐›ฝ) ๐‘‘๐œƒ multiple integral p ๐‘ซ ๐›ผ, ๐›ฝ = ๐‘‘=1 ๐‘€ ๐‘(๐œƒ ๐‘‘|๐›ผ) ๐‘›=1 ๐‘ ๐‘‘ ๐‘ง ๐‘‘๐‘› ๐‘ ๐‘ง ๐‘‘๐‘› ๐œƒ ๐‘‘ ๐‘(๐‘ค ๐‘‘๐‘›|๐‘ง ๐‘‘๐‘›, ๐›ฝ) ๐‘‘๐œƒd ๐›ฝ , Yueshen Xu
  • 28. Solution for LDA 6/11/2014 29 Middleware, CCNT, ZJU The most significant generative model in Machine Learning Community in the recent ten years ๐‘ ๐’˜ ๐›ผ, ๐›ฝ = ฮ“( ๐‘– ๐›ผ๐‘–) ๐‘– ฮ“(๐›ผ๐‘–) ๐‘–=1 ๐‘˜ ๐œƒ๐‘– ๐›ผ ๐‘–โˆ’1 ๐‘›=1 ๐‘ ๐‘–=1 ๐‘˜ ๐‘—=1 ๐‘‰ (๐œƒ๐‘– ๐›ฝ๐‘–๐‘—) ๐‘ค ๐‘› ๐‘— ๐‘‘๐œƒ p ๐’˜ ๐›ผ, ๐›ฝ = ๐‘(๐œƒ|๐›ผ) ๐‘›=1 ๐‘ ๐‘ง ๐‘› ๐‘ ๐‘ง ๐‘› ๐œƒ ๐‘(๐‘ค ๐‘›|๐‘ง ๐‘›, ๐›ฝ) ๐‘‘๐œƒ Rewrite in terms of model parameters ๐›ผ = ๐›ผ1, ๐›ผ2, โ€ฆ ๐›ผ ๐พ ; ๐›ฝ โˆˆ ๐‘… ๐พร—๐‘‰:What we need to solve out Variational Inference Gibbs Sampling Deterministic Inference Stochastic Inference Why variational inference?๏ƒ Simplify the dependency structure Why sampling?๏ƒ  Approximate the statistical properties of the population with those of samplesโ€™ , Yueshen Xu
  • 29. Variational Inference ๏ฐ Variational Inference (Inference through a variational distribution), VI ๏ฎ VI aims to use an approximating distribution that has a simpler dependency structure than that of the exact posterior distribution 6/11/2014 30 Middleware, CCNT, ZJU ๐‘ƒ(๐ป|๐ท) โ‰ˆ ๐‘„(๐ป) true posterior distribution variational distribution Dissimilarity between P and Q? Kullback-Leibler Divergence ๐พ๐ฟ(๐‘„| ๐‘ƒ = ๐‘„ ๐ป ๐‘™๐‘œ๐‘” ๐‘„ ๐ป ๐‘ƒ ๐ท ๐‘ƒ ๐ป, ๐ท ๐‘‘๐ป = ๐‘„ ๐ป ๐‘™๐‘œ๐‘” ๐‘„ ๐ป ๐‘ƒ ๐ป, ๐ท ๐‘‘๐ป + ๐‘™๐‘œ๐‘”๐‘ƒ(๐ท) ๐ฟ ๐‘‘๐‘’๐‘“ ๐‘„ ๐ป ๐‘™๐‘œ๐‘”๐‘ƒ ๐ป, ๐ท ๐‘‘๐ป โˆ’ ๐‘„ ๐ป ๐‘™๐‘œ๐‘”๐‘„ ๐ป ๐‘‘๐ป =< ๐‘™๐‘œ๐‘”๐‘ƒ(๐ป, ๐ท) >Q(H) +โ„ ๐‘„ Entropy of Q , Yueshen Xu
  • 30. Variational Inference 6/11/2014 31 Middleware, CCNT, ZJU ๐‘ƒ ๐ป ๐ท = ๐‘ ๐œƒ, ๐‘ง ๐’˜, ๐›ผ, ๐›ฝ , ๐‘„ ๐ป = ๐‘ž ๐œƒ, ๐‘ง ๐›พ, ๐œ™ = ๐‘ž ๐œƒ ๐›พ ๐‘ž ๐‘ง ๐œ™ = ๐‘ž(๐œƒ|๐›พ) ๐‘›=1 ๐‘ ๐‘ž(๐‘ง ๐‘›|๐œ™ ๐‘›) ๐›พโˆ—, ๐œ™โˆ— = arg min(๐ท(๐‘ž ๐œƒ, ๐‘ง ๐›พ, ๐œ™ ||๐‘ ๐œƒ, ๐‘ง ๐’˜, ๐›ผ, ๐›ฝ ))๏ผšbut we donโ€™t know the exact analytical form of the above KL log ๐‘ ๐‘ค ๐›ผ, ๐›ฝ = ๐‘™๐‘œ๐‘” ๐‘ง ๐‘ ๐œƒ, ๐‘ง, ๐‘ค ๐›ผ, ๐›ฝ ๐‘‘๐œƒ = ๐‘™๐‘œ๐‘” ๐‘ง ๐‘ ๐œƒ, ๐‘ง, ๐‘ค ๐›ผ, ๐›ฝ ๐‘ž(๐œƒ, ๐‘ง) ๐‘ž(๐œƒ, ๐‘ง) ๐‘‘๐œƒ โ‰ฅ ๐‘ง ๐‘ž ๐œƒ, ๐‘ง ๐‘™๐‘œ๐‘” ๐‘ ๐œƒ, ๐‘ง, ๐‘ค ๐›ผ, ๐›ฝ ๐‘ž(๐œƒ, ๐‘ง) ๐‘‘๐œƒ = ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐œƒ, ๐‘ง, ๐‘ค ๐›ผ, ๐›ฝ โˆ’ ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ž ๐œƒ, ๐‘ง = ๐ฟ(๐›พ, ๐œ™; ๐›ผ, ๐›ฝ) log ๐‘ ๐‘ค ๐›ผ, ๐›ฝ = ๐ฟ ๐›พ, ๐œ™; ๐›ผ, ๐›ฝ + KL ๏ƒ  minimize KL == maximize L ๐œƒ ,z: independent (approximately) for facilitating computation , Yueshen Xu variational distribution
  • 31. Variational Inference 6/11/2014 32 Middleware, CCNT, ZJU ๐ฟ ๐›พ, ๐œ™; ๐›ผ, ๐›ฝ = ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐œƒ ๐›ผ + ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐‘ง ๐œƒ + ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐‘ค ๐‘ง, ๐›ฝ โˆ’ ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ž ๐œƒ โˆ’ ๐ธ ๐‘ž[๐‘™๐‘œ๐‘”๐‘ž(๐‘ง)] ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐œƒ ๐›ผ = ๐‘–=1 ๐พ ๐›ผ๐‘– โˆ’ 1 ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐œƒ๐‘– + ๐‘™๐‘œ๐‘”ฮ“ ๐‘–=1 ๐พ ๐›ผ๐‘– โˆ’ ๐‘–=1 ๐พ ๐‘™๐‘œ๐‘”ฮ“(๐›ผ๐‘–) ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐œƒ๐‘– = ๐œ“ ๐›พ๐‘– โˆ’ ๐œ“( ๐‘—=1 ๐พ ๐›พ๐‘—) ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐‘ง ๐œƒ = ๐‘›=1 ๐‘ ๐‘–=1 ๐พ ๐ธ ๐‘ž[๐‘ง๐‘›๐‘–] ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐œƒ๐‘– = ๐‘›=1 ๐‘ ๐‘–=1 ๐พ ๐œ™ ๐‘›๐‘–(๐œ“ ๐›พ๐‘– โˆ’ ๐œ“( ๐‘—=1 ๐พ ๐›พ๐‘—) ) ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐‘ค ๐‘ง, ๐›ฝ = ๐‘›=1 ๐‘ ๐‘–=1 ๐พ ๐‘—=1 ๐‘‰ ๐ธ ๐‘ž[๐‘ง๐‘›๐‘–] ๐‘ค ๐‘› ๐‘— ๐‘™๐‘œ๐‘”๐›ฝ๐‘–๐‘— = ๐‘›=1 ๐‘ ๐‘–=1 ๐พ ๐‘—=1 ๐‘‰ ๐œ™ ๐‘›๐‘– ๐‘ค ๐‘› ๐‘— ๐‘™๐‘œ๐‘”๐›ฝ๐‘–๐‘— , Yueshen Xu
  • 32. Variational Inference 6/11/2014 33 Middleware, CCNT, ZJU ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ž ๐œƒ ๐›พ is much like ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ ๐œƒ ๐›ผ ๐ธ ๐‘ž ๐‘™๐‘œ๐‘”๐‘ž ๐‘ง ๐œ™ = ๐ธ ๐‘ž ๐‘›=1 ๐‘ ๐‘–=1 ๐‘˜ ๐‘ง ๐‘›๐‘– ๐‘™๐‘œ๐‘” ๐œ™ ๐‘›๐‘– Maximize L with respect to ๐œ™ ๐‘›๐‘–: ๐ฟ ๐œ™ ๐‘›๐‘– = ๐œ™ ๐‘›๐‘–(๐œ“ ๐›พ๐‘– โˆ’ ๐œ“( ๐‘—=1 ๐พ ๐›พ๐‘—))+๐œ™ ๐‘›๐‘– ๐‘™๐‘œ๐‘”๐›ฝ๐‘–๐‘—-๐œ™ ๐‘›๐‘–log๐œ™ ๐‘›๐‘– + ๐œ†( ๐‘—=1 ๐พ ๐œ™ ๐‘›๐‘– โˆ’ 1) Lagrangian Multiplier Taking derivatives with respect to ๐œ™ ๐‘›๐‘–: ๐œ•๐ฟ ๐œ•๐œ™ ๐‘›๐‘– = (๐œ“ ๐›พ๐‘– โˆ’ ๐œ“( ๐‘—=1 ๐พ ๐›พ๐‘—))+๐‘™๐‘œ๐‘”๐›ฝ๐‘–๐‘—-log๐œ™ ๐‘›๐‘– โˆ’ 1 + ๐œ†=0 ๐œ™ ๐‘›๐‘– โˆ ๐›ฝ๐‘–๐‘—exp(๐œ“ ๐›พ๐‘– โˆ’ ๐œ“ ๐‘—=1 ๐พ ๐›พ๐‘— ) , Yueshen Xu
  • 33. Variational Inference ๏ฐ You can refer to more in the original paper.๏Š๏Œ ๏ฐ Variational EM Algorithm ๏ฎ Aim: (๐›ผ โˆ— , ๐›ฝ โˆ— )=arg max ๐‘‘=1 ๐‘€ ๐‘ ๐’˜|๐›ผ, ๐›ฝ ๏ฎ Initialize ๐›ผ, ๐›ฝ ๏ฎ E-Step: compute ๐›ผ, ๐›ฝ through variational inference for likelihood approximation ๏ฎ M-Step: Maximize the likelihood according to ๐›ผ, ๐›ฝ ๏ฎ End until convergence 6/11/2014 34 Middleware, CCNT, ZJU, Yueshen Xu
  • 34. Markov Chain Monte Carlo ๏ฐ MCMC๏ƒ  Basic: Markov Chain (First-order) ๏ƒ  Stationary Distribution ๏ƒ  Fundament of Gibbs Sampling ๏ฐ General: ๐‘ƒ ๐‘‹๐‘ก+๐‘› = ๐‘ฅ ๐‘‹1, ๐‘‹2, โ€ฆ ๐‘‹๐‘ก = ๐‘ƒ(๐‘‹๐‘ก+๐‘› = ๐‘ฅ|๐‘‹๐‘ก) ๏ฐ First-Order: ๐‘ƒ ๐‘‹๐‘ก+1 = ๐‘ฅ ๐‘‹1, ๐‘‹2, โ€ฆ ๐‘‹๐‘ก = ๐‘ƒ(๐‘‹๐‘ก+1 = ๐‘ฅ|๐‘‹๐‘ก) ๏ฐ One-step transition probabilistic matrix 6/11/2014 35 Middleware, CCNT, ZJU ๏ƒท ๏ƒท ๏ƒท ๏ƒท ๏ƒท ๏ƒธ ๏ƒถ ๏ƒง ๏ƒง ๏ƒง ๏ƒง ๏ƒง ๏ƒจ ๏ƒฆ ๏‚ฎ๏‚ฎ๏‚ฎ ๏‚ฎ๏‚ฎ๏‚ฎ ๏‚ฎ๏‚ฎ๏‚ฎ ๏€ฝ |)||(|...)2|(|)1|(| )12(p...)22(p)12(p |)|1(...)21()11(p SSpSpSp Spp P ๏๏๏๏ Xm Xm+1 , Yueshen Xu
  • 35. Markov Chain Monte Carlo ๏ฐ Markov Chain ๏ฎ Initialization probability: ๐œ‹0 = {๐œ‹0 1 , ๐œ‹0 2 , โ€ฆ , ๐œ‹0(|๐‘†|)} ๏ฎ ๐œ‹ ๐‘› = ๐œ‹ ๐‘›โˆ’1 ๐‘ƒ = ๐œ‹ ๐‘›โˆ’2 ๐‘ƒ2 = โ‹ฏ = ๐œ‹0 ๐‘ƒ ๐‘›: Chapman-Kolomogrov equation ๏ฎ Central-limit Theorem: Under the premise of connectivity of P, lim ๐‘›โ†’โˆž ๐‘ƒ๐‘–๐‘— ๐‘› = ๐œ‹ ๐‘— ; ๐œ‹ ๐‘— = ๐‘–=1 |๐‘†| ๐œ‹ ๐‘– ๐‘ƒ๐‘–๐‘— ๏ฎ lim ๐‘›โ†’โˆž ๐œ‹0 ๐‘ƒ ๐‘› = ๐œ‹(1) โ€ฆ ๐œ‹(|๐‘†|) โ‹ฎ โ‹ฎ โ‹ฎ ๐œ‹(1) ๐œ‹(|๐‘†|) ๏ƒ  ๐œ‹ = {๐œ‹ 1 , ๐œ‹ 2 , โ€ฆ , ๐œ‹ ๐‘— , โ€ฆ , ๐œ‹(|๐‘†|)} 6/11/2014 36 Middleware, CCNT, ZJU Stationary Distribution ๐‘‹0~๐œ‹0 ๐‘ฅ โˆ’โ†’ ๐‘‹1~๐œ‹1 ๐‘ฅ โˆ’โ†’ โ‹ฏ โˆ’โ†’ ๐‘‹ ๐‘›~๐œ‹ ๐‘ฅ โˆ’โ†’ ๐‘‹ ๐‘›+1~๐œ‹ ๐‘ฅ โˆ’โ†’ ๐‘‹ ๐‘›+2~๐œ‹ ๐‘ฅ โˆ’โ†’ sample Convergence Stationary Distribution , Yueshen Xu
  • 36. Markov Chain Monte Carlo ๏ฐ MCMC Sampling ๏ฎ We should construct the relationship between ๐œ‹(๐‘ฅ) and MC transition process ๏ƒ  Detailed Balance Condition ๏ฎ In a common MC, if for ๐… ๐’™ , ๐‘ƒ ๐‘ก๐‘Ÿ๐‘Ž๐‘›๐‘ ๐‘–๐‘ก๐‘–๐‘œ๐‘› ๐‘š๐‘Ž๐‘ก๐‘Ÿ๐‘–๐‘ฅ , ๐œ‹ ๐‘– ๐‘ƒ๐‘–๐‘— = ๐œ‹(j) ๐‘ƒ๐‘—๐‘–, ๐‘“๐‘œ๐‘Ÿ ๐‘Ž๐‘™๐‘™ ๐‘–, ๐‘— ๏ƒ ๐œ‹(๐‘ฅ) is the stationary distribution of this MC ๏ฎ Prove: ๐‘–=1 โˆž ๐œ‹ ๐‘– ๐‘ƒ๐‘–๐‘— = ๐‘–=1 โˆž ๐œ‹ ๐‘— ๐‘ƒ๐‘—๐‘– = ๐œ‹ ๐‘— โˆ’โ†’ ๐œ‹๐‘ƒ = ๐œ‹๏ƒ ๐œ‹ is the solution of the equation ๐œ‹๐‘ƒ = ๐œ‹ ๏ƒ  Done ๏ฎ For a common MC(q(i,j), q(j|i), q(i๏ƒ j)), and for any probabilistic distribution p(x) (the dimension of x is arbitrary) ๏ƒ  Transformation 6/11/2014 37 Middleware, CCNT, ZJU ๐‘ ๐‘– ๐‘ž ๐‘–, ๐‘— ๐›ผ ๐‘–, ๐‘— = ๐‘ ๐‘— ๐‘ž(๐‘—, ๐‘–)๐›ผ(๐‘—, ๐‘–) Qโ€™(i,j) Qโ€™(j,i) ๐›ผ ๐‘–, ๐‘— = ๐‘ ๐‘— ๐‘ž(๐‘—, ๐‘–),๐›ผ ๐‘—, ๐‘– = ๐‘ ๐‘– ๐‘ž(๐‘—, ๐‘–), necessary condition , Yueshen Xu
  • 37. Markov Chain Monte Carlo ๏ฐ MCMC Sampling(cont.) Step1: Initialize: ๐‘‹0 = ๐‘ฅ0 Step2: for t = 0, 1, 2, โ€ฆ ๐‘‹๐‘ก = ๐‘ฅ๐‘ก, ๐‘ ๐‘Ž๐‘š๐‘๐‘™๐‘’ ๐‘ฆ ๐‘“๐‘Ÿ๐‘œ๐‘š ๐‘ž(๐‘ฅ|๐‘ฅ๐‘ก) (๐‘ฆ โˆˆ ๐ท๐‘œ๐‘š๐‘Ž๐‘–๐‘› ๐‘œ๐‘“ ๐ท๐‘’๐‘“๐‘–๐‘›๐‘–๐‘ก๐‘–๐‘œ๐‘›) sample u from Uniform[0,1] If ๐‘ข < ๐›ผ ๐‘ฅ๐‘ก, ๐‘ฆ = ๐‘ ๐‘ฆ ๐‘ž ๐‘ฅ๐‘ก ๐‘ฆ โ‡’ ๐‘ฅ๐‘ก โ†’ ๐‘ฆ, ๏ƒ  Xt+1 = y else Xt+1 = xt 6/11/2014 38 Middleware, CCNT, ZJU ๏ฐ Metropolis-Hastings Sampling Step1: Initialize: ๐‘‹0 = ๐‘ฅ0 Step2: for t = 0, 1, 2, โ€ฆn, n+1, n+2โ€ฆ ๐‘‹๐‘ก = ๐‘ฅ๐‘ก, ๐‘ ๐‘Ž๐‘š๐‘๐‘™๐‘’ ๐‘ฆ ๐‘“๐‘Ÿ๐‘œ๐‘š ๐‘ž ๐‘ฅ ๐‘ฅ๐‘ก ๐‘ฆ โˆˆ ๐ท๐‘œ๐‘š๐‘Ž๐‘–๐‘› ๐‘œ๐‘“ ๐ท๐‘’๐‘“๐‘–๐‘›๐‘–๐‘ก๐‘–on Burn-in Period Convergence , Yueshen Xu
  • 38. Gibbs Sampling sample u from Uniform[0,1] If ๐‘ข < ๐›ผ ๐‘ฅ๐‘ก, ๐‘ฆ = ๐‘š๐‘–๐‘›{ ๐‘ ๐‘ฆ ๐‘ž ๐‘ฅ๐‘ก ๐‘ฆ ๐‘ ๐‘ฅ๐‘ก ๐‘ž ๐‘ฆ ๐‘ฅ๐‘ก , 1} โ‡’ ๐‘ฅ๐‘ก โ†’ ๐‘ฆ ,๏ƒ  Xt+1 = y else Xt+1 = xt 6/11/2014 39 Middleware, CCNT, ZJU Not suitable with regard to high dimensional variables ๏ฐ Gibbs Sampling(Two Dimensions,(x1,y1)) ๏ฎ A(x1,y1), B(x1,y2) ๏ƒ  ๐‘ ๐‘ฅ1, ๐‘ฆ1 ๐‘ ๐‘ฆ2 ๐‘ฅ1 = ๐‘ ๐‘ฅ1 ๐‘ ๐‘ฆ1 ๐‘ฅ1 ๐‘(๐‘ฆ2|๐‘ฅ1) ๏ƒ  ๐‘ ๐‘ฅ1, ๐‘ฆ2 ๐‘ ๐‘ฆ1 ๐‘ฅ1 = ๐‘ ๐‘ฅ1 ๐‘ ๐‘ฆ2 ๐‘ฅ1 ๐‘(๐‘ฆ1|๐‘ฅ1) ๐‘ ๐‘ฅ1, ๐‘ฆ1 ๐‘ ๐‘ฆ2 ๐‘ฅ1 = ๐‘ ๐‘ฅ1, ๐‘ฆ2 ๐‘ ๐‘ฆ1 ๐‘ฅ1 ๐‘ ๐ด ๐‘ ๐‘ฆ2 ๐‘ฅ1 = ๐‘ ๐ต ๐‘ ๐‘ฆ1 ๐‘ฅ1 A(x1,y1) B(x1,y2) C(x2,y1) D ๐‘ ๐ด ๐‘ ๐‘ฅ2 ๐‘ฆ1 = ๐‘ ๐ถ ๐‘ ๐‘ฅ1 ๐‘ฆ1 , Yueshen Xu
  • 39. Gibbs Sampling ๏ฐ Gibbs Sampling(Cont.) ๏ฎ We can construct the transition probabilistic matrix Q accordingly ๐‘„ ๐ด โ†’ ๐ต = ๐‘(๐‘ฆ ๐ต|๐‘ฅ1), if ๐‘ฅ ๐ด = ๐‘ฅ ๐ต = ๐‘ฅ1 ๐‘„ ๐ด โ†’ ๐ถ = ๐‘(๐‘ฅ ๐ถ|๐‘ฆ1), if ๐‘ฆ ๐ด = ๐‘ฆ ๐ถ = ๐‘ฆ1 ๐‘„ ๐ด โ†’ ๐ท = 0, else 6/11/2014 40 Middleware, CCNT, ZJU A(x1,y1) B(x1,y2) C(x2,y1) D Detailed Balance Condition: ๐‘ ๐‘‹ ๐‘„ ๐‘‹ โ†’ ๐‘Œ = ๐‘ ๐‘Œ ๐‘„(๐‘Œ โ†’ ๐‘‹) โˆš ๏ฐ Gibbs Sampling(in two dimension) Step1: Initialize: ๐‘‹0 = ๐‘ฅ0, ๐‘Œ0 = ๐‘ฆ0 Step2: for t = 0, 1, 2, โ€ฆ 1. ๐‘ฆ๐‘ก+1~๐‘ ๐‘ฆ ๐‘ฅ ๐‘ก ; . 2. ๐‘ฅ๐‘ก+1~๐‘ ๐‘ฅ ๐‘ฆ๐‘ก+1 , Yueshen Xu
  • 40. Gibbs Sampling 6/11/2014 41 Middleware, CCNT, ZJU ๏ฐ Gibbs Sampling(in two dimension) Step1: Initialize: ๐‘‹0 = ๐‘ฅ0 = {๐‘ฅ1: ๐‘– = 1,2, โ€ฆ ๐‘›} Step2: for t = 0, 1, 2, โ€ฆ 1. ๐‘ฅ1 (๐‘ก+1) ~๐‘ ๐‘ฅ1 ๐‘ฅ2 (๐‘ก) , ๐‘ฅ3 (๐‘ก) , โ€ฆ , ๐‘ฅ ๐‘› (๐‘ก) ; 2. ๐‘ฅ2 ๐‘ก+1 ~๐‘ ๐‘ฅ2 ๐‘ฅ1 (๐‘ก+1) , ๐‘ฅ3 (๐‘ก) , โ€ฆ , ๐‘ฅ ๐‘› (๐‘ก) 3. โ€ฆ 4. ๐‘ฅ๐‘— ๐‘ก+1 ~๐‘ ๐‘ฅ๐‘— ๐‘ฅ1 (๐‘ก+1) , ๐‘ฅ๐‘—โˆ’1 (๐‘ก+1) , ๐‘ฅ๐‘—+1 (๐‘ก) โ€ฆ , ๐‘ฅ ๐‘› (๐‘ก) 5. โ€ฆ 6. ๐‘ฅ ๐‘› ๐‘ก+1~๐‘ ๐‘ฅ ๐‘› ๐‘ฅ1 (๐‘ก+1) , ๐‘ฅ2 (๐‘ก+1) , โ€ฆ , ๐‘ฅ ๐‘›โˆ’1 (๐‘ก+1) t+1 t , Yueshen Xu
  • 41. Gibbs Sampling for LDA ๏ฐ Gibbs Sampling in LDA ๏ฎ Dir ๐‘ ๐›ผ = 1 ฮ”(๐›ผ) ๐‘˜=1 ๐‘‰ ๐‘ ๐‘˜ ๐›ผ ๐‘˜โˆ’1 , ฮ”( ๐›ผ) is the normalization factor: ฮ” ๐›ผ = ๐‘˜=1 ๐‘‰ ๐‘ ๐‘˜ ๐›ผ ๐‘˜โˆ’1 ๐‘‘ ๐‘ ๐‘ ๐‘ง ๐‘š ๐›ผ = ๐‘ ๐‘ง ๐‘š ๐œƒ ๐‘ ๐œƒ ๐›ผ ๐‘‘ ๐‘ = ๐‘˜=1 ๐‘‰ ๐œƒ ๐‘˜ ๐‘› ๐‘˜ Dir( ๐œƒ| ๐›ผ) ๐‘‘ ๐œƒ = ๐‘˜=1 ๐‘‰ ๐œƒ ๐‘˜ ๐‘› ๐‘˜ 1 ฮ”(๐›ผ) ๐‘˜=1 ๐‘‰ ๐œƒ ๐‘˜ ๐›ผ ๐‘˜โˆ’1 ๐‘‘ ๐œƒ = 1 ฮ”(๐›ผ) ๐‘˜=1 ๐‘‰ ๐œƒ ๐‘˜ ๐‘› ๐‘˜+๐›ผ ๐‘˜โˆ’1 ๐‘‘ ๐œƒ = ฮ”(๐‘› ๐‘š+๐›ผ) ฮ”(๐›ผ) 6/11/2014 42 Middleware, CCNT, ZJU ๐‘ ๐’› ๐›ผ = ๐‘š=1 ๐‘€ ๐‘ ๐‘ง ๐‘š ๐›ผ = ๐‘š=1 ๐‘€ ฮ”(๐‘› ๐‘š+๐›ผ) ฮ”(๐›ผ) โˆ’โ†’ ๐‘ ๐’˜, ๐’› ๐›ผ, ๐›ฝ = ๐‘˜=1 ๐พ ฮ”(๐‘› ๐‘˜+๐›ฝ) ฮ”(๐›ฝ) ๐‘š=1 ๐‘€ ฮ”(๐‘› ๐‘š+๐›ผ) ฮ”(๐›ผ) , Yueshen Xu
  • 42. Gibbs Sampling for LDA ๏ฐ Gibbs Sampling in LDA ๏ฎ ๐‘ ๐œƒ ๐‘š ๐‘งยฌ๐‘–, ๐‘คยฌ๐‘– = ๐ท๐‘–๐‘Ÿ(๐œƒ ๐‘š|๐‘› ๐‘š,ยฌ๐‘– + ๐›ผ), ๐‘ ๐œ‘ ๐‘˜ ๐‘งยฌ๐‘–, ๐‘คยฌ๐‘– = ๐ท๐‘–๐‘Ÿ(๐œ‘ ๐‘˜|๐‘› ๐‘˜,ยฌ๐‘– + ๐›ฝ) ๐‘(๐‘ง๐‘– = ๐‘˜| ๐‘งยฌ๐‘–, ๐‘คยฌ๐‘–) โˆ ๐‘ ๐‘ง๐‘– = ๐‘˜, ๐‘ค๐‘– = ๐‘ก, ๐œƒ ๐‘š, ๐œ‘ ๐‘˜ ๐‘งยฌ๐‘–, ๐‘คยฌ๐‘– = ๐ธ ๐œƒ ๐‘š๐‘˜ โˆ™ ๐ธ ๐œ‘ ๐‘˜๐‘ก = ๐œƒ ๐‘š๐‘˜ โˆ™ ๐œ‘ ๐‘˜๐‘ก ๐œƒ ๐‘š๐‘˜= ๐‘› ๐‘š,ยฌ๐‘– (๐‘ก) +๐›ผ ๐‘˜ ๐‘˜=1 ๐พ (๐‘› ๐‘š,ยฌ๐‘– (๐‘˜) +๐›ผ ๐‘˜) , ๐œ‘ ๐‘˜๐‘ก= ๐‘› ๐‘˜,ยฌ๐‘– (๐‘ก) +๐›ฝ ๐‘˜ ๐‘ก=1 ๐‘‰ (๐‘› ๐‘˜,ยฌ๐‘– (๐‘ก) +๐›ฝ ๐‘˜) ๐‘(๐‘ง๐‘– = ๐‘˜| ๐‘งยฌ๐‘–, ๐‘ค) โˆ ๐‘› ๐‘š,ยฌ๐‘– (๐‘ก) +๐›ผ ๐‘˜ ๐‘˜=1 ๐พ (๐‘› ๐‘š,ยฌ๐‘– (๐‘˜) +๐›ผ ๐‘˜) ร— ๐‘› ๐‘˜,ยฌ๐‘– (๐‘ก) +๐›ฝ ๐‘˜ ๐‘ก=1 ๐‘‰ (๐‘› ๐‘˜,ยฌ๐‘– (๐‘ก) +๐›ฝ ๐‘˜) ๐‘ง๐‘– (๐‘ก+1) ~ ๐‘(๐‘ง๐‘– = ๐‘˜| ๐‘งยฌ๐‘–, ๐‘ค), i=1โ€ฆK 6/11/2014 43 Middleware, CCNT, ZJU, Yueshen Xu
  • 43. Q&A 6/11/2014 Middleware, CCNT, ZJU44, Yueshen Xu