Machine Learning, A Probabilistic Perspective
Chapter 27 : Latent variable models for discrete data
topic model, LDA, graph structure, relational data
text analysis
トピックモデル・テキスト分析・
Machine Learning : Latent variable models for discrete data (Topic model ...)
1. Machine Learning
A Probabilistic Perspective
Chapter 27
Latent variable models for
discrete data
Keywords : topic model, LDA, graph structure
Kyoto University, Okuno lab.
M1 Ikemiya Yukara
2. Introduction
• Discrete data and Continuous data
• Latent variable model
• Text analysis
1
0 0 1 0 1 1 1 0 1 …
foods computer sports
apple ?
?
topic
?
Bag of words : Word order is ingored
Topic model : Topics generate words
this
is
apen
10
yx +
?
…
pig dog cat
“animal” topic
latent
7. Mixture models
• Simplest model
• Model of count vectors
6
∏=
==
i
i
L
l
kliiLi yCatkqp
1
,:1, )|()|( by
document i i’s topic word l
…
pig dog cat
topic k’s
word distribution
latent
)),|(),|( ,∑===
v
viikiiiii nLLMukqLp (bnn
count vector of
words in document i
)|(),( ,, kvviii nPoikqp λ==n
If is unknowniL
8. Exponential family PCA (ePCA)
• Probabilistic PCA (PPCA) <Chap. 12>
• Categorical PCA
• Model of count vectors
7
iiiii dNNp zΣμzIWzyθy ),|(),|()|( 00
2
∫= σ
prior of latent variableslikelihood
Continuous data
Change!!
Discrete or count data
∏=
=
i
i
L
l
iliiLi SyCatp
1
,:1, ))(|()|( Wzzy
KV×
ℜ∈W
K
i ℜ∈z
: weight matrix
))(,|(),|( iiiiii SLMuLp Wznzn =
Purpose : More flexible model
Idea : Change latent variables
discrete to continuous
What’s doing? :
Dimension reduction of
iq iz
KV →
(words) (topics)
9. mPCA and LDA
• Multinomial PCA (mPCA)
• LDA
8
In ePCA, represents the natural parameters of the exponential familyiWz
The natural parameter
Vector of log odds
The dual parameter
Probability vector
iWz iBπ
)(
),|(),|(
Ki
iiiiii
Dir
LMuLp
1~π
Bπnπn
α
=
∏=
=
i
i
L
l
iliiLi yCatp
1
,:1, )|()|( Bππy
…
…
…
B
word 1 2 3 V…
topic
…
1
2
K
iπ …
topic 1 2 3 K…
iBπ …
word 1 2 3 V…
(e.g. Multinominal)
10. Latent Dirichlet allocation (LDA)
• Main purpose
• Advantage
• Dirichlet distribution
9
Unsupervised discovery of topics.
LDA can handle ambiguity (polysemy).
- To play ball
- To play the coronet
- Shakespeare’s play
10,1 ≤≤=∑ i
i
i θθ
1 2 3 1 2 3
11. Latent Dirichlet allocation (LDA)
• Full model
10
)(,|
},...,{)(|
)(|
},...,{)(|
,,
,1,
,
,1,
klili
VkkkVk
iili
KiiiKi
Catkqy
bbDir
Catq
Dir
b~B
b:1~b
π~π
π:1~π
=
≡
≡
γγ
ππαα
iL
N
α
B
γ
iπ
liq ,
liy ,
ily
k
ilq
i
li
k
li
i
documentinword:
topicofondistributiword:b
documentinwordtheoftopic:
documentofondistributitopic:π
,
,
12. Latent Dirichlet allocation (LDA)
11
word 1 2 3
word 1 2 3
1b
2b
Unsupervised
discovery
of topics
Dimension reduction
23 →
(words) (topics)
13. Evaluation of LDA
• Perplexity
– Evaluation as a language model
12
−= ∑ ∑= =
N
i
L
l
li
i
emp
i
yq
LN
qpperplexity
1 1
, )(log
11
exp),(
test
documents
language
model
14. Extensions of LDA
• Correlated topic model
• Dynamic topic model
13
business
finance
animal
topics
)(|
),(~
iii
i
S
N
zzπ
Σμz
=
topic “neuroscience”
1900s
“nerve”
2000s
“calcium
receptor”
),(~| 2
,1,1, Vktktkt N Ibbb σ−−
Correlation
Normalization
15. Extensions of LDA
• LDA-HMM
– HMM generates syntactically correct sentences,
but not semantically plausible ones.
14
HMM function or syntactic words
LDA content or semantic words
the state-state HMM transition matrix
the state-word HMM emission matrix
HMM
A
HMM
B
17. LVMs for graph-structured data
16
1. Discover some “interesting structure” in the graph,
such as clusters and communities.
2. Predict which links might occur in the future
(e.g. who will make friend with whom).
18. LVMs for graph-structured data
• Stochastic block model
17
)|(),,|(
),(~)(~
,,
,
bajiji
bai
rBerbqaqrRp
BetaCatq
η
βαη
==== η
,π
adjacency
matrix
nodes probability of connecting
group a to group b
19. LVMs for graph-structured data
• Mixed membership stochastic block model
– Lift the restriction that each node only belong to
one cluster
18
},...,1{ Kqi ∈ Ki S∈π
)|(),,|(
)(~)(~
)(~
,, bajijiji
jjiiji
i
rBerbqaqrRp
CatqCatq
Dir
η==== ←→
←→
η
π,π
απ
Who-likes-whom graph
labeled
by hand
iπ
21. LVMs for relational data
20
protein
protein
chemical
typesentity,relation :),...,(... 121 KK TTTTTR ×××⊆
1T 2T
1T
}1,0{: 211 →×× TTTR
:1),,( =kjiR
Protein i interacts with
protein j when chemical k
is present.
3d binary matrix
Extend the stochastic block model
cbakji
t
t
i
cqbqaqrkjiRp
tiKq
,,
211
),,,|),,((
:},...,1{
η=====
∈
η
typeeachofentityeach
22. Infinite relational model (IRM)
• Idea
– Using a Dirichlet process
• Inference
– Variational Bayes
– Collapsed Gibbs sampling
21
The number of clusters for each type
tK infinite
We just sketch some interesting applications.
23. Applications of IRM
• Learning ontologies
– Organization of knowledge
22
What is “disease”?
What does it do?
Semantic network
T1 : 135 concepts (e.g. “disease”, “diagnostic procedure”, “animal”)
T2 : 49 predicates (e.g. “affects”, “prevents”)
}1,0{: 211 →×× TTTR
The system found
14 concept clusters and
21 predicate clusters.
(e.g. “biological function
affect organisms”)
Result
24. Summary
• Topic model
– Latent Dirichlet allocation (LDA)
• Graph structure
– Stochastic block model
• Relational data
– Infinite relational model
23