Machine Learning : Latent variable models for discrete data (Topic model ...)

Machine Learning
A Probabilistic Perspective
Chapter 27
Latent variable models for
discrete data
Keywords : topic model, LDA, graph structure
Kyoto University, Okuno lab.
M1 Ikemiya Yukara

Introduction
• Discrete data and Continuous data
• Latent variable model
• Text analysis
1
0 0 1 0 1 1 1 0 1 …
foods computer sports
apple ?
?
topic
?
Bag of words : Word order is ingored
Topic model : Topics generate words
this
is
apen
10
yx +
?
…
pig dog cat
“animal” topic
latent

Distribution and Function
2
functionSoftmax:
ondistributiDirichlet:
ondistributiPoisson:
ondistributilCategorica:
ondistributialMultinomin:
ondistributiNormal:
(:)
(:)
(:)
(:)
(:)
(:)
S
Dir
Poi
Cat
Mu
N

Softmax function
3
• Another name
– Normalized exponential function
Ni
a
a
S N
j j
i
,...,1
)exp(
)exp(
)(
1
,a ==
∑ =

4
Topic model
(not just for Text analysis)

Definition
5
iv vi
ivi
ii
li
Ln
i
vLn
NNi
iLLl
ilVy
∑ =
∈
=
=
∈
,
,
,
:},...,1,0{
):(:1
):(:1
:},...,1{
documentinoccurs
wordtimesofnumberthe
documentsofnumberthe
documentoflengththe
documentinwordth'the

Mixture models
• Simplest model
• Model of count vectors
6
∏=
==
i
i
L
l
kliiLi yCatkqp
1
,:1, )|()|( by
document i i’s topic word l
…
pig dog cat
topic k’s
word distribution
latent
)),|(),|( ,∑===
v
viikiiiii nLLMukqLp (bnn
count vector of
words in document i
)|(),( ,, kvviii nPoikqp λ==n
If is unknowniL

Exponential family PCA (ePCA)
• Probabilistic PCA (PPCA) <Chap. 12>
• Categorical PCA
• Model of count vectors
7
iiiii dNNp zΣμzIWzyθy ),|(),|()|( 00
2
∫= σ
prior of latent variableslikelihood
Continuous data
Change!!
Discrete or count data
∏=
=
i
i
L
l
iliiLi SyCatp
1
,:1, ))(|()|( Wzzy
KV×
ℜ∈W
K
i ℜ∈z
: weight matrix
))(,|(),|( iiiiii SLMuLp Wznzn =
Purpose : More flexible model
Idea : Change latent variables
discrete to continuous
What’s doing? :
Dimension reduction of
iq iz
KV →
(words) (topics)

mPCA and LDA
• Multinomial PCA (mPCA)
• LDA
8
In ePCA, represents the natural parameters of the exponential familyiWz
The natural parameter
Vector of log odds
The dual parameter
Probability vector
iWz iBπ
)(
),|(),|(
Ki
iiiiii
Dir
LMuLp
1~π
Bπnπn
α
=
∏=
=
i
i
L
l
iliiLi yCatp
1
,:1, )|()|( Bππy
…
…
…
B
word 1 2 3 V…
topic
…
1
2
K
iπ …
topic 1 2 3 K…
iBπ …
word 1 2 3 V…
(e.g. Multinominal)

Latent Dirichlet allocation (LDA)
• Main purpose
• Advantage
• Dirichlet distribution
9
Unsupervised discovery of topics.
LDA can handle ambiguity (polysemy).
- To play ball
- To play the coronet
- Shakespeare’s play
10,1 ≤≤=∑ i
i
i θθ
1 2 3 1 2 3

• Full model
10
)(,|
},...,{)(|
)(|
},...,{)(|
,,
,1,
,
,1,
klili
VkkkVk
iili
KiiiKi
Catkqy
bbDir
Catq
Dir
b~B
b:1~b
π~π
π:1~π
=
≡
≡
γγ
ππαα
iL
N
α
B
γ
iπ
liq ,
liy ,
ily
k
ilq
i
li
k
li
i
documentinword:
topicofondistributiword:b
documentinwordtheoftopic:
documentofondistributitopic:π
,
,

11
word 1 2 3
word 1 2 3
1b
2b
Unsupervised
discovery
of topics
Dimension reduction
23 →
(words) (topics)

Evaluation of LDA
• Perplexity
– Evaluation as a language model
12






−= ∑ ∑= =
N
i
L
l
li
i
emp
i
yq
LN
qpperplexity
1 1
, )(log
11
exp),(

test
documents
language
model

Extensions of LDA
• Correlated topic model
• Dynamic topic model
13
business
finance
animal
topics
)(|
),(~
iii
i
S
N
zzπ
Σμz
=
topic “neuroscience”
1900s
“nerve”
2000s
“calcium
receptor”
),(~| 2
,1,1, Vktktkt N Ibbb σ−−
Correlation
Normalization

Extensions of LDA
• LDA-HMM
– HMM generates syntactically correct sentences,
but not semantically plausible ones.
14
HMM function or syntactic words
LDA content or semantic words
the state-state HMM transition matrix
the state-word HMM emission matrix
HMM
A
HMM
B

LVMs for graph-structured data
16
1. Discover some “interesting structure” in the graph,
such as clusters and communities.
2. Predict which links might occur in the future
(e.g. who will make friend with whom).

• Stochastic block model
17
)|(),,|(
),(~)(~
,,
,
bajiji
bai
rBerbqaqrRp
BetaCatq
η
βαη
==== η
,π
adjacency
matrix
nodes probability of connecting
group a to group b

• Mixed membership stochastic block model
– Lift the restriction that each node only belong to
one cluster
18
},...,1{ Kqi ∈ Ki S∈π
)|(),,|(
)(~)(~
)(~
,, bajijiji
jjiiji
i
rBerbqaqrRp
CatqCatq
Dir
η==== ←→
←→
η
π,π
απ
Who-likes-whom graph
labeled
by hand
iπ

LVMs for relational data
20
protein
protein
chemical
typesentity,relation :),...,(... 121 KK TTTTTR ×××⊆
1T 2T
1T
}1,0{: 211 →×× TTTR
:1),,( =kjiR
Protein i interacts with
protein j when chemical k
is present.
3d binary matrix
Extend the stochastic block model
cbakji
t
t
i
cqbqaqrkjiRp
tiKq
,,
211
),,,|),,((
:},...,1{
η=====
∈
η
typeeachofentityeach

Infinite relational model (IRM)
• Idea
– Using a Dirichlet process
• Inference
– Variational Bayes
– Collapsed Gibbs sampling
21
The number of clusters for each type
tK infinite
We just sketch some interesting applications.

Applications of IRM
• Learning ontologies
– Organization of knowledge
22
What is “disease”?
What does it do?
Semantic network
T1 : 135 concepts (e.g. “disease”, “diagnostic procedure”, “animal”)
T2 : 49 predicates (e.g. “affects”, “prevents”)
}1,0{: 211 →×× TTTR
The system found
14 concept clusters and
21 predicate clusters.
(e.g. “biological function
affect organisms”)
Result

Summary
• Topic model
– Latent Dirichlet allocation (LDA)
• Graph structure
– Stochastic block model
• Relational data
– Infinite relational model
23

Machine Learning : Latent variable models for discrete data (Topic model ...)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Machine Learning : Latent variable models for discrete data (Topic model ...)

Similar to Machine Learning : Latent variable models for discrete data (Topic model ...) (20)

Recently uploaded

Recently uploaded (20)

Machine Learning : Latent variable models for discrete data (Topic model ...)