Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms.
2. “Latent semantic analysis (LSA) is a technique in
natural language processing, in particular
distributional semantics, of analyzing relationships
between a set of documents and the terms they
contain by producing a set of concepts related to
the documents and terms.”
“LSA assumes that words that are close in meaning
will occur in similar pieces of text.”
4. LSA
Integer posuere erat a Pellentesque
ornare lacinia quam. Donec id elit
non mi porta gravida at eget metus.
Curabitur sit blandit. Maecenas
lorem faucibus mollis interdum.
Cras mattis consectetur purus sit
amet fermentum. Fusce dapibus,
tellus ac cursus commodo…
Egestas Justo
Documents
Vocabulary
D = {d1, . . . , dn}
W = {w1, . . . , wm}
Nn×m = (n(di, wj))ij
5. Nn×m =
n(d1, w1) n(d1, w1) n(d1, w1) … n(d1, wm)
n(d2, w1) n(d2, w1) n(d2, w1) … n(d2, wm)
⋮ ⋮ ⋮ ⋱ ⋮
n(dn, w1) n(dn, w1) n(dn, w1) … n(dn, wm)
A typical term-document matrix derived
from short articles, text summaries or
abstracts may only have a small fraction of
non-zero entries (typically well below 1%)
Sparsness
one has to account for synonyms in order
not to underestimate the true similarity of
documents
Underestimate
one has to deal with polysems to avoid
overestimating the true similarity between
documents by counting common terms
that are used in different meanings
Overestimating
11. Expectation Maximization
(EM) algorithm
01 step where posterior probabilities
are computed for the latent
variables, based on the current
estimates of the parameters
Expectation (E)
02 where parameters are updated
based on the so-called
expected complete data log-
likelihood which depends on the
posterior probabilities
computed in the E-step.
Maximization (M)
P(zk |di, wj) =
P(wj |zk)P(zk |di)
∑
K
l=1
P(wj |zl)P(zl |di)
P(wj |zk) =
∑
N
i=1
n(di, wj)P(zk |di, wj)
∑
M
m=1
∑
N
i=1
n(di, wm)P(zk |di, wm)
P(zk |di) =
∑
M
j=1
n(di, wj)P(zk |di, wj)
n(di)
12. SVD
U = (P(di |zk))i,k
V = (P(wj |zk))j,k
Σ = diag(P(zk))k
} P = UΣVt
P(di, wj) = ΣK
k=1P(zk)P(di |zk)P(wj |zk)
14. Aspect versus Cluster
P(wj |di) =
K
∑
k=1
P{c(di) = ck}P(wj |ck)
P{c(di) = ck} =
P(ck)∏
M
j=1
P(wj |ck)n(di,wj)
∑
k
l=1
P(cl)∏
M
j=1
P(wj |cl)n(di,wj)
15. Aspect versus Cluster
P = exp
[
−
∑i,j
n′(di, wj)logP(wj |di)
∑i,j
n′(di, wj) ]
P̃(zk; di, wj) =
[P(zk |di)P(wj |zk)]β
∑l
[p(zl |di)P(wj |zl)]β
Set
Decrease
As long as the performance on hold-out data improves (non-negligible) continue TEM
iterations at this value of fl, otherwise goto step 2
Perform stopping on
1.
2.
3.
4.
β ← 1 and perform EM with early stopping.
) and perform one TEM iteration.
β ← ηβ
,i.e., stop when decreasing does not yield further improvements.
β β
(with η < 1