7. Learning
to
Classify
Short
and
Sparse
Text
&
Web
with
Hidden
Topics
from
Large-‐scale
Data
Collections
(WWW2008)
Wikipedia,
MEDLINE LDA
Wikipedia ”universal
corpus”
12. Learning
to
Classify
Short
and
Sparse
Text
&
Web
with
Hidden
Topics
from
Large-‐scale
Data
Collections
(WWW2008)
13. ?
Unlike
normal
documents,
these
text
&
Web
segments
are
usually
noisier,
less
topic-‐focused,
and
much
shorter,
that
is,
they
consist
of
from
a
dozen
words
to
a
few
sentences.
Because
of
the
short
length,
they
do
not
provide
enough
word
co-‐
occurrence
or
shared
context
for
a
good
similarity
measure.
16. LDA(model)
-‐>
MaxEnt(classifier)
LDA sparse
text
D.
Blei,
A.
Ng,
and
M.
Jordan.
Latent
Dirichlet
Allocation.
JMLR,
3:993–1022,
2003.
SVM ME
SVM
SVM ( )
38. API
Get
1.
2. (redis)
3.
[Topic1:prob,Topic2:prob,]
JSON
39. PLDA
Collaborative
Filtering
for
Orkut
Communities:
Discovery
of
User
Latent
Behavior.
Wen-‐
Yen
Chen
et
al.,
WWW
2009.
http://www.cs.ucsb.edu/~wychen/publications/fp365-‐chen.pdf
…
The
role
of
semantic
history
on
online
generative
topic
modeling.
L
AlSumait,
D
Barbará,
C
Domeniconi
-‐
ise.gmu.edu
http://www.ise.gmu.edu/~carlotta/publications/Siam_SemOLDA.pdf
LDA …
R LDA author facebook/data
LDA
Not-‐So-‐Latent
Dirichlet
Allocation:
Collapsed
Gibbs
Sampling
Using
Human
Judgments
ePluribus:
Ethnicity
on
Social
Networks.
http://www.facebook.com/data#!/data?v=app_4949752878