[Kim+ ICML2012] Dirichlet Process
 with Mixed Random Measures : A
  Nonparametric Topic Model for
          Labeled Data


            2012/07/28
 Nakatani Shuyo @ Cybozu Labs, Inc
          twitter : @shuyo
LDA(Latent Dirichlet Allocation)
             [Blei+ 03]
• Unsupervised Topic Model
  – Each word has an unobserved topic
• Parametric
  – The topic size K is given in advance




                     via Wikipedia
Labeled LDA [Ramage+ 09]

• Supervised Topic Model
  – Each document has an observed label
• Parametric




                  via [Ramage+ 09]
Generative Process for L-LDA
• 𝜷 𝑘 ~Dir 𝜼
                                                       topics corresponding to
          𝑑                                                observed labels
• Λ 𝑘 ~Bernoulli Φ 𝑘
• 𝜽       𝑑       ~Dir 𝜶    𝑑
                                                                    restricted to labeled
     – where 𝜶          𝑑   = 𝛼𝑘                                         parameters
                                               𝑑
                                            𝑘 Λ 𝑘 =1

          𝑑                     𝑑
• 𝑧 𝑖 ~Multi 𝜽
              𝑑
•    𝑤𝑖           ~Multi 𝜷 𝑧            𝑑
                                    𝑖

                                                             via [Ramage+ 09]
Pros/Cons of L-LDA
• Pros
  – Easy to implement


• Cons                                                      via [Ramage+ 09]

  – It is necessary to specify label-topic
    correspondence manually
     • Its performance depends on the corresponds

         ※) My implementation is here : https://github.com/shuyo/iir/blob/master/lda/llda.py
DP-MRM [Kim+ 12]
  – Dirichlet Process with Mixed Random Measures

• Supervised Topic Model
• Nonparametric
  – K is not the topic size, but the label size
                                   𝛼

                                                      𝑁𝑗

                     𝐻       𝐺0𝑘   𝐺𝑗   𝜃 𝑗𝑖   𝑥 𝑗𝑖


                                   𝜆j   𝑟𝑗             𝐷
                 𝛽           𝛾𝑘    𝜂
                         𝐾
Generative Process for DP-MRM
                                                           𝛼
        Each label has a random
        measure as topic space                                                  𝑁𝑗
                                           𝐻       𝐺0𝑘     𝐺𝑗     𝜃 𝑗𝑖   𝑥 𝑗𝑖
• 𝐻 = Dir 𝛽
                                                           𝜆j     𝑟𝑗             𝐷
• 𝐺0𝑘 ~DP 𝛾 𝑘 , 𝐻                      𝛽
                                               𝐾
                                                   𝛾𝑘      𝜂


• 𝜆 𝑗 ~Dir 𝒓 𝑗 𝜂 where 𝒓 𝑗 = 𝐼 𝑘∈label                     𝑗

• 𝐺 𝑗 ~DP 𝛼,        𝑘∈label 𝑗     𝜆 𝑗𝑘 𝐺0𝑘               mixed random measures


• 𝜃 𝑗𝑖 ~𝐺 𝑗 , 𝑥 𝑗𝑖 ~𝐹 𝜃 𝑗𝑖 = Multi 𝜃 𝑗𝑖
Stick Breaking Process
•   𝑣 𝑙 𝑘 ~Beta 1, 𝛾 𝑘 , 𝜋 𝑙𝑘 = 𝑣 𝑙 𝑘      𝑙−1
                                            𝑑=0   1 − 𝑣 𝑑𝑘

•   𝜙 𝑙𝑘 ~𝐻, 𝐺0𝑘 =      ∞
                        𝑙=0   𝜋 𝑙𝑘 𝛿 𝜙 𝑘
                                      𝑙
                                                                  𝑡−1
• 𝜆 𝑗 ~Dir 𝒓 𝑗 𝜂 , 𝑤 𝑗𝑡 ~Beta 1, 𝛼 , 𝜋 𝑗𝑡 = 𝑤 𝑗𝑡                  𝑑=0   1 − 𝑤 𝑗𝑑
                                 𝑘 𝑗𝑡              ∞
•   𝑘 𝑗𝑡 ~Multi 𝜆 𝑗 ,    𝜓 𝑗𝑡 ~𝐺0 ,        𝐺𝑗 =    𝑡=0   𝜋 𝑗𝑡 𝛿 𝜓 𝑗𝑡
Chinese Restaurant Franchise
• 𝑡 𝑗𝑖 : table index of 𝑖-th term in 𝑗-th document
• 𝑘 𝑗𝑡 , 𝑙 𝑗𝑡 : dish indexes on 𝑡-th table of 𝑗-th
  document                                   This layer consists on
                                                   only a single DP G0
                                                    on normal HDP
Inference (1)



• Sampling 𝑡
Inference (2)
• Sampling 𝑘 and 𝑙
Experiments
• DP-MRM gives label-topic probabilistic
  corresponding automatically.




                   via [Kim+ 12]
via [Kim+ 12]

• L-LDA can also predict single labeled document to
  assign a common second label to any documents.
References
• [Kim+ ICML2012] Dirichlet Process with Mixed
  Random Measures : A Nonparametric Topic
  Model for Labeled Data
• [Ramage+ EMNLP2009] Labeled LDA : A
  supervised topic model for credit attribution in
  multi-labeled corpora
• [Blei+ 2003] Latent Dirichlet Allocation

[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametric Topic Model for Labeled Data

  • 1.
    [Kim+ ICML2012] DirichletProcess with Mixed Random Measures : A Nonparametric Topic Model for Labeled Data 2012/07/28 Nakatani Shuyo @ Cybozu Labs, Inc twitter : @shuyo
  • 2.
    LDA(Latent Dirichlet Allocation) [Blei+ 03] • Unsupervised Topic Model – Each word has an unobserved topic • Parametric – The topic size K is given in advance via Wikipedia
  • 3.
    Labeled LDA [Ramage+09] • Supervised Topic Model – Each document has an observed label • Parametric via [Ramage+ 09]
  • 4.
    Generative Process forL-LDA • 𝜷 𝑘 ~Dir 𝜼 topics corresponding to 𝑑 observed labels • Λ 𝑘 ~Bernoulli Φ 𝑘 • 𝜽 𝑑 ~Dir 𝜶 𝑑 restricted to labeled – where 𝜶 𝑑 = 𝛼𝑘 parameters 𝑑 𝑘 Λ 𝑘 =1 𝑑 𝑑 • 𝑧 𝑖 ~Multi 𝜽 𝑑 • 𝑤𝑖 ~Multi 𝜷 𝑧 𝑑 𝑖 via [Ramage+ 09]
  • 5.
    Pros/Cons of L-LDA •Pros – Easy to implement • Cons via [Ramage+ 09] – It is necessary to specify label-topic correspondence manually • Its performance depends on the corresponds ※) My implementation is here : https://github.com/shuyo/iir/blob/master/lda/llda.py
  • 6.
    DP-MRM [Kim+ 12] – Dirichlet Process with Mixed Random Measures • Supervised Topic Model • Nonparametric – K is not the topic size, but the label size 𝛼 𝑁𝑗 𝐻 𝐺0𝑘 𝐺𝑗 𝜃 𝑗𝑖 𝑥 𝑗𝑖 𝜆j 𝑟𝑗 𝐷 𝛽 𝛾𝑘 𝜂 𝐾
  • 7.
    Generative Process forDP-MRM 𝛼 Each label has a random measure as topic space 𝑁𝑗 𝐻 𝐺0𝑘 𝐺𝑗 𝜃 𝑗𝑖 𝑥 𝑗𝑖 • 𝐻 = Dir 𝛽 𝜆j 𝑟𝑗 𝐷 • 𝐺0𝑘 ~DP 𝛾 𝑘 , 𝐻 𝛽 𝐾 𝛾𝑘 𝜂 • 𝜆 𝑗 ~Dir 𝒓 𝑗 𝜂 where 𝒓 𝑗 = 𝐼 𝑘∈label 𝑗 • 𝐺 𝑗 ~DP 𝛼, 𝑘∈label 𝑗 𝜆 𝑗𝑘 𝐺0𝑘 mixed random measures • 𝜃 𝑗𝑖 ~𝐺 𝑗 , 𝑥 𝑗𝑖 ~𝐹 𝜃 𝑗𝑖 = Multi 𝜃 𝑗𝑖
  • 8.
    Stick Breaking Process • 𝑣 𝑙 𝑘 ~Beta 1, 𝛾 𝑘 , 𝜋 𝑙𝑘 = 𝑣 𝑙 𝑘 𝑙−1 𝑑=0 1 − 𝑣 𝑑𝑘 • 𝜙 𝑙𝑘 ~𝐻, 𝐺0𝑘 = ∞ 𝑙=0 𝜋 𝑙𝑘 𝛿 𝜙 𝑘 𝑙 𝑡−1 • 𝜆 𝑗 ~Dir 𝒓 𝑗 𝜂 , 𝑤 𝑗𝑡 ~Beta 1, 𝛼 , 𝜋 𝑗𝑡 = 𝑤 𝑗𝑡 𝑑=0 1 − 𝑤 𝑗𝑑 𝑘 𝑗𝑡 ∞ • 𝑘 𝑗𝑡 ~Multi 𝜆 𝑗 , 𝜓 𝑗𝑡 ~𝐺0 , 𝐺𝑗 = 𝑡=0 𝜋 𝑗𝑡 𝛿 𝜓 𝑗𝑡
  • 9.
    Chinese Restaurant Franchise •𝑡 𝑗𝑖 : table index of 𝑖-th term in 𝑗-th document • 𝑘 𝑗𝑡 , 𝑙 𝑗𝑡 : dish indexes on 𝑡-th table of 𝑗-th document This layer consists on only a single DP G0 on normal HDP
  • 10.
  • 11.
  • 12.
    Experiments • DP-MRM giveslabel-topic probabilistic corresponding automatically. via [Kim+ 12]
  • 13.
    via [Kim+ 12] •L-LDA can also predict single labeled document to assign a common second label to any documents.
  • 14.
    References • [Kim+ ICML2012]Dirichlet Process with Mixed Random Measures : A Nonparametric Topic Model for Labeled Data • [Ramage+ EMNLP2009] Labeled LDA : A supervised topic model for credit attribution in multi-labeled corpora • [Blei+ 2003] Latent Dirichlet Allocation