Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Expectation propagation for latent Dirichlet allocation

62 views

Published on

Expectation propagation for latent Dirichlet allocation
This is a revised version of http://www.cis.nagasaki-u.ac.jp/~masada/2012121001.pdf

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Expectation propagation for latent Dirichlet allocation

  1. 1. Expectation-Propagation for Latent Dirichlet Allocation Tomonari MASADA @ Nagasaki University November 22, 2018 1 This manuscript contains a derivation of the update formulae of expectation-propagation (EP) presented in the following paper: Thomas Minka and John Lafferty. Expectation-Propagation for the Generative Aspect Model. in Proc. of the 18th Conference on Uncertainty in Artificial Intelligence, pp. 352-359, 2002.1 2 A joint distribution for document d in LDA can be written as below. p(x, θd|α) = p(θd|α)p(x|θd) = p(θd|α) nd i=1 p(xdi|θd) = p(θd|α) nd i=1 k θdkφkxdi = p(θd|α) w k θdkφkw ndw , (1) where p(θd|α) is a Dirichlet prior distribution. We approximate p(w|θd) = k θdkφkw by tw(θd) = sw k θβwk dk . Then we obtain an approximated joint distribution as follows: p(x, θd|α) ≈ p(θd|α) w tw(θd)ndw = p(θd|α) w sw k θβwk dk ndw = Γ( k αk) k Γ(αk) w sndw w k θ αk−1+ w βwkndw dk (2) Let γdk = αk + w βwkndw. That is, p(xd, θd|α) ≈ Γ( k αk) k Γ(αk) w sndw w k θγdk−1 dk . (3) An approximated posterior q(θd) can be obtained as follows: q(θd) = Γ( k αk) k Γ(αk) w sndw w k θγdk−1 dk Γ( k αk) k Γ(αk) w sndw w k θγdk−1 dk dθd = k θγdk−1 dk k θγdk−1 dk dθd = Γ( k γdk) k Γ(γdk) k θγdk−1 dk . (4) 1http://research.microsoft.com/en-us/um/people/minka/papers/aspect/ 1
  2. 2. 3 For each word token in document d, we remove its contribution to w tw(θd)ndw = w sw k θβwk dk ndw . When the token is a token of word w, this corresponds to a division by sw k θβwk dk . Note that we here consider a removal of word token, not a removal of word type. Then we obtain an ‘old’ posterior after this division as follows: qw (θd) = Γ k(γdk − βwk) k Γ(γdk − βwk) k θγdk−βwk−1 dk = Γ( k γ w dk ) k Γ(γ w dk ) k θ γ w dk −1 dk , (5) where γ w dk ≡ γdk − βwk. By combining p(w|θd) = k θdkΦkw with qw (θd), we obtain a something similar to joint distribution as follows: ( k θdkφkw) · Γ( k γ w dk ) k Γ(γ w dk ) k θ γ w dk −1 dk . Therefore, by integrating θd out, we obtain a something similar to evidence as follows: k θdkφkw Γ( k γ w dk ) k Γ(γ w dk ) k θ γ w dk −1 dk dθd = k φkwγ w dk k γ w dk . Consequently, we obtain a new posterior as follows: ˜q(θd) = k γ w dk k φkwγ w dk k θdkφkw Γ( k γ w dk ) k Γ(γ w dk ) k θ γ w dk −1 dk = ˆγd w k φkwγ w dk k θdkφkw Γ( ˆγd w ) k Γ(γ w dk ) k θ γ w dk −1 dk , (6) where ˆγd w ≡ k γ w dk . 4 Based on Eq. (6), we calculate E˜q[θdk] and E˜q[θ2 dk]. E˜q[θdk] = θdk ˜q(θd)dθd = θdk ˆγd w k φkwγ w dk k θdkφkw Γ( ˆγd w ) k Γ(γ w dk ) k θ γ w dk −1 dk dθd = ˆγd w k φkwγ w dk · Γ( ˆγd w ) k Γ(γ w dk ) θdk k θdkφkw k θ γ w dk −1 dk dθd = ˆγd w k φkwγ w dk · Γ( ˆγd w ) k Γ(γ w dk ) θ2 dkφkw k θ γ w dk −1 dk dθd + k =k θdkθdk φk w k θ γ w dk −1 dk dθd = ˆγd w k φkwγ w dk · Γ( ˆγd w ) k Γ(γ w dk ) φkw θ2 dk k θ γ w dk −1 dk dθd + k =k φk w θdkθdk k θ γ w dk −1 dk dθd = ˆγd w k φkwγ w dk · Γ( ˆγd w ) k Γ(γ w dk ) φkw θ γ w dk +1 dk l=k θ γ w dl −1 dl dθd + k =k φk w θ γ w dk dk θ γ w dk dk l=k∧l=k θ γ w dl −1 dl dθd It should be noted that k Γ(γdk) Γ( k γdk) = k θγdk−1 dk dθd holds for any γd = (γd1, . . . , γdK). This is how we obtain the normalizing constant of Dirichlet distribution. Therefore, θ γ w dk +1 dk l=k θ γ w dl −1 dl dθd = 2
  3. 3. Γ(γ w dk +2) l=k Γ(γ w dl ) Γ( ˆγd w+2) . Further, θ γ w dk dk θ γ w dk dk l=k∧l=k θ γ w dl −1 dl dθd = Γ(γ w dk +1)Γ(γ w dk +1) l=k∧l=k Γ(γ w dl ) Γ( ˆγd w+2) . With these results, we can continue the formula rearrangement as follows: ∴ E˜q[θdk] = ˆγd w k φkwγ w dk · Γ( ˆγd w ) k Γ(γ w dk ) φkw Γ(γ w dk + 2) l=k Γ(γ w dl ) Γ( ˆγd w + 2) + k =k φk w Γ(γ w dk + 1)Γ(γ w dk + 1) l=k∧l=k Γ(γ w dl ) Γ( ˆγd w + 2) = ˆγd w k φkwγ w dk φkw Γ( ˆγd w ) k Γ(γ w dk ) · Γ(γ w dk + 2) l=k Γ(γ w dl ) Γ( ˆγd w + 2) + k =k φk w Γ( ˆγd w ) k Γ(γ w dk ) · Γ(γ w dk + 1)Γ(γ w dk + 1) l=k∧l=k Γ(γ w dl ) Γ( ˆγd w + 2) = ˆγd w k φkwγ w dk φkw γ w dk (γ w dk + 1) ˆγd w ( ˆγd w + 1) + k =k φk w γ w dk γ w dk ˆγd w ( ˆγd w + 1) = ˆγd w k φkwγ w dk · γ w dk ˆγd w · φkw + k φkwγ w dk ˆγd w + 1 (7) E˜q[θ2 dk] = θ2 dk ˜q(θd)dθd = θ2 dk ˆγd w k φkwγ w dk k θdkφkw Γ( ˆγd w ) k Γ(γ w dk ) k θ γ w dk −1 dk dθd = ˆγd w k φkwγ w dk φkw γ w dk (γ w dk + 1)(γ w dk + 2) ˆγd w ( ˆγd w + 1)( ˆγd w + 2) + k =k φk w γ w dk (γ w dk + 1)γ w dk ˆγd w ( ˆγd w + 1)( ˆγd w + 2) = ˆγd w k φkwγ w dk · γ w dk ˆγd w · γ w dk + 1 ˆγd w + 1 · 2φkw + k φkwγ w dk ˆγd w + 2 . (8) We determine a Dirichlet distribution Dir(γd) so that the mean γdk ˆγd and the average second moment2 1 K k γdk(γdk+1) ˆγd(ˆγd+1) are equal to those of ˜q(θd), where ˆγd = k γdk. This procedure is called moment matching. To be precise, we obtain γd by solving the following equations: γdk ˆγd = E˜q[θdk] for each k, and (9) 1 K k γdk(γdk + 1) ˆγd(ˆγd + 1) = 1 K k E˜q[θ2 dk] . (10) 2http://www.ucl.ac.uk/statistics/research/pdfs/135.zip 3
  4. 4. We can obtain γdk as below. γdk = E˜q[θdk]ˆγd k E˜q[θdk]ˆγd(E˜q[θdk]ˆγd + 1) ˆγd(ˆγd + 1) = k E˜q[θ2 dk] k E˜q[θdk](E˜q[θdk]ˆγd + 1) = k E˜q[θ2 dk](ˆγd + 1) k E˜q[θdk]2 ˆγd + k E˜q[θdk] = k E˜q[θ2 dk]ˆγd + k E˜q[θ2 dk] k E˜q[θdk]2 − E˜q[θ2 dk] ˆγd = k E˜q[θ2 dk] − E˜q[θdk] ∴ ˆγd = k(E˜q[θ2 dk] − E˜q[θdk]) k(E˜q[θdk]2 − E˜q[θ2 dk]) (11) ∴ γdk = k(E˜q[θ2 dk] − E˜q[θdk]) k(E˜q[θdk]2 − E˜q[θ2 dk]) · E˜q[θdk] (12) 5 We now have a Dirichlet distribution that approxiates the posterior ˜q(θd) in Eq. (6), i.e., ˜q(θd) = k γ w dk k φkwγ w dk k θdkφkw Γ( k γ w dk ) k Γ(γ w dk ) k θ γ w dk −1 dk . To obtain a density function of Dirichlet distribution from Eq. (6), we replace the term ( k θdkφkw) by tw(θd) = sw k θ βwk dk . We set the resulting function equal to the density function of Dir(γd) and obtain the following equation: k γ w dk k φkwγ w dk sw k θ βwk dk Γ( k γ w dk ) k Γ(γ w dk ) k θ γ w dk −1 dk = Γ( k γdk) k Γ(γdk) k θ γdk−1 dk . (13) Consequently, sw and βwk are determined as follows: βwk = γdk − γ w dk , (14) sw = k φkwγ w dk k γ w dk Γ( k γdk) k Γ(γdk) k Γ(γ w dk ) Γ( k γ w dk ) . (15) 4

×