SlideShare a Scribd company logo
A Note on TopicRNN
Tomonari MASADA @ Nagasaki University
July 13, 2017
1 Model
TopicRNN is a generative model proposed by [1], whose generative story for a particular document x1:T
is given as below.
1. Draw a topic vector θ ∼ N(0, I).
2. Given word y1:t−1, for the tth word yt in the document,
(a) Compute hidden state ht = fW (xt, ht−1), where we let xt yt−1.
(b) Draw stop word indicator lt ∼ Bernoulli(σ(Γ ht)), with σ the sigmoid function.
(c) Draw word yt ∼ p(yt|ht, θ, lt, B), where
p(yt = i|ht, θ, lt, B) ∝ exp(vi ht + (1 − lt)bi θ) .
2 Lower bound
The log marginal likelihood of the word sequence y1:T and the stop word indicators l1:T is
log p(y1:T , l1:T |h1:T ) = log p(θ)
T
t=1
p(yt|ht, lt, θ; W)p(lt|ht; Γ)dθ (1)
A lower bound can be obtained as follows:
log p(y1:T , l1:T |h1:T ) = log p(θ)
T
t=1
p(yt|ht, lt, θ; W)p(lt|ht; Γ)dθ
= log q(θ)
p(θ)
T
t=1 p(yt|ht, lt, θ; W)p(lt|ht; Γ)
q(θ)
dθ
≥ q(θ) log
p(θ)
T
t=1 p(yt|ht, lt, θ; W)p(lt|ht; Γ)
q(θ)
dθ
= q(θ) log p(θ)dθ +
T
t=1
q(θ) log p(yt|ht, lt, θ; W)dθ +
T
t=1
q(θ) log p(lt|ht; Γ)dθ − q(θ) log q(θ)dθ
L(y1:T , l1:T |q(θ), Θ) (2)
3 Approximate posterior
The form of q(θ) is chosen to be an inference network using a feed-forward neural network. Each expec-
tation in Eq. (2) is approximated with the samples from q(θ|Xc), where Xc denotes the term-frequency
representation of y1:T excluding stop words. The density of the approximate posterior q(θ|Xc) is specified
as follows:
q(θ|Xc) = N(θ; µ(Xc), diag(σ2
(Xc))), (3)
µ(Xc) = W1g(Xc) + a1, (4)
log σ(Xc) = W2g(Xc) + a2, (5)
where g(·) denotes the feed-forward neural network. Eq. (3) gives the reparameterization of θk as θk =
µk(Xc) + kσk(Xc) for k = 1, . . . , K, where k is a sample from the standard normal distribution N(0, 1).
1
4 Monte Carlo integration
We can now rewrite each term of the lower bound L(y1:T , l1:T |q(θ), Θ) in Eq. (2) as below, where the θ(s)
s
denote the samples drawn from the approximate posterior q(θ|Xc).
The first term:
q(θ) log p(θ)dθ ≈
1
S
S
s=1
log p(θ(s)
) =
1
S
S
s=1
K
k=1
log
1
√
2π
exp −
θ
(s)
k
2
2
= −
K log(2π)
2
−
1
2
K
k=1
s θ
(s)
k
2
S
(6)
Each addend of the second term:
q(θ) log p(yt|ht, lt, θ; W)dθ ≈
1
S
S
s=1
log
exp(vyt
ht + (1 − lt)byt
θ(s)
)
C
j=1 exp(vj ht + (1 − lt)bj θ(s))
= vyt
ht + (1 − lt)byt
S
s=1 θ(s)
S
−
1
S
S
s=1
log
C
j=1
exp vj ht + (1 − lt)bj θ(s)
(7)
Each addend of the third term:
q(θ) log p(lt|ht; Γ)dθ = lt log(σ(Γ ht)) + (1 − lt) log(1 − σ(Γ ht)) (8)
The fourth term:
q(θ) log q(θ)dθ ≈
1
S
S
s=1
K
k=1
log
1
2πσ2
k(Xc)
exp −
(θ
(s)
k − µk(Xc))2
2σ2
k(Xc)
= −
K log(2π)
2
−
K
k=1
log(σk(Xc)) −
1
S
S
s=1
K
k=1
θ
(s)
k − µk(Xc)
2
2σ2
k(Xc)
(9)
5 Objective to be maximized
Each of the s samples (i.e., θ(s)
for s = 1, . . . , S) is obtained as θ(s)
= µ(Xc)+ (s)
◦σ(Xc) via the reparam-
eterization, where the
(s)
k s are drawn from the standard normal, and ◦ is the element-wise multiplication.
Consequently, the lower bound L(y1:T , l1:T |q(θ), Θ) to be maximized is obtained as follows:
L(y1:T , l1:T |q(θ), Θ) = −
1
2
K
k=1
s µk(Xc) +
(s)
k σk(Xc)
2
S
+
T
t=1
vyt
ht +
1
S
S
s=1
T
t=1
(1 − lt)byt
µ(Xc) + (s)
◦ σ(Xc)
−
T
t=1
1
S
S
s=1
log
C
j=1
exp vj ht + (1 − lt)bj µ(Xc) + (s)
◦ σ(Xc)
+
T
t=1
lt log(σ(Γ ht)) + (1 − lt) log(1 − σ(Γ ht))
+
K
k=1
log(σk(Xc)) + const. (10)
References
[1] Adji Bousso Dieng, Chong Wang, Jianfeng Gao, and John Paisley. TopicRNN: A Recurrent Neural
Network with Long-Range Semantic Dependency. ICLR, 2017.
2

More Related Content

What's hot

Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
MeetupDataScienceRoma
 
Specific Finite Groups(General)
Specific Finite Groups(General)Specific Finite Groups(General)
Specific Finite Groups(General)Shane Nicklas
 
lecture 4
lecture 4lecture 4
lecture 4sajinsc
 
A One-Pass Triclustering Approach: Is There any Room for Big Data?
A One-Pass Triclustering Approach: Is There any Room for Big Data?A One-Pass Triclustering Approach: Is There any Room for Big Data?
A One-Pass Triclustering Approach: Is There any Room for Big Data?
Dmitrii Ignatov
 
Goldberg-Coxeter construction for 3- or 4-valent plane maps
Goldberg-Coxeter construction for 3- or 4-valent plane mapsGoldberg-Coxeter construction for 3- or 4-valent plane maps
Goldberg-Coxeter construction for 3- or 4-valent plane maps
Mathieu Dutour Sikiric
 
Specific Finite Groups(General)
Specific Finite Groups(General)Specific Finite Groups(General)
Specific Finite Groups(General)Shane Nicklas
 
Specific Finite Groups(General)
Specific Finite Groups(General)Specific Finite Groups(General)
Specific Finite Groups(General)Shane Nicklas
 
On maximal and variational Fourier restriction
On maximal and variational Fourier restrictionOn maximal and variational Fourier restriction
On maximal and variational Fourier restriction
VjekoslavKovac1
 
Bayesian Inference and Uncertainty Quantification for Inverse Problems
Bayesian Inference and Uncertainty Quantification for Inverse ProblemsBayesian Inference and Uncertainty Quantification for Inverse Problems
Bayesian Inference and Uncertainty Quantification for Inverse Problems
Matt Moores
 
R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...
R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...
R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...
Matt Moores
 
Kumegawa russia
Kumegawa russiaKumegawa russia
Kumegawa russia
Kazuki Kumegawa
 
Low-rank tensor approximation (Introduction)
Low-rank tensor approximation (Introduction)Low-rank tensor approximation (Introduction)
Low-rank tensor approximation (Introduction)
Alexander Litvinenko
 
Faster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select DictionariesFaster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select Dictionaries
Rakuten Group, Inc.
 
Prim's Algorithm on minimum spanning tree
Prim's Algorithm on minimum spanning treePrim's Algorithm on minimum spanning tree
Prim's Algorithm on minimum spanning tree
oneous
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
Alexander Litvinenko
 
On the-approximate-solution-of-a-nonlinear-singular-integral-equation
On the-approximate-solution-of-a-nonlinear-singular-integral-equationOn the-approximate-solution-of-a-nonlinear-singular-integral-equation
On the-approximate-solution-of-a-nonlinear-singular-integral-equationCemal Ardil
 
Fdtd ppt for mine
Fdtd ppt   for mineFdtd ppt   for mine
Fdtd ppt for mine
AnimikhGoswami
 
2.6 all pairsshortestpath
2.6 all pairsshortestpath2.6 all pairsshortestpath
2.6 all pairsshortestpath
Krish_ver2
 
A Commutative Alternative to Fractional Calculus on k-Differentiable Functions
A Commutative Alternative to Fractional Calculus on k-Differentiable FunctionsA Commutative Alternative to Fractional Calculus on k-Differentiable Functions
A Commutative Alternative to Fractional Calculus on k-Differentiable FunctionsMatt Parker
 
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
Fast Identification of Heavy Hitters by Cached and Packed Group TestingFast Identification of Heavy Hitters by Cached and Packed Group Testing
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
Rakuten Group, Inc.
 

What's hot (20)

Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)
 
Specific Finite Groups(General)
Specific Finite Groups(General)Specific Finite Groups(General)
Specific Finite Groups(General)
 
lecture 4
lecture 4lecture 4
lecture 4
 
A One-Pass Triclustering Approach: Is There any Room for Big Data?
A One-Pass Triclustering Approach: Is There any Room for Big Data?A One-Pass Triclustering Approach: Is There any Room for Big Data?
A One-Pass Triclustering Approach: Is There any Room for Big Data?
 
Goldberg-Coxeter construction for 3- or 4-valent plane maps
Goldberg-Coxeter construction for 3- or 4-valent plane mapsGoldberg-Coxeter construction for 3- or 4-valent plane maps
Goldberg-Coxeter construction for 3- or 4-valent plane maps
 
Specific Finite Groups(General)
Specific Finite Groups(General)Specific Finite Groups(General)
Specific Finite Groups(General)
 
Specific Finite Groups(General)
Specific Finite Groups(General)Specific Finite Groups(General)
Specific Finite Groups(General)
 
On maximal and variational Fourier restriction
On maximal and variational Fourier restrictionOn maximal and variational Fourier restriction
On maximal and variational Fourier restriction
 
Bayesian Inference and Uncertainty Quantification for Inverse Problems
Bayesian Inference and Uncertainty Quantification for Inverse ProblemsBayesian Inference and Uncertainty Quantification for Inverse Problems
Bayesian Inference and Uncertainty Quantification for Inverse Problems
 
R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...
R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...
R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...
 
Kumegawa russia
Kumegawa russiaKumegawa russia
Kumegawa russia
 
Low-rank tensor approximation (Introduction)
Low-rank tensor approximation (Introduction)Low-rank tensor approximation (Introduction)
Low-rank tensor approximation (Introduction)
 
Faster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select DictionariesFaster Practical Block Compression for Rank/Select Dictionaries
Faster Practical Block Compression for Rank/Select Dictionaries
 
Prim's Algorithm on minimum spanning tree
Prim's Algorithm on minimum spanning treePrim's Algorithm on minimum spanning tree
Prim's Algorithm on minimum spanning tree
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
 
On the-approximate-solution-of-a-nonlinear-singular-integral-equation
On the-approximate-solution-of-a-nonlinear-singular-integral-equationOn the-approximate-solution-of-a-nonlinear-singular-integral-equation
On the-approximate-solution-of-a-nonlinear-singular-integral-equation
 
Fdtd ppt for mine
Fdtd ppt   for mineFdtd ppt   for mine
Fdtd ppt for mine
 
2.6 all pairsshortestpath
2.6 all pairsshortestpath2.6 all pairsshortestpath
2.6 all pairsshortestpath
 
A Commutative Alternative to Fractional Calculus on k-Differentiable Functions
A Commutative Alternative to Fractional Calculus on k-Differentiable FunctionsA Commutative Alternative to Fractional Calculus on k-Differentiable Functions
A Commutative Alternative to Fractional Calculus on k-Differentiable Functions
 
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
Fast Identification of Heavy Hitters by Cached and Packed Group TestingFast Identification of Heavy Hitters by Cached and Packed Group Testing
Fast Identification of Heavy Hitters by Cached and Packed Group Testing
 

Similar to A Note on TopicRNN

On Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular IntegralsOn Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular Integrals
VjekoslavKovac1
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化
Akira Tanimoto
 
Hybrid Atlas Models of Financial Equity Market
Hybrid Atlas Models of Financial Equity MarketHybrid Atlas Models of Financial Equity Market
Hybrid Atlas Models of Financial Equity Markettomoyukiichiba
 
Tele4653 l1
Tele4653 l1Tele4653 l1
Tele4653 l1Vin Voro
 
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMSSOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
Tahia ZERIZER
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
Daisuke Yoneoka
 
A Note on BPTT for LSTM LM
A Note on BPTT for LSTM LMA Note on BPTT for LSTM LM
A Note on BPTT for LSTM LM
Tomonari Masada
 
Univariate Financial Time Series Analysis
Univariate Financial Time Series AnalysisUnivariate Financial Time Series Analysis
Univariate Financial Time Series Analysis
Anissa ATMANI
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Chiheb Ben Hammouda
 
Presentation OCIP2014
Presentation OCIP2014Presentation OCIP2014
Presentation OCIP2014
Fabian Froehlich
 
Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...
Alexander Litvinenko
 
Métodos computacionales para el estudio de modelos epidemiológicos con incer...
Métodos computacionales para el estudio de modelos  epidemiológicos con incer...Métodos computacionales para el estudio de modelos  epidemiológicos con incer...
Métodos computacionales para el estudio de modelos epidemiológicos con incer...
Facultad de Informática UCM
 
Tele4653 l5
Tele4653 l5Tele4653 l5
Tele4653 l5Vin Voro
 
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Alexander Litvinenko
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
The Statistical and Applied Mathematical Sciences Institute
 
Response Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty QuantificationResponse Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty Quantification
Alexander Litvinenko
 
2014 spring crunch seminar (SDE/levy/fractional/spectral method)
2014 spring crunch seminar (SDE/levy/fractional/spectral method)2014 spring crunch seminar (SDE/levy/fractional/spectral method)
2014 spring crunch seminar (SDE/levy/fractional/spectral method)Zheng Mengdi
 
22nd BSS meeting poster
22nd BSS meeting poster 22nd BSS meeting poster
22nd BSS meeting poster
Samuel Gbari
 

Similar to A Note on TopicRNN (20)

On Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular IntegralsOn Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular Integrals
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化
 
Hybrid Atlas Models of Financial Equity Market
Hybrid Atlas Models of Financial Equity MarketHybrid Atlas Models of Financial Equity Market
Hybrid Atlas Models of Financial Equity Market
 
Tele4653 l1
Tele4653 l1Tele4653 l1
Tele4653 l1
 
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMSSOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
residue
residueresidue
residue
 
A Note on BPTT for LSTM LM
A Note on BPTT for LSTM LMA Note on BPTT for LSTM LM
A Note on BPTT for LSTM LM
 
Univariate Financial Time Series Analysis
Univariate Financial Time Series AnalysisUnivariate Financial Time Series Analysis
Univariate Financial Time Series Analysis
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
 
Presentation OCIP2014
Presentation OCIP2014Presentation OCIP2014
Presentation OCIP2014
 
Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...
 
Métodos computacionales para el estudio de modelos epidemiológicos con incer...
Métodos computacionales para el estudio de modelos  epidemiológicos con incer...Métodos computacionales para el estudio de modelos  epidemiológicos con incer...
Métodos computacionales para el estudio de modelos epidemiológicos con incer...
 
Tele4653 l5
Tele4653 l5Tele4653 l5
Tele4653 l5
 
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
 
Response Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty QuantificationResponse Surface in Tensor Train format for Uncertainty Quantification
Response Surface in Tensor Train format for Uncertainty Quantification
 
2014 spring crunch seminar (SDE/levy/fractional/spectral method)
2014 spring crunch seminar (SDE/levy/fractional/spectral method)2014 spring crunch seminar (SDE/levy/fractional/spectral method)
2014 spring crunch seminar (SDE/levy/fractional/spectral method)
 
22nd BSS meeting poster
22nd BSS meeting poster 22nd BSS meeting poster
22nd BSS meeting poster
 
Mid term solution
Mid term solutionMid term solution
Mid term solution
 

More from Tomonari Masada

Learning Latent Space Energy Based Prior Modelの解説
Learning Latent Space Energy Based Prior Modelの解説Learning Latent Space Energy Based Prior Modelの解説
Learning Latent Space Energy Based Prior Modelの解説
Tomonari Masada
 
Denoising Diffusion Probabilistic Modelsの重要な式の解説
Denoising Diffusion Probabilistic Modelsの重要な式の解説Denoising Diffusion Probabilistic Modelsの重要な式の解説
Denoising Diffusion Probabilistic Modelsの重要な式の解説
Tomonari Masada
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingContext-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Tomonari Masada
 
A note on the density of Gumbel-softmax
A note on the density of Gumbel-softmaxA note on the density of Gumbel-softmax
A note on the density of Gumbel-softmax
Tomonari Masada
 
トピックモデルの基礎と応用
トピックモデルの基礎と応用トピックモデルの基礎と応用
トピックモデルの基礎と応用
Tomonari Masada
 
Expectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocationExpectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocation
Tomonari Masada
 
Mini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic ModelingMini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic Modeling
Tomonari Masada
 
A note on variational inference for the univariate Gaussian
A note on variational inference for the univariate GaussianA note on variational inference for the univariate Gaussian
A note on variational inference for the univariate Gaussian
Tomonari Masada
 
Document Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior DistributionsDocument Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior Distributions
Tomonari Masada
 
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka CompositionLDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
Tomonari Masada
 
A Note on ZINB-VAE
A Note on ZINB-VAEA Note on ZINB-VAE
A Note on ZINB-VAE
Tomonari Masada
 
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic ModelA Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
Tomonari Masada
 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
Tomonari Masada
 
Word count in Husserliana Volumes 1 to 28
Word count in Husserliana Volumes 1 to 28Word count in Husserliana Volumes 1 to 28
Word count in Husserliana Volumes 1 to 28
Tomonari Masada
 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
Tomonari Masada
 
FDSE2015
FDSE2015FDSE2015
FDSE2015
Tomonari Masada
 
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
Tomonari Masada
 
The detailed derivation of the derivatives in Table 2 of Marginalized Denoisi...
The detailed derivation of the derivatives in Table 2 of Marginalized Denoisi...The detailed derivation of the derivatives in Table 2 of Marginalized Denoisi...
The detailed derivation of the derivatives in Table 2 of Marginalized Denoisi...
Tomonari Masada
 
A Note on PCVB0 for HDP-LDA
A Note on PCVB0 for HDP-LDAA Note on PCVB0 for HDP-LDA
A Note on PCVB0 for HDP-LDA
Tomonari Masada
 
ChronoSAGE: Diversifying Topic Modeling Chronologically
ChronoSAGE: Diversifying Topic Modeling ChronologicallyChronoSAGE: Diversifying Topic Modeling Chronologically
ChronoSAGE: Diversifying Topic Modeling Chronologically
Tomonari Masada
 

More from Tomonari Masada (20)

Learning Latent Space Energy Based Prior Modelの解説
Learning Latent Space Energy Based Prior Modelの解説Learning Latent Space Energy Based Prior Modelの解説
Learning Latent Space Energy Based Prior Modelの解説
 
Denoising Diffusion Probabilistic Modelsの重要な式の解説
Denoising Diffusion Probabilistic Modelsの重要な式の解説Denoising Diffusion Probabilistic Modelsの重要な式の解説
Denoising Diffusion Probabilistic Modelsの重要な式の解説
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingContext-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
 
A note on the density of Gumbel-softmax
A note on the density of Gumbel-softmaxA note on the density of Gumbel-softmax
A note on the density of Gumbel-softmax
 
トピックモデルの基礎と応用
トピックモデルの基礎と応用トピックモデルの基礎と応用
トピックモデルの基礎と応用
 
Expectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocationExpectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocation
 
Mini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic ModelingMini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic Modeling
 
A note on variational inference for the univariate Gaussian
A note on variational inference for the univariate GaussianA note on variational inference for the univariate Gaussian
A note on variational inference for the univariate Gaussian
 
Document Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior DistributionsDocument Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior Distributions
 
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka CompositionLDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
 
A Note on ZINB-VAE
A Note on ZINB-VAEA Note on ZINB-VAE
A Note on ZINB-VAE
 
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic ModelA Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
 
Word count in Husserliana Volumes 1 to 28
Word count in Husserliana Volumes 1 to 28Word count in Husserliana Volumes 1 to 28
Word count in Husserliana Volumes 1 to 28
 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
 
FDSE2015
FDSE2015FDSE2015
FDSE2015
 
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
 
The detailed derivation of the derivatives in Table 2 of Marginalized Denoisi...
The detailed derivation of the derivatives in Table 2 of Marginalized Denoisi...The detailed derivation of the derivatives in Table 2 of Marginalized Denoisi...
The detailed derivation of the derivatives in Table 2 of Marginalized Denoisi...
 
A Note on PCVB0 for HDP-LDA
A Note on PCVB0 for HDP-LDAA Note on PCVB0 for HDP-LDA
A Note on PCVB0 for HDP-LDA
 
ChronoSAGE: Diversifying Topic Modeling Chronologically
ChronoSAGE: Diversifying Topic Modeling ChronologicallyChronoSAGE: Diversifying Topic Modeling Chronologically
ChronoSAGE: Diversifying Topic Modeling Chronologically
 

Recently uploaded

Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 

Recently uploaded (20)

Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 

A Note on TopicRNN

  • 1. A Note on TopicRNN Tomonari MASADA @ Nagasaki University July 13, 2017 1 Model TopicRNN is a generative model proposed by [1], whose generative story for a particular document x1:T is given as below. 1. Draw a topic vector θ ∼ N(0, I). 2. Given word y1:t−1, for the tth word yt in the document, (a) Compute hidden state ht = fW (xt, ht−1), where we let xt yt−1. (b) Draw stop word indicator lt ∼ Bernoulli(σ(Γ ht)), with σ the sigmoid function. (c) Draw word yt ∼ p(yt|ht, θ, lt, B), where p(yt = i|ht, θ, lt, B) ∝ exp(vi ht + (1 − lt)bi θ) . 2 Lower bound The log marginal likelihood of the word sequence y1:T and the stop word indicators l1:T is log p(y1:T , l1:T |h1:T ) = log p(θ) T t=1 p(yt|ht, lt, θ; W)p(lt|ht; Γ)dθ (1) A lower bound can be obtained as follows: log p(y1:T , l1:T |h1:T ) = log p(θ) T t=1 p(yt|ht, lt, θ; W)p(lt|ht; Γ)dθ = log q(θ) p(θ) T t=1 p(yt|ht, lt, θ; W)p(lt|ht; Γ) q(θ) dθ ≥ q(θ) log p(θ) T t=1 p(yt|ht, lt, θ; W)p(lt|ht; Γ) q(θ) dθ = q(θ) log p(θ)dθ + T t=1 q(θ) log p(yt|ht, lt, θ; W)dθ + T t=1 q(θ) log p(lt|ht; Γ)dθ − q(θ) log q(θ)dθ L(y1:T , l1:T |q(θ), Θ) (2) 3 Approximate posterior The form of q(θ) is chosen to be an inference network using a feed-forward neural network. Each expec- tation in Eq. (2) is approximated with the samples from q(θ|Xc), where Xc denotes the term-frequency representation of y1:T excluding stop words. The density of the approximate posterior q(θ|Xc) is specified as follows: q(θ|Xc) = N(θ; µ(Xc), diag(σ2 (Xc))), (3) µ(Xc) = W1g(Xc) + a1, (4) log σ(Xc) = W2g(Xc) + a2, (5) where g(·) denotes the feed-forward neural network. Eq. (3) gives the reparameterization of θk as θk = µk(Xc) + kσk(Xc) for k = 1, . . . , K, where k is a sample from the standard normal distribution N(0, 1). 1
  • 2. 4 Monte Carlo integration We can now rewrite each term of the lower bound L(y1:T , l1:T |q(θ), Θ) in Eq. (2) as below, where the θ(s) s denote the samples drawn from the approximate posterior q(θ|Xc). The first term: q(θ) log p(θ)dθ ≈ 1 S S s=1 log p(θ(s) ) = 1 S S s=1 K k=1 log 1 √ 2π exp − θ (s) k 2 2 = − K log(2π) 2 − 1 2 K k=1 s θ (s) k 2 S (6) Each addend of the second term: q(θ) log p(yt|ht, lt, θ; W)dθ ≈ 1 S S s=1 log exp(vyt ht + (1 − lt)byt θ(s) ) C j=1 exp(vj ht + (1 − lt)bj θ(s)) = vyt ht + (1 − lt)byt S s=1 θ(s) S − 1 S S s=1 log C j=1 exp vj ht + (1 − lt)bj θ(s) (7) Each addend of the third term: q(θ) log p(lt|ht; Γ)dθ = lt log(σ(Γ ht)) + (1 − lt) log(1 − σ(Γ ht)) (8) The fourth term: q(θ) log q(θ)dθ ≈ 1 S S s=1 K k=1 log 1 2πσ2 k(Xc) exp − (θ (s) k − µk(Xc))2 2σ2 k(Xc) = − K log(2π) 2 − K k=1 log(σk(Xc)) − 1 S S s=1 K k=1 θ (s) k − µk(Xc) 2 2σ2 k(Xc) (9) 5 Objective to be maximized Each of the s samples (i.e., θ(s) for s = 1, . . . , S) is obtained as θ(s) = µ(Xc)+ (s) ◦σ(Xc) via the reparam- eterization, where the (s) k s are drawn from the standard normal, and ◦ is the element-wise multiplication. Consequently, the lower bound L(y1:T , l1:T |q(θ), Θ) to be maximized is obtained as follows: L(y1:T , l1:T |q(θ), Θ) = − 1 2 K k=1 s µk(Xc) + (s) k σk(Xc) 2 S + T t=1 vyt ht + 1 S S s=1 T t=1 (1 − lt)byt µ(Xc) + (s) ◦ σ(Xc) − T t=1 1 S S s=1 log C j=1 exp vj ht + (1 − lt)bj µ(Xc) + (s) ◦ σ(Xc) + T t=1 lt log(σ(Γ ht)) + (1 − lt) log(1 − σ(Γ ht)) + K k=1 log(σk(Xc)) + const. (10) References [1] Adji Bousso Dieng, Chong Wang, Jianfeng Gao, and John Paisley. TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency. ICLR, 2017. 2