A Note on TopicRNN

•

2 likes•431 views

TopicRNN is a generative model for documents that: 1. Draws a topic vector from a standard normal distribution and uses it to generate words in a document. 2. Computes a lower bound on the log marginal likelihood of words and stop word indicators. 3. Approximates the expected values in the lower bound using samples from an inference network that models the approximate posterior distribution over topics.

Data & Analytics

A Note on TopicRNN
Tomonari MASADA @ Nagasaki University
July 13, 2017
1 Model
TopicRNN is a generative model proposed by [1], whose generative story for a particular document x1:T
is given as below.
1. Draw a topic vector θ ∼ N(0, I).
2. Given word y1:t−1, for the tth word yt in the document,
(a) Compute hidden state ht = fW (xt, ht−1), where we let xt yt−1.
(b) Draw stop word indicator lt ∼ Bernoulli(σ(Γ ht)), with σ the sigmoid function.
(c) Draw word yt ∼ p(yt|ht, θ, lt, B), where
p(yt = i|ht, θ, lt, B) ∝ exp(vi ht + (1 − lt)bi θ) .
2 Lower bound
The log marginal likelihood of the word sequence y1:T and the stop word indicators l1:T is
log p(y1:T , l1:T |h1:T ) = log p(θ)
T
t=1
p(yt|ht, lt, θ; W)p(lt|ht; Γ)dθ (1)
A lower bound can be obtained as follows:
log p(y1:T , l1:T |h1:T ) = log p(θ)
T
t=1
p(yt|ht, lt, θ; W)p(lt|ht; Γ)dθ
= log q(θ)
p(θ)
T
t=1 p(yt|ht, lt, θ; W)p(lt|ht; Γ)
q(θ)
dθ
≥ q(θ) log
p(θ)
T
t=1 p(yt|ht, lt, θ; W)p(lt|ht; Γ)
q(θ)
dθ
= q(θ) log p(θ)dθ +
T
t=1
q(θ) log p(yt|ht, lt, θ; W)dθ +
T
t=1
q(θ) log p(lt|ht; Γ)dθ − q(θ) log q(θ)dθ
L(y1:T , l1:T |q(θ), Θ) (2)
3 Approximate posterior
The form of q(θ) is chosen to be an inference network using a feed-forward neural network. Each expec-
tation in Eq. (2) is approximated with the samples from q(θ|Xc), where Xc denotes the term-frequency
representation of y1:T excluding stop words. The density of the approximate posterior q(θ|Xc) is speciﬁed
as follows:
q(θ|Xc) = N(θ; µ(Xc), diag(σ2
(Xc))), (3)
µ(Xc) = W1g(Xc) + a1, (4)
log σ(Xc) = W2g(Xc) + a2, (5)
where g(·) denotes the feed-forward neural network. Eq. (3) gives the reparameterization of θk as θk =
µk(Xc) + kσk(Xc) for k = 1, . . . , K, where k is a sample from the standard normal distribution N(0, 1).
1

4 Monte Carlo integration
We can now rewrite each term of the lower bound L(y1:T , l1:T |q(θ), Θ) in Eq. (2) as below, where the θ(s)
s
denote the samples drawn from the approximate posterior q(θ|Xc).
The ﬁrst term:
q(θ) log p(θ)dθ ≈
1
S
S
s=1
log p(θ(s)
) =
1
S
S
s=1
K
k=1
log
1
√
2π
exp −
θ
(s)
k
2
2
= −
K log(2π)
2
−
1
2
K
k=1
s θ
(s)
k
2
S
(6)
Each addend of the second term:
q(θ) log p(yt|ht, lt, θ; W)dθ ≈
1
S
S
s=1
log
exp(vyt
ht + (1 − lt)byt
θ(s)
)
C
j=1 exp(vj ht + (1 − lt)bj θ(s))
= vyt
ht + (1 − lt)byt
S
s=1 θ(s)
S
−
1
S
S
s=1
log
C
j=1
exp vj ht + (1 − lt)bj θ(s)
(7)
Each addend of the third term:
q(θ) log p(lt|ht; Γ)dθ = lt log(σ(Γ ht)) + (1 − lt) log(1 − σ(Γ ht)) (8)
The fourth term:
q(θ) log q(θ)dθ ≈
1
S
S
s=1
K
k=1
log
1
2πσ2
k(Xc)
exp −
(θ
(s)
k − µk(Xc))2
2σ2
k(Xc)
= −
K log(2π)
2
−
K
k=1
log(σk(Xc)) −
1
S
S
s=1
K
k=1
θ
(s)
k − µk(Xc)
2
2σ2
k(Xc)
(9)
5 Objective to be maximized
Each of the s samples (i.e., θ(s)
for s = 1, . . . , S) is obtained as θ(s)
= µ(Xc)+ (s)
◦σ(Xc) via the reparam-
eterization, where the
(s)
k s are drawn from the standard normal, and ◦ is the element-wise multiplication.
Consequently, the lower bound L(y1:T , l1:T |q(θ), Θ) to be maximized is obtained as follows:
L(y1:T , l1:T |q(θ), Θ) = −
1
2
K
k=1
s µk(Xc) +
(s)
k σk(Xc)
2
S
+
T
t=1
vyt
ht +
1
S
S
s=1
T
t=1
(1 − lt)byt
µ(Xc) + (s)
◦ σ(Xc)
−
T
t=1
1
S
S
s=1
log
C
j=1
exp vj ht + (1 − lt)bj µ(Xc) + (s)
◦ σ(Xc)
+
T
t=1
lt log(σ(Γ ht)) + (1 − lt) log(1 − σ(Γ ht))
+
K
k=1
log(σk(Xc)) + const. (10)
References
[1] Adji Bousso Dieng, Chong Wang, Jianfeng Gao, and John Paisley. TopicRNN: A Recurrent Neural
Network with Long-Range Semantic Dependency. ICLR, 2017.
2

An efficient one-pass online algorithm for triclustering of binary data (triadic formal contexts) is proposed. This algorithm is a modified version of the basic algorithm for OAC-triclustering approach, but it has linear time and memory complexities with respect to the cardinality of the underlying ternary relation and can be easily parallelized in order to be applied for the analysis of big datasets. The results of computer experiments show the efficiency of the proposed algorithm.

Goldberg-Coxeter construction for 3- or 4-valent plane maps

Mathieu Dutour Sikiric

The Goldberg-Coxeter construction takes two integers (k,l) a 3-or 4-valent plane graph and returns a 3- or 4-valent plane graph. This construction is useful in virus study, numerical analysis, architecture, chemistry and of course mathematics. Here we consider the zigzags and central circuits of 3- or 4-valent plane graph. It turns out that we can define an algebraic construction of (k,l)-product that allows to find the length of the zigzags and central circuits in a compact way. All possible lengths of zigzags are determined by this (k,l)-product and the normal structure of the automorphism group allows to find them for some congruence conditions.

Specific Finite Groups(General)Shane Nicklas

On maximal and variational Fourier restriction

VjekoslavKovac1

Bayesian Inference and Uncertainty Quantification for Inverse Problems

Matt Moores

So-called “inverse” problems arise when the parameters of a physical system cannot be directly observed. The mapping between these latent parameters and the space of noisy observations is represented as a mathematical model, often involving a system of differential equations. We seek to infer the parameter values that best fit our observed data. However, it is also vital to obtain accurate quantification of the uncertainty involved with these parameters, particularly when the output of the model will be used for forecasting. Bayesian inference provides well-calibrated uncertainty estimates, represented by the posterior distribution over the parameters. In this talk, I will give a brief introduction to Markov chain Monte Carlo (MCMC) algorithms for sampling from the posterior distribution and describe how they can be combined with numerical solvers for the forward model. We apply these methods to two examples of ODE models: growth curves in ecology, and thermogravimetric analysis (TGA) in chemistry. This is joint work with Matthew Berry, Mark Nelson, Brian Monaghan and Raymond Longbottom.

R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...

Matt Moores

There are many approaches to Bayesian computation with intractable likelihoods, including the exchange algorithm, approximate Bayesian computation (ABC), thermodynamic integration, and composite likelihood. These approaches vary in accuracy as well as scalability for datasets of significant size. The Potts model is an example where such methods are required, due to its intractable normalising constant. This model is a type of Markov random field, which is commonly used for image segmentation. The dimension of its parameter space increases linearly with the number of pixels in the image, making this a challenging application for scalable Bayesian computation. My talk will introduce various algorithms in the context of the Potts model and describe their implementation in C++, using OpenMP for parallelism. I will also discuss the process of releasing this software as an open source R package on the CRAN repository.

Kumegawa russia

Kazuki Kumegawa

Low-rank tensor approximation (Introduction)

Alexander Litvinenko

Faster Practical Block Compression for Rank/Select Dictionaries

Rakuten Group, Inc.

Prim's Algorithm on minimum spanning tree

oneous

Hierarchical matrices for approximating large covariance matries and computin...

Alexander Litvinenko

On the-approximate-solution-of-a-nonlinear-singular-integral-equationCemal Ardil

Fdtd ppt for mine

AnimikhGoswami

2.6 all pairsshortestpath

Krish_ver2

A Commutative Alternative to Fractional Calculus on k-Differentiable FunctionsMatt Parker

Fast Identification of Heavy Hitters by Cached and Packed Group Testing

Rakuten Group, Inc.

On Twisted Paraproducts and some other Multilinear Singular Integrals

VjekoslavKovac1

MLP輪読スパース8章トレースノルム正則化

Akira Tanimoto

What's hot

Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)

MeetupDataScienceRoma

Specific Finite Groups(General)Shane Nicklas

lecture 4sajinsc

A One-Pass Triclustering Approach: Is There any Room for Big Data?

Dmitrii Ignatov

Goldberg-Coxeter construction for 3- or 4-valent plane maps

Mathieu Dutour Sikiric

Specific Finite Groups(General)Shane Nicklas

On maximal and variational Fourier restriction

VjekoslavKovac1

Bayesian Inference and Uncertainty Quantification for Inverse Problems

Matt Moores

R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...

Matt Moores

Kumegawa russia

Kazuki Kumegawa

Low-rank tensor approximation (Introduction)

Alexander Litvinenko

Faster Practical Block Compression for Rank/Select Dictionaries

Rakuten Group, Inc.

Prim's Algorithm on minimum spanning tree

oneous

Hierarchical matrices for approximating large covariance matries and computin...

Alexander Litvinenko

On the-approximate-solution-of-a-nonlinear-singular-integral-equationCemal Ardil

Fdtd ppt for mine

AnimikhGoswami

2.6 all pairsshortestpath

Krish_ver2

A Commutative Alternative to Fractional Calculus on k-Differentiable FunctionsMatt Parker

Fast Identification of Heavy Hitters by Cached and Packed Group Testing

Rakuten Group, Inc.

What's hot (20)

Quantum Machine Learning and QEM for Gaussian mixture models (Alessandro Luongo)

Specific Finite Groups(General)

lecture 4

A One-Pass Triclustering Approach: Is There any Room for Big Data?

Goldberg-Coxeter construction for 3- or 4-valent plane maps

Specific Finite Groups(General)

On maximal and variational Fourier restriction

Bayesian Inference and Uncertainty Quantification for Inverse Problems

R package 'bayesImageS': a case study in Bayesian computation using Rcpp and ...

Kumegawa russia

Low-rank tensor approximation (Introduction)

Faster Practical Block Compression for Rank/Select Dictionaries

Prim's Algorithm on minimum spanning tree

Hierarchical matrices for approximating large covariance matries and computin...

On the-approximate-solution-of-a-nonlinear-singular-integral-equation

Fdtd ppt for mine

2.6 all pairsshortestpath

A Commutative Alternative to Fractional Calculus on k-Differentiable Functions

Fast Identification of Heavy Hitters by Cached and Packed Group Testing

Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines. #SQL #Views #Privacy #Compliance #DataLake

Analysis insight about a Flyball dog competition team's performance

roli9797

Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...

John Andrews

SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation" Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults Description: Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project. Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas

一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理

mbawufebxi

原版定制【微信:41543339】【(Bradford毕业证书)布拉德福德大学毕业证】【微信:41543339】成绩单、外壳、offer、留信学历认证（永久存档真实可查）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【我们承诺采用的是学校原版纸张（纸质、底色、纹路），我们拥有全套进口原装设备，特殊工艺都是采用不同机器制作，仿真度基本可以达到100%，所有工艺效果都可提前给客户展示，不满意可以根据客户要求进行调整，直到满意为止！】【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信41543339】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信41543339】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才留信网服务项目： 1、留学生专业人才库服务（留信分析） 2、国（境）学习人员提供就业推荐信服务 3、留学人员区块链存储服务 → 【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：客户在留信官方认证查询网站查询到认证通过结果后付款，不成功不收费！

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...

Subhajit Sahu

Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.

一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理

ahzuo

UIUC毕业证offer【微信95270640】☀《伊利诺伊大学|厄巴纳-香槟分校毕业证购买》GoogleQ微信95270640《UIUC毕业证模板办理》加拿大文凭、本科、硕士、研究生学历都可以做,二、业务范围： ★、全套服务：毕业证、成绩单、化学专业毕业证书伪造《伊利诺伊大学|厄巴纳-香槟分校大学毕业证》Q微信95270640《UIUC学位证书购买》 (诚招代理)办理国外高校毕业证成绩单文凭学位证,真实使馆公证（留学回国人员证明）真实留信网认证国外学历学位认证雅思代考国外学校代申请名校保录开请假条改GPA改成绩ID卡 1.高仿业务:【本科硕士】毕业证,成绩单（GPA修改）,学历认证（教育部认证）,大学Offer,,ID,留信认证,使馆认证,雅思,语言证书等高仿类证书； 2.认证服务: 学历认证（教育部认证）,大使馆认证（回国人员证明）,留信认证（可查有编号证书）,大学保录取,雅思保分成绩单。 3.技术服务：钢印水印烫金激光防伪凹凸版设计印刷激凸温感光标底纹镭射速度快。办理伊利诺伊大学|厄巴纳-香槟分校伊利诺伊大学|厄巴纳-香槟分校毕业证offer流程： 1客户提供办理信息：姓名生日专业学位毕业时间等（如信息不确定可以咨询顾问：我们有专业老师帮你查询）； 2开始安排制作毕业证成绩单电子图； 3毕业证成绩单电子版做好以后发送给您确认； 4毕业证成绩单电子版您确认信息无误之后安排制作成品； 5成品做好拍照或者视频给您确认； 6快递给客户（国内顺丰国外DHLUPS等快读邮寄） -办理真实使馆公证（即留学回国人员证明） -办理各国各大学文凭（世界名校一对一专业服务,可全程监控跟踪进度） -全套服务：毕业证成绩单真实使馆公证真实教育部认证。让您回国发展信心十足！（详情请加一下文凭顾问+微信:95270640）欢迎咨询！的鬼地方父亲的家在高楼最底屋最下面很矮很黑是很不显眼的地下室父亲的家安在别人脚底下须绕过高楼旁边的垃圾堆下八个台阶才到父亲的家很狭小除了一张单人床和一张小方桌几乎没有多余的空间山娃一下子就联想起学校的男小便处山娃很想笑却怎么也笑不出来山娃很迷惑父亲的家除了一扇小铁门连窗户也没有墓穴一般阴森森有些骇人父亲的城也便成了山娃的城父亲的家也便成了山娃的家父亲让山娃呆在屋里做作业看电视最多只能在门口透透气间

一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理

g4dpvqap0

毕业原版【微信:41543339】【(爱大毕业证书)爱丁堡大学毕业证】【微信:41543339】成绩单、外壳、offer、留信学历认证（永久存档真实可查）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【我们承诺采用的是学校原版纸张（纸质、底色、纹路），我们拥有全套进口原装设备，特殊工艺都是采用不同机器制作，仿真度基本可以达到100%，所有工艺效果都可提前给客户展示，不满意可以根据客户要求进行调整，直到满意为止！】【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信41543339】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信41543339】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才留信网服务项目： 1、留学生专业人才库服务（留信分析） 2、国（境）学习人员提供就业推荐信服务 3、留学人员区块链存储服务 → 【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：客户在留信官方认证查询网站查询到认证通过结果后付款，不成功不收费！

Malana- Gimlet Market Analysis (Portfolio 2)

TravisMalana

Recently uploaded (20)

Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx

Influence of Marketing Strategy and Market Competition on Business Plan

Nanandann Nilekani's ppt On India's .pdf

My burning issue is homelessness K.C.M.O.

Adjusting OpenMP PageRank : SHORT REPORT / NOTES

做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样

Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf

办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样

The affect of service quality and online reviews on customer loyalty in the E...

哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样

一比一原版(CBU毕业证)卡普顿大学毕业证如何办理

一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理

ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake

Analysis insight about a Flyball dog competition team's performance

Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...

一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...

一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理

一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理

Malana- Gimlet Market Analysis (Portfolio 2)

A Note on TopicRNN

1. A Note on TopicRNN Tomonari MASADA @ Nagasaki University July 13, 2017 1 Model TopicRNN is a generative model proposed by [1], whose generative story for a particular document x1:T is given as below. 1. Draw a topic vector θ ∼ N(0, I). 2. Given word y1:t−1, for the tth word yt in the document, (a) Compute hidden state ht = fW (xt, ht−1), where we let xt yt−1. (b) Draw stop word indicator lt ∼ Bernoulli(σ(Γ ht)), with σ the sigmoid function. (c) Draw word yt ∼ p(yt|ht, θ, lt, B), where p(yt = i|ht, θ, lt, B) ∝ exp(vi ht + (1 − lt)bi θ) . 2 Lower bound The log marginal likelihood of the word sequence y1:T and the stop word indicators l1:T is log p(y1:T , l1:T |h1:T ) = log p(θ) T t=1 p(yt|ht, lt, θ; W)p(lt|ht; Γ)dθ (1) A lower bound can be obtained as follows: log p(y1:T , l1:T |h1:T ) = log p(θ) T t=1 p(yt|ht, lt, θ; W)p(lt|ht; Γ)dθ = log q(θ) p(θ) T t=1 p(yt|ht, lt, θ; W)p(lt|ht; Γ) q(θ) dθ ≥ q(θ) log p(θ) T t=1 p(yt|ht, lt, θ; W)p(lt|ht; Γ) q(θ) dθ = q(θ) log p(θ)dθ + T t=1 q(θ) log p(yt|ht, lt, θ; W)dθ + T t=1 q(θ) log p(lt|ht; Γ)dθ − q(θ) log q(θ)dθ L(y1:T , l1:T |q(θ), Θ) (2) 3 Approximate posterior The form of q(θ) is chosen to be an inference network using a feed-forward neural network. Each expec- tation in Eq. (2) is approximated with the samples from q(θ|Xc), where Xc denotes the term-frequency representation of y1:T excluding stop words. The density of the approximate posterior q(θ|Xc) is speciﬁed as follows: q(θ|Xc) = N(θ; µ(Xc), diag(σ2 (Xc))), (3) µ(Xc) = W1g(Xc) + a1, (4) log σ(Xc) = W2g(Xc) + a2, (5) where g(·) denotes the feed-forward neural network. Eq. (3) gives the reparameterization of θk as θk = µk(Xc) + kσk(Xc) for k = 1, . . . , K, where k is a sample from the standard normal distribution N(0, 1). 1

2. 4 Monte Carlo integration We can now rewrite each term of the lower bound L(y1:T , l1:T |q(θ), Θ) in Eq. (2) as below, where the θ(s) s denote the samples drawn from the approximate posterior q(θ|Xc). The ﬁrst term: q(θ) log p(θ)dθ ≈ 1 S S s=1 log p(θ(s) ) = 1 S S s=1 K k=1 log 1 √ 2π exp − θ (s) k 2 2 = − K log(2π) 2 − 1 2 K k=1 s θ (s) k 2 S (6) Each addend of the second term: q(θ) log p(yt|ht, lt, θ; W)dθ ≈ 1 S S s=1 log exp(vyt ht + (1 − lt)byt θ(s) ) C j=1 exp(vj ht + (1 − lt)bj θ(s)) = vyt ht + (1 − lt)byt S s=1 θ(s) S − 1 S S s=1 log C j=1 exp vj ht + (1 − lt)bj θ(s) (7) Each addend of the third term: q(θ) log p(lt|ht; Γ)dθ = lt log(σ(Γ ht)) + (1 − lt) log(1 − σ(Γ ht)) (8) The fourth term: q(θ) log q(θ)dθ ≈ 1 S S s=1 K k=1 log 1 2πσ2 k(Xc) exp − (θ (s) k − µk(Xc))2 2σ2 k(Xc) = − K log(2π) 2 − K k=1 log(σk(Xc)) − 1 S S s=1 K k=1 θ (s) k − µk(Xc) 2 2σ2 k(Xc) (9) 5 Objective to be maximized Each of the s samples (i.e., θ(s) for s = 1, . . . , S) is obtained as θ(s) = µ(Xc)+ (s) ◦σ(Xc) via the reparameterization, where the (s) k s are drawn from the standard normal, and ◦ is the element-wise multiplication. Consequently, the lower bound L(y1:T , l1:T |q(θ), Θ) to be maximized is obtained as follows: L(y1:T , l1:T |q(θ), Θ) = − 1 2 K k=1 s µk(Xc) + (s) k σk(Xc) 2 S + T t=1 vyt ht + 1 S S s=1 T t=1 (1 − lt)byt µ(Xc) + (s) ◦ σ(Xc) − T t=1 1 S S s=1 log C j=1 exp vj ht + (1 − lt)bj µ(Xc) + (s) ◦ σ(Xc) + T t=1 lt log(σ(Γ ht)) + (1 − lt) log(1 − σ(Γ ht)) + K k=1 log(σk(Xc)) + const. (10) References [1] Adji Bousso Dieng, Chong Wang, Jianfeng Gao, and John Paisley. TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency. ICLR, 2017. 2

A Note on TopicRNN

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A Note on TopicRNN

Similar to A Note on TopicRNN (20)

More from Tomonari Masada

More from Tomonari Masada (20)

Recently uploaded

Recently uploaded (20)

A Note on TopicRNN