This document proposes a method to score sequences generated by RNN for automatic poetry composition using LDA-based topic modeling. It trains an RNN like GRU or LSTM on a corpus of Japanese tanka poems represented as sequences of character bigrams. It then uses LDA to infer topics in the training corpus and assign words in generated sequences to topics. Sequences containing many words assigned to the same topic receive a higher score, promoting diversity in top-ranked sequences compared to scoring based solely on RNN output probabilities. An experiment on tanka generation found the proposed method selected more varied poems.
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition
1. LDA-Based Scoring of Sequences Generated by
RNN for Automatic Tanka Composition
Tomonari Masada1
and Atsuhiro Takasu2
1
Nagasaki University, 1-14 Bunkyo-machi, Nagasaki, Japan
masada@nagasaki-u.ac.jp
2
National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan
takasu@nii.ac.jp
Abstract. This paper proposes a method of scoring sequences gener-
ated by recurrent neural network (RNN) for automatic Tanka compo-
sition. Our method gives sequences a score based on topic assignments
provided by latent Dirichlet allocation (LDA). A scoring of sequences
generated by RNN can be achieved with their output probabilities. How-
ever, the sequences having large probabilities are likely to share much the
same subsequences and thus are doomed to be deprived of diversity. In
contrast, our method uses topic assignments given by LDA. We first per-
form an inference for LDA over the same set of sequences as that used for
training RNN. The inferred per-topic word probabilities are then used
to obtain topic assignments of word tokens in the sequences generated
by a well-trained RNN. When a sequence contains many word tokens as-
signed to the same topic, we give it a high score. This is an entropy-based
scoring. The result of an experiment, where we scored Japanese Tanka
poems generated by RNN, shows that the top-ranked sequences selected
by our method were likely to contain a wider variety of subsequences
than those selected with RNN output probability.
Keywords: topic modeling, sequence generation, recurrent neural net-
work, automatic poetry composition
1 Introduction
Recurrent neural network (RNN) is a class of artificial neural network that has
been providing remarkable achievements in many important applications. Se-
quence generation is one among such applications [11][6]. By adopting a special
architecture like LSTM [9] or GRU [3][4], RNN can learn dependencies among
word tokens appearing at distant positions. It is thus possible to use RNN for
generating sequences showing dependencies among distant word tokens.
When we use sequence generation in realistic situations, we may sift the se-
quences generated by RNN to obtain only useful ones. Such sifting may give each
generated sequence a score representing its usefulness relative to the application
under consideration. While an improvement of the quality of sequences gener-
ated by RNN can also be achieved by modifying the architecture of RNN [5][13],
2. we here consider a scoring method that can be tuned and applied separately
from RNN. In particular, this paper proposes a method for scoring sequences to
achieve a diversity of subsequences appearing in highly scored sequences.
We may score the generated sequences by using their output probabilities
in RNN. However, this method is likely to gives high scores to the sequences
containing subsequences popular among the sequences used for training RNN.
Consequently, the sequences of high score are doomed to look alike and to show
only a limited diversity. In contrast, our scoring method uses latent Dirichlet allo-
cation (LDA) [2]. Topic models like LDA can extract diverse topics from training
documents. Each topic is represented as a probability distribution defined over
vocabulary words. By using the per-topic word probabilities LDA provides, we
can assign high scores to the sequences containing many words having a strong
relevance to some particular topic. This scoring is expected to give top-ranked
sequences that individually show a strong relevance to some particular topic and
together show a wide variety of topics.
We performed an evaluation experiment by generating Japanese Tanka po-
ems with RNN using LSTM or GRU module. Tanka poems basically have a
5-7-5-7-7 syllabic structure and thus consist of five parts. Every Tanka poem
used in the experiment was given in Hiragana characters with no voicing marks
and was represented as a sequence of non-overlapping character bigrams, which
were regarded as vocabulary words. After training RNN under different set-
tings, we chose the best setting in terms of validation perplexity. We then gener-
ated random sequences by using the best-trained RNN. The generated sequences
were scored by their output probabilities in RNN or by using our LDA-based
method. The result shows that our LDA-based method could select more diverse
sequences in the sense that a wider variety of subsequences were obtained as the
five parts of the top-ranked Tanka poems.
The rest of the paper is structured as follows. Section 2 describes the method
in detail. The result of the evaluation experiment is given in Section 3. Section 4
discusses previous study. Section 5 concludes the paper by giving future work.
2 Method
2.1 Preprocessing of Tanka Poems
Sequence generation is one among the important applications of RNN [11][6].
This paper considers the generation of Japanese Tanka poems. We assume that
all Tanka poems for training RNN are given in Hiragana characters with no
voicing marks. Tanka poems have a 5-7-5-7-7 syllabic structure and thus consist
of five subsequences, which are called parts in this paper. We give an example of
Tanka poem taken from The Tale of Genji in Fig. 1.1
Note that the Tanka poem
in Fig. 1 has a 6-7-5-7-7 syllabic structure. While the first part of the standard
syllabic structure consists of five syllables, that of this example consists of six.
In this manner, a small deviation from the standard 5-7-5-7-7 structure is often
1
In this paper, a romanization called Kunrei-shiki is used for Hiraganas.
3. Fig. 1. A Tanka poem from The Tale of Genji.
observed. When performing a preprocessing of Tanka poems, we need to consider
this kind of deviation. The details of preprocessing are given below.
First, we put a spacing character ‘ ’ between each neighboring pair of parts
and also at the tail of the poem. Further, the first Hiragana character of each of
the five parts is “uppercased,” i.e., marked as distinct from the same character
appearing at other positions. The above example is then converted to:
MO no o mo hu ni _ TA ti ma hu he ku mo _ A ra nu mi no _ SO te u ti hu ri
si _ KO ko ro si ri ki ya _
Second, we represent each Tanka poem as a sequence of non-overlapping char-
acter bigrams. Since we would like to regard character bigrams as vocabulary
words composing each Tanka poem, non-overlapping bigrams are used. How-
ever, only for the parts of even length, we apply an additional modification. The
special bigram ‘( )’ was put at the tail of the parts of even length in place
of the spacing character ‘ ’. Consequently, the non-overlapping bigram sequence
corresponding to the above example is obtained as follows:
(MO no) (o mo) (hu ni) (_ _) (TA ti) (ma hu) (he ku) (mo _) (A ra) (nu mi)
(no _) (SO te) (u ti) (hu ri) (si _) (KO ko) (ro si) (ri ki) (ya _)
Note that the even-length part ‘MO no o mo hu ni ’ is converted into the bi-
gram sequence ‘(MO no) (o mo) (hu ni) ( )’ by the additional modifica-
tion. Finally, we put tokens of the special words ‘BOS’ and ‘EOS’ at the head and
the tail of each sequence, respectively. The final output of preprocessing for the
above example is as follows:
BOS (MO no) (o mo) (hu ni) (_ _) (TA ti) (ma hu) (he ku) (mo _) (A ra)
(nu mi) (no _) (SO te) (u ti) (hu ri) (si _) (KO ko) (ro si) (ri ki)
(ya _) EOS
The sequences preprocessed in this manner were used for training RNN and also
for training LDA.
4. Fig. 2. Validation perplexities obtained by GRU-RNN with three hidden layers of
various sizes.
2.2 Tanka Poem Generation by RNN
We downloaded 179,225 Tanka poems from the web site of International Research
Center for Japanese Studies.2
3,631 different non-overlapping bigrams were found
to appear in this set. Therefore, the vocabulary size was 3,631. Among 179,225
Tanka poems, 143,550 were used for training RNN and also for training LDA,
and 35,675 were for validation, i.e., for tuning free parameters.
We implemented RNN with PyTorch3
by using LSTM or GRU modules. The
number of hidden layers were three, because larger numbers of layers could not
give better results in terms of validation set perplexity. The dropout probability
was 0.5 for all layers. The weights were clipped to [−0.25, 0.25]. RMSprop [12]
was used for optimization with the learning rate 0.002. The mini-batch size was
200. The best RNN architecture has been chosen by the following procedure.
Fig. 2 presents the validation set perplexity computed in the course of training
of GRU-RNN with three hidden layers. The hidden layer size h was set to 400,
600, 800, or 1,000. When h = 400 (dashed line), the worst result was obtained.
The results for the three other cases showed no significant difference. The running
time was the shortest when h = 600 (solid line) among these three cases. While
we obtained similar results also for LSTM-RNN, the results of GRU-RNN were
slightly better. Therefore, we have selected the GRU-RNN with three hidden
layers of size 600 as the model for generating Tanka poems.
2.3 Topic Modeling for Sequence Scoring
This paper proposes a new method for scoring the sequences generated by RNN.
We use latent Dirichlet allocation (LDA) [2], the best-known topic model, for
2
http://tois.nichibun.ac.jp/database/html2/waka/menu.html
3
http://pytorch.org/
5. scoring. LDA is a Bayesian probabilistic model of documents and can model
the difference in semantic contents of documents as the difference in mixing
proportions of topics. Each topic is in turn modeled as a probability distribution
defined over vocabulary words. The details of LDA is given below.
We first introduce notations. We denote the number of documents, the vocab-
ulary size, and the number of topics by D, V , and K, respectively. By performing
an inference for LDA, e.g. variational Bayesian inference [2] or collapsed Gibbs
sampling [7], over training document set, we can estimate the two groups of
parameters: θdk and φkv, for d = 1, . . . , D, v = 1, . . . , V , and k = 1, . . . , K.
The parameter θdk is the probability of the topic k in the document d. In-
tuitively, this group of parameters represents the importance of each topic in
each document. Technically, the parameter vector θd = (θd1, . . . , θdK) gives the
mixing proportions of K different topics in the document d. Each topic is in turn
represented as a probability distribution defined over words. The parameter φkv
is the probability of the word v in the topic k. Intuitively, φkv represents the rel-
evance of the word v to the topic k. For example, in autumn, people talk about
fallen leaves more often than about blooming flowers. Such topic relevancy of
vocabulary words is represented by the parameter vector φk = (φk1, . . . φkV ).
When an inference for LDA estimates many word tokens in the document
d as strongly relevant to the topic k, i.e., as having a large probability in the
topic k, θdk is also estimated as large. If the estimated value of θdk is far larger
than those of the θdk ’s for k = k, we can say that the document d is exclusively
related to the topic k. LDA is used in this manner to know whether a given
sequence is exclusively related to some particular topic or not when we score
Tanka poems generated by RNN.
In our experiment, the inference was performed by collapsed Gibbs sampling
(CGS) [7]. CGS iteratively updates the topic assignments of all word tokens in
every training document. This assignment can be viewed as a clustering of word
tokens, where the number of clusters is K. For explaining the details of CGS, we
introduce some more notations. Let xdi denote the vocabulary word appearing
as the i-th word token of the document d and zdi the topic to which the i-th
word token of the document d is assigned. CGS updates zdi based on the topic
probabilities given as follows:
p(zdi = k|X, Z¬di
) ∝ n¬di
dk + α ×
n¬di
kv + β
n¬di
k + V β
for k = 1, . . . , K. (1)
This is the conditional probability that the i-th word token of the document d is
assigned to the topic k when given all observed word tokens X and their topic
assignments except the assignment we are going to update, which is denoted
by Z¬di
in Eq. (1), where ¬di means the exclusion of the i-th word token of
the document d. It should hold that k p(zdi = k|X, Z¬di
) = 1. In Eq. (1),
nkv is the number of the tokens of the word v that are assigned to the topic
k in the entire document set, and nk is defined as nk ≡
V
v=1 nkv. α is the
hyperparameter of the symmetric Dirichlet prior for the θd’s, and β is that
of the symmetric Dirichlet prior for the φk’s [7][1]. In CGS, we compute the
6. Table 1. An Example of Topic Words
(HA na) (ha na) (HA ru) (no ) (hi su) (U ku) (ni ) (YA ma) (hu )
(sa to) (ru ) (NI ho) (U me) (ha ru) (ta ti) (HU ru) (SA ki) (U no)
(tu ki) (no ) (NA ka) (ni ) (TU ki) (A ka) (KU mo) (te ) (ka ke)
(wo ) (so ra) (ki yo) (ha ) (KA ke) (ka ri) (hi no) (yu ku) (KO yo)
(su ) (to ki) (HO to) (ko ye) (NA ki) (NA ku) (KO ye) (HI to) (ho to)
(ni ) (ni na) (su ka) (ku ) (MA tu) (HA tu) (ku ra) (MA ta) (ya ma)
(ka se) (ku ) (A ki) (ni ) (no ha) (se ) (KO ro) (so hu) (no )
(HU ki) (O to) (HU ku) (tu ka) (sa mu) (WO ki) (MA tu) (mo u) (U ra)
(KI mi) (no ) (KA yo) (ni ) (te ) (MA tu) (wo ) (ha ) (TI yo)
(ma tu) (to se) (tu yo) (I ku) (YO ro) (mu ) (TI to) (no ma) (ka ta)
probabilities of K topics by Eq. (1) at every word token and sample a topic
based on these probabilities. The sampled topic is set as a new value of zdi. We
repeat this sampling hundreds of times over the entire training document set.
When performing CGS in our evaluation experiment, we used the same set
of Tanka poems as that used for training RNN. Therefore, D = 143, 550 and
V = 3, 631 as given in Subsection 2.2. Since ordering of word tokens is irrelevant
to LDA, each non-overlapping bigram sequence was regarded as a bag of bigrams.
K was set to 50, because other numbers gave no significant improvements. The
hyperparameters α and β were tuned by grid search based on validation set
perplexity. It should be noted that CGS for LDA can be run independently of
the training of RNN. Table 1 gives an example of the 20 top-ranked topic words
in terms of φkv for five among 50 topics. Each row corresponds to a different
topic. The five topics represent bloom of flowers, autumn moon, singing birds,
wind blow, and thy reign, respectively from top to bottom. For example, in the
topic corresponding to the top row in Table 1, the words “ha na” (flowers), “ha
ru” (spring), “ni ho” (the first two Hiragana characters of the word “ni ho hi,”
which means fragrance), and “u me” (plum blossom) have large probabilities.
In the topic corresponding to the second row, the words “tu ki” (moon), “ku
mo” (cloud), “ka ke” (waning of the moon), and “ko yo” (the first two Hiragana
characters of the word “ko yo hi,” which means tonight) have large probabilities.
2.4 Topic-Based Scoring
Our sequence scoring uses the φk’s, i.e., the per-topic word probabilities, learned
by CGS from the training data set. CGS provides φkv as φkv ≡ nkv+β
nk+V β , where
nkv and nk are defined in Subsection 2.3. Based on these φk’s, we can assign
each word token of unseen documents, i.e., the documents not used for inference,
to one among the K topics. This procedure is called fold-in [1]. In our case, bi-
gram sequences generated by RNN are unseen documents. The fold-in procedure
ran CGS over each sequence by keeping the φk’s unchanged. We then computed
topic probabilities by counting the assignments to each topic. For example, when
the length of a given bigram sequence is 18, and the fold-in procedure assigns 12
7. among the 18 bigrams to some particular topic, the probability of that topic is
12+α
18+Kα . In the experiment, we obtained eight topic assignments every 50 itera-
tions after the 100th iteration of fold-in procedure for each sequence. The mean
of these eight probabilities was used to compute the entropy −
K
k=1
¯θ0k log ¯θ0k,
where ¯θ0k was the averaged probability of the topic k for a sequence under con-
sideration. We call this topic entropy. Smaller topic entropies are regarded as
better, because smaller ones correspond to the situations where more tokens
are assigned to the same topic. In other words, we aim to select the sequences
showing a topic consistency.
3 Evaluation
The evaluation experiment compared our scoring method to the method based on
RNN output probability. The output probability in RNN is obtained as follows.
We can generate a random sequence with RNN by starting from the special word
BOS and then randomly drawing words one by one until we draw the special
word EOS. The output probability of the generated sequence is the product of
the output probabilities of all tokens, where the probability of each token is
the output probability at each moment during the sequence generation. Our
LDA-based method was compared to this probability-based method.
We first investigate the difference of the top-ranked Tanka poems obtained by
the two compared scoring methods. Table 2 presents an example of the top five
Tanka poems selected by our method in the left column and those selected based
on RNN output probability in the right column. To obtain these top-ranked po-
ems, we first generated 100,000 Tanka poems with GRU-RNN. Since the number
of poems containing grammatically incorrect parts was large, a grammar check
was required. However, we could not find any good grammar check tool for ar-
chaic Japanese. Therefore, as an approximation, we regard the poems containing
at least one part appearing in no training Tanka poem as grammatically incor-
rect.4
It is a future work to perform a grammar check in a more appropriate
manner. After removing the poems that are grammatically incorrect, we assign
to the remaining poems a score based on each scoring method. Table 2 presents
the resulting top five Tanka poems for each method.
Table 2 shows that when we use RNN output probability (right column), it is
difficult to achieve topic consistency. The fourth poem contains the words “sa ku
4
This grammar check eventually reduces our search space to the set of random com-
binations of the parts appearing in the training set. However, even when we generate
Tanka poems simply by randomly shuffling the parts appearing in the existing Tanka
poems, we need some criterion to say if an obtained poem is good or not. To pro-
pose such criterion is nothing but to propose a method for Tanka poem generation.
The data set used in our experiment contains around 48,000 unique subsequences
that can be used in the first or third part of Tanka poem and around 215,000
unique subsequences that can be used in the second, fourth, or fifth part. It is far
from a trivial task to select from all possible combinations of these parts (i.e., from
≈ 48, 0002
×215, 0003
combinations) only those that are good in some definite sense.
8. Table 2. Top five Tanka poems selected by compared methods
(rank) Topic entropy Output probability
si ku re yu ku o ho a ra ki no
ka tu ra ki ya ma no mo ri no si ta ku sa
1 i ro hu ka ki ka mi na tu ki
mo mi ti no i ro ni mo ri no si ta ku sa
si ku re hu ri ke ri ku ti ni ke ru ka na
ti ha ya hu ru hi sa ka ta no
yu hu hi no ya ma no ku mo wi ni mi yu ru
2 ka mi na hi no tu ki ka ke no
mi mu ro no ya ma no tu ki ka ke wa ta ru
mo mi ti wo so mi ru a ma no ka ku ya ma
ka he ru ya ma ti ha ya hu ru
hu mo to no mi ti ha ka mo no ka ha na mi
3 ki ri ko me te ta ti ka he ri
hu ka ku mo mu su hu ki ru hi to mo na ki
wo ti no ya ma ka se a hu sa ka no se ki
o ho ka ta mo ta ka sa ko no
wa su ru ru ko to mo wo no he no sa ku ra
4 ka yo he to mo sa ki ni ke ri
o mo hu ko ko ro ni mi ne no ma tu ya ma
o mo hu ha ka ri so yu ki hu ri ni ke ri
u ti na hi ku ta ka sa ko no
ko ro mo te sa mu ki wo no he no sa ku ra
5 o ku ya ma ni na ka mu re ha
u tu ro hi ni ke ri a ri a ke no tu ki ni
a ki no ha tu ka se a ki ka se so hu ku
ra” (cherry blossom) and “yu ki” (snow). The fifth one contains the words “sa
ku ra” (cherry blossom) and “a ki ka se” (autumn wind). The poems top-ranked
based on RNN output probability are likely to contain the words expressing
different seasons. This is a fatal error for Tanka composition. In contrast, the
first poem selected by our method contains the words “si ku re” (drizzling rain)
and “mo mi ti” (autumnal tints). Because drizzling rain is a shower observed in
late autumn or in early winter, the word “mo mi ti” fits well within this context.
Further, in the third poem, the words “ya ma” and “hu mo to” are observed. The
former means mountain, and the latter means the foot of the mountain. This
poem also shows a topic consistency. However, a slight weakness can be observed
in the poems selected by our method. The same word is likely to be used twice
or more. For example, the word “o mo hu” (think, imagine) is repeated twice in
the fourth poem. While a refrain is often observed in Tanka poems, some future
work may introduce an improvement here.
Next, we investigated the diversity of Tanka poems selected by both methods.
We picked up the 200 top-ranked poems given by each scoring method. We then
split all poems into five parts to obtain 200 × 5 = 1, 000 parts in total. Since
9. Table 3. Ten most frequent parts in the top 200 Tanka poems
Topic entropy Output probability
hi sa ka ta no (9) a ri a ke no tu ki no (13)
ko ro mo te sa mu ki (8) hi sa ka ta no (12)
a ri a ke no tu ki ni (8) ta ka sa ko no (12)
ho to to ki su (7) ho to to ki su (11)
a ri a ke no tu ki (6) a si hi ki no (10)
a ki ka se so hu ku (6) a ma no ka ku ya ma (8)
sa ku ra ha na (6) a ri a ke no tu ki ni (7)
ka mi na tu ki (6) na ri ni ke ri (7)
ta na ha ta no (5) sa ku ra ha na (7)
a ki no yo no tu ki (5) ya ma ho to to ki su (7)
the resulting set of 1,000 parts included duplicates, we grouped the 1,000 parts
by their identity and counted duplicates. Table 3 presents the ten most frequent
parts for each scoring method. When we used RNN output probability for scoring
(right column), “a ri a ke no tu ki no” appeared 13 times, “hi sa ka ta no” 12
times, “ta ka sa ko no” 12 times, and so on, among the 1,000 parts coming from
the 200 top-ranked poems. In contrast, when we used the LDA-based scoring
(left column), “hi sa ka ta no” appeared nine times, “ko ro mo te sa mu ki”
eight times, “a ri a ke no tu ki ni” eight times, and so on. That is, there were
less duplicates for our method. Further, we also observed that while only 678
among 1,000 were unique when RNN output probability was used for scoring,
806 were unique when our scoring method was used. It can be said that our
method explores a larger diversity.
4 Previous Study
There already exists many proposals of automatic poetry composition using
RNN. However, LDA is a key component in our method. Therefore, we focus
on the proposals using topic modeling. Yan et al. [14] utilizes LDA for Chinese
poetry composition. However, LDA is only used for obtaining word similarities,
not for exploring topical diversity. Further, the poem generation is not achieved
with RNN. The subsequent work by the same author [15] proposes a method
called iterative polishing for recomposition of the poems generated by a well-
designed RNN architecture. However, LDA is no more used.
The combination of topic modeling and RNN can be found in the proposals
not related to automatic poetry composition. Dieng et al. [5] proposes a com-
bination of RNN and topic modeling. The model, called TopicRNN, modifies
the output word probabilities of RNN by the long-range semantic information
of each document captured by an LDA-like mechanism. However, when the se-
quences are generated with TopicRNN, we need to choose one document among
the existing documents as seed. This means that we can only generate a sequence
similar to the chosen document. While TopicRNN has this limitation, it provides
10. Fig. 3. Correlation between the topic entropy and the log output probability.
a valuable guide for future work. Our method detaches sequence selection from
sequence generation. Only after generating sequences by RNN, we can perform
a selection. It may be better to directly generate the sequences having some de-
sirable character regarding their topical contents. Xing et al. [13] also suggests
a similar research direction.
Hall et al. [8] utilizes the entropy of topic probabilities for studying the
history of ideas. However, the entropy is used for measuring how many different
topics are covered in a particular conference held in a particular year. Therefore,
larger entropies are better. Their evaluation direction is opposite to ours. Our
scoring regards the poems having topic consistency as important. Since LDA
can extract diverse topics, Tanka poems can be highly scored by our method due
to a wide variety of reasons. Some poems are topically consistent in describing
autumn moon, and others in describing blooming flowers. As a result, our method
eventually could select Tanka poems showing a diversity. RNN output probability
could not address such topical diversity, because it was indifferent to the topic
consistency within each Tanka poem.
5 Conclusions
This paper proposed a method for scoring the sequences generated by RNN. The
proposed method based on topic entropy was compared to the scoring based on
RNN output probability and could provide more diverse Tanka poems. An im-
mediate future work is to devise a method for combining these two scores. Fig. 3
plots per-token topic entropies (horizontal axis) obtained by our method and
per-token log output probabilities (vertical axis) of 10,000 sequences generated
by GRU-RNN. Since less topic entropies are better, the upper left part contains
the sequences better in terms of both scores. As depicted in Fig. 3, the two scores
only show a moderate negative correlation. The correlation coefficient is around
−0.41. It may be a meaningful task to seek an effective combination of the two
scores for a better sequence selection.
11. In this paper, we only consider the method for obtaining better sequences by
screening generated sequences. However, the same thing can also be achieved by
modifying the architecture of RNN. As discussed in Section 4, Dieng et al. [5]
incorporates an idea from topic modeling into the architecture of RNN. It is an
interesting future work to propose an architecture of RNN that can generate
sequences diverse in their topics. With respect to the evaluation, we only con-
sidered the diversity of subsequences of generated poems. It is also an important
future work to apply other evaluation metrics like BLEU [10][15] or even human
subjective evaluation.
Acknowledgments
This work was supported by Grant-in-Aid for Scientific Research (B) 15H02789.
References
1. Asuncion, A., Welling, M., Smyth, P., and Teh, Y. W.: On smoothing and inference
for topic models. In Proceedings of the Twenty-Fifth Conference on Uncertainty in
Artificial Intelligence (UAI ’09), 27–34 (2009)
2. Blei, D. M., Ng, A. Y., and Jordan, M. I.: Latent dirichlet allocation. J. Mach.
Learn. Res. 3, 993–1022 (2003)
3. Cho, K., van Merrienboer, B., Bahdanau, D., and Bengio, Y.: On the proper-
ties of neural machine translation: Encoder-decoder approaches. arXiv preprint,
arXiv:1409.1259 (2014)
4. Chung, J., Gulcehre, C., Cho, K., and Bengio, Y.: Empirical evaluation of gated
recurrent neural networks on sequence modeling. arXiv preprint, arXiv:1412.3555
(2014)
5. Dieng, A. B., Wang, C., Gao, J., and Paisley, J.: TopicRNN: A recurrent neural
network with long-range semantic dependency. arXiv preprint, arXiv:1611.01702
(2016)
6. Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint,
arXiv:1308.0850 (2013)
7. Griffiths, T. L. and Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. U.
S. A. 101, Suppl. 1, 5228–35 (2004)
8. Hall, D., Jurafsky, D., and Manning, C. D.: Studying the history of ideas using topic
models. In Proceedings of Conference on Empirical Methods in Natural Language
Processing (EMNLP ’08), 363–371 (2008)
9. Hochreiter, S. and Schmidhuber, J.: Long short-term memory. Neural Comput. 9,
8, 1735–1780 (1997)
10. Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J.: BLEU: a method for automatic
evaluation of machine translation. In Proceedings of the 40th Annual Meeting on
Association for Computational Linguistics (ACL ’02), 311–318 (2002)
11. Sutskever, I., Martens, J., and Hinton, G.: Generating text with recurrent neural
networks. In Proceedings of the 28th International Conference on Machine Learning
(ICML ’11), 1017–1024 (2011)
12. Tieleman, T. and Hinton, G.: Lecture 6.5-rmsprop: Divide the gradient by a run-
ning average of its recent magnitude. COURSERA: Neural Networks for Machine
Learning, 4 (2012)
12. 13. Xing, C., Wu, W., Wu, Y., Liu, J., Huang, Y., Zhou, M., and Ma, W.-Y.: Topic
aware neural response generation. In Proceedings of the 31st AAAI Conference on
Artificial Intelligence (AAAI ’17), 3351–3357 (2017)
14. Yan, R., Jiang, H., Lapata, M., Lin, S.-D., Lv, X., Li, X.: i, Poet: Automatic
Chinese poetry composition through a generative summarization framework under
constrained optimization. In Proceedings of the 23rd International Joint Conference
on Artificial Intelligence (IJCAI ’13), 2197–2203 (2013)
15. Yan, R.: i, Poet: Automatic poetry composition through recurrent neural networks
with iterative polishing schema. In Proceedings of the 25th International Joint Con-
ference on Artificial Intelligence (IJCAI ’16), 2238–2244 (2016)