SlideShare a Scribd company logo
1 of 12
Download to read offline
LDA-Based Scoring of Sequences Generated by
RNN for Automatic Tanka Composition
Tomonari Masada1
and Atsuhiro Takasu2
1
Nagasaki University, 1-14 Bunkyo-machi, Nagasaki, Japan
masada@nagasaki-u.ac.jp
2
National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan
takasu@nii.ac.jp
Abstract. This paper proposes a method of scoring sequences gener-
ated by recurrent neural network (RNN) for automatic Tanka compo-
sition. Our method gives sequences a score based on topic assignments
provided by latent Dirichlet allocation (LDA). A scoring of sequences
generated by RNN can be achieved with their output probabilities. How-
ever, the sequences having large probabilities are likely to share much the
same subsequences and thus are doomed to be deprived of diversity. In
contrast, our method uses topic assignments given by LDA. We first per-
form an inference for LDA over the same set of sequences as that used for
training RNN. The inferred per-topic word probabilities are then used
to obtain topic assignments of word tokens in the sequences generated
by a well-trained RNN. When a sequence contains many word tokens as-
signed to the same topic, we give it a high score. This is an entropy-based
scoring. The result of an experiment, where we scored Japanese Tanka
poems generated by RNN, shows that the top-ranked sequences selected
by our method were likely to contain a wider variety of subsequences
than those selected with RNN output probability.
Keywords: topic modeling, sequence generation, recurrent neural net-
work, automatic poetry composition
1 Introduction
Recurrent neural network (RNN) is a class of artificial neural network that has
been providing remarkable achievements in many important applications. Se-
quence generation is one among such applications [11][6]. By adopting a special
architecture like LSTM [9] or GRU [3][4], RNN can learn dependencies among
word tokens appearing at distant positions. It is thus possible to use RNN for
generating sequences showing dependencies among distant word tokens.
When we use sequence generation in realistic situations, we may sift the se-
quences generated by RNN to obtain only useful ones. Such sifting may give each
generated sequence a score representing its usefulness relative to the application
under consideration. While an improvement of the quality of sequences gener-
ated by RNN can also be achieved by modifying the architecture of RNN [5][13],
we here consider a scoring method that can be tuned and applied separately
from RNN. In particular, this paper proposes a method for scoring sequences to
achieve a diversity of subsequences appearing in highly scored sequences.
We may score the generated sequences by using their output probabilities
in RNN. However, this method is likely to gives high scores to the sequences
containing subsequences popular among the sequences used for training RNN.
Consequently, the sequences of high score are doomed to look alike and to show
only a limited diversity. In contrast, our scoring method uses latent Dirichlet allo-
cation (LDA) [2]. Topic models like LDA can extract diverse topics from training
documents. Each topic is represented as a probability distribution defined over
vocabulary words. By using the per-topic word probabilities LDA provides, we
can assign high scores to the sequences containing many words having a strong
relevance to some particular topic. This scoring is expected to give top-ranked
sequences that individually show a strong relevance to some particular topic and
together show a wide variety of topics.
We performed an evaluation experiment by generating Japanese Tanka po-
ems with RNN using LSTM or GRU module. Tanka poems basically have a
5-7-5-7-7 syllabic structure and thus consist of five parts. Every Tanka poem
used in the experiment was given in Hiragana characters with no voicing marks
and was represented as a sequence of non-overlapping character bigrams, which
were regarded as vocabulary words. After training RNN under different set-
tings, we chose the best setting in terms of validation perplexity. We then gener-
ated random sequences by using the best-trained RNN. The generated sequences
were scored by their output probabilities in RNN or by using our LDA-based
method. The result shows that our LDA-based method could select more diverse
sequences in the sense that a wider variety of subsequences were obtained as the
five parts of the top-ranked Tanka poems.
The rest of the paper is structured as follows. Section 2 describes the method
in detail. The result of the evaluation experiment is given in Section 3. Section 4
discusses previous study. Section 5 concludes the paper by giving future work.
2 Method
2.1 Preprocessing of Tanka Poems
Sequence generation is one among the important applications of RNN [11][6].
This paper considers the generation of Japanese Tanka poems. We assume that
all Tanka poems for training RNN are given in Hiragana characters with no
voicing marks. Tanka poems have a 5-7-5-7-7 syllabic structure and thus consist
of five subsequences, which are called parts in this paper. We give an example of
Tanka poem taken from The Tale of Genji in Fig. 1.1
Note that the Tanka poem
in Fig. 1 has a 6-7-5-7-7 syllabic structure. While the first part of the standard
syllabic structure consists of five syllables, that of this example consists of six.
In this manner, a small deviation from the standard 5-7-5-7-7 structure is often
1
In this paper, a romanization called Kunrei-shiki is used for Hiraganas.
Fig. 1. A Tanka poem from The Tale of Genji.
observed. When performing a preprocessing of Tanka poems, we need to consider
this kind of deviation. The details of preprocessing are given below.
First, we put a spacing character ‘ ’ between each neighboring pair of parts
and also at the tail of the poem. Further, the first Hiragana character of each of
the five parts is “uppercased,” i.e., marked as distinct from the same character
appearing at other positions. The above example is then converted to:
MO no o mo hu ni _ TA ti ma hu he ku mo _ A ra nu mi no _ SO te u ti hu ri
si _ KO ko ro si ri ki ya _
Second, we represent each Tanka poem as a sequence of non-overlapping char-
acter bigrams. Since we would like to regard character bigrams as vocabulary
words composing each Tanka poem, non-overlapping bigrams are used. How-
ever, only for the parts of even length, we apply an additional modification. The
special bigram ‘( )’ was put at the tail of the parts of even length in place
of the spacing character ‘ ’. Consequently, the non-overlapping bigram sequence
corresponding to the above example is obtained as follows:
(MO no) (o mo) (hu ni) (_ _) (TA ti) (ma hu) (he ku) (mo _) (A ra) (nu mi)
(no _) (SO te) (u ti) (hu ri) (si _) (KO ko) (ro si) (ri ki) (ya _)
Note that the even-length part ‘MO no o mo hu ni ’ is converted into the bi-
gram sequence ‘(MO no) (o mo) (hu ni) ( )’ by the additional modifica-
tion. Finally, we put tokens of the special words ‘BOS’ and ‘EOS’ at the head and
the tail of each sequence, respectively. The final output of preprocessing for the
above example is as follows:
BOS (MO no) (o mo) (hu ni) (_ _) (TA ti) (ma hu) (he ku) (mo _) (A ra)
(nu mi) (no _) (SO te) (u ti) (hu ri) (si _) (KO ko) (ro si) (ri ki)
(ya _) EOS
The sequences preprocessed in this manner were used for training RNN and also
for training LDA.
Fig. 2. Validation perplexities obtained by GRU-RNN with three hidden layers of
various sizes.
2.2 Tanka Poem Generation by RNN
We downloaded 179,225 Tanka poems from the web site of International Research
Center for Japanese Studies.2
3,631 different non-overlapping bigrams were found
to appear in this set. Therefore, the vocabulary size was 3,631. Among 179,225
Tanka poems, 143,550 were used for training RNN and also for training LDA,
and 35,675 were for validation, i.e., for tuning free parameters.
We implemented RNN with PyTorch3
by using LSTM or GRU modules. The
number of hidden layers were three, because larger numbers of layers could not
give better results in terms of validation set perplexity. The dropout probability
was 0.5 for all layers. The weights were clipped to [−0.25, 0.25]. RMSprop [12]
was used for optimization with the learning rate 0.002. The mini-batch size was
200. The best RNN architecture has been chosen by the following procedure.
Fig. 2 presents the validation set perplexity computed in the course of training
of GRU-RNN with three hidden layers. The hidden layer size h was set to 400,
600, 800, or 1,000. When h = 400 (dashed line), the worst result was obtained.
The results for the three other cases showed no significant difference. The running
time was the shortest when h = 600 (solid line) among these three cases. While
we obtained similar results also for LSTM-RNN, the results of GRU-RNN were
slightly better. Therefore, we have selected the GRU-RNN with three hidden
layers of size 600 as the model for generating Tanka poems.
2.3 Topic Modeling for Sequence Scoring
This paper proposes a new method for scoring the sequences generated by RNN.
We use latent Dirichlet allocation (LDA) [2], the best-known topic model, for
2
http://tois.nichibun.ac.jp/database/html2/waka/menu.html
3
http://pytorch.org/
scoring. LDA is a Bayesian probabilistic model of documents and can model
the difference in semantic contents of documents as the difference in mixing
proportions of topics. Each topic is in turn modeled as a probability distribution
defined over vocabulary words. The details of LDA is given below.
We first introduce notations. We denote the number of documents, the vocab-
ulary size, and the number of topics by D, V , and K, respectively. By performing
an inference for LDA, e.g. variational Bayesian inference [2] or collapsed Gibbs
sampling [7], over training document set, we can estimate the two groups of
parameters: θdk and φkv, for d = 1, . . . , D, v = 1, . . . , V , and k = 1, . . . , K.
The parameter θdk is the probability of the topic k in the document d. In-
tuitively, this group of parameters represents the importance of each topic in
each document. Technically, the parameter vector θd = (θd1, . . . , θdK) gives the
mixing proportions of K different topics in the document d. Each topic is in turn
represented as a probability distribution defined over words. The parameter φkv
is the probability of the word v in the topic k. Intuitively, φkv represents the rel-
evance of the word v to the topic k. For example, in autumn, people talk about
fallen leaves more often than about blooming flowers. Such topic relevancy of
vocabulary words is represented by the parameter vector φk = (φk1, . . . φkV ).
When an inference for LDA estimates many word tokens in the document
d as strongly relevant to the topic k, i.e., as having a large probability in the
topic k, θdk is also estimated as large. If the estimated value of θdk is far larger
than those of the θdk ’s for k = k, we can say that the document d is exclusively
related to the topic k. LDA is used in this manner to know whether a given
sequence is exclusively related to some particular topic or not when we score
Tanka poems generated by RNN.
In our experiment, the inference was performed by collapsed Gibbs sampling
(CGS) [7]. CGS iteratively updates the topic assignments of all word tokens in
every training document. This assignment can be viewed as a clustering of word
tokens, where the number of clusters is K. For explaining the details of CGS, we
introduce some more notations. Let xdi denote the vocabulary word appearing
as the i-th word token of the document d and zdi the topic to which the i-th
word token of the document d is assigned. CGS updates zdi based on the topic
probabilities given as follows:
p(zdi = k|X, Z¬di
) ∝ n¬di
dk + α ×
n¬di
kv + β
n¬di
k + V β
for k = 1, . . . , K. (1)
This is the conditional probability that the i-th word token of the document d is
assigned to the topic k when given all observed word tokens X and their topic
assignments except the assignment we are going to update, which is denoted
by Z¬di
in Eq. (1), where ¬di means the exclusion of the i-th word token of
the document d. It should hold that k p(zdi = k|X, Z¬di
) = 1. In Eq. (1),
nkv is the number of the tokens of the word v that are assigned to the topic
k in the entire document set, and nk is defined as nk ≡
V
v=1 nkv. α is the
hyperparameter of the symmetric Dirichlet prior for the θd’s, and β is that
of the symmetric Dirichlet prior for the φk’s [7][1]. In CGS, we compute the
Table 1. An Example of Topic Words
(HA na) (ha na) (HA ru) (no ) (hi su) (U ku) (ni ) (YA ma) (hu )
(sa to) (ru ) (NI ho) (U me) (ha ru) (ta ti) (HU ru) (SA ki) (U no)
(tu ki) (no ) (NA ka) (ni ) (TU ki) (A ka) (KU mo) (te ) (ka ke)
(wo ) (so ra) (ki yo) (ha ) (KA ke) (ka ri) (hi no) (yu ku) (KO yo)
(su ) (to ki) (HO to) (ko ye) (NA ki) (NA ku) (KO ye) (HI to) (ho to)
(ni ) (ni na) (su ka) (ku ) (MA tu) (HA tu) (ku ra) (MA ta) (ya ma)
(ka se) (ku ) (A ki) (ni ) (no ha) (se ) (KO ro) (so hu) (no )
(HU ki) (O to) (HU ku) (tu ka) (sa mu) (WO ki) (MA tu) (mo u) (U ra)
(KI mi) (no ) (KA yo) (ni ) (te ) (MA tu) (wo ) (ha ) (TI yo)
(ma tu) (to se) (tu yo) (I ku) (YO ro) (mu ) (TI to) (no ma) (ka ta)
probabilities of K topics by Eq. (1) at every word token and sample a topic
based on these probabilities. The sampled topic is set as a new value of zdi. We
repeat this sampling hundreds of times over the entire training document set.
When performing CGS in our evaluation experiment, we used the same set
of Tanka poems as that used for training RNN. Therefore, D = 143, 550 and
V = 3, 631 as given in Subsection 2.2. Since ordering of word tokens is irrelevant
to LDA, each non-overlapping bigram sequence was regarded as a bag of bigrams.
K was set to 50, because other numbers gave no significant improvements. The
hyperparameters α and β were tuned by grid search based on validation set
perplexity. It should be noted that CGS for LDA can be run independently of
the training of RNN. Table 1 gives an example of the 20 top-ranked topic words
in terms of φkv for five among 50 topics. Each row corresponds to a different
topic. The five topics represent bloom of flowers, autumn moon, singing birds,
wind blow, and thy reign, respectively from top to bottom. For example, in the
topic corresponding to the top row in Table 1, the words “ha na” (flowers), “ha
ru” (spring), “ni ho” (the first two Hiragana characters of the word “ni ho hi,”
which means fragrance), and “u me” (plum blossom) have large probabilities.
In the topic corresponding to the second row, the words “tu ki” (moon), “ku
mo” (cloud), “ka ke” (waning of the moon), and “ko yo” (the first two Hiragana
characters of the word “ko yo hi,” which means tonight) have large probabilities.
2.4 Topic-Based Scoring
Our sequence scoring uses the φk’s, i.e., the per-topic word probabilities, learned
by CGS from the training data set. CGS provides φkv as φkv ≡ nkv+β
nk+V β , where
nkv and nk are defined in Subsection 2.3. Based on these φk’s, we can assign
each word token of unseen documents, i.e., the documents not used for inference,
to one among the K topics. This procedure is called fold-in [1]. In our case, bi-
gram sequences generated by RNN are unseen documents. The fold-in procedure
ran CGS over each sequence by keeping the φk’s unchanged. We then computed
topic probabilities by counting the assignments to each topic. For example, when
the length of a given bigram sequence is 18, and the fold-in procedure assigns 12
among the 18 bigrams to some particular topic, the probability of that topic is
12+α
18+Kα . In the experiment, we obtained eight topic assignments every 50 itera-
tions after the 100th iteration of fold-in procedure for each sequence. The mean
of these eight probabilities was used to compute the entropy −
K
k=1
¯θ0k log ¯θ0k,
where ¯θ0k was the averaged probability of the topic k for a sequence under con-
sideration. We call this topic entropy. Smaller topic entropies are regarded as
better, because smaller ones correspond to the situations where more tokens
are assigned to the same topic. In other words, we aim to select the sequences
showing a topic consistency.
3 Evaluation
The evaluation experiment compared our scoring method to the method based on
RNN output probability. The output probability in RNN is obtained as follows.
We can generate a random sequence with RNN by starting from the special word
BOS and then randomly drawing words one by one until we draw the special
word EOS. The output probability of the generated sequence is the product of
the output probabilities of all tokens, where the probability of each token is
the output probability at each moment during the sequence generation. Our
LDA-based method was compared to this probability-based method.
We first investigate the difference of the top-ranked Tanka poems obtained by
the two compared scoring methods. Table 2 presents an example of the top five
Tanka poems selected by our method in the left column and those selected based
on RNN output probability in the right column. To obtain these top-ranked po-
ems, we first generated 100,000 Tanka poems with GRU-RNN. Since the number
of poems containing grammatically incorrect parts was large, a grammar check
was required. However, we could not find any good grammar check tool for ar-
chaic Japanese. Therefore, as an approximation, we regard the poems containing
at least one part appearing in no training Tanka poem as grammatically incor-
rect.4
It is a future work to perform a grammar check in a more appropriate
manner. After removing the poems that are grammatically incorrect, we assign
to the remaining poems a score based on each scoring method. Table 2 presents
the resulting top five Tanka poems for each method.
Table 2 shows that when we use RNN output probability (right column), it is
difficult to achieve topic consistency. The fourth poem contains the words “sa ku
4
This grammar check eventually reduces our search space to the set of random com-
binations of the parts appearing in the training set. However, even when we generate
Tanka poems simply by randomly shuffling the parts appearing in the existing Tanka
poems, we need some criterion to say if an obtained poem is good or not. To pro-
pose such criterion is nothing but to propose a method for Tanka poem generation.
The data set used in our experiment contains around 48,000 unique subsequences
that can be used in the first or third part of Tanka poem and around 215,000
unique subsequences that can be used in the second, fourth, or fifth part. It is far
from a trivial task to select from all possible combinations of these parts (i.e., from
≈ 48, 0002
×215, 0003
combinations) only those that are good in some definite sense.
Table 2. Top five Tanka poems selected by compared methods
(rank) Topic entropy Output probability
si ku re yu ku o ho a ra ki no
ka tu ra ki ya ma no mo ri no si ta ku sa
1 i ro hu ka ki ka mi na tu ki
mo mi ti no i ro ni mo ri no si ta ku sa
si ku re hu ri ke ri ku ti ni ke ru ka na
ti ha ya hu ru hi sa ka ta no
yu hu hi no ya ma no ku mo wi ni mi yu ru
2 ka mi na hi no tu ki ka ke no
mi mu ro no ya ma no tu ki ka ke wa ta ru
mo mi ti wo so mi ru a ma no ka ku ya ma
ka he ru ya ma ti ha ya hu ru
hu mo to no mi ti ha ka mo no ka ha na mi
3 ki ri ko me te ta ti ka he ri
hu ka ku mo mu su hu ki ru hi to mo na ki
wo ti no ya ma ka se a hu sa ka no se ki
o ho ka ta mo ta ka sa ko no
wa su ru ru ko to mo wo no he no sa ku ra
4 ka yo he to mo sa ki ni ke ri
o mo hu ko ko ro ni mi ne no ma tu ya ma
o mo hu ha ka ri so yu ki hu ri ni ke ri
u ti na hi ku ta ka sa ko no
ko ro mo te sa mu ki wo no he no sa ku ra
5 o ku ya ma ni na ka mu re ha
u tu ro hi ni ke ri a ri a ke no tu ki ni
a ki no ha tu ka se a ki ka se so hu ku
ra” (cherry blossom) and “yu ki” (snow). The fifth one contains the words “sa
ku ra” (cherry blossom) and “a ki ka se” (autumn wind). The poems top-ranked
based on RNN output probability are likely to contain the words expressing
different seasons. This is a fatal error for Tanka composition. In contrast, the
first poem selected by our method contains the words “si ku re” (drizzling rain)
and “mo mi ti” (autumnal tints). Because drizzling rain is a shower observed in
late autumn or in early winter, the word “mo mi ti” fits well within this context.
Further, in the third poem, the words “ya ma” and “hu mo to” are observed. The
former means mountain, and the latter means the foot of the mountain. This
poem also shows a topic consistency. However, a slight weakness can be observed
in the poems selected by our method. The same word is likely to be used twice
or more. For example, the word “o mo hu” (think, imagine) is repeated twice in
the fourth poem. While a refrain is often observed in Tanka poems, some future
work may introduce an improvement here.
Next, we investigated the diversity of Tanka poems selected by both methods.
We picked up the 200 top-ranked poems given by each scoring method. We then
split all poems into five parts to obtain 200 × 5 = 1, 000 parts in total. Since
Table 3. Ten most frequent parts in the top 200 Tanka poems
Topic entropy Output probability
hi sa ka ta no (9) a ri a ke no tu ki no (13)
ko ro mo te sa mu ki (8) hi sa ka ta no (12)
a ri a ke no tu ki ni (8) ta ka sa ko no (12)
ho to to ki su (7) ho to to ki su (11)
a ri a ke no tu ki (6) a si hi ki no (10)
a ki ka se so hu ku (6) a ma no ka ku ya ma (8)
sa ku ra ha na (6) a ri a ke no tu ki ni (7)
ka mi na tu ki (6) na ri ni ke ri (7)
ta na ha ta no (5) sa ku ra ha na (7)
a ki no yo no tu ki (5) ya ma ho to to ki su (7)
the resulting set of 1,000 parts included duplicates, we grouped the 1,000 parts
by their identity and counted duplicates. Table 3 presents the ten most frequent
parts for each scoring method. When we used RNN output probability for scoring
(right column), “a ri a ke no tu ki no” appeared 13 times, “hi sa ka ta no” 12
times, “ta ka sa ko no” 12 times, and so on, among the 1,000 parts coming from
the 200 top-ranked poems. In contrast, when we used the LDA-based scoring
(left column), “hi sa ka ta no” appeared nine times, “ko ro mo te sa mu ki”
eight times, “a ri a ke no tu ki ni” eight times, and so on. That is, there were
less duplicates for our method. Further, we also observed that while only 678
among 1,000 were unique when RNN output probability was used for scoring,
806 were unique when our scoring method was used. It can be said that our
method explores a larger diversity.
4 Previous Study
There already exists many proposals of automatic poetry composition using
RNN. However, LDA is a key component in our method. Therefore, we focus
on the proposals using topic modeling. Yan et al. [14] utilizes LDA for Chinese
poetry composition. However, LDA is only used for obtaining word similarities,
not for exploring topical diversity. Further, the poem generation is not achieved
with RNN. The subsequent work by the same author [15] proposes a method
called iterative polishing for recomposition of the poems generated by a well-
designed RNN architecture. However, LDA is no more used.
The combination of topic modeling and RNN can be found in the proposals
not related to automatic poetry composition. Dieng et al. [5] proposes a com-
bination of RNN and topic modeling. The model, called TopicRNN, modifies
the output word probabilities of RNN by the long-range semantic information
of each document captured by an LDA-like mechanism. However, when the se-
quences are generated with TopicRNN, we need to choose one document among
the existing documents as seed. This means that we can only generate a sequence
similar to the chosen document. While TopicRNN has this limitation, it provides
Fig. 3. Correlation between the topic entropy and the log output probability.
a valuable guide for future work. Our method detaches sequence selection from
sequence generation. Only after generating sequences by RNN, we can perform
a selection. It may be better to directly generate the sequences having some de-
sirable character regarding their topical contents. Xing et al. [13] also suggests
a similar research direction.
Hall et al. [8] utilizes the entropy of topic probabilities for studying the
history of ideas. However, the entropy is used for measuring how many different
topics are covered in a particular conference held in a particular year. Therefore,
larger entropies are better. Their evaluation direction is opposite to ours. Our
scoring regards the poems having topic consistency as important. Since LDA
can extract diverse topics, Tanka poems can be highly scored by our method due
to a wide variety of reasons. Some poems are topically consistent in describing
autumn moon, and others in describing blooming flowers. As a result, our method
eventually could select Tanka poems showing a diversity. RNN output probability
could not address such topical diversity, because it was indifferent to the topic
consistency within each Tanka poem.
5 Conclusions
This paper proposed a method for scoring the sequences generated by RNN. The
proposed method based on topic entropy was compared to the scoring based on
RNN output probability and could provide more diverse Tanka poems. An im-
mediate future work is to devise a method for combining these two scores. Fig. 3
plots per-token topic entropies (horizontal axis) obtained by our method and
per-token log output probabilities (vertical axis) of 10,000 sequences generated
by GRU-RNN. Since less topic entropies are better, the upper left part contains
the sequences better in terms of both scores. As depicted in Fig. 3, the two scores
only show a moderate negative correlation. The correlation coefficient is around
−0.41. It may be a meaningful task to seek an effective combination of the two
scores for a better sequence selection.
In this paper, we only consider the method for obtaining better sequences by
screening generated sequences. However, the same thing can also be achieved by
modifying the architecture of RNN. As discussed in Section 4, Dieng et al. [5]
incorporates an idea from topic modeling into the architecture of RNN. It is an
interesting future work to propose an architecture of RNN that can generate
sequences diverse in their topics. With respect to the evaluation, we only con-
sidered the diversity of subsequences of generated poems. It is also an important
future work to apply other evaluation metrics like BLEU [10][15] or even human
subjective evaluation.
Acknowledgments
This work was supported by Grant-in-Aid for Scientific Research (B) 15H02789.
References
1. Asuncion, A., Welling, M., Smyth, P., and Teh, Y. W.: On smoothing and inference
for topic models. In Proceedings of the Twenty-Fifth Conference on Uncertainty in
Artificial Intelligence (UAI ’09), 27–34 (2009)
2. Blei, D. M., Ng, A. Y., and Jordan, M. I.: Latent dirichlet allocation. J. Mach.
Learn. Res. 3, 993–1022 (2003)
3. Cho, K., van Merrienboer, B., Bahdanau, D., and Bengio, Y.: On the proper-
ties of neural machine translation: Encoder-decoder approaches. arXiv preprint,
arXiv:1409.1259 (2014)
4. Chung, J., Gulcehre, C., Cho, K., and Bengio, Y.: Empirical evaluation of gated
recurrent neural networks on sequence modeling. arXiv preprint, arXiv:1412.3555
(2014)
5. Dieng, A. B., Wang, C., Gao, J., and Paisley, J.: TopicRNN: A recurrent neural
network with long-range semantic dependency. arXiv preprint, arXiv:1611.01702
(2016)
6. Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint,
arXiv:1308.0850 (2013)
7. Griffiths, T. L. and Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. U.
S. A. 101, Suppl. 1, 5228–35 (2004)
8. Hall, D., Jurafsky, D., and Manning, C. D.: Studying the history of ideas using topic
models. In Proceedings of Conference on Empirical Methods in Natural Language
Processing (EMNLP ’08), 363–371 (2008)
9. Hochreiter, S. and Schmidhuber, J.: Long short-term memory. Neural Comput. 9,
8, 1735–1780 (1997)
10. Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J.: BLEU: a method for automatic
evaluation of machine translation. In Proceedings of the 40th Annual Meeting on
Association for Computational Linguistics (ACL ’02), 311–318 (2002)
11. Sutskever, I., Martens, J., and Hinton, G.: Generating text with recurrent neural
networks. In Proceedings of the 28th International Conference on Machine Learning
(ICML ’11), 1017–1024 (2011)
12. Tieleman, T. and Hinton, G.: Lecture 6.5-rmsprop: Divide the gradient by a run-
ning average of its recent magnitude. COURSERA: Neural Networks for Machine
Learning, 4 (2012)
13. Xing, C., Wu, W., Wu, Y., Liu, J., Huang, Y., Zhou, M., and Ma, W.-Y.: Topic
aware neural response generation. In Proceedings of the 31st AAAI Conference on
Artificial Intelligence (AAAI ’17), 3351–3357 (2017)
14. Yan, R., Jiang, H., Lapata, M., Lin, S.-D., Lv, X., Li, X.: i, Poet: Automatic
Chinese poetry composition through a generative summarization framework under
constrained optimization. In Proceedings of the 23rd International Joint Conference
on Artificial Intelligence (IJCAI ’13), 2197–2203 (2013)
15. Yan, R.: i, Poet: Automatic poetry composition through recurrent neural networks
with iterative polishing schema. In Proceedings of the 25th International Joint Con-
ference on Artificial Intelligence (IJCAI ’16), 2238–2244 (2016)

More Related Content

What's hot

Performance Improvement Of Bengali Text Compression Using Transliteration And...
Performance Improvement Of Bengali Text Compression Using Transliteration And...Performance Improvement Of Bengali Text Compression Using Transliteration And...
Performance Improvement Of Bengali Text Compression Using Transliteration And...IJERA Editor
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksSDL
 
Supplementary material for my following paper: Infinite Latent Process Decomp...
Supplementary material for my following paper: Infinite Latent Process Decomp...Supplementary material for my following paper: Infinite Latent Process Decomp...
Supplementary material for my following paper: Infinite Latent Process Decomp...Tomonari Masada
 
referát.doc
referát.docreferát.doc
referát.docbutest
 
Signal Processing Approach for Recognizing Identical Reads From DNA Sequencin...
Signal Processing Approach for Recognizing Identical Reads From DNA Sequencin...Signal Processing Approach for Recognizing Identical Reads From DNA Sequencin...
Signal Processing Approach for Recognizing Identical Reads From DNA Sequencin...IOSR Journals
 
Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysis Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysis kevig
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkViet-Trung TRAN
 

What's hot (7)

Performance Improvement Of Bengali Text Compression Using Transliteration And...
Performance Improvement Of Bengali Text Compression Using Transliteration And...Performance Improvement Of Bengali Text Compression Using Transliteration And...
Performance Improvement Of Bengali Text Compression Using Transliteration And...
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 
Supplementary material for my following paper: Infinite Latent Process Decomp...
Supplementary material for my following paper: Infinite Latent Process Decomp...Supplementary material for my following paper: Infinite Latent Process Decomp...
Supplementary material for my following paper: Infinite Latent Process Decomp...
 
referát.doc
referát.docreferát.doc
referát.doc
 
Signal Processing Approach for Recognizing Identical Reads From DNA Sequencin...
Signal Processing Approach for Recognizing Identical Reads From DNA Sequencin...Signal Processing Approach for Recognizing Identical Reads From DNA Sequencin...
Signal Processing Approach for Recognizing Identical Reads From DNA Sequencin...
 
Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysis Indexing of Arabic documents automatically based on lexical analysis
Indexing of Arabic documents automatically based on lexical analysis
 
A Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural NetworkA Vietnamese Language Model Based on Recurrent Neural Network
A Vietnamese Language Model Based on Recurrent Neural Network
 

Similar to LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition

Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011Robin Gutell
 
Recursive Neural Networks
Recursive Neural NetworksRecursive Neural Networks
Recursive Neural NetworksSangwoo Mo
 
Low-Rank Neighbor Embedding for Single Image Super-Resolution
Low-Rank Neighbor Embedding for Single Image Super-ResolutionLow-Rank Neighbor Embedding for Single Image Super-Resolution
Low-Rank Neighbor Embedding for Single Image Super-Resolutionjohn236zaq
 
EMNLP 2014: Opinion Mining with Deep Recurrent Neural Network
EMNLP 2014: Opinion Mining with Deep Recurrent Neural NetworkEMNLP 2014: Opinion Mining with Deep Recurrent Neural Network
EMNLP 2014: Opinion Mining with Deep Recurrent Neural NetworkPeinan ZHANG
 
Extractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised ApproachExtractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised ApproachFindwise
 
NEURAL DISCOURSE MODELLING OF CONVERSATIONS
NEURAL DISCOURSE MODELLING OF CONVERSATIONSNEURAL DISCOURSE MODELLING OF CONVERSATIONS
NEURAL DISCOURSE MODELLING OF CONVERSATIONSijnlc
 
NEURAL DISCOURSE MODELLING OF CONVERSATIONS
NEURAL DISCOURSE MODELLING OF CONVERSATIONSNEURAL DISCOURSE MODELLING OF CONVERSATIONS
NEURAL DISCOURSE MODELLING OF CONVERSATIONSkevig
 
Sanskrit parser Project Report
Sanskrit parser Project ReportSanskrit parser Project Report
Sanskrit parser Project ReportLaxmi Kant Yadav
 
Optimizing Near-Synonym System
Optimizing Near-Synonym SystemOptimizing Near-Synonym System
Optimizing Near-Synonym SystemSiyuan Zhou
 
ANN Based POS Tagging For Nepali Text
ANN Based POS Tagging For Nepali Text ANN Based POS Tagging For Nepali Text
ANN Based POS Tagging For Nepali Text ijnlc
 
Bliss: A New Read Overlap Detection Algorithm
Bliss: A New Read Overlap Detection AlgorithmBliss: A New Read Overlap Detection Algorithm
Bliss: A New Read Overlap Detection AlgorithmCSCJournals
 
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...cscpconf
 
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...csandit
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNsGrigory Sapunov
 

Similar to LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition (20)

Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011Gutell 116.rpass.bibm11.pp618-622.2011
Gutell 116.rpass.bibm11.pp618-622.2011
 
Recursive Neural Networks
Recursive Neural NetworksRecursive Neural Networks
Recursive Neural Networks
 
P-6
P-6P-6
P-6
 
p138-jiang
p138-jiangp138-jiang
p138-jiang
 
Low-Rank Neighbor Embedding for Single Image Super-Resolution
Low-Rank Neighbor Embedding for Single Image Super-ResolutionLow-Rank Neighbor Embedding for Single Image Super-Resolution
Low-Rank Neighbor Embedding for Single Image Super-Resolution
 
EMNLP 2014: Opinion Mining with Deep Recurrent Neural Network
EMNLP 2014: Opinion Mining with Deep Recurrent Neural NetworkEMNLP 2014: Opinion Mining with Deep Recurrent Neural Network
EMNLP 2014: Opinion Mining with Deep Recurrent Neural Network
 
Extractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised ApproachExtractive Document Summarization - An Unsupervised Approach
Extractive Document Summarization - An Unsupervised Approach
 
NEURAL DISCOURSE MODELLING OF CONVERSATIONS
NEURAL DISCOURSE MODELLING OF CONVERSATIONSNEURAL DISCOURSE MODELLING OF CONVERSATIONS
NEURAL DISCOURSE MODELLING OF CONVERSATIONS
 
NEURAL DISCOURSE MODELLING OF CONVERSATIONS
NEURAL DISCOURSE MODELLING OF CONVERSATIONSNEURAL DISCOURSE MODELLING OF CONVERSATIONS
NEURAL DISCOURSE MODELLING OF CONVERSATIONS
 
Sanskrit parser Project Report
Sanskrit parser Project ReportSanskrit parser Project Report
Sanskrit parser Project Report
 
2224d_final
2224d_final2224d_final
2224d_final
 
Optimizing Near-Synonym System
Optimizing Near-Synonym SystemOptimizing Near-Synonym System
Optimizing Near-Synonym System
 
ANN Based POS Tagging For Nepali Text
ANN Based POS Tagging For Nepali Text ANN Based POS Tagging For Nepali Text
ANN Based POS Tagging For Nepali Text
 
Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Bliss: A New Read Overlap Detection Algorithm
Bliss: A New Read Overlap Detection AlgorithmBliss: A New Read Overlap Detection Algorithm
Bliss: A New Read Overlap Detection Algorithm
 
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
Enhancing the Performance of Sentiment Analysis Supervised Learning Using Sen...
 
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
ENHANCING THE PERFORMANCE OF SENTIMENT ANALYSIS SUPERVISED LEARNING USING SEN...
 
Sequence Alignment
Sequence AlignmentSequence Alignment
Sequence Alignment
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNs
 

More from Tomonari Masada

Learning Latent Space Energy Based Prior Modelの解説
Learning Latent Space Energy Based Prior Modelの解説Learning Latent Space Energy Based Prior Modelの解説
Learning Latent Space Energy Based Prior Modelの解説Tomonari Masada
 
Denoising Diffusion Probabilistic Modelsの重要な式の解説
Denoising Diffusion Probabilistic Modelsの重要な式の解説Denoising Diffusion Probabilistic Modelsの重要な式の解説
Denoising Diffusion Probabilistic Modelsの重要な式の解説Tomonari Masada
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingContext-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingTomonari Masada
 
A note on the density of Gumbel-softmax
A note on the density of Gumbel-softmaxA note on the density of Gumbel-softmax
A note on the density of Gumbel-softmaxTomonari Masada
 
トピックモデルの基礎と応用
トピックモデルの基礎と応用トピックモデルの基礎と応用
トピックモデルの基礎と応用Tomonari Masada
 
Expectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocationExpectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocationTomonari Masada
 
Mini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic ModelingMini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic ModelingTomonari Masada
 
A note on variational inference for the univariate Gaussian
A note on variational inference for the univariate GaussianA note on variational inference for the univariate Gaussian
A note on variational inference for the univariate GaussianTomonari Masada
 
Document Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior DistributionsDocument Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior DistributionsTomonari Masada
 
A Note on Latent LSTM Allocation
A Note on Latent LSTM AllocationA Note on Latent LSTM Allocation
A Note on Latent LSTM AllocationTomonari Masada
 
Topic modeling with Poisson factorization (2)
Topic modeling with Poisson factorization (2)Topic modeling with Poisson factorization (2)
Topic modeling with Poisson factorization (2)Tomonari Masada
 
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic ModelA Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic ModelTomonari Masada
 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationTomonari Masada
 
Word count in Husserliana Volumes 1 to 28
Word count in Husserliana Volumes 1 to 28Word count in Husserliana Volumes 1 to 28
Word count in Husserliana Volumes 1 to 28Tomonari Masada
 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationTomonari Masada
 
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...Tomonari Masada
 

More from Tomonari Masada (20)

Learning Latent Space Energy Based Prior Modelの解説
Learning Latent Space Energy Based Prior Modelの解説Learning Latent Space Energy Based Prior Modelの解説
Learning Latent Space Energy Based Prior Modelの解説
 
Denoising Diffusion Probabilistic Modelsの重要な式の解説
Denoising Diffusion Probabilistic Modelsの重要な式の解説Denoising Diffusion Probabilistic Modelsの重要な式の解説
Denoising Diffusion Probabilistic Modelsの重要な式の解説
 
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic ModelingContext-dependent Token-wise Variational Autoencoder for Topic Modeling
Context-dependent Token-wise Variational Autoencoder for Topic Modeling
 
A note on the density of Gumbel-softmax
A note on the density of Gumbel-softmaxA note on the density of Gumbel-softmax
A note on the density of Gumbel-softmax
 
トピックモデルの基礎と応用
トピックモデルの基礎と応用トピックモデルの基礎と応用
トピックモデルの基礎と応用
 
Expectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocationExpectation propagation for latent Dirichlet allocation
Expectation propagation for latent Dirichlet allocation
 
Mini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic ModelingMini-batch Variational Inference for Time-Aware Topic Modeling
Mini-batch Variational Inference for Time-Aware Topic Modeling
 
A note on variational inference for the univariate Gaussian
A note on variational inference for the univariate GaussianA note on variational inference for the univariate Gaussian
A note on variational inference for the univariate Gaussian
 
Document Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior DistributionsDocument Modeling with Implicit Approximate Posterior Distributions
Document Modeling with Implicit Approximate Posterior Distributions
 
A Note on ZINB-VAE
A Note on ZINB-VAEA Note on ZINB-VAE
A Note on ZINB-VAE
 
A Note on Latent LSTM Allocation
A Note on Latent LSTM AllocationA Note on Latent LSTM Allocation
A Note on Latent LSTM Allocation
 
A Note on TopicRNN
A Note on TopicRNNA Note on TopicRNN
A Note on TopicRNN
 
Topic modeling with Poisson factorization (2)
Topic modeling with Poisson factorization (2)Topic modeling with Poisson factorization (2)
Topic modeling with Poisson factorization (2)
 
Poisson factorization
Poisson factorizationPoisson factorization
Poisson factorization
 
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic ModelA Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
A Simple Stochastic Gradient Variational Bayes for the Correlated Topic Model
 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
 
Word count in Husserliana Volumes 1 to 28
Word count in Husserliana Volumes 1 to 28Word count in Husserliana Volumes 1 to 28
Word count in Husserliana Volumes 1 to 28
 
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet AllocationA Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
A Simple Stochastic Gradient Variational Bayes for Latent Dirichlet Allocation
 
FDSE2015
FDSE2015FDSE2015
FDSE2015
 
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
A derivation of the sampling formulas for An Entity-Topic Model for Entity Li...
 

Recently uploaded

Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一F sss
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 

Recently uploaded (20)

Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 

LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition

  • 1. LDA-Based Scoring of Sequences Generated by RNN for Automatic Tanka Composition Tomonari Masada1 and Atsuhiro Takasu2 1 Nagasaki University, 1-14 Bunkyo-machi, Nagasaki, Japan masada@nagasaki-u.ac.jp 2 National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo, Japan takasu@nii.ac.jp Abstract. This paper proposes a method of scoring sequences gener- ated by recurrent neural network (RNN) for automatic Tanka compo- sition. Our method gives sequences a score based on topic assignments provided by latent Dirichlet allocation (LDA). A scoring of sequences generated by RNN can be achieved with their output probabilities. How- ever, the sequences having large probabilities are likely to share much the same subsequences and thus are doomed to be deprived of diversity. In contrast, our method uses topic assignments given by LDA. We first per- form an inference for LDA over the same set of sequences as that used for training RNN. The inferred per-topic word probabilities are then used to obtain topic assignments of word tokens in the sequences generated by a well-trained RNN. When a sequence contains many word tokens as- signed to the same topic, we give it a high score. This is an entropy-based scoring. The result of an experiment, where we scored Japanese Tanka poems generated by RNN, shows that the top-ranked sequences selected by our method were likely to contain a wider variety of subsequences than those selected with RNN output probability. Keywords: topic modeling, sequence generation, recurrent neural net- work, automatic poetry composition 1 Introduction Recurrent neural network (RNN) is a class of artificial neural network that has been providing remarkable achievements in many important applications. Se- quence generation is one among such applications [11][6]. By adopting a special architecture like LSTM [9] or GRU [3][4], RNN can learn dependencies among word tokens appearing at distant positions. It is thus possible to use RNN for generating sequences showing dependencies among distant word tokens. When we use sequence generation in realistic situations, we may sift the se- quences generated by RNN to obtain only useful ones. Such sifting may give each generated sequence a score representing its usefulness relative to the application under consideration. While an improvement of the quality of sequences gener- ated by RNN can also be achieved by modifying the architecture of RNN [5][13],
  • 2. we here consider a scoring method that can be tuned and applied separately from RNN. In particular, this paper proposes a method for scoring sequences to achieve a diversity of subsequences appearing in highly scored sequences. We may score the generated sequences by using their output probabilities in RNN. However, this method is likely to gives high scores to the sequences containing subsequences popular among the sequences used for training RNN. Consequently, the sequences of high score are doomed to look alike and to show only a limited diversity. In contrast, our scoring method uses latent Dirichlet allo- cation (LDA) [2]. Topic models like LDA can extract diverse topics from training documents. Each topic is represented as a probability distribution defined over vocabulary words. By using the per-topic word probabilities LDA provides, we can assign high scores to the sequences containing many words having a strong relevance to some particular topic. This scoring is expected to give top-ranked sequences that individually show a strong relevance to some particular topic and together show a wide variety of topics. We performed an evaluation experiment by generating Japanese Tanka po- ems with RNN using LSTM or GRU module. Tanka poems basically have a 5-7-5-7-7 syllabic structure and thus consist of five parts. Every Tanka poem used in the experiment was given in Hiragana characters with no voicing marks and was represented as a sequence of non-overlapping character bigrams, which were regarded as vocabulary words. After training RNN under different set- tings, we chose the best setting in terms of validation perplexity. We then gener- ated random sequences by using the best-trained RNN. The generated sequences were scored by their output probabilities in RNN or by using our LDA-based method. The result shows that our LDA-based method could select more diverse sequences in the sense that a wider variety of subsequences were obtained as the five parts of the top-ranked Tanka poems. The rest of the paper is structured as follows. Section 2 describes the method in detail. The result of the evaluation experiment is given in Section 3. Section 4 discusses previous study. Section 5 concludes the paper by giving future work. 2 Method 2.1 Preprocessing of Tanka Poems Sequence generation is one among the important applications of RNN [11][6]. This paper considers the generation of Japanese Tanka poems. We assume that all Tanka poems for training RNN are given in Hiragana characters with no voicing marks. Tanka poems have a 5-7-5-7-7 syllabic structure and thus consist of five subsequences, which are called parts in this paper. We give an example of Tanka poem taken from The Tale of Genji in Fig. 1.1 Note that the Tanka poem in Fig. 1 has a 6-7-5-7-7 syllabic structure. While the first part of the standard syllabic structure consists of five syllables, that of this example consists of six. In this manner, a small deviation from the standard 5-7-5-7-7 structure is often 1 In this paper, a romanization called Kunrei-shiki is used for Hiraganas.
  • 3. Fig. 1. A Tanka poem from The Tale of Genji. observed. When performing a preprocessing of Tanka poems, we need to consider this kind of deviation. The details of preprocessing are given below. First, we put a spacing character ‘ ’ between each neighboring pair of parts and also at the tail of the poem. Further, the first Hiragana character of each of the five parts is “uppercased,” i.e., marked as distinct from the same character appearing at other positions. The above example is then converted to: MO no o mo hu ni _ TA ti ma hu he ku mo _ A ra nu mi no _ SO te u ti hu ri si _ KO ko ro si ri ki ya _ Second, we represent each Tanka poem as a sequence of non-overlapping char- acter bigrams. Since we would like to regard character bigrams as vocabulary words composing each Tanka poem, non-overlapping bigrams are used. How- ever, only for the parts of even length, we apply an additional modification. The special bigram ‘( )’ was put at the tail of the parts of even length in place of the spacing character ‘ ’. Consequently, the non-overlapping bigram sequence corresponding to the above example is obtained as follows: (MO no) (o mo) (hu ni) (_ _) (TA ti) (ma hu) (he ku) (mo _) (A ra) (nu mi) (no _) (SO te) (u ti) (hu ri) (si _) (KO ko) (ro si) (ri ki) (ya _) Note that the even-length part ‘MO no o mo hu ni ’ is converted into the bi- gram sequence ‘(MO no) (o mo) (hu ni) ( )’ by the additional modifica- tion. Finally, we put tokens of the special words ‘BOS’ and ‘EOS’ at the head and the tail of each sequence, respectively. The final output of preprocessing for the above example is as follows: BOS (MO no) (o mo) (hu ni) (_ _) (TA ti) (ma hu) (he ku) (mo _) (A ra) (nu mi) (no _) (SO te) (u ti) (hu ri) (si _) (KO ko) (ro si) (ri ki) (ya _) EOS The sequences preprocessed in this manner were used for training RNN and also for training LDA.
  • 4. Fig. 2. Validation perplexities obtained by GRU-RNN with three hidden layers of various sizes. 2.2 Tanka Poem Generation by RNN We downloaded 179,225 Tanka poems from the web site of International Research Center for Japanese Studies.2 3,631 different non-overlapping bigrams were found to appear in this set. Therefore, the vocabulary size was 3,631. Among 179,225 Tanka poems, 143,550 were used for training RNN and also for training LDA, and 35,675 were for validation, i.e., for tuning free parameters. We implemented RNN with PyTorch3 by using LSTM or GRU modules. The number of hidden layers were three, because larger numbers of layers could not give better results in terms of validation set perplexity. The dropout probability was 0.5 for all layers. The weights were clipped to [−0.25, 0.25]. RMSprop [12] was used for optimization with the learning rate 0.002. The mini-batch size was 200. The best RNN architecture has been chosen by the following procedure. Fig. 2 presents the validation set perplexity computed in the course of training of GRU-RNN with three hidden layers. The hidden layer size h was set to 400, 600, 800, or 1,000. When h = 400 (dashed line), the worst result was obtained. The results for the three other cases showed no significant difference. The running time was the shortest when h = 600 (solid line) among these three cases. While we obtained similar results also for LSTM-RNN, the results of GRU-RNN were slightly better. Therefore, we have selected the GRU-RNN with three hidden layers of size 600 as the model for generating Tanka poems. 2.3 Topic Modeling for Sequence Scoring This paper proposes a new method for scoring the sequences generated by RNN. We use latent Dirichlet allocation (LDA) [2], the best-known topic model, for 2 http://tois.nichibun.ac.jp/database/html2/waka/menu.html 3 http://pytorch.org/
  • 5. scoring. LDA is a Bayesian probabilistic model of documents and can model the difference in semantic contents of documents as the difference in mixing proportions of topics. Each topic is in turn modeled as a probability distribution defined over vocabulary words. The details of LDA is given below. We first introduce notations. We denote the number of documents, the vocab- ulary size, and the number of topics by D, V , and K, respectively. By performing an inference for LDA, e.g. variational Bayesian inference [2] or collapsed Gibbs sampling [7], over training document set, we can estimate the two groups of parameters: θdk and φkv, for d = 1, . . . , D, v = 1, . . . , V , and k = 1, . . . , K. The parameter θdk is the probability of the topic k in the document d. In- tuitively, this group of parameters represents the importance of each topic in each document. Technically, the parameter vector θd = (θd1, . . . , θdK) gives the mixing proportions of K different topics in the document d. Each topic is in turn represented as a probability distribution defined over words. The parameter φkv is the probability of the word v in the topic k. Intuitively, φkv represents the rel- evance of the word v to the topic k. For example, in autumn, people talk about fallen leaves more often than about blooming flowers. Such topic relevancy of vocabulary words is represented by the parameter vector φk = (φk1, . . . φkV ). When an inference for LDA estimates many word tokens in the document d as strongly relevant to the topic k, i.e., as having a large probability in the topic k, θdk is also estimated as large. If the estimated value of θdk is far larger than those of the θdk ’s for k = k, we can say that the document d is exclusively related to the topic k. LDA is used in this manner to know whether a given sequence is exclusively related to some particular topic or not when we score Tanka poems generated by RNN. In our experiment, the inference was performed by collapsed Gibbs sampling (CGS) [7]. CGS iteratively updates the topic assignments of all word tokens in every training document. This assignment can be viewed as a clustering of word tokens, where the number of clusters is K. For explaining the details of CGS, we introduce some more notations. Let xdi denote the vocabulary word appearing as the i-th word token of the document d and zdi the topic to which the i-th word token of the document d is assigned. CGS updates zdi based on the topic probabilities given as follows: p(zdi = k|X, Z¬di ) ∝ n¬di dk + α × n¬di kv + β n¬di k + V β for k = 1, . . . , K. (1) This is the conditional probability that the i-th word token of the document d is assigned to the topic k when given all observed word tokens X and their topic assignments except the assignment we are going to update, which is denoted by Z¬di in Eq. (1), where ¬di means the exclusion of the i-th word token of the document d. It should hold that k p(zdi = k|X, Z¬di ) = 1. In Eq. (1), nkv is the number of the tokens of the word v that are assigned to the topic k in the entire document set, and nk is defined as nk ≡ V v=1 nkv. α is the hyperparameter of the symmetric Dirichlet prior for the θd’s, and β is that of the symmetric Dirichlet prior for the φk’s [7][1]. In CGS, we compute the
  • 6. Table 1. An Example of Topic Words (HA na) (ha na) (HA ru) (no ) (hi su) (U ku) (ni ) (YA ma) (hu ) (sa to) (ru ) (NI ho) (U me) (ha ru) (ta ti) (HU ru) (SA ki) (U no) (tu ki) (no ) (NA ka) (ni ) (TU ki) (A ka) (KU mo) (te ) (ka ke) (wo ) (so ra) (ki yo) (ha ) (KA ke) (ka ri) (hi no) (yu ku) (KO yo) (su ) (to ki) (HO to) (ko ye) (NA ki) (NA ku) (KO ye) (HI to) (ho to) (ni ) (ni na) (su ka) (ku ) (MA tu) (HA tu) (ku ra) (MA ta) (ya ma) (ka se) (ku ) (A ki) (ni ) (no ha) (se ) (KO ro) (so hu) (no ) (HU ki) (O to) (HU ku) (tu ka) (sa mu) (WO ki) (MA tu) (mo u) (U ra) (KI mi) (no ) (KA yo) (ni ) (te ) (MA tu) (wo ) (ha ) (TI yo) (ma tu) (to se) (tu yo) (I ku) (YO ro) (mu ) (TI to) (no ma) (ka ta) probabilities of K topics by Eq. (1) at every word token and sample a topic based on these probabilities. The sampled topic is set as a new value of zdi. We repeat this sampling hundreds of times over the entire training document set. When performing CGS in our evaluation experiment, we used the same set of Tanka poems as that used for training RNN. Therefore, D = 143, 550 and V = 3, 631 as given in Subsection 2.2. Since ordering of word tokens is irrelevant to LDA, each non-overlapping bigram sequence was regarded as a bag of bigrams. K was set to 50, because other numbers gave no significant improvements. The hyperparameters α and β were tuned by grid search based on validation set perplexity. It should be noted that CGS for LDA can be run independently of the training of RNN. Table 1 gives an example of the 20 top-ranked topic words in terms of φkv for five among 50 topics. Each row corresponds to a different topic. The five topics represent bloom of flowers, autumn moon, singing birds, wind blow, and thy reign, respectively from top to bottom. For example, in the topic corresponding to the top row in Table 1, the words “ha na” (flowers), “ha ru” (spring), “ni ho” (the first two Hiragana characters of the word “ni ho hi,” which means fragrance), and “u me” (plum blossom) have large probabilities. In the topic corresponding to the second row, the words “tu ki” (moon), “ku mo” (cloud), “ka ke” (waning of the moon), and “ko yo” (the first two Hiragana characters of the word “ko yo hi,” which means tonight) have large probabilities. 2.4 Topic-Based Scoring Our sequence scoring uses the φk’s, i.e., the per-topic word probabilities, learned by CGS from the training data set. CGS provides φkv as φkv ≡ nkv+β nk+V β , where nkv and nk are defined in Subsection 2.3. Based on these φk’s, we can assign each word token of unseen documents, i.e., the documents not used for inference, to one among the K topics. This procedure is called fold-in [1]. In our case, bi- gram sequences generated by RNN are unseen documents. The fold-in procedure ran CGS over each sequence by keeping the φk’s unchanged. We then computed topic probabilities by counting the assignments to each topic. For example, when the length of a given bigram sequence is 18, and the fold-in procedure assigns 12
  • 7. among the 18 bigrams to some particular topic, the probability of that topic is 12+α 18+Kα . In the experiment, we obtained eight topic assignments every 50 itera- tions after the 100th iteration of fold-in procedure for each sequence. The mean of these eight probabilities was used to compute the entropy − K k=1 ¯θ0k log ¯θ0k, where ¯θ0k was the averaged probability of the topic k for a sequence under con- sideration. We call this topic entropy. Smaller topic entropies are regarded as better, because smaller ones correspond to the situations where more tokens are assigned to the same topic. In other words, we aim to select the sequences showing a topic consistency. 3 Evaluation The evaluation experiment compared our scoring method to the method based on RNN output probability. The output probability in RNN is obtained as follows. We can generate a random sequence with RNN by starting from the special word BOS and then randomly drawing words one by one until we draw the special word EOS. The output probability of the generated sequence is the product of the output probabilities of all tokens, where the probability of each token is the output probability at each moment during the sequence generation. Our LDA-based method was compared to this probability-based method. We first investigate the difference of the top-ranked Tanka poems obtained by the two compared scoring methods. Table 2 presents an example of the top five Tanka poems selected by our method in the left column and those selected based on RNN output probability in the right column. To obtain these top-ranked po- ems, we first generated 100,000 Tanka poems with GRU-RNN. Since the number of poems containing grammatically incorrect parts was large, a grammar check was required. However, we could not find any good grammar check tool for ar- chaic Japanese. Therefore, as an approximation, we regard the poems containing at least one part appearing in no training Tanka poem as grammatically incor- rect.4 It is a future work to perform a grammar check in a more appropriate manner. After removing the poems that are grammatically incorrect, we assign to the remaining poems a score based on each scoring method. Table 2 presents the resulting top five Tanka poems for each method. Table 2 shows that when we use RNN output probability (right column), it is difficult to achieve topic consistency. The fourth poem contains the words “sa ku 4 This grammar check eventually reduces our search space to the set of random com- binations of the parts appearing in the training set. However, even when we generate Tanka poems simply by randomly shuffling the parts appearing in the existing Tanka poems, we need some criterion to say if an obtained poem is good or not. To pro- pose such criterion is nothing but to propose a method for Tanka poem generation. The data set used in our experiment contains around 48,000 unique subsequences that can be used in the first or third part of Tanka poem and around 215,000 unique subsequences that can be used in the second, fourth, or fifth part. It is far from a trivial task to select from all possible combinations of these parts (i.e., from ≈ 48, 0002 ×215, 0003 combinations) only those that are good in some definite sense.
  • 8. Table 2. Top five Tanka poems selected by compared methods (rank) Topic entropy Output probability si ku re yu ku o ho a ra ki no ka tu ra ki ya ma no mo ri no si ta ku sa 1 i ro hu ka ki ka mi na tu ki mo mi ti no i ro ni mo ri no si ta ku sa si ku re hu ri ke ri ku ti ni ke ru ka na ti ha ya hu ru hi sa ka ta no yu hu hi no ya ma no ku mo wi ni mi yu ru 2 ka mi na hi no tu ki ka ke no mi mu ro no ya ma no tu ki ka ke wa ta ru mo mi ti wo so mi ru a ma no ka ku ya ma ka he ru ya ma ti ha ya hu ru hu mo to no mi ti ha ka mo no ka ha na mi 3 ki ri ko me te ta ti ka he ri hu ka ku mo mu su hu ki ru hi to mo na ki wo ti no ya ma ka se a hu sa ka no se ki o ho ka ta mo ta ka sa ko no wa su ru ru ko to mo wo no he no sa ku ra 4 ka yo he to mo sa ki ni ke ri o mo hu ko ko ro ni mi ne no ma tu ya ma o mo hu ha ka ri so yu ki hu ri ni ke ri u ti na hi ku ta ka sa ko no ko ro mo te sa mu ki wo no he no sa ku ra 5 o ku ya ma ni na ka mu re ha u tu ro hi ni ke ri a ri a ke no tu ki ni a ki no ha tu ka se a ki ka se so hu ku ra” (cherry blossom) and “yu ki” (snow). The fifth one contains the words “sa ku ra” (cherry blossom) and “a ki ka se” (autumn wind). The poems top-ranked based on RNN output probability are likely to contain the words expressing different seasons. This is a fatal error for Tanka composition. In contrast, the first poem selected by our method contains the words “si ku re” (drizzling rain) and “mo mi ti” (autumnal tints). Because drizzling rain is a shower observed in late autumn or in early winter, the word “mo mi ti” fits well within this context. Further, in the third poem, the words “ya ma” and “hu mo to” are observed. The former means mountain, and the latter means the foot of the mountain. This poem also shows a topic consistency. However, a slight weakness can be observed in the poems selected by our method. The same word is likely to be used twice or more. For example, the word “o mo hu” (think, imagine) is repeated twice in the fourth poem. While a refrain is often observed in Tanka poems, some future work may introduce an improvement here. Next, we investigated the diversity of Tanka poems selected by both methods. We picked up the 200 top-ranked poems given by each scoring method. We then split all poems into five parts to obtain 200 × 5 = 1, 000 parts in total. Since
  • 9. Table 3. Ten most frequent parts in the top 200 Tanka poems Topic entropy Output probability hi sa ka ta no (9) a ri a ke no tu ki no (13) ko ro mo te sa mu ki (8) hi sa ka ta no (12) a ri a ke no tu ki ni (8) ta ka sa ko no (12) ho to to ki su (7) ho to to ki su (11) a ri a ke no tu ki (6) a si hi ki no (10) a ki ka se so hu ku (6) a ma no ka ku ya ma (8) sa ku ra ha na (6) a ri a ke no tu ki ni (7) ka mi na tu ki (6) na ri ni ke ri (7) ta na ha ta no (5) sa ku ra ha na (7) a ki no yo no tu ki (5) ya ma ho to to ki su (7) the resulting set of 1,000 parts included duplicates, we grouped the 1,000 parts by their identity and counted duplicates. Table 3 presents the ten most frequent parts for each scoring method. When we used RNN output probability for scoring (right column), “a ri a ke no tu ki no” appeared 13 times, “hi sa ka ta no” 12 times, “ta ka sa ko no” 12 times, and so on, among the 1,000 parts coming from the 200 top-ranked poems. In contrast, when we used the LDA-based scoring (left column), “hi sa ka ta no” appeared nine times, “ko ro mo te sa mu ki” eight times, “a ri a ke no tu ki ni” eight times, and so on. That is, there were less duplicates for our method. Further, we also observed that while only 678 among 1,000 were unique when RNN output probability was used for scoring, 806 were unique when our scoring method was used. It can be said that our method explores a larger diversity. 4 Previous Study There already exists many proposals of automatic poetry composition using RNN. However, LDA is a key component in our method. Therefore, we focus on the proposals using topic modeling. Yan et al. [14] utilizes LDA for Chinese poetry composition. However, LDA is only used for obtaining word similarities, not for exploring topical diversity. Further, the poem generation is not achieved with RNN. The subsequent work by the same author [15] proposes a method called iterative polishing for recomposition of the poems generated by a well- designed RNN architecture. However, LDA is no more used. The combination of topic modeling and RNN can be found in the proposals not related to automatic poetry composition. Dieng et al. [5] proposes a com- bination of RNN and topic modeling. The model, called TopicRNN, modifies the output word probabilities of RNN by the long-range semantic information of each document captured by an LDA-like mechanism. However, when the se- quences are generated with TopicRNN, we need to choose one document among the existing documents as seed. This means that we can only generate a sequence similar to the chosen document. While TopicRNN has this limitation, it provides
  • 10. Fig. 3. Correlation between the topic entropy and the log output probability. a valuable guide for future work. Our method detaches sequence selection from sequence generation. Only after generating sequences by RNN, we can perform a selection. It may be better to directly generate the sequences having some de- sirable character regarding their topical contents. Xing et al. [13] also suggests a similar research direction. Hall et al. [8] utilizes the entropy of topic probabilities for studying the history of ideas. However, the entropy is used for measuring how many different topics are covered in a particular conference held in a particular year. Therefore, larger entropies are better. Their evaluation direction is opposite to ours. Our scoring regards the poems having topic consistency as important. Since LDA can extract diverse topics, Tanka poems can be highly scored by our method due to a wide variety of reasons. Some poems are topically consistent in describing autumn moon, and others in describing blooming flowers. As a result, our method eventually could select Tanka poems showing a diversity. RNN output probability could not address such topical diversity, because it was indifferent to the topic consistency within each Tanka poem. 5 Conclusions This paper proposed a method for scoring the sequences generated by RNN. The proposed method based on topic entropy was compared to the scoring based on RNN output probability and could provide more diverse Tanka poems. An im- mediate future work is to devise a method for combining these two scores. Fig. 3 plots per-token topic entropies (horizontal axis) obtained by our method and per-token log output probabilities (vertical axis) of 10,000 sequences generated by GRU-RNN. Since less topic entropies are better, the upper left part contains the sequences better in terms of both scores. As depicted in Fig. 3, the two scores only show a moderate negative correlation. The correlation coefficient is around −0.41. It may be a meaningful task to seek an effective combination of the two scores for a better sequence selection.
  • 11. In this paper, we only consider the method for obtaining better sequences by screening generated sequences. However, the same thing can also be achieved by modifying the architecture of RNN. As discussed in Section 4, Dieng et al. [5] incorporates an idea from topic modeling into the architecture of RNN. It is an interesting future work to propose an architecture of RNN that can generate sequences diverse in their topics. With respect to the evaluation, we only con- sidered the diversity of subsequences of generated poems. It is also an important future work to apply other evaluation metrics like BLEU [10][15] or even human subjective evaluation. Acknowledgments This work was supported by Grant-in-Aid for Scientific Research (B) 15H02789. References 1. Asuncion, A., Welling, M., Smyth, P., and Teh, Y. W.: On smoothing and inference for topic models. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI ’09), 27–34 (2009) 2. Blei, D. M., Ng, A. Y., and Jordan, M. I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003) 3. Cho, K., van Merrienboer, B., Bahdanau, D., and Bengio, Y.: On the proper- ties of neural machine translation: Encoder-decoder approaches. arXiv preprint, arXiv:1409.1259 (2014) 4. Chung, J., Gulcehre, C., Cho, K., and Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint, arXiv:1412.3555 (2014) 5. Dieng, A. B., Wang, C., Gao, J., and Paisley, J.: TopicRNN: A recurrent neural network with long-range semantic dependency. arXiv preprint, arXiv:1611.01702 (2016) 6. Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint, arXiv:1308.0850 (2013) 7. Griffiths, T. L. and Steyvers, M.: Finding scientific topics. Proc. Natl. Acad. Sci. U. S. A. 101, Suppl. 1, 5228–35 (2004) 8. Hall, D., Jurafsky, D., and Manning, C. D.: Studying the history of ideas using topic models. In Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP ’08), 363–371 (2008) 9. Hochreiter, S. and Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 8, 1735–1780 (1997) 10. Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL ’02), 311–318 (2002) 11. Sutskever, I., Martens, J., and Hinton, G.: Generating text with recurrent neural networks. In Proceedings of the 28th International Conference on Machine Learning (ICML ’11), 1017–1024 (2011) 12. Tieleman, T. and Hinton, G.: Lecture 6.5-rmsprop: Divide the gradient by a run- ning average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4 (2012)
  • 12. 13. Xing, C., Wu, W., Wu, Y., Liu, J., Huang, Y., Zhou, M., and Ma, W.-Y.: Topic aware neural response generation. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI ’17), 3351–3357 (2017) 14. Yan, R., Jiang, H., Lapata, M., Lin, S.-D., Lv, X., Li, X.: i, Poet: Automatic Chinese poetry composition through a generative summarization framework under constrained optimization. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI ’13), 2197–2203 (2013) 15. Yan, R.: i, Poet: Automatic poetry composition through recurrent neural networks with iterative polishing schema. In Proceedings of the 25th International Joint Con- ference on Artificial Intelligence (IJCAI ’16), 2238–2244 (2016)