This is a char-level RNN model that writes Chinese poetry, serving as a baseline for my machine poetry project. An online survey has been conducted to evaluate 150 poetries written by this model and more than 65% of people believe the poems are written by human.
2. Tang Dynasty
• In Chinese ancient history, the Tang
Dynasty or Tang Empire (618-907
A.D.) is definitely the best of all
times.
• Its military and international
political power dominated the
Eastern Asia.
• With hundreds of years of advance
in agriculture and without the
discrimination on traders and
merchants, the empire reached
into unparalleled prosperity.
• Its capital Chang An was the largest
and most populated metropolis over
the world.
• At the street of Chang An, there were
Japanese students learning Chinese
culture, Arabian merchants buying
silk and tea and selling horses and
ivories, beautiful girls dancing in
Persian music, and Indian missionary
teaching Buddhism ......
https://en.wikipedia.org/wiki/Tang_dynasty
https://en.wikipedia.org/wiki/Chang'an
https://en.wikipedia.org/wiki/Japanese_missions_to_Tang_China
https://en.wikipedia.org/wiki/Silk_Road#Tang_dynasty_reopens_the_route
3. Tang Poets
• With the confidence that we are
the “Central Land”, the Tang
Dynasty shows an inclusive and
tolerant atmosphere of culture
diversity and literary creation.
• Under such a society, countless of
great poets created their
masterpiece. Li Bai, Du Fu, Bai Juyi,
Wang Changling, Meng Haoran, Li
Shangying, Ceng Shen, Wang
Zhihuan … Their names are tied
with the zenith of Chinese culture.
• There are some poems where the
emperor leads for the 1st sentence
during the royal banquet, the prime
minister makes up the 2nd and the
other ministers follow.
• However, there are more poems
written by lower level officials
struggling in realizing their political
ideals, soldiers and officers defending
at the frontier for years or the whole
life, talented youth depressed by not
being selected to serve the gov. and
missing for family. That’s where the
masterpiece usually comes from.
4. • Most of Tang poems have multiples of
4 sentences, like 4 or 8 or 12 sentences,
with 5 or 7 syllables in each one. Some
poems have 6 or 10 sentences. Their
structures are generally similar with
each other.
• Within limited length, to present the
author’s complicated emotional
variations and create a artistic
conception that readers can touch by
their feelings, the poets must
condense their expression and make
use of each character very carefully.
• Finally in most of times, each character
can represent one word, making a
char-level model equate the capability
of a model based on word.
5. • Ancient Chinese language also has a
significant phenomenon of polysemous, that
is the same word can be used and
understood in multiple ways under different
context.
• In the case of poetry, the extreme
succinctness helps amplify this effect even
more. I will say, the words (chars) in Chinese
ancient poems have some kind of “perplexity”,
probably significantly larger than English
words.
• 画蛇添足 adding feet when drawing snakes
• 足矣 it’s enough (to do something)
• 不足为外人道也 not deserved to let others
know
• 民不足而可治者自古及今未之尝闻 never
see a country that can be well governed
without its people living rich
• Remember what an RNN
language model doing is
basically trying to narrow down
the scope of possible phrases,
neglecting most kinds of word
combinations but giving support
to few. Finally, its perplexity
decreases to a point it cannot
distinguish more.
• The clear structure, the fact that
one word can be summarized
into one char and each char can
have multiple reasonable
explanations, all of these make
the char-level RNN fits better
with the purpose of generating
Chinese ancient poems rather
than English poems.
6. Recurrent Neural Network (RNN)
• RNN is one kind of neural network
that applies the same operation
iteratively, which looks quite like a
dynamic process.
•
𝑦_𝑝𝑟𝑒𝑑 𝑡 = 𝑓(ℎ 𝑡, 𝑥𝑡)
ℎ 𝑡+1 = 𝑔 ℎ 𝑡, 𝑥𝑡
• 𝑓 and 𝑔 are functions that contain a
lot of parameters, which do not
change with steps.
• In our language model, we just use 𝑥𝑡
and ℎ 𝑡 to predict for the next char,
like the PTB example on Tensorflow’s
tutorial.
We use 2 layer LSTM in this project.
7. T 0 1 2 3 4 5 6 7 8 9 10 11 12 13
𝑋𝑖 < 白 日 依 山 尽 , 黄 河 入 海 流 。 >
𝑌𝑖 白 日 依 山 尽 , 黄 河 入 海 流 。 > >
T 0 1 2 3 4 5 6 7 8 9 10 11 12 13
𝑋𝑗 < 对 酒 当 歌 , 人 生 几 何 。 > > >
𝑌𝑗 对 酒 当 歌 , 人 生 几 何 。 > > > >
• The dataset is Complete Tang
Poems《全唐诗》, which contains
about 43030 poems with a
vocabulary size of 6109 chars.
• 7859 poems are not used for
training because they are too long
(more than 80 chars, more than 10
lines). But we are using them as a
validation data after the training
finishes.
• 520 poems are abandoned for
noisy information or missing lines.
We select the
ending signal
“>” as the
padding both
for X and for Y,
• Following the method of this blog, every
poem is added “<” at the beginning and “>”
at the ending during the training.
• During the generating, the only input is the
beginning signal “<”, so that the real
content of poem is totally created by
model.
• Each time the model generates only one
char, and that new char is used as next
step’s input. (Note that you also need to
transport the final_state of LSTM cell to
initialize next step’s initial_state.)
• And when the model generates “>”, the
generating terminates.
while that
blog uses a
very strange
way to process
the ending of Y,
like this:
8. Bucket-and-padding
• Since the length of poem can be
variable within our training data,
we use the bucket-and-padding
method. And since our dataset is
not large, we suggest to use a
small bucket size (16) to
minimize the impacts of manual
modification.
• When calculating the perplexity,
you need to consider the
padded chars that you have
added manually.
• We count the appearance of the
padded char within each batch
and divide it by the batch size, as
a correction to the sequence
length.
9. • Comparing this dataset with the Penn
Tree Bank (PTB) dataset, we find their
vocabulary size and total number of
words are at similar magnitude.
• So we decide to take the parameters
from Tensorflow’s PTB example as our
initial guess.
• However, in the PTB example, it
flattens all the corpus a into one single
string and reshape that long string
into two dimensions (batch_size,
total_steps) by simply truncating.
• Then it takes one batch of size
(batch_size, num_steps) from the 2D array
for each updating step. The size
(batch_size, num_steps) is where the RNN
graph has been defined over.
vocabulary total words
Tang Poetry 6,109 1,721,809
Penn Tree Bank 10,000 929,589
• In its assumption, it must keep track
of each step’s final_state, and
substitute it into the next step’s
initial_state.
• Within our training stage, each poem
is independent with other poems, so
every batch can simply initialize its
initial_state just from zero state.
• However, during the generating stage,
since we can only generate one char
at a time, we still need to transport
the last step’s final_state to the next
step’s initial_state. Otherwise, all the
previous chars’ (except the last one)
influence are not received at all.
Comparing with the PTB
example
10. RNN Model
• The model is generally similar to
the Tensorflow’s PTB model.
• The difference is:
• PTB assumes the two neighboring
batches are literately consecutive;
• But here each batch is
independent with the next batch.
• PTB builds its RNN model through
explicit for loop;
• Here we call tf.nn.dynamic_rnn to
do the same work.
• Finally, for placeholder of X & Y,
the sequence length is set to None
for variable length.
12. We have decreased the
training perplexity by
48.5% and validation
perplexity by 22.1%
with less epochs.
Using our configuration Using that Blog’s configuration
Begin to decrease
learning rate Stop at the
2nd converge
Adam with
continuously
decreasing
learning rate
Beginning from 6111 Beginning from 6111
small batch
size 16
large batch
size 64
13. Difference between
creative generating model
& non-creative model
• In most machine learning tasks, like
image recognition, sentiment
prediction, we always prepare a
validation dataset and use model’s
performance on validation data to
choose parameters, determine where
to stop and evaluate model.
• The key object of these kinds of
models is to generalize well. Since
these models are going to be used on
other unseen data.
• However, in our case, we are not
requiring our model to generalize on
other dataset. We just want the model
to learn its training dataset as well as
it can.
• And the process of writing poem
already contains random procedure:
the model outputs a P.D.F., and we
generate a random number, the next
char is selected based on the random
number’s index over the C.D.F.
• So, we decide the total # of training
epochs not based on the validating
perplexity, but based on the converge
of training perplexity.
14. Evaluation
• There are a lot of laws poems
must follow to become a Tang
poem. Writing such a masterpiece
is already beyond the literary level
of most educated modern Chinese.
• And to evaluate poems in that
kind of requirements will need
experienced experts to judge.
• Since our objective is to provide a
poem template that inspire people
to improve it through minor
amount of work, so that the
process of writing a poem can be
significantly reduced (to like no
more than 20 minutes), we finally
decide to conduct an online survey.
• https://wj.qq.com/s/1851532/88d3
18. Evaluation
• We design the survey by generating
150 poems in total. Each respondent
will randomly read 2 poems.
• Q1-2: Give scores (1-5) to the 2 poems separately.
• Q3: Who do you think write the above poems?
(correct answer is None of above)
• Q4: How many consecutive sentences could you
find that looks very good to you? Answering for the
best of the two.
• Q5: Do you think it is possible that at least one of
the above poems can be improved to be an
acceptable poem through only minor amount of
manual modification?
• Q6: Which aspects do you think the above poems
are good at? (multiple choice)
• Q7: From which sentence do you think the poem
begin to become unacceptable to be a good poem?
Answering for the best of the two. (optional)
• Presently, we only get 70 results back from survey.
• Some of the respondents have already know the
existence of artificial intelligence like RNN can
write poems or music.
• About 64.6% of respondents think the poems are
written by human, instead of AI.
• For the consecutive sentences, (only 10 results)
• 37.2% of respondents cannot find any lines that looks
very good to them,
• 41.9% of respondents can find 2 consecutive
sentences that are very good,
• 18.6% respondents can find 4 very good consecutive
sentences,
• 2.3% finds more than 4 consecutive good sentences.
• 55.7% of respondents believe it will be easy to
improve at least one of the model generated
poems to an acceptable one.
• For the question of what the poems are good at,
31.3% respondents think the poems are not good
at all.