SlideShare a Scribd company logo
1 of 22
Tang poetry
inspiration
machine using
char-level RNN
Chengeng Ma
Stony Brook
University
2018/01/30
Tang Dynasty
• In Chinese ancient history, the Tang
Dynasty or Tang Empire (618-907
A.D.) is definitely the best of all
times.
• Its military and international
political power dominated the
Eastern Asia.
• With hundreds of years of advance
in agriculture and without the
discrimination on traders and
merchants, the empire reached
into unparalleled prosperity.
• Its capital Chang An was the largest
and most populated metropolis over
the world.
• At the street of Chang An, there were
Japanese students learning Chinese
culture, Arabian merchants buying
silk and tea and selling horses and
ivories, beautiful girls dancing in
Persian music, and Indian missionary
teaching Buddhism ......
https://en.wikipedia.org/wiki/Tang_dynasty
https://en.wikipedia.org/wiki/Chang'an
https://en.wikipedia.org/wiki/Japanese_missions_to_Tang_China
https://en.wikipedia.org/wiki/Silk_Road#Tang_dynasty_reopens_the_route
Tang Poets
• With the confidence that we are
the “Central Land”, the Tang
Dynasty shows an inclusive and
tolerant atmosphere of culture
diversity and literary creation.
• Under such a society, countless of
great poets created their
masterpiece. Li Bai, Du Fu, Bai Juyi,
Wang Changling, Meng Haoran, Li
Shangying, Ceng Shen, Wang
Zhihuan … Their names are tied
with the zenith of Chinese culture.
• There are some poems where the
emperor leads for the 1st sentence
during the royal banquet, the prime
minister makes up the 2nd and the
other ministers follow.
• However, there are more poems
written by lower level officials
struggling in realizing their political
ideals, soldiers and officers defending
at the frontier for years or the whole
life, talented youth depressed by not
being selected to serve the gov. and
missing for family. That’s where the
masterpiece usually comes from.
• Most of Tang poems have multiples of
4 sentences, like 4 or 8 or 12 sentences,
with 5 or 7 syllables in each one. Some
poems have 6 or 10 sentences. Their
structures are generally similar with
each other.
• Within limited length, to present the
author’s complicated emotional
variations and create a artistic
conception that readers can touch by
their feelings, the poets must
condense their expression and make
use of each character very carefully.
• Finally in most of times, each character
can represent one word, making a
char-level model equate the capability
of a model based on word.
• Ancient Chinese language also has a
significant phenomenon of polysemous, that
is the same word can be used and
understood in multiple ways under different
context.
• In the case of poetry, the extreme
succinctness helps amplify this effect even
more. I will say, the words (chars) in Chinese
ancient poems have some kind of “perplexity”,
probably significantly larger than English
words.
• 画蛇添足 adding feet when drawing snakes
• 足矣 it’s enough (to do something)
• 不足为外人道也 not deserved to let others
know
• 民不足而可治者自古及今未之尝闻 never
see a country that can be well governed
without its people living rich
• Remember what an RNN
language model doing is
basically trying to narrow down
the scope of possible phrases,
neglecting most kinds of word
combinations but giving support
to few. Finally, its perplexity
decreases to a point it cannot
distinguish more.
• The clear structure, the fact that
one word can be summarized
into one char and each char can
have multiple reasonable
explanations, all of these make
the char-level RNN fits better
with the purpose of generating
Chinese ancient poems rather
than English poems.
Recurrent Neural Network (RNN)
• RNN is one kind of neural network
that applies the same operation
iteratively, which looks quite like a
dynamic process.
•
𝑦_𝑝𝑟𝑒𝑑 𝑡 = 𝑓(ℎ 𝑡, 𝑥𝑡)
ℎ 𝑡+1 = 𝑔 ℎ 𝑡, 𝑥𝑡
• 𝑓 and 𝑔 are functions that contain a
lot of parameters, which do not
change with steps.
• In our language model, we just use 𝑥𝑡
and ℎ 𝑡 to predict for the next char,
like the PTB example on Tensorflow’s
tutorial.
We use 2 layer LSTM in this project.
T 0 1 2 3 4 5 6 7 8 9 10 11 12 13
𝑋𝑖 < 白 日 依 山 尽 , 黄 河 入 海 流 。 >
𝑌𝑖 白 日 依 山 尽 , 黄 河 入 海 流 。 > >
T 0 1 2 3 4 5 6 7 8 9 10 11 12 13
𝑋𝑗 < 对 酒 当 歌 , 人 生 几 何 。 > > >
𝑌𝑗 对 酒 当 歌 , 人 生 几 何 。 > > > >
• The dataset is Complete Tang
Poems《全唐诗》, which contains
about 43030 poems with a
vocabulary size of 6109 chars.
• 7859 poems are not used for
training because they are too long
(more than 80 chars, more than 10
lines). But we are using them as a
validation data after the training
finishes.
• 520 poems are abandoned for
noisy information or missing lines.
We select the
ending signal
“>” as the
padding both
for X and for Y,
• Following the method of this blog, every
poem is added “<” at the beginning and “>”
at the ending during the training.
• During the generating, the only input is the
beginning signal “<”, so that the real
content of poem is totally created by
model.
• Each time the model generates only one
char, and that new char is used as next
step’s input. (Note that you also need to
transport the final_state of LSTM cell to
initialize next step’s initial_state.)
• And when the model generates “>”, the
generating terminates.
while that
blog uses a
very strange
way to process
the ending of Y,
like this:
Bucket-and-padding
• Since the length of poem can be
variable within our training data,
we use the bucket-and-padding
method. And since our dataset is
not large, we suggest to use a
small bucket size (16) to
minimize the impacts of manual
modification.
• When calculating the perplexity,
you need to consider the
padded chars that you have
added manually.
• We count the appearance of the
padded char within each batch
and divide it by the batch size, as
a correction to the sequence
length.
• Comparing this dataset with the Penn
Tree Bank (PTB) dataset, we find their
vocabulary size and total number of
words are at similar magnitude.
• So we decide to take the parameters
from Tensorflow’s PTB example as our
initial guess.
• However, in the PTB example, it
flattens all the corpus a into one single
string and reshape that long string
into two dimensions (batch_size,
total_steps) by simply truncating.
• Then it takes one batch of size
(batch_size, num_steps) from the 2D array
for each updating step. The size
(batch_size, num_steps) is where the RNN
graph has been defined over.
vocabulary total words
Tang Poetry 6,109 1,721,809
Penn Tree Bank 10,000 929,589
• In its assumption, it must keep track
of each step’s final_state, and
substitute it into the next step’s
initial_state.
• Within our training stage, each poem
is independent with other poems, so
every batch can simply initialize its
initial_state just from zero state.
• However, during the generating stage,
since we can only generate one char
at a time, we still need to transport
the last step’s final_state to the next
step’s initial_state. Otherwise, all the
previous chars’ (except the last one)
influence are not received at all.
Comparing with the PTB
example
RNN Model
• The model is generally similar to
the Tensorflow’s PTB model.
• The difference is:
• PTB assumes the two neighboring
batches are literately consecutive;
• But here each batch is
independent with the next batch.
• PTB builds its RNN model through
explicit for loop;
• Here we call tf.nn.dynamic_rnn to
do the same work.
• Finally, for placeholder of X & Y,
the sequence length is set to None
for variable length.
Training vs Generating
We have decreased the
training perplexity by
48.5% and validation
perplexity by 22.1%
with less epochs.
Using our configuration Using that Blog’s configuration
Begin to decrease
learning rate Stop at the
2nd converge
Adam with
continuously
decreasing
learning rate
Beginning from 6111 Beginning from 6111
small batch
size 16
large batch
size 64
Difference between
creative generating model
& non-creative model
• In most machine learning tasks, like
image recognition, sentiment
prediction, we always prepare a
validation dataset and use model’s
performance on validation data to
choose parameters, determine where
to stop and evaluate model.
• The key object of these kinds of
models is to generalize well. Since
these models are going to be used on
other unseen data.
• However, in our case, we are not
requiring our model to generalize on
other dataset. We just want the model
to learn its training dataset as well as
it can.
• And the process of writing poem
already contains random procedure:
the model outputs a P.D.F., and we
generate a random number, the next
char is selected based on the random
number’s index over the C.D.F.
• So, we decide the total # of training
epochs not based on the validating
perplexity, but based on the converge
of training perplexity.
Evaluation
• There are a lot of laws poems
must follow to become a Tang
poem. Writing such a masterpiece
is already beyond the literary level
of most educated modern Chinese.
• And to evaluate poems in that
kind of requirements will need
experienced experts to judge.
• Since our objective is to provide a
poem template that inspire people
to improve it through minor
amount of work, so that the
process of writing a poem can be
significantly reduced (to like no
more than 20 minutes), we finally
decide to conduct an online survey.
• https://wj.qq.com/s/1851532/88d3
https://wj.qq.com/s/1851532/88d3
Evaluation
• We design the survey by generating
150 poems in total. Each respondent
will randomly read 2 poems.
• Q1-2: Give scores (1-5) to the 2 poems separately.
• Q3: Who do you think write the above poems?
(correct answer is None of above)
• Q4: How many consecutive sentences could you
find that looks very good to you? Answering for the
best of the two.
• Q5: Do you think it is possible that at least one of
the above poems can be improved to be an
acceptable poem through only minor amount of
manual modification?
• Q6: Which aspects do you think the above poems
are good at? (multiple choice)
• Q7: From which sentence do you think the poem
begin to become unacceptable to be a good poem?
Answering for the best of the two. (optional)
• Presently, we only get 70 results back from survey.
• Some of the respondents have already know the
existence of artificial intelligence like RNN can
write poems or music.
• About 64.6% of respondents think the poems are
written by human, instead of AI.
• For the consecutive sentences, (only 10 results)
• 37.2% of respondents cannot find any lines that looks
very good to them,
• 41.9% of respondents can find 2 consecutive
sentences that are very good,
• 18.6% respondents can find 4 very good consecutive
sentences,
• 2.3% finds more than 4 consecutive good sentences.
• 55.7% of respondents believe it will be easy to
improve at least one of the model generated
poems to an acceptable one.
• For the question of what the poems are good at,
31.3% respondents think the poems are not good
at all.
Tang poetry inspiration machine using char level rnn
Tang poetry inspiration machine using char level rnn
Tang poetry inspiration machine using char level rnn
Tang poetry inspiration machine using char level rnn

More Related Content

Similar to Tang poetry inspiration machine using char level rnn

Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
Abdullah al Mamun
 
Intelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalIntelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_final
Suhas Pillai
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Edmond Lepedus
 

Similar to Tang poetry inspiration machine using char level rnn (20)

Sequencing run grief counseling: counting kmers at MG-RAST
Sequencing run grief counseling: counting kmers at MG-RASTSequencing run grief counseling: counting kmers at MG-RAST
Sequencing run grief counseling: counting kmers at MG-RAST
 
NLP_KASHK:N-Grams
NLP_KASHK:N-GramsNLP_KASHK:N-Grams
NLP_KASHK:N-Grams
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
Language models
Language modelsLanguage models
Language models
 
Generating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in juliaGenerating Sequences with Deep LSTMs & RNNS in julia
Generating Sequences with Deep LSTMs & RNNS in julia
 
NLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State AutomataNLP_KASHK:Finite-State Automata
NLP_KASHK:Finite-State Automata
 
Lexical analysis - Compiler Design
Lexical analysis - Compiler DesignLexical analysis - Compiler Design
Lexical analysis - Compiler Design
 
Ayan Das_25300121057.pptx
Ayan Das_25300121057.pptxAyan Das_25300121057.pptx
Ayan Das_25300121057.pptx
 
2017:12:06 acl読み会"Learning attention for historical text normalization by lea...
2017:12:06 acl読み会"Learning attention for historical text normalization by lea...2017:12:06 acl読み会"Learning attention for historical text normalization by lea...
2017:12:06 acl読み会"Learning attention for historical text normalization by lea...
 
Data communication & computer networking: Huffman algorithm
Data communication & computer networking:  Huffman algorithmData communication & computer networking:  Huffman algorithm
Data communication & computer networking: Huffman algorithm
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
Intelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_finalIntelligent Handwriting Recognition_MIL_presentation_v3_final
Intelligent Handwriting Recognition_MIL_presentation_v3_final
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Experimental Writing with AI
Experimental Writing with AIExperimental Writing with AI
Experimental Writing with AI
 
Dataworkz odsc london 2018
Dataworkz odsc london 2018Dataworkz odsc london 2018
Dataworkz odsc london 2018
 
Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
 
Presentation (5).pdf
Presentation (5).pdfPresentation (5).pdf
Presentation (5).pdf
 

Recently uploaded

AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Dr.Costas Sachpazis
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
rknatarajan
 

Recently uploaded (20)

AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 

Tang poetry inspiration machine using char level rnn

  • 1. Tang poetry inspiration machine using char-level RNN Chengeng Ma Stony Brook University 2018/01/30
  • 2. Tang Dynasty • In Chinese ancient history, the Tang Dynasty or Tang Empire (618-907 A.D.) is definitely the best of all times. • Its military and international political power dominated the Eastern Asia. • With hundreds of years of advance in agriculture and without the discrimination on traders and merchants, the empire reached into unparalleled prosperity. • Its capital Chang An was the largest and most populated metropolis over the world. • At the street of Chang An, there were Japanese students learning Chinese culture, Arabian merchants buying silk and tea and selling horses and ivories, beautiful girls dancing in Persian music, and Indian missionary teaching Buddhism ...... https://en.wikipedia.org/wiki/Tang_dynasty https://en.wikipedia.org/wiki/Chang'an https://en.wikipedia.org/wiki/Japanese_missions_to_Tang_China https://en.wikipedia.org/wiki/Silk_Road#Tang_dynasty_reopens_the_route
  • 3. Tang Poets • With the confidence that we are the “Central Land”, the Tang Dynasty shows an inclusive and tolerant atmosphere of culture diversity and literary creation. • Under such a society, countless of great poets created their masterpiece. Li Bai, Du Fu, Bai Juyi, Wang Changling, Meng Haoran, Li Shangying, Ceng Shen, Wang Zhihuan … Their names are tied with the zenith of Chinese culture. • There are some poems where the emperor leads for the 1st sentence during the royal banquet, the prime minister makes up the 2nd and the other ministers follow. • However, there are more poems written by lower level officials struggling in realizing their political ideals, soldiers and officers defending at the frontier for years or the whole life, talented youth depressed by not being selected to serve the gov. and missing for family. That’s where the masterpiece usually comes from.
  • 4. • Most of Tang poems have multiples of 4 sentences, like 4 or 8 or 12 sentences, with 5 or 7 syllables in each one. Some poems have 6 or 10 sentences. Their structures are generally similar with each other. • Within limited length, to present the author’s complicated emotional variations and create a artistic conception that readers can touch by their feelings, the poets must condense their expression and make use of each character very carefully. • Finally in most of times, each character can represent one word, making a char-level model equate the capability of a model based on word.
  • 5. • Ancient Chinese language also has a significant phenomenon of polysemous, that is the same word can be used and understood in multiple ways under different context. • In the case of poetry, the extreme succinctness helps amplify this effect even more. I will say, the words (chars) in Chinese ancient poems have some kind of “perplexity”, probably significantly larger than English words. • 画蛇添足 adding feet when drawing snakes • 足矣 it’s enough (to do something) • 不足为外人道也 not deserved to let others know • 民不足而可治者自古及今未之尝闻 never see a country that can be well governed without its people living rich • Remember what an RNN language model doing is basically trying to narrow down the scope of possible phrases, neglecting most kinds of word combinations but giving support to few. Finally, its perplexity decreases to a point it cannot distinguish more. • The clear structure, the fact that one word can be summarized into one char and each char can have multiple reasonable explanations, all of these make the char-level RNN fits better with the purpose of generating Chinese ancient poems rather than English poems.
  • 6. Recurrent Neural Network (RNN) • RNN is one kind of neural network that applies the same operation iteratively, which looks quite like a dynamic process. • 𝑦_𝑝𝑟𝑒𝑑 𝑡 = 𝑓(ℎ 𝑡, 𝑥𝑡) ℎ 𝑡+1 = 𝑔 ℎ 𝑡, 𝑥𝑡 • 𝑓 and 𝑔 are functions that contain a lot of parameters, which do not change with steps. • In our language model, we just use 𝑥𝑡 and ℎ 𝑡 to predict for the next char, like the PTB example on Tensorflow’s tutorial. We use 2 layer LSTM in this project.
  • 7. T 0 1 2 3 4 5 6 7 8 9 10 11 12 13 𝑋𝑖 < 白 日 依 山 尽 , 黄 河 入 海 流 。 > 𝑌𝑖 白 日 依 山 尽 , 黄 河 入 海 流 。 > > T 0 1 2 3 4 5 6 7 8 9 10 11 12 13 𝑋𝑗 < 对 酒 当 歌 , 人 生 几 何 。 > > > 𝑌𝑗 对 酒 当 歌 , 人 生 几 何 。 > > > > • The dataset is Complete Tang Poems《全唐诗》, which contains about 43030 poems with a vocabulary size of 6109 chars. • 7859 poems are not used for training because they are too long (more than 80 chars, more than 10 lines). But we are using them as a validation data after the training finishes. • 520 poems are abandoned for noisy information or missing lines. We select the ending signal “>” as the padding both for X and for Y, • Following the method of this blog, every poem is added “<” at the beginning and “>” at the ending during the training. • During the generating, the only input is the beginning signal “<”, so that the real content of poem is totally created by model. • Each time the model generates only one char, and that new char is used as next step’s input. (Note that you also need to transport the final_state of LSTM cell to initialize next step’s initial_state.) • And when the model generates “>”, the generating terminates. while that blog uses a very strange way to process the ending of Y, like this:
  • 8. Bucket-and-padding • Since the length of poem can be variable within our training data, we use the bucket-and-padding method. And since our dataset is not large, we suggest to use a small bucket size (16) to minimize the impacts of manual modification. • When calculating the perplexity, you need to consider the padded chars that you have added manually. • We count the appearance of the padded char within each batch and divide it by the batch size, as a correction to the sequence length.
  • 9. • Comparing this dataset with the Penn Tree Bank (PTB) dataset, we find their vocabulary size and total number of words are at similar magnitude. • So we decide to take the parameters from Tensorflow’s PTB example as our initial guess. • However, in the PTB example, it flattens all the corpus a into one single string and reshape that long string into two dimensions (batch_size, total_steps) by simply truncating. • Then it takes one batch of size (batch_size, num_steps) from the 2D array for each updating step. The size (batch_size, num_steps) is where the RNN graph has been defined over. vocabulary total words Tang Poetry 6,109 1,721,809 Penn Tree Bank 10,000 929,589 • In its assumption, it must keep track of each step’s final_state, and substitute it into the next step’s initial_state. • Within our training stage, each poem is independent with other poems, so every batch can simply initialize its initial_state just from zero state. • However, during the generating stage, since we can only generate one char at a time, we still need to transport the last step’s final_state to the next step’s initial_state. Otherwise, all the previous chars’ (except the last one) influence are not received at all. Comparing with the PTB example
  • 10. RNN Model • The model is generally similar to the Tensorflow’s PTB model. • The difference is: • PTB assumes the two neighboring batches are literately consecutive; • But here each batch is independent with the next batch. • PTB builds its RNN model through explicit for loop; • Here we call tf.nn.dynamic_rnn to do the same work. • Finally, for placeholder of X & Y, the sequence length is set to None for variable length.
  • 12. We have decreased the training perplexity by 48.5% and validation perplexity by 22.1% with less epochs. Using our configuration Using that Blog’s configuration Begin to decrease learning rate Stop at the 2nd converge Adam with continuously decreasing learning rate Beginning from 6111 Beginning from 6111 small batch size 16 large batch size 64
  • 13. Difference between creative generating model & non-creative model • In most machine learning tasks, like image recognition, sentiment prediction, we always prepare a validation dataset and use model’s performance on validation data to choose parameters, determine where to stop and evaluate model. • The key object of these kinds of models is to generalize well. Since these models are going to be used on other unseen data. • However, in our case, we are not requiring our model to generalize on other dataset. We just want the model to learn its training dataset as well as it can. • And the process of writing poem already contains random procedure: the model outputs a P.D.F., and we generate a random number, the next char is selected based on the random number’s index over the C.D.F. • So, we decide the total # of training epochs not based on the validating perplexity, but based on the converge of training perplexity.
  • 14. Evaluation • There are a lot of laws poems must follow to become a Tang poem. Writing such a masterpiece is already beyond the literary level of most educated modern Chinese. • And to evaluate poems in that kind of requirements will need experienced experts to judge. • Since our objective is to provide a poem template that inspire people to improve it through minor amount of work, so that the process of writing a poem can be significantly reduced (to like no more than 20 minutes), we finally decide to conduct an online survey. • https://wj.qq.com/s/1851532/88d3
  • 16.
  • 17.
  • 18. Evaluation • We design the survey by generating 150 poems in total. Each respondent will randomly read 2 poems. • Q1-2: Give scores (1-5) to the 2 poems separately. • Q3: Who do you think write the above poems? (correct answer is None of above) • Q4: How many consecutive sentences could you find that looks very good to you? Answering for the best of the two. • Q5: Do you think it is possible that at least one of the above poems can be improved to be an acceptable poem through only minor amount of manual modification? • Q6: Which aspects do you think the above poems are good at? (multiple choice) • Q7: From which sentence do you think the poem begin to become unacceptable to be a good poem? Answering for the best of the two. (optional) • Presently, we only get 70 results back from survey. • Some of the respondents have already know the existence of artificial intelligence like RNN can write poems or music. • About 64.6% of respondents think the poems are written by human, instead of AI. • For the consecutive sentences, (only 10 results) • 37.2% of respondents cannot find any lines that looks very good to them, • 41.9% of respondents can find 2 consecutive sentences that are very good, • 18.6% respondents can find 4 very good consecutive sentences, • 2.3% finds more than 4 consecutive good sentences. • 55.7% of respondents believe it will be easy to improve at least one of the model generated poems to an acceptable one. • For the question of what the poems are good at, 31.3% respondents think the poems are not good at all.