● Modified from open source python RNN example using GRU and word-by-word
● Training data: 44 MB of reddit post
● 64 characters per chunk
● 724205 chunks
● Training on Microsoft Windows Azure NC6 instance
● Approximately 40 hours ( 2.5x faster than CPU )
● Source code: https://github.com/belugon/rnn-tutorial-gru-lstm
Example Language Model Output
● If you are respace, moved out the volucth and he's the guy lise the nice defens.
● I can liss in some better fun whan it complist up vlcceed cute your custogying.
● I see those be your uwwellem would taken like gone stock to long stratempyen
the whole reyer raluting reals e-anitary enfriends.
● Expimination is much they far of the fantaria with our rards of strend people
will always all doing now.
● Friend on some such a girl guts, not eep to the already form of about no plan
Eyladual producting from inesting right no det an rritur.
Parameter abs-min abs-max abs-average max/min # of int bit
E (128x128) 0.000020777 5.568969 0.242639 2.68E+05 >=3
U (8x128x128) 0.000006318 6.535724 0.459153 1.03E+06 >=3
V (8x128x128) 0.000002306 8.086514 0.562632 3.51E+06 >=4
W (128x128) 0.000025967 94.84794 7.578607 3.65E+06 >=7
b (128x128) 0.001117974 6.040035 0.871756 5.40E+03 >=3
c (1x128) 12.89209300 50.80365 33.714424 3.94E+00 >=6
We need 8 bits signed integer to fully represent the integer range of our parameters
● Network parameter truncated to 16 bit fix point representation.
● 1bit sign + 7 bit int + 8 bit fraction
● Range: (-127.996) -- (-0.004) and (0.004 -- 127.996)
● 15.3% parameters (abs) between 1 and 95. Represented by 7+1(sign) bit
integer and 8 bit fraction
● 84.7% parameters (abs) between 0 and 1. Represented by 8 bit fraction only
● 0.64% parameters (abs) truncated due to 0.004 cutoff
 X. Chang, B. Martini and E. Culurciello, Recurrent Neural Networks Hardware
Implementation on FPGA, v4 ed., West Lafayette: ARXIV, 2016.
M. Sundermeyer, R. Schluter, and H. Ney. LSTM neural networks for language
modeling. In INTERSPEECH, 2010.