Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Generating Sequences with Deep LSTMs & RNNS in julia

5,765 views

Published on

Outlines what LSTMs & RNNs are and explores how the can be used. Several demos included. Shows how they can be implemented in the Julia Language

Published in: Data & Analytics
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Generating Sequences with Deep LSTMs & RNNS in julia

  1. 1. Generating Sequences using Deep LSTMs & RNNs Andre Pemmelaar @QuantixResearch Julia Tokyo Meetup - April 2015
  2. 2. About Me Andre Pemmelaar • 5-yrs Financial System Solutions • 12 Buy-Side Finance • 7-yrs Japanese Gov’t Bond Options Market Maker (JGBs) • 5-yrs Statistical Arbitrage (Global Equities) • Low latency & Quantitative Algorithm • Primarily use mixture of basic statistics and machine learning (Java, F#, Python,R) • Using Julia for most of my real work (90%) since July, 2014 • Can be reached at @QuantixResearch
  3. 3. Why my interest in LSTMs & RNNs • In my field, finance, so much of the work involves sequence models. ! • Most deep learning models are not built for use with sequences. You have to jury rig them to make it work. ! • RNNs and LSTM are specifically designed to work with sequence data. ! • Sequence models can be combined with Reinforcement Learning to produce some very nice results (more on this and a demo later) ! • They have begun producing amazing results. • Better initialization procedures • Use of Rectified Linear Units for RNNs and “Memory cells” in LSTM
  4. 4. So what is a Recurrent Neural Network?
  5. 5. In a word … Feedback
  6. 6. What are Recurrent Neural Networks 1. In their simplest form (RNNs), they are just Neural Networks with a feedback loop 2. The previous time step’s hidden layer and final outputs are fed back into the network as part of the input to the next time step’s hidden layers. @QuantixResearch
  7. 7. Why Generate Sequences?! • To improve classification?! • To create synthetic training data?! • Practical tasks like speech synthesis?! • To simulate situations?! • To understand the data This slide is from “Generating Sequences with Recurrent Neural Networks” - Alex Graves
  8. 8. This slide is from “Generating Sequences with Recurrent Neural Networks” - Alex Graves
  9. 9. This slide is from “Generating Sequences with Recurrent Neural Networks” - Alex Graves
  10. 10. This slide is from “Generating Sequences with Recurrent Neural Networks” - Alex Graves
  11. 11. This slide is from “Generating Sequences with Recurrent Neural Networks” - Alex Graves
  12. 12. Some great examples Alex Graves! Formerly at University of Toronto! Now part of Google Deep Mind Team! ! Has a great example of generating handwriting using a LSTM! • 3 inputs: Δx, Δy, pen up/down! • 121 output units! • 20 two dimensional Gaussians for x,y = 40 means (linear) + 40! std. devs (exp) + 20 correlations (tanh) + 20 weights (softmax)! • 1 sigmoid for up/down! • 3 hidden Layers, 400 LSTM cells in each! • 3.6M weights total! • Trained with RMSprop, learn rate 0.0001, momentum 0.9! • Error clipped during backward pass (lots of numerical problems)! • Trained overnight on fast multicore CPU
  13. 13. handwriting demo http://www.cs.toronto.edu/~graves/handwriting.html
  14. 14. Some great examples Andrej Karpathy! Now Stanford University! ! Has a great example of generating characters using a LSTM! • 51 inputs (unique characters)! • 2 hidden Layers, 20 LSTM cells in each! • Trained with RMSprop, learn rate 0.0001, momentum 0.9! • Error clipped during backward pass
  15. 15. Character generation demo http://cs.stanford.edu/people/karpathy/recurrentjs/
  16. 16. Some great examples @hardmaru! Tokyo, Japan! ! Has a great example of an RNN + Reinforcement learning using the one of the pole balancing task! ! • Uses a recurrent neural network! ! ! • Uses genetic algorithms to train the network.! ! • The demo is doing the balancing inverted double pendulum task which I suspect is quite hard even for humans ! ! • All done in Javascript which makes for some great demos!
  17. 17. Pole balancing demo http://otoro.net/ml/pendulum-esp-mobile/index.html
  18. 18. RecurrentNN.jl
  19. 19. RecurrentNN.jl • My first public package (Yay!!) ! • Based on Andrej Karpathy’s implementation in recurrentjs ! • https://github.com/Andy-P/RecurrentNN.jl ! • Implements both Recurrent Neural Networks, and Long-Short-Term Networks ! • Allows one to compose arbitrary network architecture using graph.jl ! • Makes use of Rmsprop (a variant of stochastic gradient decent)
  20. 20. graph.jl • Has functionality to construct arbitrary expression graphs over which the library can perform automatic differentiation ! • Similar to what you may find in Theano for Python, or in Torch. ! • Basic idea is to allow the user to compose neural networks then call backprop() and have it all work with the solver ! • https://github.com/Andy-P/RecurrentNN/src/graph.jl
  21. 21. type Graph backprop::Array{Function,1} doBackprop::Bool function Graph(backPropNeeded::Bool) new(Array(Function,0),backPropNeeded) end end ! function sigmoid(g::Graph, m::NNMatrix) … if g.doBackprop push!(g.backprop, function () … @inbounds m.dw[i,j] += out.w[i,j] * (1. - out.w[i,j]) * out.dw[i,j] end ) end return out end graph.jl During forward pass we build up an array of anonymous functions to calculate each of the gradients
  22. 22. graph.jl type Graph backprop::Array{Function,1} doBackprop::Bool function Graph(backPropNeeded::Bool) new(Array(Function,0),backPropNeeded) end end ! function sigmoid(g::Graph, m::NNMatrix) … if g.doBackprop push!(g.backprop, function () … @inbounds m.dw[i,j] += out.w[i,j] * (1. - out.w[i,j]) * out.dw[i,j] end ) end return out end … # use built up graph of backprop functions # to compute backprop (set .dw fields in matirices) for i = length(g.backprop):-1:1 g.backprop[i]() end Then we loop backwards through the array calling each of the functions to propagate the gradients backwards through the network
  23. 23. solver.jl function step(solver::Solver, model::Model, …) … for k = 1:length(modelMatices) @inbounds m = modelMatices[k] # mat ref @inbounds s = solver.stepcache[k] for i = 1:m.n for j = 1:m.d ! # rmsprop adaptive learning rate @inbounds mdwi = m.dw[i,j] @inbounds s.w[i,j] = s.w[i,j] * solver.decayrate + (1.0 - solver.decayrate) * mdwi^2 ! # gradient clip … ! # update and regularize @inbounds m.w[i,j] += - stepsize * mdwi / sqrt(s.w[i,j] + solver.smootheps) - regc * m.w[i,j] end end end … end Now that we have calculated each of the gradients, we can call the solver to loop through and update each of the weights based on the gradients we stored during the backprop pass RMSProp uses an adaptive learning rate for each individual parameter
  24. 24. solve.jl Examples of RmsProp vs other optimization algorithms http://imgur.com/a/Hqolp
  25. 25. example.jl • Based on I. Sutskever et.al. “Generating Text with Recurrent Neural Networks” ICML, 2011! ! • Closely follows Andrej Karpathy’s example! ! • Read in about 1400 English Sentences from Paul Graham’s essay’s on what makes a successful start-up! ! • Learns to predict the next character from the previous character! ! • Uses perplexity for cost function! ! • Takes about 8-12hrs to get a good model (need to anneal learning rate)! ! • letter embedding = 6, hidden units = 100 (note example default is set to 5 & [20,20])
  26. 26. sample output -1hr • be bet sroud thir an • the to be startups dalle a boticast that co thas as tame goudtent wist • the dase mede dosle on astasing sandiry if the the op • that the dor slous seof the pos to they wame mace thas theming obs and secofcagires morlillers dure t • you i it stark to fon'te nallof the they coulker imn to suof imas to ge thas int thals le withe the t
  27. 27. sample output -5hrs ! • you dire prefor reple take stane to of conwe that there cimh the don't than high breads them one gro • but startups you month • work of have not end a will araing thec sow about startup maunost matate thinkij the show that's but • you dire prefor reple take stane to of conwe that there cimh the don't than high breads them one gro • but cashe the sowe the mont pecipest fitlid just • Argmax: it's the startups the the seem the startups the the seem the startups the the seem the startups the
  28. 28. sample output -10hrs • and if will be dismiss we can all they have to be a demo every looking • you stall the right take to grow fast, you won't back • new rectionally not a lot of that the initial single of optimizing money you don't prosperity don't pl • when you she have to probably as one there are on the startup ideas week • the startup need of to a company is the doesn't raise in startups who confident is that doesn't usual
  29. 29. What’s not yet so great about this package?
  30. 30. What’s not yet so great about this package? Garbage Collection ! • Tried to keep close to the original implementation to make regression testing easier ! • Karpathy’s version frequently uses JS’ push to build arrays of matrices ! • This is appropriate in Javascript but creates a lot of GC in Julia. ! • The likely fix is to create the arrays only once and then update them inline on each pass (version 0.2!) Model Types ! • Models need some kind of interface that the solver can call to get the collection of matrices ! • At the moment that is implemented in collectNNMat() function ! • Could be tightened up by making this part of the initialization of the models !
  31. 31. Thank you! Andre Pemmelaar @QuantixResearch Julia Tokyo Meetup - April 2015

×