Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep learning Tutorial - Part II

5,894 views

Published on

Interest in Deep Learning has been growing in the past few years. With advances in software and hardware technologies, Neural Networks are making a resurgence. With interest in AI based applications growing, and companies like IBM, Google, Microsoft, NVidia investing heavily in computing and software applications, it is time to understand Deep Learning better!

In this lecture, we will get an introduction to Autoencoders and Recurrent Neural Networks and understand the state-of-the-art in hardware and software architectures. Functional Demos will be presented in Keras, a popular Python package with a backend in Theano. This will be a preview of the QuantUniversity Deep Learning Workshop that will be offered in 2017.

Published in: Software
  • Be the first to comment

Deep learning Tutorial - Part II

  1. 1. Location: QuantUniversity Meetup January 19th 2017 Boston MA Deep Learning : An introduction Part II 2016 Copyright QuantUniversity LLC. Presented By: Sri Krishnamurthy, CFA, CAP www.QuantUniversity.com sri@quantuniversity.com
  2. 2. 2 Slides and Code will be available at: http://www.analyticscertificate.com/DeepLearning
  3. 3. - Analytics Advisory services - Custom training programs - Architecture assessments, advice and audits
  4. 4. • Founder of QuantUniversity LLC. and www.analyticscertificate.com • Advisory and Consultancy for Financial Analytics • Prior Experience at MathWorks, Citigroup and Endeca and 25+ financial services and energy customers. • Regular Columnist for the Wilmott Magazine • Author of forthcoming book “Financial Modeling: A case study approach” published by Wiley • Charted Financial Analyst and Certified Analytics Professional • Teaches Analytics in the Babson College MBA program and at Northeastern University, Boston Sri Krishnamurthy Founder and CEO 4
  5. 5. 5 Quantitative Analytics and Big Data Analytics Onboarding • Trained more than 500 students in Quantitative methods, Data Science and Big Data Technologies using MATLAB, Python and R • Launched the Analytics Certificate Program in September ▫ New Cohort in March 2017 • Coming soon: Deep Learning and Cognitive computing Certificate!
  6. 6. 6 • February 2017 ▫ Apache Spark Lecture – Feb 3rd ▫ Deep Learning Workshop – Boston – March 27-28 ▫ Anomaly Detection Workshop – Boston – April 24-25 • March 2017 ▫ Deep Learning Workshop – New York (Date TBD) Events of Interest
  7. 7. 7 • Neural Networks 101 • Multi-Layer Perceptron • Convolutional Neural Networks Recap
  8. 8. 8 • AutoEncoders • Recurrent Neural Networks ▫ LSTM Agenda for today
  9. 9. 9 • Unsupervised Algorithms ▫ Given a dataset with variables 𝑥𝑖, build a model that captures the similarities in different observations and assigns them to different buckets => Clustering, etc. ▫ Create a transformed representation of the original data=> PCA Machine Learning Obs1, Obs2,Obs3 etc. Model Obs1- Class 1 Obs2- Class 2 Obs3- Class 1
  10. 10. 10 • Supervised Algorithms ▫ Given a set of variables 𝑥𝑖, predict the value of another variable 𝑦 in a given data set such that ▫ If y is numeric => Prediction ▫ If y is categorical => Classification Machine Learning x1,x2,x3… Model F(X) y
  11. 11. 11 • Motivation1: Autoencoders 1. http://ai.stanford.edu/~quocle/tutorial2.pdf
  12. 12. 12 https://blog.google/products/google-plus/saving-you- bandwidth-through-machine-learning/
  13. 13. 13 • Goal is to have 𝑥 to approximate x • Interesting applications such as ▫ Data compression ▫ Visualization ▫ Pre-train neural networks Autoencoder
  14. 14. 14 Demo in Keras1 1. https://blog.keras.io/building-autoencoders-in-keras.html 2. https://keras.io/models/model/
  15. 15. 15 • Pretraining step: Train a sequence of shallow autoencoders, greedily one layer at a time, using unsupervised data. • Fine-tuning step 1: train the last layer using supervised data • Fine-tuning step 2: use backpropagation to fine-tune the entire network using supervised data Autoencoders1 1. http://ai.stanford.edu/~quocle/tutorial2.pdf
  16. 16. Supervised learning Cross-sectional ▫ Observations are independent ▫ Given X1----Xi, predict Y ▫ CNNs
  17. 17. Supervised learning Sequential ▫ Sequentially ordered ▫ Given O1---OT, predict OT+1 1 Normal 2 Normal 3 Abnormal 4 Normal 5 Abnormal
  18. 18. 18 • Given : X1,X2,X3----XN • Convert the Univariate time series dataset to a cross sectional Dataset Time series modeling in Keras using MLPs X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X Y X1 X2 X2 X3 X3 X4 X4 X5 X5 X6 X6 X7 X7 X8 X8 X9 X9 X10 X10 X11 X11 X12 X12 X13 X13 X14 X14 X15
  19. 19. 19 • Monthly data • Computational Intelligence in Forecasting • Source: http://irafm.osu.cz/cif/main.php?c=Static&page=download Sample data 0 200 400 600 800 1000 1200 1400 1600 1800 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103 106
  20. 20. 20 • Keras is a high-level neural networks library, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation. • Allows for easy and fast prototyping (through total modularity, minimalism, and extensibility). • Supports both convolutional networks and recurrent networks, as well as combinations of the two. • Supports arbitrary connectivity schemes (including multi-input and multi-output training). • Runs seamlessly on CPU and GPU. Keras
  21. 21. 21 • Use 72 for training and 36 for testing • Lookback 1, 10 • Longer the lookback, larger the network Multi-Layer Perceptron Size 8 Size 1
  22. 22. 22 Demo Train Score: 1972.20 MSE (44.41 RMSE) Test Score: 3001.77 MSE (54.79 RMSE) Train Score: 2631.49 MSE (51.30 RMSE) Test Score: 4166.64 MSE (64.55 RMSE) Lookback = 1 Lookback = 10
  23. 23. 23 • Has 3 types of parameters ▫ W – Hidden weights ▫ U – Hidden to Hidden weights ▫ V – Hidden to Label weights • All W,U,V are shared Recurrent Neural Networks1 1. http://ai.stanford.edu/~quocle/tutorial2.pdf
  24. 24. 24 Where can Recurrent Neural Networks be used?1 1. http://karpathy.github.io/2015/05/21/rnn-effectiveness/ 1. Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image classification). 2. Sequence output (e.g. image captioning takes an image and outputs a sentence of words). 3. Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive or negative sentiment). 4. Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in English and then outputs a sentence in French). 5. Synced sequence input and output (e.g. video classification where we wish to label each frame of the video).
  25. 25. 25 • Andrej Karpathy’s article ▫ http://karpathy.github.io/2015/05/21/rnn-effectiveness/ • Hand writing generation demo ▫ http://www.cs.toronto.edu/~graves/handwriting.html Sample applications
  26. 26. 26 Recurrent Neural Networks • A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. 1 • Backpropagation(computing gradient wrt all parameters of the network) which is process used to propagate errors and weights needs to be modified for RNNs due to the existence of loops http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  27. 27. 27 • BPTT begins by unfolding a recurrent neural network through time as shown in the figure. • Training then proceeds in a manner similar to training a feed- forward neural network with backpropagation, except that the training patterns are visited in sequential order. Back Propagation through time (BPTT)1 1. https://en.wikipedia.org/wiki/Backpropagation_through_time
  28. 28. 28 • Backpropagation through time (BPTT) for RNNs is difficult due to a problem known as vanishing/exploding gradient . i.e, the gradient becomes extremely small or large towards the first and end of the network. • This is addressed by LSTM RNNs. Instead of neurons, LSTMs use memory cells 1 Addressing the problem of Vanishing/Exploding gradient http://deeplearning.net/tutorial/lstm.html
  29. 29. 29 • Dataset of 25,000 movies reviews from IMDB, labeled by sentiment (positive/negative). • Reviews have been preprocessed, and each review is encoded as a sequence of word indexes (integers). • For convenience, words are indexed by overall frequency in the dataset, so that for instance the integer "3" encodes the 3rd most frequent word in the data. • The 2011 paper (see below) had approximately 88% accuracy • See ▫ https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py ▫ http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural- networks-python-keras/ ▫ http://ai.stanford.edu/~amaas/papers/wvSent_acl2011.pdf Demo – IMDB Dataset
  30. 30. 30 Network The most frequent 5000 words are chosen and mapped to 32 length vector Sequences are restricted to 500 words; > 500 cut off ; < 500 pad LSTM layer with 100 output dimensions Accuracy: 84.08%
  31. 31. 31 • Use 72 for training and 36 for testing • Lookback 1 Using RNNs for the CIF forecasting problem 0 200 400 600 800 1000 1200 1400 1600 1800 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105
  32. 32. 32 Result Train Score: 50.54 RMSE Test Score: 65.34 RMSE Lookback = 1 Train Score: 41.65 RMSE Test Score: 90.68 RMSE Lookback = 10
  33. 33. 33 • Approach using Microsoft’s Cognitive Toolkit ▫ https://gallery.cortanaintelligence.com/Tutorial/Forecasting-Short-Time-Series-with-LSTM-Neural-Networks-2 ▫ https://www.microsoft.com/en-us/research/product/cognitive-toolkit/model-gallery/
  34. 34. 34 Q&A
  35. 35. Thank you! Members & Sponsors! Sri Krishnamurthy, CFA, CAP Founder and CEO QuantUniversity LLC. srikrishnamurthy www.QuantUniversity.com Contact Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC. 35

×