Deep learning Tutorial - Part II

Location:
QuantUniversity Meetup
January 19th 2017
Boston MA
Deep Learning : An introduction
Part II
2016 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
www.QuantUniversity.com
sri@quantuniversity.com

2
Slides and Code will be available at:
http://www.analyticscertificate.com/DeepLearning

- Analytics Advisory services
- Custom training programs
- Architecture assessments, advice and audits

• Founder of QuantUniversity LLC. and
www.analyticscertificate.com
• Advisory and Consultancy for Financial Analytics
• Prior Experience at MathWorks, Citigroup and
Endeca and 25+ financial services and energy
customers.
• Regular Columnist for the Wilmott Magazine
• Author of forthcoming book
“Financial Modeling: A case study approach”
published by Wiley
• Charted Financial Analyst and Certified Analytics
Professional
• Teaches Analytics in the Babson College MBA
program and at Northeastern University, Boston
Sri Krishnamurthy
Founder and CEO
4

5
Quantitative Analytics and Big Data Analytics Onboarding
• Trained more than 500 students in
Quantitative methods, Data Science
and Big Data Technologies using
MATLAB, Python and R
• Launched the Analytics Certificate
Program in September
▫ New Cohort in March 2017
• Coming soon: Deep Learning and
Cognitive computing Certificate!

6
• February 2017
▫ Apache Spark Lecture – Feb 3rd
▫ Deep Learning Workshop – Boston – March 27-28
▫ Anomaly Detection Workshop – Boston – April 24-25
• March 2017
▫ Deep Learning Workshop – New York (Date TBD)
Events of Interest

7
• Neural Networks 101
• Multi-Layer Perceptron
• Convolutional Neural Networks
Recap

8
• AutoEncoders
• Recurrent Neural Networks
▫ LSTM
Agenda for today

9
• Unsupervised Algorithms
▫ Given a dataset with variables 𝑥𝑖, build a model that captures the
similarities in different observations and assigns them to different
buckets => Clustering, etc.
▫ Create a transformed representation of the original data=> PCA
Machine Learning
Obs1,
Obs2,Obs3
etc.
Model
Obs1- Class 1
Obs2- Class 2
Obs3- Class 1

10
• Supervised Algorithms
▫ Given a set of variables 𝑥𝑖, predict the value of another variable 𝑦 in a
given data set such that
▫ If y is numeric => Prediction
▫ If y is categorical => Classification
Machine Learning
x1,x2,x3… Model F(X) y

11
• Motivation1:
Autoencoders
1. http://ai.stanford.edu/~quocle/tutorial2.pdf

12
https://blog.google/products/google-plus/saving-you-
bandwidth-through-machine-learning/

13
• Goal is to have 𝑥 to approximate x
• Interesting applications such as
▫ Data compression
▫ Visualization
▫ Pre-train neural networks
Autoencoder

14
Demo in Keras1
1. https://blog.keras.io/building-autoencoders-in-keras.html
2. https://keras.io/models/model/

15
• Pretraining step: Train a sequence of shallow autoencoders, greedily
one layer at a time, using unsupervised data.
• Fine-tuning step 1: train the last layer using supervised data
• Fine-tuning step 2: use backpropagation to fine-tune the entire
network using supervised data
Autoencoders1

Supervised learning
Cross-sectional
▫ Observations are independent
▫ Given X1----Xi, predict Y
▫ CNNs

Supervised learning
Sequential
▫ Sequentially ordered
▫ Given O1---OT, predict OT+1
1 Normal
2 Normal
3 Abnormal
4 Normal
5 Abnormal

18
• Given : X1,X2,X3----XN
• Convert the Univariate time series dataset to a cross sectional
Dataset
Time series modeling in Keras using MLPs
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
X13
X14
X15
X Y
X1 X2
X2 X3
X3 X4
X4 X5
X5 X6
X6 X7
X7 X8
X8 X9
X9 X10
X10 X11
X11 X12
X12 X13
X13 X14
X14 X15

19
• Monthly data
• Computational Intelligence in Forecasting
• Source: http://irafm.osu.cz/cif/main.php?c=Static&page=download
Sample data
0
200
400
600
800
1000
1200
1400
1600
1800
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
88
91
94
97
100
103
106

20
• Keras is a high-level neural networks library, written in Python and
capable of running on top of either TensorFlow or Theano. It was
developed with a focus on enabling fast experimentation.
• Allows for easy and fast prototyping (through total modularity,
minimalism, and extensibility).
• Supports both convolutional networks and recurrent networks, as
well as combinations of the two.
• Supports arbitrary connectivity schemes (including multi-input and
multi-output training).
• Runs seamlessly on CPU and GPU.
Keras

21
• Use 72 for training and 36 for testing
• Lookback 1, 10
• Longer the lookback, larger the network
Multi-Layer Perceptron
Size 8
Size 1

22
Demo
Train Score: 1972.20 MSE (44.41 RMSE)
Test Score: 3001.77 MSE (54.79 RMSE)
Train Score: 2631.49 MSE (51.30 RMSE)
Test Score: 4166.64 MSE (64.55 RMSE)
Lookback = 1 Lookback = 10

23
• Has 3 types of parameters
▫ W – Hidden weights
▫ U – Hidden to Hidden weights
▫ V – Hidden to Label weights
• All W,U,V are shared
Recurrent Neural Networks1

24
Where can Recurrent Neural Networks be used?1
1. http://karpathy.github.io/2015/05/21/rnn-effectiveness/
1. Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image
classification).
2. Sequence output (e.g. image captioning takes an image and outputs a sentence of words).
3. Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive
or negative sentiment).
4. Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in
English and then outputs a sentence in French).
5. Synced sequence input and output (e.g. video classification where we wish to label each frame of
the video).

25
• Andrej Karpathy’s article
▫ http://karpathy.github.io/2015/05/21/rnn-effectiveness/
• Hand writing generation demo
▫ http://www.cs.toronto.edu/~graves/handwriting.html
Sample applications

26
Recurrent Neural Networks
• A recurrent neural network can be thought of as multiple copies of
the same network, each passing a message to a successor. 1
• Backpropagation(computing gradient wrt all parameters of the
network) which is process used to propagate errors and weights
needs to be modified for RNNs due to the existence of loops
http://colah.github.io/posts/2015-08-Understanding-LSTMs/

27
• BPTT begins by unfolding a recurrent neural network through time
as shown in the figure.
• Training then proceeds in a manner similar to training a feed-
forward neural network with backpropagation, except that the
training patterns are visited in sequential order.
Back Propagation through time (BPTT)1
1. https://en.wikipedia.org/wiki/Backpropagation_through_time

28
• Backpropagation through time (BPTT) for RNNs is difficult due to a
problem known as vanishing/exploding gradient . i.e, the gradient
becomes extremely small or large towards the first and end of the
network.
• This is addressed by LSTM RNNs. Instead of neurons, LSTMs use
memory cells 1
Addressing the problem of Vanishing/Exploding gradient
http://deeplearning.net/tutorial/lstm.html

29
• Dataset of 25,000 movies reviews from IMDB, labeled by sentiment
(positive/negative).
• Reviews have been preprocessed, and each review is encoded as a sequence of
word indexes (integers).
• For convenience, words are indexed by overall frequency in the dataset, so that
for instance the integer "3" encodes the 3rd most frequent word in the data.
• The 2011 paper (see below) had approximately 88% accuracy
• See
▫ https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py
▫ http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-
networks-python-keras/
▫ http://ai.stanford.edu/~amaas/papers/wvSent_acl2011.pdf
Demo – IMDB Dataset

30
Network
The most frequent 5000 words are chosen and mapped to 32 length vector
Sequences are restricted to 500 words; > 500 cut off ; < 500 pad
LSTM layer with 100 output dimensions
Accuracy: 84.08%

31
• Use 72 for training and 36 for testing
• Lookback 1
Using RNNs for the CIF forecasting problem
0
200
400
600
800
1000
1200
1400
1600
1800
1
5
9
13
17
21
25
29
33
37
41
45
49
53
57
61
65
69
73
77
81
85
89
93
97
101
105

32
Result
Train Score: 50.54 RMSE
Test Score: 65.34 RMSE
Lookback = 1
Train Score: 41.65 RMSE
Test Score: 90.68 RMSE
Lookback = 10

33
• Approach using Microsoft’s Cognitive Toolkit
▫ https://gallery.cortanaintelligence.com/Tutorial/Forecasting-Short-Time-Series-with-LSTM-Neural-Networks-2
▫ https://www.microsoft.com/en-us/research/product/cognitive-toolkit/model-gallery/

Thank you!
Members & Sponsors!
Sri Krishnamurthy, CFA, CAP
Founder and CEO
QuantUniversity LLC.
srikrishnamurthy
www.QuantUniversity.com
Contact
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
35

Deep learning Tutorial - Part II

More Related Content

What's hot

Viewers also liked

Similar to Deep learning Tutorial - Part II

More from QuantUniversity

Recently uploaded

Deep learning Tutorial - Part II