Location:
QuantUniversity Meetup
January 19th 2017
Boston MA
Deep Learning : An introduction
Part II
2016 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
www.QuantUniversity.com
sri@quantuniversity.com
2
Slides and Code will be available at:
http://www.analyticscertificate.com/DeepLearning
- Analytics Advisory services
- Custom training programs
- Architecture assessments, advice and audits
• Founder of QuantUniversity LLC. and
www.analyticscertificate.com
• Advisory and Consultancy for Financial Analytics
• Prior Experience at MathWorks, Citigroup and
Endeca and 25+ financial services and energy
customers.
• Regular Columnist for the Wilmott Magazine
• Author of forthcoming book
“Financial Modeling: A case study approach”
published by Wiley
• Charted Financial Analyst and Certified Analytics
Professional
• Teaches Analytics in the Babson College MBA
program and at Northeastern University, Boston
Sri Krishnamurthy
Founder and CEO
4
5
Quantitative Analytics and Big Data Analytics Onboarding
• Trained more than 500 students in
Quantitative methods, Data Science
and Big Data Technologies using
MATLAB, Python and R
• Launched the Analytics Certificate
Program in September
▫ New Cohort in March 2017
• Coming soon: Deep Learning and
Cognitive computing Certificate!
6
• February 2017
▫ Apache Spark Lecture – Feb 3rd
▫ Deep Learning Workshop – Boston – March 27-28
▫ Anomaly Detection Workshop – Boston – April 24-25
• March 2017
▫ Deep Learning Workshop – New York (Date TBD)
Events of Interest
7
• Neural Networks 101
• Multi-Layer Perceptron
• Convolutional Neural Networks
Recap
8
• AutoEncoders
• Recurrent Neural Networks
▫ LSTM
Agenda for today
9
• Unsupervised Algorithms
▫ Given a dataset with variables 𝑥𝑖, build a model that captures the
similarities in different observations and assigns them to different
buckets => Clustering, etc.
▫ Create a transformed representation of the original data=> PCA
Machine Learning
Obs1,
Obs2,Obs3
etc.
Model
Obs1- Class 1
Obs2- Class 2
Obs3- Class 1
10
• Supervised Algorithms
▫ Given a set of variables 𝑥𝑖, predict the value of another variable 𝑦 in a
given data set such that
▫ If y is numeric => Prediction
▫ If y is categorical => Classification
Machine Learning
x1,x2,x3… Model F(X) y
11
• Motivation1:
Autoencoders
1. http://ai.stanford.edu/~quocle/tutorial2.pdf
12
https://blog.google/products/google-plus/saving-you-
bandwidth-through-machine-learning/
13
• Goal is to have 𝑥 to approximate x
• Interesting applications such as
▫ Data compression
▫ Visualization
▫ Pre-train neural networks
Autoencoder
14
Demo in Keras1
1. https://blog.keras.io/building-autoencoders-in-keras.html
2. https://keras.io/models/model/
15
• Pretraining step: Train a sequence of shallow autoencoders, greedily
one layer at a time, using unsupervised data.
• Fine-tuning step 1: train the last layer using supervised data
• Fine-tuning step 2: use backpropagation to fine-tune the entire
network using supervised data
Autoencoders1
1. http://ai.stanford.edu/~quocle/tutorial2.pdf
Supervised learning
Cross-sectional
▫ Observations are independent
▫ Given X1----Xi, predict Y
▫ CNNs
Supervised learning
Sequential
▫ Sequentially ordered
▫ Given O1---OT, predict OT+1
1 Normal
2 Normal
3 Abnormal
4 Normal
5 Abnormal
18
• Given : X1,X2,X3----XN
• Convert the Univariate time series dataset to a cross sectional
Dataset
Time series modeling in Keras using MLPs
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
X13
X14
X15
X Y
X1 X2
X2 X3
X3 X4
X4 X5
X5 X6
X6 X7
X7 X8
X8 X9
X9 X10
X10 X11
X11 X12
X12 X13
X13 X14
X14 X15
19
• Monthly data
• Computational Intelligence in Forecasting
• Source: http://irafm.osu.cz/cif/main.php?c=Static&page=download
Sample data
0
200
400
600
800
1000
1200
1400
1600
1800
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
88
91
94
97
100
103
106
20
• Keras is a high-level neural networks library, written in Python and
capable of running on top of either TensorFlow or Theano. It was
developed with a focus on enabling fast experimentation.
• Allows for easy and fast prototyping (through total modularity,
minimalism, and extensibility).
• Supports both convolutional networks and recurrent networks, as
well as combinations of the two.
• Supports arbitrary connectivity schemes (including multi-input and
multi-output training).
• Runs seamlessly on CPU and GPU.
Keras
21
• Use 72 for training and 36 for testing
• Lookback 1, 10
• Longer the lookback, larger the network
Multi-Layer Perceptron
Size 8
Size 1
22
Demo
Train Score: 1972.20 MSE (44.41 RMSE)
Test Score: 3001.77 MSE (54.79 RMSE)
Train Score: 2631.49 MSE (51.30 RMSE)
Test Score: 4166.64 MSE (64.55 RMSE)
Lookback = 1 Lookback = 10
23
• Has 3 types of parameters
▫ W – Hidden weights
▫ U – Hidden to Hidden weights
▫ V – Hidden to Label weights
• All W,U,V are shared
Recurrent Neural Networks1
1. http://ai.stanford.edu/~quocle/tutorial2.pdf
24
Where can Recurrent Neural Networks be used?1
1. http://karpathy.github.io/2015/05/21/rnn-effectiveness/
1. Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image
classification).
2. Sequence output (e.g. image captioning takes an image and outputs a sentence of words).
3. Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive
or negative sentiment).
4. Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in
English and then outputs a sentence in French).
5. Synced sequence input and output (e.g. video classification where we wish to label each frame of
the video).
25
• Andrej Karpathy’s article
▫ http://karpathy.github.io/2015/05/21/rnn-effectiveness/
• Hand writing generation demo
▫ http://www.cs.toronto.edu/~graves/handwriting.html
Sample applications
26
Recurrent Neural Networks
• A recurrent neural network can be thought of as multiple copies of
the same network, each passing a message to a successor. 1
• Backpropagation(computing gradient wrt all parameters of the
network) which is process used to propagate errors and weights
needs to be modified for RNNs due to the existence of loops
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
27
• BPTT begins by unfolding a recurrent neural network through time
as shown in the figure.
• Training then proceeds in a manner similar to training a feed-
forward neural network with backpropagation, except that the
training patterns are visited in sequential order.
Back Propagation through time (BPTT)1
1. https://en.wikipedia.org/wiki/Backpropagation_through_time
28
• Backpropagation through time (BPTT) for RNNs is difficult due to a
problem known as vanishing/exploding gradient . i.e, the gradient
becomes extremely small or large towards the first and end of the
network.
• This is addressed by LSTM RNNs. Instead of neurons, LSTMs use
memory cells 1
Addressing the problem of Vanishing/Exploding gradient
http://deeplearning.net/tutorial/lstm.html
29
• Dataset of 25,000 movies reviews from IMDB, labeled by sentiment
(positive/negative).
• Reviews have been preprocessed, and each review is encoded as a sequence of
word indexes (integers).
• For convenience, words are indexed by overall frequency in the dataset, so that
for instance the integer "3" encodes the 3rd most frequent word in the data.
• The 2011 paper (see below) had approximately 88% accuracy
• See
▫ https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py
▫ http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural-
networks-python-keras/
▫ http://ai.stanford.edu/~amaas/papers/wvSent_acl2011.pdf
Demo – IMDB Dataset
30
Network
The most frequent 5000 words are chosen and mapped to 32 length vector
Sequences are restricted to 500 words; > 500 cut off ; < 500 pad
LSTM layer with 100 output dimensions
Accuracy: 84.08%
31
• Use 72 for training and 36 for testing
• Lookback 1
Using RNNs for the CIF forecasting problem
0
200
400
600
800
1000
1200
1400
1600
1800
1
5
9
13
17
21
25
29
33
37
41
45
49
53
57
61
65
69
73
77
81
85
89
93
97
101
105
32
Result
Train Score: 50.54 RMSE
Test Score: 65.34 RMSE
Lookback = 1
Train Score: 41.65 RMSE
Test Score: 90.68 RMSE
Lookback = 10
33
• Approach using Microsoft’s Cognitive Toolkit
▫ https://gallery.cortanaintelligence.com/Tutorial/Forecasting-Short-Time-Series-with-LSTM-Neural-Networks-2
▫ https://www.microsoft.com/en-us/research/product/cognitive-toolkit/model-gallery/
34
Q&A
Thank you!
Members & Sponsors!
Sri Krishnamurthy, CFA, CAP
Founder and CEO
QuantUniversity LLC.
srikrishnamurthy
www.QuantUniversity.com
Contact
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
35

Deep learning Tutorial - Part II

  • 1.
    Location: QuantUniversity Meetup January 19th2017 Boston MA Deep Learning : An introduction Part II 2016 Copyright QuantUniversity LLC. Presented By: Sri Krishnamurthy, CFA, CAP www.QuantUniversity.com sri@quantuniversity.com
  • 2.
    2 Slides and Codewill be available at: http://www.analyticscertificate.com/DeepLearning
  • 3.
    - Analytics Advisoryservices - Custom training programs - Architecture assessments, advice and audits
  • 4.
    • Founder ofQuantUniversity LLC. and www.analyticscertificate.com • Advisory and Consultancy for Financial Analytics • Prior Experience at MathWorks, Citigroup and Endeca and 25+ financial services and energy customers. • Regular Columnist for the Wilmott Magazine • Author of forthcoming book “Financial Modeling: A case study approach” published by Wiley • Charted Financial Analyst and Certified Analytics Professional • Teaches Analytics in the Babson College MBA program and at Northeastern University, Boston Sri Krishnamurthy Founder and CEO 4
  • 5.
    5 Quantitative Analytics andBig Data Analytics Onboarding • Trained more than 500 students in Quantitative methods, Data Science and Big Data Technologies using MATLAB, Python and R • Launched the Analytics Certificate Program in September ▫ New Cohort in March 2017 • Coming soon: Deep Learning and Cognitive computing Certificate!
  • 6.
    6 • February 2017 ▫Apache Spark Lecture – Feb 3rd ▫ Deep Learning Workshop – Boston – March 27-28 ▫ Anomaly Detection Workshop – Boston – April 24-25 • March 2017 ▫ Deep Learning Workshop – New York (Date TBD) Events of Interest
  • 7.
    7 • Neural Networks101 • Multi-Layer Perceptron • Convolutional Neural Networks Recap
  • 8.
    8 • AutoEncoders • RecurrentNeural Networks ▫ LSTM Agenda for today
  • 9.
    9 • Unsupervised Algorithms ▫Given a dataset with variables 𝑥𝑖, build a model that captures the similarities in different observations and assigns them to different buckets => Clustering, etc. ▫ Create a transformed representation of the original data=> PCA Machine Learning Obs1, Obs2,Obs3 etc. Model Obs1- Class 1 Obs2- Class 2 Obs3- Class 1
  • 10.
    10 • Supervised Algorithms ▫Given a set of variables 𝑥𝑖, predict the value of another variable 𝑦 in a given data set such that ▫ If y is numeric => Prediction ▫ If y is categorical => Classification Machine Learning x1,x2,x3… Model F(X) y
  • 11.
  • 12.
  • 13.
    13 • Goal isto have 𝑥 to approximate x • Interesting applications such as ▫ Data compression ▫ Visualization ▫ Pre-train neural networks Autoencoder
  • 14.
    14 Demo in Keras1 1.https://blog.keras.io/building-autoencoders-in-keras.html 2. https://keras.io/models/model/
  • 15.
    15 • Pretraining step:Train a sequence of shallow autoencoders, greedily one layer at a time, using unsupervised data. • Fine-tuning step 1: train the last layer using supervised data • Fine-tuning step 2: use backpropagation to fine-tune the entire network using supervised data Autoencoders1 1. http://ai.stanford.edu/~quocle/tutorial2.pdf
  • 16.
    Supervised learning Cross-sectional ▫ Observationsare independent ▫ Given X1----Xi, predict Y ▫ CNNs
  • 17.
    Supervised learning Sequential ▫ Sequentiallyordered ▫ Given O1---OT, predict OT+1 1 Normal 2 Normal 3 Abnormal 4 Normal 5 Abnormal
  • 18.
    18 • Given :X1,X2,X3----XN • Convert the Univariate time series dataset to a cross sectional Dataset Time series modeling in Keras using MLPs X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X Y X1 X2 X2 X3 X3 X4 X4 X5 X5 X6 X6 X7 X7 X8 X8 X9 X9 X10 X10 X11 X11 X12 X12 X13 X13 X14 X14 X15
  • 19.
    19 • Monthly data •Computational Intelligence in Forecasting • Source: http://irafm.osu.cz/cif/main.php?c=Static&page=download Sample data 0 200 400 600 800 1000 1200 1400 1600 1800 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103 106
  • 20.
    20 • Keras isa high-level neural networks library, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation. • Allows for easy and fast prototyping (through total modularity, minimalism, and extensibility). • Supports both convolutional networks and recurrent networks, as well as combinations of the two. • Supports arbitrary connectivity schemes (including multi-input and multi-output training). • Runs seamlessly on CPU and GPU. Keras
  • 21.
    21 • Use 72for training and 36 for testing • Lookback 1, 10 • Longer the lookback, larger the network Multi-Layer Perceptron Size 8 Size 1
  • 22.
    22 Demo Train Score: 1972.20MSE (44.41 RMSE) Test Score: 3001.77 MSE (54.79 RMSE) Train Score: 2631.49 MSE (51.30 RMSE) Test Score: 4166.64 MSE (64.55 RMSE) Lookback = 1 Lookback = 10
  • 23.
    23 • Has 3types of parameters ▫ W – Hidden weights ▫ U – Hidden to Hidden weights ▫ V – Hidden to Label weights • All W,U,V are shared Recurrent Neural Networks1 1. http://ai.stanford.edu/~quocle/tutorial2.pdf
  • 24.
    24 Where can RecurrentNeural Networks be used?1 1. http://karpathy.github.io/2015/05/21/rnn-effectiveness/ 1. Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image classification). 2. Sequence output (e.g. image captioning takes an image and outputs a sentence of words). 3. Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive or negative sentiment). 4. Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in English and then outputs a sentence in French). 5. Synced sequence input and output (e.g. video classification where we wish to label each frame of the video).
  • 25.
    25 • Andrej Karpathy’sarticle ▫ http://karpathy.github.io/2015/05/21/rnn-effectiveness/ • Hand writing generation demo ▫ http://www.cs.toronto.edu/~graves/handwriting.html Sample applications
  • 26.
    26 Recurrent Neural Networks •A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. 1 • Backpropagation(computing gradient wrt all parameters of the network) which is process used to propagate errors and weights needs to be modified for RNNs due to the existence of loops http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  • 27.
    27 • BPTT beginsby unfolding a recurrent neural network through time as shown in the figure. • Training then proceeds in a manner similar to training a feed- forward neural network with backpropagation, except that the training patterns are visited in sequential order. Back Propagation through time (BPTT)1 1. https://en.wikipedia.org/wiki/Backpropagation_through_time
  • 28.
    28 • Backpropagation throughtime (BPTT) for RNNs is difficult due to a problem known as vanishing/exploding gradient . i.e, the gradient becomes extremely small or large towards the first and end of the network. • This is addressed by LSTM RNNs. Instead of neurons, LSTMs use memory cells 1 Addressing the problem of Vanishing/Exploding gradient http://deeplearning.net/tutorial/lstm.html
  • 29.
    29 • Dataset of25,000 movies reviews from IMDB, labeled by sentiment (positive/negative). • Reviews have been preprocessed, and each review is encoded as a sequence of word indexes (integers). • For convenience, words are indexed by overall frequency in the dataset, so that for instance the integer "3" encodes the 3rd most frequent word in the data. • The 2011 paper (see below) had approximately 88% accuracy • See ▫ https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py ▫ http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural- networks-python-keras/ ▫ http://ai.stanford.edu/~amaas/papers/wvSent_acl2011.pdf Demo – IMDB Dataset
  • 30.
    30 Network The most frequent5000 words are chosen and mapped to 32 length vector Sequences are restricted to 500 words; > 500 cut off ; < 500 pad LSTM layer with 100 output dimensions Accuracy: 84.08%
  • 31.
    31 • Use 72for training and 36 for testing • Lookback 1 Using RNNs for the CIF forecasting problem 0 200 400 600 800 1000 1200 1400 1600 1800 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105
  • 32.
    32 Result Train Score: 50.54RMSE Test Score: 65.34 RMSE Lookback = 1 Train Score: 41.65 RMSE Test Score: 90.68 RMSE Lookback = 10
  • 33.
    33 • Approach usingMicrosoft’s Cognitive Toolkit ▫ https://gallery.cortanaintelligence.com/Tutorial/Forecasting-Short-Time-Series-with-LSTM-Neural-Networks-2 ▫ https://www.microsoft.com/en-us/research/product/cognitive-toolkit/model-gallery/
  • 34.
  • 35.
    Thank you! Members &Sponsors! Sri Krishnamurthy, CFA, CAP Founder and CEO QuantUniversity LLC. srikrishnamurthy www.QuantUniversity.com Contact Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC. 35