Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Location:
ODSC 2017
5/4/2017
Deep Learning with Keras
2016 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy,...
2
Slides and Code will be available at:
http://www.analyticscertificate.com/ODSC2017
- Analytics Advisory services
- Custom training & certificate programs
- Fintech and Energy Analytics and Infrastructure
• Founder of QuantUniversity LLC. and
www.analyticscertificate.com
• Advisory and Consultancy for Financial Analytics
• Pr...
5
Quantitative Analytics and Big Data Analytics Onboarding
• Trained more than 1000 students in
Quantitative methods, Data...
6
• May 2017
▫ Sponsoring the CFA Fintech Conference in Boston
▫ QuantUniversity Chicago Meetup
 Deep Learning – May 18th...
Summer 2017: http://www.analyticscertificate.com
8
• Boston
• New York
• Chicago
• Washington DC (Coming soon)
QuantUniversity meetups
9
10
• Unsupervised Algorithms
▫ Given a dataset with variables 𝑥𝑖, build a model that captures the
similarities in differen...
11
• Supervised Algorithms
▫ Given a set of variables 𝑥𝑖, predict the value of another variable 𝑦 in a
given data set such...
12
Start with labeled pairs (Xi, Yi)
( ,“kitten”),( ,“puppy”)
…
13
Success: predict new examples
( ,?)
14
https://commons.wikimedia.org/wiki/Neural_network
“kitten”
“puppy”
“has fur?”
“pointy ears?”
“dangerously cute?”
15
16
http://stackoverflow.com/questions/40537503/deep-neural-networks-precision-for-image-recognition-float-or-double
17
http://stackoverflow.com/questions/40537503/deep-neural-networks-precision-for-image-recognition-float-or-double
Weight...
18
http://stackoverflow.com/questions/40537503/deep-neural-networks-precision-for-image-recognition-float-or-double
Non-li...
19
http://stackoverflow.com/questions/40537503/deep-neural-networks-precision-for-image-recognition-float-or-double
Learni...
20
http://stackoverflow.com/questions/40537503/deep-neural-networks-precision-for-image-recognition-float-or-double
Learni...
22
1. Our labeled datasets were thousands of times too small.
2. Our computers were millions of times too slow.
3. We init...
23
http://www.rsipvision.com/exploring-deep-learning/
24
http://www.asimovinstitute.org/neural-network-zoo/
25
26
https://research.googleblog.com/2014/09/building-deeper-understanding-of-images.html
27
https://research.googleblog.com/2014/09/building-deeper-understanding-of-images.html
28
Computer-Aided Diagnosis with Deep Learning Architecture: Applications to Breast Lesions in US Images and Pulmonary
Nod...
29
Towards End-to-End Speech Recognition with Recurrent Neural Networks
http://www.jmlr.org/proceedings/papers/v32/graves1...
30
https://www.technologyreview.com/s/544651/baidus-deep-learning-system-rivals-people-at-speech-recognition/
31
https://research.googleblog.com/2014/11/a-picture-is-worth-thousand-coherent.html
32
http://cs.umd.edu/~miyyer/data/deepqa.pdf
33
34
http://blog.ventureradar.com/2016/03/11/10-hot-startups-using-artificial-intelligence-in-cyber-security/
35
https://www.youtube.com/watch?v=H4V6NZLNu-c
36
https://www.engadget.com/2016/03/12/watch-alphago-vs-lee-sedol-round-3-live-right-now/
37
https://www.youtube.com/watch?v=kMMbW96nMW8
38
39
How is deep learning special?
Given (lots of) data, DNNs learn useful input
representations.
D. Erhan et al. ‘09
http:/...
40
41
Hardware
42
Data
http://www.theneweconomy.com/strategy/big-data-is-not-without-its-problems
43
New Approaches
http://deeplearning.net/reading-list/
44
45
• Theano is a Python library that allows you to define, optimize, and
evaluate mathematical expressions involving multi...
46
• GPU vs CPU
▫ Theano Test
▫ See Theano Test.ipyb
Demo
47
• Logistic Regression
Theano
See Theano-Logistic Regression.ipyb
48
MLP
49
Convolutional Neural Networks
Convolution
50
Max pooling
51
Convolutional Neural Networks
See Theano-Conv-Net.ipynb
52
• Keras is a high-level neural networks library, written in Python and
capable of running on top of either TensorFlow o...
53
• Keras Examples
▫ Testing Keras: See KerasPython.ipynb
▫ Mlp-1 layer
▫ Running Convolutional NN on Keras with a Theano...
54
55
• Motivation1:
Autoencoders
1. http://ai.stanford.edu/~quocle/tutorial2.pdf
56
• Goal is to have ෤𝑥 to approximate x
• Interesting applications such as
▫ Data compression
▫ Visualization
▫ Pre-train...
57
Demo in Keras1
1. https://blog.keras.io/building-autoencoders-in-keras.html
2. https://keras.io/models/model/
Supervised learning
Cross-sectional
▫ Observations are independent
▫ Given X1----Xi, predict Y
▫ CNNs
Supervised learning
Sequential
▫ Sequentially ordered
▫ Given O1---OT, predict OT+1
1 Normal
2 Normal
3 Abnormal
4 Normal
...
60
• Given : X1,X2,X3----XN
• Convert the Univariate time series dataset to a cross sectional
Dataset
Time series modeling...
61
• Monthly data
• Computational Intelligence in Forecasting
• Source: http://irafm.osu.cz/cif/main.php?c=Static&page=dow...
62
• Use 72 for training and 36 for testing
• Lookback 1, 10
• Longer the lookback, larger the network
Multi-Layer Percept...
63
Demo
Train Score: 1972.20 MSE (44.41 RMSE)
Test Score: 3001.77 MSE (54.79 RMSE)
Train Score: 2631.49 MSE (51.30 RMSE)
T...
64
• Has 3 types of parameters
▫ W – Hidden weights
▫ U – Hidden to Hidden weights
▫ V – Hidden to Label weights
• All W,U...
65
Where can Recurrent Neural Networks be used?1
1. http://karpathy.github.io/2015/05/21/rnn-effectiveness/
1. Vanilla mod...
66
• Andrej Karpathy’s article
▫ http://karpathy.github.io/2015/05/21/rnn-effectiveness/
• Hand writing generation demo
▫ ...
67
Recurrent Neural Networks
• A recurrent neural network can be thought of as multiple copies of
the same network, each p...
68
• BPTT begins by unfolding a recurrent neural network through time
as shown in the figure.
• Training then proceeds in ...
69
• Backpropagation through time (BPTT) for RNNs is difficult due to a
problem known as vanishing/exploding gradient . i....
70
• Dataset of 25,000 movies reviews from IMDB, labeled by sentiment
(positive/negative).
• Reviews have been preprocesse...
71
Network
The most frequent 5000 words are chosen and mapped to 32 length vector
Sequences are restricted to 500 words; >...
72
73
• Neural Networks are resource intensive
▫ Typically require huge dedicated hardware (RAM, GPUs)
• Parameter space huge...
What is Spark ?
• Apache Spark™ is a fast and general engine for large-scale data
processing.
• Run programs up to 100x fa...
Why Spark ?
Generality
• Combine SQL, streaming, and
complex analytics.
• Spark powers a stack of high-level
tools includi...
76
• Investment : Enterprises have significantly invested in Big-Data
infrastructure
• GPUs – Require specialized hardware...
77
• Databricks – Platform for running Spark applications
• BigDL – Intel’s library for deep learning on existing data fra...
78
• Deploying trained models to make predictions on data stored in
Spark RDDs or Dataframes
 Inception model: https://ww...
79
• Distributed model training
 Use deep learning libraries like TensorFlow to test different model
hyperparameters on e...
80
• Tensorframes
 Experimental TensorFlow binding for Scala and Apache Spark.
 TensorFrames (TensorFlow on Spark Datafr...
81
• BigDL is an open source,
distributed deep learning
library for Apache Spark that
has feature parity with
existing pop...
82
• BigDL uses Intel Math Kernel Library, a fast math library for Intel and
compatible processors to facilitate multi-thr...
83
• Existing DL frameworks often require setting up separate clusters for
deep learning, forcing us to create multiple pr...
84
• TensorFlowOnSpark supports all types of TensorFlow programs,
enabling both asynchronous and synchronous training and
...
85
• Developed at UC Berleley’s AMPLab
• SparkNet is built on top of Spark and Caffe.
• Not much activity in the last year...
86
• Deeplearning4j (DL4J) leverages Spark clusters for fast, distributed,
in-memory training of DL models that were devel...
87
• Leverages Spark and asynchronous SGD to accelerate Deep Learning
training from HDFS/Spark data
• DeepDist fetches the...
88
• Databricks – Platform for running Spark applications
• BigDL – Intel’s library for deep learning on existing data fra...
89
• QuantUniversity has started a new initiative to support students and
unemployed professionals interested in fintech a...
90
Q&A
Thank you!
Checkout our programs at:
www.analyticscertificate.com
Sri Krishnamurthy, CFA, CAP
Founder and CEO
QuantUnivers...
Upcoming SlideShare
Loading in …5
×

Deep learning with Keras

5,624 views

Published on

Interest in Deep Learning has been growing in the past few years. With advances in software and hardware technologies, Neural Networks are making a resurgence. With interest in AI based applications growing, and companies like IBM, Google, Microsoft, NVidia investing heavily in computing and software applications, it is time to understand Deep Learning better!

In this workshop, we will discuss the basics of Neural Networks and discuss how Deep Learning Neural networks are different from conventional Neural Network architectures. We will review a bit of mathematics that goes into building neural networks and understand the role of GPUs in Deep Learning. We will also get an introduction to Autoencoders, Convolutional Neural Networks, Recurrent Neural Networks and understand the state-of-the-art in hardware and software architectures. Functional Demos will be presented in Keras, a popular Python package with a backend in Theano and Tensorflow.

Published in: Data & Analytics
  • Login to see the comments

Deep learning with Keras

  1. 1. Location: ODSC 2017 5/4/2017 Deep Learning with Keras 2016 Copyright QuantUniversity LLC. Presented By: Sri Krishnamurthy, CFA, CAP sri@quantuniversity.com www.analyticscertificate.com
  2. 2. 2 Slides and Code will be available at: http://www.analyticscertificate.com/ODSC2017
  3. 3. - Analytics Advisory services - Custom training & certificate programs - Fintech and Energy Analytics and Infrastructure
  4. 4. • Founder of QuantUniversity LLC. and www.analyticscertificate.com • Advisory and Consultancy for Financial Analytics • Prior Experience at MathWorks, Citigroup and Endeca and 25+ financial services and energy customers. • Regular Columnist for the Wilmott Magazine • Author of forthcoming book “Financial Modeling: A case study approach” published by Wiley • Charted Financial Analyst and Certified Analytics Professional • Teaches Analytics in the Babson College MBA program and at Northeastern University, Boston Sri Krishnamurthy Founder and CEO 4
  5. 5. 5 Quantitative Analytics and Big Data Analytics Onboarding • Trained more than 1000 students in Quantitative methods, Data Science and Big Data Technologies using MATLAB, Python and R • Launching the Analytics Certificate Program in Summer and a Fintech Certification program in Fall
  6. 6. 6 • May 2017 ▫ Sponsoring the CFA Fintech Conference in Boston ▫ QuantUniversity Chicago Meetup  Deep Learning – May 18th - https://www.meetup.com/QuantUniversity- Meetup-Chicago ▫ Deep Learning Workshop – May 30,31st  Chicago & Online : http://www.analyticscertificate.com/DeepLearning • June 2017 ▫ Machine Learning Workshop – June 8th, 9th  New York & Online : http://www.analyticscertificate.com/MachineLearning ▫ Anomaly Detection Workshop – June 18th, 19th  Boston & Online : http://www.analyticscertificate.com/Anomaly Events of Interest
  7. 7. Summer 2017: http://www.analyticscertificate.com
  8. 8. 8 • Boston • New York • Chicago • Washington DC (Coming soon) QuantUniversity meetups
  9. 9. 9
  10. 10. 10 • Unsupervised Algorithms ▫ Given a dataset with variables 𝑥𝑖, build a model that captures the similarities in different observations and assigns them to different buckets => Clustering, etc. ▫ Create a transformed representation of the original data=> PCA Machine Learning Obs1, Obs2,Obs3 etc. Model Obs1- Class 1 Obs2- Class 2 Obs3- Class 1
  11. 11. 11 • Supervised Algorithms ▫ Given a set of variables 𝑥𝑖, predict the value of another variable 𝑦 in a given data set such that ▫ If y is numeric => Prediction ▫ If y is categorical => Classification Machine Learning x1,x2,x3… Model F(X) y
  12. 12. 12 Start with labeled pairs (Xi, Yi) ( ,“kitten”),( ,“puppy”) …
  13. 13. 13 Success: predict new examples ( ,?)
  14. 14. 14 https://commons.wikimedia.org/wiki/Neural_network “kitten” “puppy” “has fur?” “pointy ears?” “dangerously cute?”
  15. 15. 15
  16. 16. 16 http://stackoverflow.com/questions/40537503/deep-neural-networks-precision-for-image-recognition-float-or-double
  17. 17. 17 http://stackoverflow.com/questions/40537503/deep-neural-networks-precision-for-image-recognition-float-or-double Weighted sum
  18. 18. 18 http://stackoverflow.com/questions/40537503/deep-neural-networks-precision-for-image-recognition-float-or-double Non-linear “activation” function
  19. 19. 19 http://stackoverflow.com/questions/40537503/deep-neural-networks-precision-for-image-recognition-float-or-double Learning = “find good weights”
  20. 20. 20 http://stackoverflow.com/questions/40537503/deep-neural-networks-precision-for-image-recognition-float-or-double Learning = “find good weights” How? Gradient descent!
  21. 21. 22 1. Our labeled datasets were thousands of times too small. 2. Our computers were millions of times too slow. 3. We initialized the weights in a stupid way. 4. We used the wrong type of non-linearity. - Geoff Hinton Neural nets were tried in the 1980s. What changed? https://youtu.be/IcOMKXAw5VA?t=21m29s
  22. 22. 23 http://www.rsipvision.com/exploring-deep-learning/
  23. 23. 24 http://www.asimovinstitute.org/neural-network-zoo/
  24. 24. 25
  25. 25. 26 https://research.googleblog.com/2014/09/building-deeper-understanding-of-images.html
  26. 26. 27 https://research.googleblog.com/2014/09/building-deeper-understanding-of-images.html
  27. 27. 28 Computer-Aided Diagnosis with Deep Learning Architecture: Applications to Breast Lesions in US Images and Pulmonary Nodules in CT Scans http://www.nature.com/articles/srep24454/figures/1
  28. 28. 29 Towards End-to-End Speech Recognition with Recurrent Neural Networks http://www.jmlr.org/proceedings/papers/v32/graves14.pdf
  29. 29. 30 https://www.technologyreview.com/s/544651/baidus-deep-learning-system-rivals-people-at-speech-recognition/
  30. 30. 31 https://research.googleblog.com/2014/11/a-picture-is-worth-thousand-coherent.html
  31. 31. 32 http://cs.umd.edu/~miyyer/data/deepqa.pdf
  32. 32. 33
  33. 33. 34 http://blog.ventureradar.com/2016/03/11/10-hot-startups-using-artificial-intelligence-in-cyber-security/
  34. 34. 35 https://www.youtube.com/watch?v=H4V6NZLNu-c
  35. 35. 36 https://www.engadget.com/2016/03/12/watch-alphago-vs-lee-sedol-round-3-live-right-now/
  36. 36. 37 https://www.youtube.com/watch?v=kMMbW96nMW8
  37. 37. 38
  38. 38. 39 How is deep learning special? Given (lots of) data, DNNs learn useful input representations. D. Erhan et al. ‘09 http://www.iro.umontreal.ca/~lisa/publications2/index.php/publications/show/247
  39. 39. 40
  40. 40. 41 Hardware
  41. 41. 42 Data http://www.theneweconomy.com/strategy/big-data-is-not-without-its-problems
  42. 42. 43 New Approaches http://deeplearning.net/reading-list/
  43. 43. 44
  44. 44. 45 • Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently • Performs efficient symbolic differentiation • Leverages NVIDIA GPU (Claim 140X faster than CPU) • Developed by University of Montreal researchers and is open-source • Works on Windows/Linux/Mac OS • See https://arxiv.org/abs/1605.02688 Theano
  45. 45. 46 • GPU vs CPU ▫ Theano Test ▫ See Theano Test.ipyb Demo
  46. 46. 47 • Logistic Regression Theano See Theano-Logistic Regression.ipyb
  47. 47. 48 MLP
  48. 48. 49 Convolutional Neural Networks Convolution
  49. 49. 50 Max pooling
  50. 50. 51 Convolutional Neural Networks See Theano-Conv-Net.ipynb
  51. 51. 52 • Keras is a high-level neural networks library, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation. • Allows for easy and fast prototyping (through total modularity, minimalism, and extensibility). • Supports both convolutional networks and recurrent networks, as well as combinations of the two. • Supports arbitrary connectivity schemes (including multi-input and multi-output training). • Runs seamlessly on CPU and GPU. Keras
  52. 52. 53 • Keras Examples ▫ Testing Keras: See KerasPython.ipynb ▫ Mlp-1 layer ▫ Running Convolutional NN on Keras with a Theano Backend  See Keras-conv-example-mnist.ipynb Demo
  53. 53. 54
  54. 54. 55 • Motivation1: Autoencoders 1. http://ai.stanford.edu/~quocle/tutorial2.pdf
  55. 55. 56 • Goal is to have ෤𝑥 to approximate x • Interesting applications such as ▫ Data compression ▫ Visualization ▫ Pre-train neural networks Autoencoder
  56. 56. 57 Demo in Keras1 1. https://blog.keras.io/building-autoencoders-in-keras.html 2. https://keras.io/models/model/
  57. 57. Supervised learning Cross-sectional ▫ Observations are independent ▫ Given X1----Xi, predict Y ▫ CNNs
  58. 58. Supervised learning Sequential ▫ Sequentially ordered ▫ Given O1---OT, predict OT+1 1 Normal 2 Normal 3 Abnormal 4 Normal 5 Abnormal
  59. 59. 60 • Given : X1,X2,X3----XN • Convert the Univariate time series dataset to a cross sectional Dataset Time series modeling in Keras using MLPs X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X Y X1 X2 X2 X3 X3 X4 X4 X5 X5 X6 X6 X7 X7 X8 X8 X9 X9 X10 X10 X11 X11 X12 X12 X13 X13 X14 X14 X15
  60. 60. 61 • Monthly data • Computational Intelligence in Forecasting • Source: http://irafm.osu.cz/cif/main.php?c=Static&page=download Sample data 0 200 400 600 800 1000 1200 1400 1600 1800 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 103 106
  61. 61. 62 • Use 72 for training and 36 for testing • Lookback 1, 10 • Longer the lookback, larger the network Multi-Layer Perceptron Size 8 Size 1
  62. 62. 63 Demo Train Score: 1972.20 MSE (44.41 RMSE) Test Score: 3001.77 MSE (54.79 RMSE) Train Score: 2631.49 MSE (51.30 RMSE) Test Score: 4166.64 MSE (64.55 RMSE) Lookback = 1 Lookback = 10
  63. 63. 64 • Has 3 types of parameters ▫ W – Hidden weights ▫ U – Hidden to Hidden weights ▫ V – Hidden to Label weights • All W,U,V are shared Recurrent Neural Networks1 1. http://ai.stanford.edu/~quocle/tutorial2.pdf
  64. 64. 65 Where can Recurrent Neural Networks be used?1 1. http://karpathy.github.io/2015/05/21/rnn-effectiveness/ 1. Vanilla mode of processing without RNN, from fixed-sized input to fixed-sized output (e.g. image classification). 2. Sequence output (e.g. image captioning takes an image and outputs a sentence of words). 3. Sequence input (e.g. sentiment analysis where a given sentence is classified as expressing positive or negative sentiment). 4. Sequence input and sequence output (e.g. Machine Translation: an RNN reads a sentence in English and then outputs a sentence in French). 5. Synced sequence input and output (e.g. video classification where we wish to label each frame of the video).
  65. 65. 66 • Andrej Karpathy’s article ▫ http://karpathy.github.io/2015/05/21/rnn-effectiveness/ • Hand writing generation demo ▫ http://www.cs.toronto.edu/~graves/handwriting.html Sample applications
  66. 66. 67 Recurrent Neural Networks • A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. 1 • Backpropagation(computing gradient wrt all parameters of the network) which is process used to propagate errors and weights needs to be modified for RNNs due to the existence of loops http://colah.github.io/posts/2015-08-Understanding-LSTMs/
  67. 67. 68 • BPTT begins by unfolding a recurrent neural network through time as shown in the figure. • Training then proceeds in a manner similar to training a feed- forward neural network with backpropagation, except that the training patterns are visited in sequential order. Back Propagation through time (BPTT)1 1. https://en.wikipedia.org/wiki/Backpropagation_through_time
  68. 68. 69 • Backpropagation through time (BPTT) for RNNs is difficult due to a problem known as vanishing/exploding gradient . i.e, the gradient becomes extremely small or large towards the first and end of the network. • This is addressed by LSTM RNNs. Instead of neurons, LSTMs use memory cells 1 Addressing the problem of Vanishing/Exploding gradient http://deeplearning.net/tutorial/lstm.html
  69. 69. 70 • Dataset of 25,000 movies reviews from IMDB, labeled by sentiment (positive/negative). • Reviews have been preprocessed, and each review is encoded as a sequence of word indexes (integers). • For convenience, words are indexed by overall frequency in the dataset, so that for instance the integer "3" encodes the 3rd most frequent word in the data. • The 2011 paper (see below) had approximately 88% accuracy • See ▫ https://github.com/fchollet/keras/blob/master/examples/imdb_lstm.py ▫ http://machinelearningmastery.com/sequence-classification-lstm-recurrent-neural- networks-python-keras/ ▫ http://ai.stanford.edu/~amaas/papers/wvSent_acl2011.pdf Demo – IMDB Dataset
  70. 70. 71 Network The most frequent 5000 words are chosen and mapped to 32 length vector Sequences are restricted to 500 words; > 500 cut off ; < 500 pad LSTM layer with 100 output dimensions Accuracy: 84.08%
  71. 71. 72
  72. 72. 73 • Neural Networks are resource intensive ▫ Typically require huge dedicated hardware (RAM, GPUs) • Parameter space huge! – 100s of thousands of parameters ▫ Tuning is important • Architecture choice is important: ▫ See http://www.asimovinstitute.org/neural-network-zoo/ Key takeaways from modeling Deep Neural Networks
  73. 73. What is Spark ? • Apache Spark™ is a fast and general engine for large-scale data processing. • Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. Lightning-fast cluster computing
  74. 74. Why Spark ? Generality • Combine SQL, streaming, and complex analytics. • Spark powers a stack of high-level tools including: 1. Spark Streaming: processing real-time data streams 2. Spark SQL and DataFrames: support for structured data and relational queries 3. MLlib: built-in machine learning library 4. GraphX: Spark’s new API for graph processing
  75. 75. 76 • Investment : Enterprises have significantly invested in Big-Data infrastructure • GPUs – Require specialized hardware – Niche Use-cases • Can enterprises reuse existing infrastructure for deep learning applications? • What use-cases in Deep learning can leverage Apache Spark? Deep Learning + Apache Spark ?
  76. 76. 77 • Databricks – Platform for running Spark applications • BigDL – Intel’s library for deep learning on existing data frameworks. • TensorflowOnSpark – Yahoo’s Distributed Deep Learning on Big Data Clusters • The Rest: ▫ SparkNet – AMPLab’s framework for training deep networks in Spark ▫ DeepLearning4J – Uses Data parallism to train on separate neural networks ▫ DeepDist - Lightning-Fast Deep Learning on Spark Via parallel stochastic gradient updates Efforts on using Deep Learning Frameworks with Spark
  77. 77. 78 • Deploying trained models to make predictions on data stored in Spark RDDs or Dataframes  Inception model: https://www.tensorflow.org/tutorials/image_recognition  Each prediction requires about 4.8 billion operations  Parallelizing with Spark helps scale operations Databricks https://databricks.com/blog/2016/12/21/deep-learning-on- databricks.html
  78. 78. 79 • Distributed model training  Use deep learning libraries like TensorFlow to test different model hyperparameters on each worker  Task parallelism Databricks https://databricks.com/blog/2016/12/21/deep-learning-on- databricks.html
  79. 79. 80 • Tensorframes  Experimental TensorFlow binding for Scala and Apache Spark.  TensorFrames (TensorFlow on Spark Dataframes) lets you manipulate Apache Spark's DataFrames with TensorFlow programs.  TensorFrames is available as a Spark package. Databricks https://github.com/databricks/tensorframes
  80. 80. 81 • BigDL is an open source, distributed deep learning library for Apache Spark that has feature parity with existing popular deep learning frameworks like Torch and Caffe • BigDL is a standalone Spark package Intel’s BigDL library https://www.oreilly.com/ideas/deep-learning-for-apache-spark
  81. 81. 82 • BigDL uses Intel Math Kernel Library, a fast math library for Intel and compatible processors to facilitate multi-threaded programming in each Spark task. • The MKL library facilitates efficiently train larger models across a cluster (using distributed synchronous, mini-batch SGD) • Key Value proposition: ▫ “The typical deep learning pipeline that involves data preprocessing and preparation on a Spark cluster and model training on a server with multiple GPUs, now involves a simple Spark library that runs on the same cluster used for data preparation and storage.” Intel’s BigDL library https://www.oreilly.com/ideas/deep-learning-for-apache-spark
  82. 82. 83 • Existing DL frameworks often require setting up separate clusters for deep learning, forcing us to create multiple programs for a machine learning pipeline TensorflowOnSpark, CafeOnSpark – Yahoo’s Distributed Deep Learning https://github.com/yahoo/TensorFlowOnSpark http://yahoohadoop.tumblr.com/post/157196317141/open-sourcing- tensorflowonspark-distributed-deep
  83. 83. 84 • TensorFlowOnSpark supports all types of TensorFlow programs, enabling both asynchronous and synchronous training and inferencing. It supports model parallelism and data parallelism. https://github.com/yahoo/TensorFlowOnSpark http://yahoohadoop.tumblr.com/post/157196317141/open-sourcing- tensorflowonspark-distributed-deep TensorflowOnSpark, CafeOnSpark – Yahoo’s Distributed Deep Learning
  84. 84. 85 • Developed at UC Berleley’s AMPLab • SparkNet is built on top of Spark and Caffe. • Not much activity in the last year https://github.com/amplab/SparkNet • SparkNet's parallelized stochastic gradient decent (SGD) algorithm requires minimal communication between nodes SparkNet https://arxiv.org/pdf/1511.06051v1.pdf
  85. 85. 86 • Deeplearning4j (DL4J) leverages Spark clusters for fast, distributed, in-memory training of DL models that were developed Scala or Java • A centralized DL model iteratively averages the parameters produced by separate neural nets. DeepLearning4J https://deeplearning4j.org/spark.html#how
  86. 86. 87 • Leverages Spark and asynchronous SGD to accelerate Deep Learning training from HDFS/Spark data • DeepDist fetches the model from the master and calls gradient(). After computing gradients on the data partitions, gradient updates are sent back the server. On the server, the master model is updated by descent() using the updates from the nodes.. DeepDist http://deepdist.com/
  87. 87. 88 • Databricks – Platform for running Spark applications • BigDL – Intel’s library for deep learning on existing data frameworks. • TensorflowOnSpark – Yahoo’s Distributed Deep Learning on Big Data Clusters • The Rest: ▫ SparkNet – AMPLab’s framework for training deep networks in Spark ▫ DeepLearning4J – Uses Data parallism to train on separate neural networks ▫ DeepDist - Lightning-Fast Deep Learning on Spark Via parallel stochastic gradient updates Efforts on using Deep Learning Frameworks with Spark
  88. 88. 89 • QuantUniversity has started a new initiative to support students and unemployed professionals interested in fintech and data science roles to attend our workshops for free/reduced cost. • If you or some one you know are interested in attending our workshops for free/significantly discounted price, apply for a scholarship here • If you are want to join us in supporting this initiative through a sponsorship, please contact us. We are on a mission to democratize Analytics education and we seek your support in making it possible! QuantUniversity’s Analytics for a cause Initiative
  89. 89. 90 Q&A
  90. 90. Thank you! Checkout our programs at: www.analyticscertificate.com Sri Krishnamurthy, CFA, CAP Founder and CEO QuantUniversity LLC. srikrishnamurthy www.QuantUniversity.com Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC. 91

×