Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2016 LigaData, Inc. All Rights Reserved.
Using Deep Learning to
do Real-Time Scoring in
Practical Applications
SFbayACM ...
•  Big Picture of 2016 Technology
•  Neural Net Basics
•  Deep Network Configurations for Practical Applications
–  Auto-En...
David
Clearley
Is Deep Learning Hype?
Is this just a “buzzword of the day or year?”
Is this improvement at the normal pace?
Is Deep Learning Hype?
Is this just a “buzzword of the day or year?”
Is this improvement at the normal pace?
NO !
Not only...
http://whatsnext.nuance.com/in-the-labs/what-is-deep-machine-learning/
Deep Learning Caused about an 18% / Year Reduction
...
http://whatsnext.nuance.com/in-the-labs/what-is-deep-machine-learning/
2008, 09, 10, 11, 12, 13, 14, 15
Deep Learning Caus...
Neural Net training is 10+ times faster on GPU’s

The gaming market is pushing for faster GPU speeds
https://jonpeddie.com...
•  Big Picture of 2016 Technology
•  Neural Net Basics
•  Deep Network Configurations for Practical Applications
–  Auto-En...
Advantages	of	a	Net		
over	Regression	
10	
field	1	
field	2	
$	
c	
$	
$	
$	
$	
$	
$	
$	
$	$	
$	 $	
$	
$	
c	
c	
c	
c	
c	
c	
c...
Advantages of a Net !
over Regression!
11	
field	1	
field	2	
$	
c	
$	
$	
$	
$	
$	
$	
$	
$	$	
$	 $	
$	
$	
c	
c	
c	
c	
c	
c	
c...
!
A Comparison of a Neural Net!
and Regression!
A Logistic regression formula:
Y = f( a0 + a1*X1 + a2*X2 + a3*X3)
a* are c...
Think of Separating Land vs. Water
13
1 line,
Regression
(more errors)
5 Hidden Nodes in
a Neural Network
Different algori...
•  Big Picture of 2016 Technology
•  Neural Net Basics
•  Deep Network Configurations for Practical Applications
–  Auto-En...
Leading up to an Auto Encoder 

•  Supervised Learning
–  Regression (one layer, one line, one dot-product)
•  50 inputs à...
Auto Encoder (like data compression)

Relate input to output, through compressed middle
At each step of training
Only trai...
Auto Encoder (like data compression)

Relate input to output, through compressed middle
•  Supervised Learning
–  Regressi...
Auto Encoder (like data compression)

With Supervised Layers on Top
Unsupervised output
Like cluster output,
Only large va...
Auto Encoder

How it can be generally used to solve problems
•  Add supervised layer to forecast 10 target categories
–  4...
Fraud Detection Example using

Deep Learning – auto encoders
•  Unsupervised Learning of Normal Behavior(Outlier Detection...
•  Big Picture of 2016 Technology
•  Neural Net Basics
•  Deep Network Configurations for Practical Applications
–  Auto-En...
Internet of Things (IoT) is heavily signal data
http://www.datasciencecentral.com/profiles/blogs/the-internet-of-things-da...
Convolutional Neural Net (CNN)

Enables detecting shift invariant patterns
In Speech and Image applications, patterns vary...
Convolutional Neural Net (CNN)

Enables detecting shift invariant patterns
In Speech and Image applications, patterns vary...
Convolution – Shift Horizontal
•  SAME	25	WEIGHTS	FEED	INTO	
EACH	OUTPUT	
•  Backpropaga8on	weight	
update	is	averaged	
• ...
Convolution
https://en.wikipedia.org/wiki/Convolution
Convolution – Shift Horizontal & Vertical
Max pooling
Layer output
= 0.8
Convolution – 3 Weight Patterns, Shifted 2D
Hidden Layer 1 Output Sections Per Convolution 3(10x3) – detection layer
Hidde...
Convolution Neural Net (CNN)

Same Low Level Features can support different output
http://stats.stackexchange.com/question...
Convolution Neural Net: 

from LeNet-5
Gradient-Based Learning Applied to Document Recognition
Proceedings of the IEEE, No...
Convolution Neural Net (CNN)
•  How is a CNN trained differently than a typical back
propagation (BP) network?
–  Parts of...
The Mammalian Visual Cortex is Hierarchical

(The Brain is a Deep Neural Net - Yann LeCun)
http://www.pamitc.org/cvpr15/fi...
Convolution Neural Net (CNN)

Facebook example
https://gigaom.com/2014/03/18/facebook-shows-off-its-deep-learning-skills-w...
Convolution Neural Net (CNN)

Yahoo + Stanford example – find a face in a pic, even upside down
http://www.dailymail.co.uk/...
Convolutional Neural Nets (CNN)

Robotic Grasp Detection (IoT)
http://pjreddie.com/media/files/papers/grasp_detection_1.pdf
•  Big Picture of 2016 Technology
•  Neural Net Basics
•  Deep Network Configurations for Practical Applications
–  Auto-En...
Real Time Scoring

Neural Net Optimizations
•  Auto-Encoding nets
–  Can grow to millions of connections, and start to get...
© 2016 LigaData, Inc. All Rights Reserved.
 39
Real Time Scoring – Enterprise App Architecture uses

Lambda Architecture –...
© 2016 LigaData, Inc. All Rights Reserved.
 40
HDFS Spark
Map Reduce
Spark

Streaming
Storm
Real Time Scoring

Lambda Arch...
© 2016 LigaData, Inc. All Rights Reserved.
 41
Kamanja Technology Stack
Compute
Fabric
Cloud, EC2
Internal Cloud
Security
...
Deep Learning Tools
Deep Learning Tools
By Google, 600 DL proj
Speech
Google Photos
Translation
Gmail
Search
Rajat Monga, Tech Lead & Manager ...
Deep Learning Tools
https://www.tensorflow.org/versions/0.6.0/get_started/index.html
Python code to
make up data
in two di...
Deep Learning Tools
www.Kamanja.org
•  Big Picture of 2016 Technology
•  Neural Net Basics
•  Deep Network Configurations for Practical Applications
–  Auto-En...
Reinforcement Learning (RL)
•  Different than supervised and unsupervised learning
•  Q) Can the network figure out hot to ...
Deep Reinforcement Learning (RL),
Q-Learning
http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David...
Deep Reinforcement Learning, Q-Learning

(Think about IoT possibilities)
http://www.iclr.cc/lib/exe/fetch.php?media=iclr20...
Deep Reinforcement Learning, Q-Learning
http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silv...
Deep Reinforcement Learning, Q-Learning
http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silv...
•  Big Picture of 2016 Technology
•  Neural Net Basics
•  Deep Network Configurations for Practical Applications
–  Auto-En...
Continuous Space Word Models (word2vec)
•  Before (a predictive “Bag of Words” model):
–  One row per document, paragraph ...
Continuous Space Word Models (word2vec)
•  Before (a predictive “Bag of Words” model):
–  One row per document, paragraph ...
Continuous Space Word Models (word2vec)

How to SCALE to larger vocabularies?
http://www.slideshare.net/hustwj/cikm-keynot...
Training Continuous Space Word Models
•  How to Train These Models?
–  Raw data: “This example sentence shows the word2vec...
Training Continuous Space Word Models
•  How to Train These Models?
–  Raw data: “This example sentence shows the word2vec...
Training Continuous Space Word Models
•  Use Pre-Trained Models https://code.google.com/p/word2vec/
–  Trained on 100 bill...
Training Continuous Space Word Models
http://www.slideshare.net/hustwj/cikm-keynotenov2014?qid=f92c9e86-feea-41ac-a099-d08...
Applying Continuous Space Word Models
http://static.googleusercontent.com/media/research.google.com/en//people/jeff/BayLea...
“Greg’s Guts” on Deep Learning
•  Some claim the need for preprocessing and knowledge
representation has ended
–  For most...
Deep Learning Summary – ITS EXCITING!
•  Discussed Deep Learning architectures
–  Auto Encoder, convolutional, reinforceme...
© 2016 LigaData, Inc. All Rights Reserved.
Using Deep Learning to
do Real-Time Scoring in
Practical Applications
SFbayACM ...
Using Deep Learning to do Real-Time Scoring in Practical Applications
Upcoming SlideShare
Loading in …5
×

Using Deep Learning to do Real-Time Scoring in Practical Applications

4,272 views

Published on

http://www.meetup.com/SF-Bay-ACM/events/227480571/
(see also YouTube for a recording of the presentation)

The talk will cover a brief review of neural network basics and the following types of neural network deep learning:

* autocorrelational - unsupervised learning for extracting features. He will describe how additional layers build complexity in the feature extraction.
* convolutional - how to detect shift invariant patterns in various data sources. Horizontal shift invariant detection applies to signals like speech recognition or IoT data. Horizontal and vertical shift invariance applies to images or videos, for faces or self driving cars
* discuss details of applying deep net systems for continuous or real time scoring
* reinforcement learning or Q Learning - such as learning how to play Atari video games
* continuous space word models - such as word2vec, skipgram training, NLP understanding and translation

Published in: Software
  • I'd advise you to use this service: ⇒ www.WritePaper.info ⇐ The price of your order will depend on the deadline and type of paper (e.g. bachelor, undergraduate etc). The more time you have before the deadline - the less price of the order you will have. Thus, this service offers high-quality essays at the optimal price.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • It's so easy that you can find it with your eyes shut. For example, as for me the best and the most responsibly working service is this one - HelpWriting.net - you'll find there everything you need. And the prices are reasonable.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • My partner says the difference is incredible! My partner has probably punched me a hundred times to get me to roll over and stop snoring. I have been using your techniques recently and now my partner has told me that the difference is incredible. But what has amazed me the most is how much better and more energetic I now feel after a good night's sleep! Thank you so much! ●●● http://ishbv.com/snoringno/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Using Deep Learning to do Real-Time Scoring in Practical Applications

  1. 1. © 2016 LigaData, Inc. All Rights Reserved. Using Deep Learning to do Real-Time Scoring in Practical Applications SFbayACM Data Science SIG, Monday, 1/25/2016 By Greg Makowski www.Linkedin.com/in/GregMakowski greg@LigaDATA.com Community @ http://Kamanja.org Try out
  2. 2. •  Big Picture of 2016 Technology •  Neural Net Basics •  Deep Network Configurations for Practical Applications –  Auto-Encoder (i.e. data compression or Principal Components Analysis) –  Convolutional (shift invariance in time or space for voice, image or IoT) –  Real Time Scoring and Lambda Architecture –  Deep Net libraries and tools (Theano, Tourch, TensorFlow, ... Kamanja) –  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games, IoT) –  Continuous Space Word Models (i.e. word2vec) Deep Learning - Outline
  3. 3. David Clearley
  4. 4. Is Deep Learning Hype? Is this just a “buzzword of the day or year?” Is this improvement at the normal pace?
  5. 5. Is Deep Learning Hype? Is this just a “buzzword of the day or year?” Is this improvement at the normal pace? NO ! Not only a buzzword This is a leap in the rate of improvement! So What? Show me…
  6. 6. http://whatsnext.nuance.com/in-the-labs/what-is-deep-machine-learning/ Deep Learning Caused about an 18% / Year Reduction in Error in Speech Recognition (Nuance) not only did DNNs drive error rates down at once, … they promise a lot of poten8al for the years to come. It is no overstatement to say that DNNs were the single largest contributor to innovation across many of our products in recent years.
  7. 7. http://whatsnext.nuance.com/in-the-labs/what-is-deep-machine-learning/ 2008, 09, 10, 11, 12, 13, 14, 15 Deep Learning Caused about an 18% / Year Reduction in Error in Speech Recognition (Nuance) What if Moore’s Law changed from 2X to 4x over the last 7 years, because of a new technology advance! not only did DNNs drive error rates down at once, … they promise a lot of poten8al for the years to come. It is no overstatement to say that DNNs were the single largest contributor to innovation across many of our products in recent years.
  8. 8. Neural Net training is 10+ times faster on GPU’s
 The gaming market is pushing for faster GPU speeds https://jonpeddie.com/publications/ whitepapers/an-analysis-of-the-gpu- market https://developer.nvidia.com/cudnn
  9. 9. •  Big Picture of 2016 Technology •  Neural Net Basics •  Deep Network Configurations for Practical Applications –  Auto-Encoder (i.e. data compression or Principal Components Analysis) –  Convolutional (shift invariance in time or space for voice, image or IoT) –  Real Time Scoring and Lambda Architecture –  Deep Net libraries and tools (Theano, Tourch, TensorFlow, ... Kamanja) –  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games, IoT) –  Continuous Space Word Models (i.e. word2vec) Deep Learning - Outline
  10. 10. Advantages of a Net over Regression 10 field 1 field 2 $ c $ $ $ $ $ $ $ $ $ $ $ $ $ c c c c c c c c c c c c c c c c c c c c c c c c A Regression Solution “Linear” Fit one Line $ c Target values for a data point with source field values graphed by “field 1” and “field 2” Showing ONE target field, with values of $ or c https://en.wikipedia.org/wiki/Regression_analysis
  11. 11. Advantages of a Net ! over Regression! 11 field 1 field 2 $ c $ $ $ $ $ $ $ $ $ $ $ $ $ c c c c c c c c c c c c c c c c c c c c c c c c A Neural Net Solution “Non-Linear” Several regions which are not adjacent Hidden nodes can be line or circle https://en.wikipedia.org/wiki/Artificial_neural_network
  12. 12. ! A Comparison of a Neural Net! and Regression! A Logistic regression formula: Y = f( a0 + a1*X1 + a2*X2 + a3*X3) a* are coefficients Backpropagation, cast in a similar form: H1 = f(w0 + w1*I1 + w2*I2 + w3*I3) H2 = f(w4 + w5*I1 + w6*I2 + w7*I3) : Hn = f(w8 + w9*I1 + w10*I2 + w11*I3) O1 = f(w12 + w13*H1 + .... + w15*Hn) On = .... w* are weights, AKA coefficients I1..In are input nodes or input variables. H1..Hn are hidden nodes, which extract features of the data. O1..On are the outputs, which group disjoint categories. Look at ratio of training records v.s. free parameters (complexity, regularization) a0 a1 a2 a3 X1 X2 X3 Y Input 1 I2 I3 Bias H1 Hidden 2 Output w1 w2 w3 Dot product is Cosine similarity, used broadly Tensors are matrices of N dimensions
  13. 13. Think of Separating Land vs. Water 13 1 line, Regression (more errors) 5 Hidden Nodes in a Neural Network Different algorithms use different Basis Functions: •  One line •  Many horizontal & vertical lines •  Many diagonal lines •  Circles Decision Tree 12 splits (more elements, Less computation) Q) What is too detailed? “Memorizing high tide boundary” and applying it at all times
  14. 14. •  Big Picture of 2016 Technology •  Neural Net Basics •  Deep Network Configurations for Practical Applications –  Auto-Encoder (i.e. data compression or Principal Components Analysis) –  Convolutional (shift invariance in time or space for voice, image or IoT) –  Real Time Scoring and Lambda Architecture –  Deep Net libraries and tools (Theano, Tourch, TensorFlow, ... Kamanja) –  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games, IoT) –  Continuous Space Word Models (i.e. word2vec) Deep Learning - Outline http://deeplearning.net/ http://www.kdnuggets.com/ http://www.analyticbridge.com/
  15. 15. Leading up to an Auto Encoder 
 •  Supervised Learning –  Regression (one layer, one line, one dot-product) •  50 inputs à 1 output –  Possible nets: •  256 à 120 à 1 •  256 à 120 à 5 (trees, regression, SVM & most algs are limited to 1 output) •  256 ! 120 ! 60 ! 1 (can try 2 hidden layers, 3 sets of weights) •  256 à 180 à 120 à 60 à 1 (start getting into training stability problems, with 1990’s training processes) •  Unsupervised Learning –  Clustering (traditional unsupervised): •  60 inputs (no output target); produce 1-2 new (cluster ID & distance)
  16. 16. Auto Encoder (like data compression)
 Relate input to output, through compressed middle At each step of training Only train the black connections 256 256 180 Output (same as input values) Input …256 120 180 … 256 180 … … … Step 1, Train 1st Hidden Layer (Tensor) Step 2, Train 2nd Hidden Layer (Tensor) Called “Auto Encoder” because input values = target values Unsupervised, there are no additional target values “Data Compression” because Compress 256 numbers into 180 numbers
  17. 17. Auto Encoder (like data compression)
 Relate input to output, through compressed middle •  Supervised Learning –  Regression, Tree or Net: 50 inputs à 1 output –  Possible nets: •  256 à 120 à 1 •  256 à 120 à 5 (trees, regressions, SVD and most are limited to 1 output) •  256 à 120 à 60 à 1 •  256 à 180 à 120 à 60 à 1 •  Unsupervised Learning –  Clustering (traditional unsupervised): •  60 inputs (no target); produce 1-2 new (cluster ID & distance) –  Unsupervised training of a net, assign (target record == input record) AUTO- ENCODING –  Train net in stages, •  256 à 180 à 256 à 120 à à 120 à à 120 à •  Add supervised layer to forecast 10 target categories à 10 Because of symmetry, Only need to update mirrored weights once (start getting long training times to stabilize, or may not finish, The BREAKTHROUGH provided by DEEP LEARNING) 4 hidden layers w/ unsupervised training 1 layer at end w/ supervised training https://en.wikipedia.org/wiki/Deep_learning
  18. 18. Auto Encoder (like data compression)
 With Supervised Layers on Top Unsupervised output Like cluster output, Only large values are a match (not distance) Train Supervised Layers on Top Regular Back Propagation Using unsupervised nodes as input 256 180 120 120 : Target specific to the problem Fraud risk * $ Cat, dog, human, other 256 180 120 120 : 50 1, 2, 10 or…
  19. 19. Auto Encoder
 How it can be generally used to solve problems •  Add supervised layer to forecast 10 target categories –  4 hidden layers trained with unuspervised training, –  1 new layer, trained with supervised learning à 10 •  Outlier detection •  The “activation” at each of the 120 output nodes indicates the “match” to that cluster or compressed feature •  When scoring new records, can detect outliers with a process like If ( max_output_match < 0.333) then suspected outlier •  How is it like PCA? –  Individual hidden nodes in the same layer are “different” or “orthogonal”
  20. 20. Fraud Detection Example using
 Deep Learning – auto encoders •  Unsupervised Learning of Normal Behavior(Outlier Detection) –  May want to preprocess transaction data - in the context of the person’s past normal behavior •  0..1, where 1 is the most SURPRISING for that person to act •  0..1, where 1 is the most RISKY of fraud •  General, descriptive attributes that can be used for interactions •  Filter out from the training data – the most surprising & risky •  Want to the net to learn “normal” records –  Train 5-10 layers deep, end up with 50 to 100+ nodes at end –  Score records on membership in final nodes •  Transactions that are far from all final nodes are candidates for outliers •  Validate with existing surprising & risky. Add application post-processing •  Supervised Learning –  Add two layers on top, train to predict normal vs. surprising/risky labeled data (if it is available)
  21. 21. •  Big Picture of 2016 Technology •  Neural Net Basics •  Deep Network Configurations for Practical Applications –  Auto-Encoder (i.e. data compression or Principal Components Analysis) –  Convolutional (shift invariance in time or space for voice, image or IoT) –  Real Time Scoring and Lambda Architecture –  Deep Net libraries and tools (Theano, Tourch, TensorFlow, ... Kamanja) –  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games, IoT) –  Continuous Space Word Models (i.e. word2vec) Deep Learning - Outline
  22. 22. Internet of Things (IoT) is heavily signal data http://www.datasciencecentral.com/profiles/blogs/the-internet-of-things-data-science-and-big-data
  23. 23. Convolutional Neural Net (CNN)
 Enables detecting shift invariant patterns In Speech and Image applications, patterns vary by size, can be shifted right or left Challenge: finding a bounding box for a pattern is almost as hard as detecting the pat. Neural Nets can be explicitly trained to provide a FFT (Fast Fourier Transform) to convert data from time domain to the frequency domain – but typically an explicit FFT is used Internet of Things Signal Data
  24. 24. Convolutional Neural Net (CNN)
 Enables detecting shift invariant patterns In Speech and Image applications, patterns vary by size, can be shifted right or left Challenge: finding a bounding box for a pattern is almost as hard as detecting the pat. Solution: use a siding convolution to detect the pattern CNN can use very long observational windows, up to 400 ms, long context
  25. 25. Convolution – Shift Horizontal •  SAME 25 WEIGHTS FEED INTO EACH OUTPUT •  Backpropaga8on weight update is averaged •  Otherwise NO convolu8on and HUGE complexity! Max pooling Layer output = 1.2
  26. 26. Convolution https://en.wikipedia.org/wiki/Convolution
  27. 27. Convolution – Shift Horizontal & Vertical Max pooling Layer output = 0.8
  28. 28. Convolution – 3 Weight Patterns, Shifted 2D Hidden Layer 1 Output Sections Per Convolution 3(10x3) – detection layer Hidden Layer 1 Weights Per Conv. Pattern Input pixels, audio, video or IoT signal (14 x 7). Convolutions can be over 3+ dimensions (video frames, time invariance) Max pooling layer output = 0.8 Max pooling layer output = 1.0 Max pooling layer output = 0.9
  29. 29. Convolution Neural Net (CNN)
 Same Low Level Features can support different output http://stats.stackexchange.com/questions/146413/why-convolutional-neural-networks-belong-to-deep-learning Previous Slides Showed Training this Hidden 1 Layer Same Training process for later hidden layers, one at a time Think of fraud detection higher level node patterns
  30. 30. Convolution Neural Net: 
 from LeNet-5 Gradient-Based Learning Applied to Document Recognition Proceedings of the IEEE, Nov 1998 Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner Director Facebook, AI Research http://yann.lecun.com/Can do some size invariance, but it adds to the layers
  31. 31. Convolution Neural Net (CNN) •  How is a CNN trained differently than a typical back propagation (BP) network? –  Parts of the training which is the same: •  Present input record •  Forward pass through the network •  Back propagate error (i.e. per epoch) –  Different parts of training: •  Some connections are CONSTRAINED to the same value –  The connections for the same pattern, sliding over all input space •  Error updates are averaged and applied equally to the one set of weight values •  End up with the same pattern detector feeding many nodes at the next level http://www.cs.toronto.edu/~rgrosse/icml09-cdbn.pdf Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations, 2009
  32. 32. The Mammalian Visual Cortex is Hierarchical
 (The Brain is a Deep Neural Net - Yann LeCun) http://www.pamitc.org/cvpr15/files/lecun-20150610-cvpr-keynote.pdf 0 1 2 3 4 5 6 7 8 9 1011
  33. 33. Convolution Neural Net (CNN)
 Facebook example https://gigaom.com/2014/03/18/facebook-shows-off-its-deep-learning-skills-with-deepface/
  34. 34. Convolution Neural Net (CNN)
 Yahoo + Stanford example – find a face in a pic, even upside down http://www.dailymail.co.uk/sciencetech/article-2958597/Facial-recognition-breakthrough-Deep-Dense-software-spots-faces-images-partially-hidden-UPSIDE-DOWN.html
  35. 35. Convolutional Neural Nets (CNN)
 Robotic Grasp Detection (IoT) http://pjreddie.com/media/files/papers/grasp_detection_1.pdf
  36. 36. •  Big Picture of 2016 Technology •  Neural Net Basics •  Deep Network Configurations for Practical Applications –  Auto-Encoder (i.e. data compression or Principal Components Analysis) –  Convolutional (shift invariance in time or space for voice, image or IoT) –  Real Time Scoring and Lambda Architecture –  Deep Net libraries and tools (Theano, Tourch, TensorFlow, ... Kamanja) –  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games, IoT) –  Continuous Space Word Models (i.e. word2vec) Deep Learning - Outline
  37. 37. Real Time Scoring
 Neural Net Optimizations •  Auto-Encoding nets –  Can grow to millions of connections, and start to get computational –  Can reduce connections by 5% to 25+% with pruning & retraining •  Train with increased regularization settings •  Drop connections with near zero weights, then retrain •  Drop nodes with fan in connections which don’t get used much later, such as in your predictive problem •  Perform sensitivity analysis – delete possible input fields •  Convolutional Neural Nets –  With large enough data, can even skip the FFT preprocessing step –  Can use wider than 10ms audio sampling rates for speed up •  Implement other preprocessing as lookup tables (i.e. Bayesian Priors) •  Use cloud computing, do not limit to device computing •  Large models don’t fit à use model or data parallelism to train
  38. 38. © 2016 LigaData, Inc. All Rights Reserved. 39 Real Time Scoring – Enterprise App Architecture uses
 Lambda Architecture – for both Batch and Real Time •  First architecture to really define how batch and stream processing can work together •  Founded on the concepts of immutability and re-computation, with human fault tolerance •  Pre-computes the results of batch & real-time processes as a set of views, & query layer merges the views https://en.wikipedia.org/wiki/Lambda_architecture Speed Layer Batch Layer Query Master Data Batch Processing Speed Processing Real-time AnalyticsSpeed Views Batch Views Merged ViewsNew Data B New Data A
  39. 39. © 2016 LigaData, Inc. All Rights Reserved. 40 HDFS Spark Map Reduce Spark
 Streaming Storm Real Time Scoring
 Lambda Architecture With Kamanja Decisioning Layer Batch Layer Query Master Data Batch Processing Real-time Analytics Action Queue Serving Layer Speed Views Batch Views Merged Views Kafka MQ Files AVRO/ Flume Continuous Decisioning Cassandra HBase Druid PMML, Java, Scala, Python, Deep Learning Kafka MQ All New Data Speed Layer Speed Processing Decisioning Data Continuous Feedback Model Data Elephant DB Impala
  40. 40. © 2016 LigaData, Inc. All Rights Reserved. 41 Kamanja Technology Stack Compute Fabric Cloud, EC2 Internal Cloud Security Kerberos Real Time Streaming Kafka, MQ Spark* LigaData Data Store HBase, Cassandra, InfluxDB HDFS (Create adaptors to integrate others) Resource Management Zookeeper, Yarn* High Level Languages / Abstractions PMML Producers, MLlib Real Time Computing Kamanja PMML, Java, Scala
  41. 41. Deep Learning Tools
  42. 42. Deep Learning Tools By Google, 600 DL proj Speech Google Photos Translation Gmail Search Rajat Monga, Tech Lead & Manager for TensorFlow
  43. 43. Deep Learning Tools https://www.tensorflow.org/versions/0.6.0/get_started/index.html Python code to make up data in two dimensions and then fit it
  44. 44. Deep Learning Tools www.Kamanja.org
  45. 45. •  Big Picture of 2016 Technology •  Neural Net Basics •  Deep Network Configurations for Practical Applications –  Auto-Encoder (i.e. data compression or Principal Components Analysis) –  Convolutional (shift invariance in time or space for voice, image or IoT) –  Real Time Scoring and Lambda Architecture –  Deep Net libraries and tools (Theano, Tourch, TensorFlow, ... Kamanja) –  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games, IoT) –  Continuous Space Word Models (i.e. word2vec) Deep Learning - Outline
  46. 46. Reinforcement Learning (RL) •  Different than supervised and unsupervised learning •  Q) Can the network figure out hot to take one or more actions NOW, to achieve a reward or payout (potentially far-off, i.e. T steps in the FUTURE? •  Need to solve the credit assignment problem –  There is no teacher and very little labeled data –  Need to learn the best POLICY that will achieve the best outcome –  Assume no knowledge of the process model or reward function •  Next guess = –  Linear combination of ((current guess) and (the new reward info just collected)), weighted by the learning rate http://www.humphreysheil.com/blog/gorila-google-reinforcement-learning-architecture http://robotics.ai.uiuc.edu/~scandido/?Developing_Reinforcement_Learning_from_the_Bellman_Equation
  47. 47. Deep Reinforcement Learning (RL), Q-Learning http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind https://en.wikipedia.org/wiki/Reinforcement_learning https://en.wikipedia.org/wiki/Q-learning Think in terms of IoT…. Device agent measures, infers user’s action Maximizes future reward, recommends to user or system
  48. 48. Deep Reinforcement Learning, Q-Learning
 (Think about IoT possibilities) http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind Use last 4 screen shots
  49. 49. Deep Reinforcement Learning, Q-Learning http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind Use 4 screen shots Use 4 screen shots IoT challenge: How to replace game score with IoT score? Shift right fast shift right stay shift left shift left fast
  50. 50. Deep Reinforcement Learning, Q-Learning http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind Games w/ best Q-learning Video Pinball Breakout Star Gunner Crazy Climber Gopher
  51. 51. •  Big Picture of 2016 Technology •  Neural Net Basics •  Deep Network Configurations for Practical Applications –  Auto-Encoder (i.e. data compression or Principal Components Analysis) –  Convolutional (shift invariance in time or space for voice, image or IoT) –  Real Time Scoring –  Deep Net libraries and tools (Theano, Tourch, TensorFlow, ... Kamanja) –  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games, IoT) –  Continuous Space Word Models (i.e. word2vec) Deep Learning - Outline
  52. 52. Continuous Space Word Models (word2vec) •  Before (a predictive “Bag of Words” model): –  One row per document, paragraph or web page –  Binary word space: 10k to 200k columns, one per word or phrase 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 …. “This word space model is ….” –  The “Bag of words model” relates input record to a target category
  53. 53. Continuous Space Word Models (word2vec) •  Before (a predictive “Bag of Words” model): –  One row per document, paragraph or web page –  Binary word space: 10k to 200k columns, one per word or phrase 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 …. “This word space model is ….” –  The “Bag of words model” relates input record to a target category •  New: –  One row per word (word2vec), possibly per sentence (sent2vec) –  Continuous word space: 100 to 300 columns, continuous values .01 .05 .02 .00 .00 .68 .01 .01 .35 ... .00 à “King” .00 .00 .05 .01 .49 .52 .00 .11 .84 ... .01 à “Queen” –  The deep net training resulted in an Emergent Property: •  Numeric geometry location relates to concept space •  “King” – “man” + “woman” = “Queen” (math to change gender relation) •  “USA” – “Washington DC” + “England” = “London” (math for capital relation)
  54. 54. Continuous Space Word Models (word2vec)
 How to SCALE to larger vocabularies? http://www.slideshare.net/hustwj/cikm-keynotenov2014?qid=f92c9e86-feea-41ac-a099-d086efa6fac1&v=default&b=&from_search=2
  55. 55. Training Continuous Space Word Models •  How to Train These Models? –  Raw data: “This example sentence shows the word2vec model training.”
  56. 56. Training Continuous Space Word Models •  How to Train These Models? –  Raw data: “This example sentence shows the word2vec model training.” –  Training data (with target values underscored, and other words as input) “This example sentence shows word2vec” (prune “the”) “example sentence shows word2vec model” “sentence shows word2vec model training” –  The context of the 2 to 5 prior and following words predict the middle word –  Deep Net model architecture, data compression to 300 continuous nodes •  50k binary word input vector à ... à 300 à ... à 50k word target vector
  57. 57. Training Continuous Space Word Models •  Use Pre-Trained Models https://code.google.com/p/word2vec/ –  Trained on 100 billion words from Google News –  300 dim vectors for 3 million words and phrases –  https://code.google.com/p/word2vec/ •  Questions on re-use: –  What if I want to train to add client terms or docs? –  What about stability (keeping past training) vs. placticity (learning new content)
  58. 58. Training Continuous Space Word Models http://www.slideshare.net/hustwj/cikm-keynotenov2014?qid=f92c9e86-feea-41ac-a099-d086efa6fac1&v=default&b=&from_search=2
  59. 59. Applying Continuous Space Word Models http://static.googleusercontent.com/media/research.google.com/en//people/jeff/BayLearn2015.pdf State of the art in machine translation Sequence to Sequence Learning with neural Networks, NIPS 2014 Language translation Document summary Generate text captions for pictures .01 .05 .89 .00 .05 .62 .00 .34
  60. 60. “Greg’s Guts” on Deep Learning •  Some claim the need for preprocessing and knowledge representation has ended –  For most of the signal processing applications à yes, simplify –  I am VERY READY TO COMPETE in other applications, continuing •  expressing explicit domain knowledge – using lookup data for context •  optimizing business value calculations •  Deep Learning gets big advantages from big data –  Why? Better populating high dimensional space combination subsets –  Unsupervised feature extraction reduces need for large labeled data •  However, “regular sized data” gets a big boost as well –  The “ratio of free parameters” (i.e. neurons) to training set records –  For regressions or regular nets, want 5-10 times as many records –  Regularization and weight drop out reduces this pressure –  Especially when only training “the next auto encoding layer”
  61. 61. Deep Learning Summary – ITS EXCITING! •  Discussed Deep Learning architectures –  Auto Encoder, convolutional, reinforcement learning, continuous word •  Real Time speed up –  Train model, reduce complexity, retrain –  Simplify preprocessing with lookup tables –  Use cloud computing, do not be limited to device computing –  Lambda architecture like Kamanja, to combine real time and batch •  Applications –  Fraud detection –  Signal Data: IoT, Speech, Images –  Control System models (like Atari game playing, IoT) –  Language Models https://www.quora.com/Why-is-deep-learning-in-such-demand-now
  62. 62. © 2016 LigaData, Inc. All Rights Reserved. Using Deep Learning to do Real-Time Scoring in Practical Applications SFbayACM Data Science Meetup Monday 1/25/2016 By Greg Makowski www.Linkedin.com/in/GregMakowski greg@LigaDATA.com Community @ http://Kamanja.org Try out

×