SlideShare a Scribd company logo
© 2016 LigaData, Inc. All Rights Reserved.
Using Deep Learning to
do Real-Time Scoring in
Practical Applications
SFbayACM Data Science SIG, Monday, 1/25/2016
By Greg Makowski
www.Linkedin.com/in/GregMakowski
greg@LigaDATA.com
Community @ http://Kamanja.org
Try out
•  Big Picture of 2016 Technology
•  Neural Net Basics
•  Deep Network Configurations for Practical Applications
–  Auto-Encoder (i.e. data compression or Principal Components
Analysis)
–  Convolutional (shift invariance in time or space for voice, image or IoT)
–  Real Time Scoring and Lambda Architecture
–  Deep Net libraries and tools (Theano, Tourch, TensorFlow, ...
Kamanja)
–  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games,
IoT)
–  Continuous Space Word Models (i.e. word2vec)
Deep Learning - Outline
David
Clearley
Is Deep Learning Hype?
Is this just a “buzzword of the day or year?”
Is this improvement at the normal pace?
Is Deep Learning Hype?
Is this just a “buzzword of the day or year?”
Is this improvement at the normal pace?
NO !
Not only a buzzword
This is a leap in the rate of improvement!
So What? Show me…
http://whatsnext.nuance.com/in-the-labs/what-is-deep-machine-learning/
Deep Learning Caused about an 18% / Year Reduction
in Error in Speech Recognition (Nuance)
not	only	did	DNNs	drive	
error	rates	down	at	once,	…
they	promise	a	lot	of	
poten8al	for	the	years	to	
come.		
It is no overstatement to say
that DNNs were the single
largest contributor to
innovation across many of
our products in recent years.
http://whatsnext.nuance.com/in-the-labs/what-is-deep-machine-learning/
2008, 09, 10, 11, 12, 13, 14, 15
Deep Learning Caused about an 18% / Year Reduction
in Error in Speech Recognition (Nuance)
What	if	Moore’s	Law	changed	from	2X	to	4x	over	the	last	7	years,		
because	of	a	new	technology	advance!	
not	only	did	DNNs	drive	
error	rates	down	at	once,	…
they	promise	a	lot	of	
poten8al	for	the	years	to	
come.		
It is no overstatement to say
that DNNs were the single
largest contributor to
innovation across many of
our products in recent years.
Neural Net training is 10+ times faster on GPU’s

The gaming market is pushing for faster GPU speeds
https://jonpeddie.com/publications/
whitepapers/an-analysis-of-the-gpu-
market
https://developer.nvidia.com/cudnn
•  Big Picture of 2016 Technology
•  Neural Net Basics
•  Deep Network Configurations for Practical Applications
–  Auto-Encoder (i.e. data compression or Principal Components
Analysis)
–  Convolutional (shift invariance in time or space for voice, image or IoT)
–  Real Time Scoring and Lambda Architecture
–  Deep Net libraries and tools (Theano, Tourch, TensorFlow, ...
Kamanja)
–  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games,
IoT)
–  Continuous Space Word Models (i.e. word2vec)
Deep Learning - Outline
Advantages	of	a	Net		
over	Regression	
10	
field	1	
field	2	
$	
c	
$	
$	
$	
$	
$	
$	
$	
$	$	
$	 $	
$	
$	
c	
c	
c	
c	
c	
c	
c	
c	
c	
c	
c	
c	
c	
c	
c	
c	 c	
c	
c	
c	 c	
c	
c	
c	
A Regression
Solution
“Linear”
Fit one Line
$	 c	
Target	values	for	a	
data	point	with	source		
field	values	graphed	by	
	“field	1”	and	“field	2”	
Showing ONE target field, with values of $ or c
https://en.wikipedia.org/wiki/Regression_analysis
Advantages of a Net !
over Regression!
11	
field	1	
field	2	
$	
c	
$	
$	
$	
$	
$	
$	
$	
$	$	
$	 $	
$	
$	
c	
c	
c	
c	
c	
c	
c	
c	
c	
c	
c	
c	
c	
c	
c	
c	 c	
c	
c	
c	 c	
c	
c	
c	
A Neural Net
Solution
“Non-Linear”
Several
regions
which are
not adjacent
Hidden nodes
can be line or
circle
https://en.wikipedia.org/wiki/Artificial_neural_network
!
A Comparison of a Neural Net!
and Regression!
A Logistic regression formula:
Y = f( a0 + a1*X1 + a2*X2 + a3*X3)
a* are coefficients
Backpropagation, cast in a similar form:
H1 = f(w0 + w1*I1 + w2*I2 + w3*I3)
H2 = f(w4 + w5*I1 + w6*I2 + w7*I3)
:
Hn = f(w8 + w9*I1 + w10*I2 + w11*I3)
O1 = f(w12 + w13*H1 + .... + w15*Hn)
On = ....
w* are weights, AKA coefficients
I1..In are input nodes or input variables.
H1..Hn are hidden nodes, which extract features of the data.
O1..On are the outputs, which group disjoint categories.
Look at ratio of training records v.s. free parameters (complexity, regularization)
a0	
a1	 a2	 a3	
X1	 X2	 X3	
Y	
Input	1	 I2	 I3	
Bias	
H1	 Hidden	2	
Output	
w1	
w2	
w3	
Dot product is
Cosine similarity,
used broadly
Tensors
are matrices
of N dimensions
Think of Separating Land vs. Water
13
1 line,
Regression
(more errors)
5 Hidden Nodes in
a Neural Network
Different algorithms use
different Basis Functions:
•  One line
•  Many horizontal & vertical lines
•  Many diagonal lines
•  Circles
Decision Tree
12 splits
(more elements,
Less computation)
Q) What is too detailed? “Memorizing high tide boundary” and applying it at all times
•  Big Picture of 2016 Technology
•  Neural Net Basics
•  Deep Network Configurations for Practical Applications
–  Auto-Encoder (i.e. data compression or Principal Components
Analysis)
–  Convolutional (shift invariance in time or space for voice, image or IoT)
–  Real Time Scoring and Lambda Architecture
–  Deep Net libraries and tools (Theano, Tourch, TensorFlow, ...
Kamanja)
–  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games,
IoT)
–  Continuous Space Word Models (i.e. word2vec)
Deep Learning - Outline
http://deeplearning.net/
http://www.kdnuggets.com/
http://www.analyticbridge.com/
Leading up to an Auto Encoder 

•  Supervised Learning
–  Regression (one layer, one line, one dot-product)
•  50 inputs à 1 output
–  Possible nets:
•  256 à 120 à 1
•  256 à 120 à 5 (trees, regression, SVM & most algs are limited to 1
output)
•  256 ! 120 ! 60 ! 1 (can try 2 hidden layers, 3 sets of weights)
•  256 à 180 à 120 à 60 à 1 (start getting into training stability problems,
with 1990’s training processes)
•  Unsupervised Learning
–  Clustering (traditional unsupervised):
•  60 inputs (no output target); produce 1-2 new (cluster ID & distance)
Auto Encoder (like data compression)

Relate input to output, through compressed middle
At each step of training
Only train the black connections
256
256
180
Output
(same as input values)
Input
…256
120
180 …
256
180
…
…
…
Step 1,
Train 1st Hidden Layer (Tensor)
Step 2,
Train 2nd Hidden Layer (Tensor)
Called “Auto Encoder” because
input values = target values
Unsupervised, there are no additional target values
“Data Compression” because
Compress 256 numbers into 180 numbers
Auto Encoder (like data compression)

Relate input to output, through compressed middle
•  Supervised Learning
–  Regression, Tree or Net: 50 inputs à 1 output
–  Possible nets:
•  256 à 120 à 1
•  256 à 120 à 5 (trees, regressions, SVD and most are limited to 1 output)
•  256 à 120 à 60 à 1
•  256 à 180 à 120 à 60 à 1
•  Unsupervised Learning
–  Clustering (traditional unsupervised):
•  60 inputs (no target); produce 1-2 new (cluster ID & distance)
–  Unsupervised training of a net, assign (target record == input record) AUTO-
ENCODING
–  Train net in stages,
•  256 à 180 à 256
à 120 à
à 120 à
à 120 à
•  Add supervised layer to forecast 10 target categories
à 10
Because of symmetry,
Only need to update
mirrored weights once
(start getting long training times to stabilize, or may not finish,
The BREAKTHROUGH provided by DEEP LEARNING)
4 hidden layers w/ unsupervised training
1 layer at end w/ supervised training
https://en.wikipedia.org/wiki/Deep_learning
Auto Encoder (like data compression)

With Supervised Layers on Top
Unsupervised output
Like cluster output,
Only large values are a match
(not distance)
Train Supervised Layers on Top
Regular Back Propagation
Using unsupervised nodes as input
256
180
120
120
:
Target specific to the problem
Fraud risk * $
Cat, dog, human, other
256
180
120
120
:
50
1, 2, 10 or…
Auto Encoder

How it can be generally used to solve problems
•  Add supervised layer to forecast 10 target categories
–  4 hidden layers trained with unuspervised training,
–  1 new layer, trained with supervised learning
à 10
•  Outlier detection
•  The “activation” at each of the 120 output nodes indicates the “match” to
that cluster or compressed feature
•  When scoring new records, can detect outliers with a process like
If ( max_output_match < 0.333) then suspected outlier
•  How is it like PCA?
–  Individual hidden nodes in the same layer are “different” or “orthogonal”
Fraud Detection Example using

Deep Learning – auto encoders
•  Unsupervised Learning of Normal Behavior(Outlier Detection)
–  May want to preprocess transaction data - in the context of the
person’s past normal behavior
•  0..1, where 1 is the most SURPRISING for that person to act
•  0..1, where 1 is the most RISKY of fraud
•  General, descriptive attributes that can be used for interactions
•  Filter out from the training data – the most surprising & risky
•  Want to the net to learn “normal” records
–  Train 5-10 layers deep, end up with 50 to 100+ nodes at end
–  Score records on membership in final nodes
•  Transactions that are far from all final nodes are candidates for outliers
•  Validate with existing surprising & risky. Add application post-processing
•  Supervised Learning
–  Add two layers on top, train to predict normal vs. surprising/risky
labeled data (if it is available)
•  Big Picture of 2016 Technology
•  Neural Net Basics
•  Deep Network Configurations for Practical Applications
–  Auto-Encoder (i.e. data compression or Principal Components
Analysis)
–  Convolutional (shift invariance in time or space for voice, image or IoT)
–  Real Time Scoring and Lambda Architecture
–  Deep Net libraries and tools (Theano, Tourch, TensorFlow, ...
Kamanja)
–  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games,
IoT)
–  Continuous Space Word Models (i.e. word2vec)
Deep Learning - Outline
Internet of Things (IoT) is heavily signal data
http://www.datasciencecentral.com/profiles/blogs/the-internet-of-things-data-science-and-big-data
Convolutional Neural Net (CNN)

Enables detecting shift invariant patterns
In Speech and Image applications, patterns vary by size, can be shifted right or left
Challenge: finding a bounding box for a pattern is almost as hard as detecting the pat.
Neural Nets can be explicitly trained to provide a FFT (Fast Fourier Transform)
to convert data from time domain to the frequency domain – but typically an explicit FFT is used
Internet of
Things Signal
Data
Convolutional Neural Net (CNN)

Enables detecting shift invariant patterns
In Speech and Image applications, patterns vary by size, can be shifted right or left
Challenge: finding a bounding box for a pattern is almost as hard as detecting the pat.
Solution: use a siding convolution to detect the pattern
CNN can use very long observational windows, up to 400 ms, long context
Convolution – Shift Horizontal
•  SAME	25	WEIGHTS	FEED	INTO	
EACH	OUTPUT	
•  Backpropaga8on	weight	
update	is	averaged	
•  Otherwise	NO	convolu8on	and	
HUGE	complexity!	
Max pooling
Layer output = 1.2
Convolution
https://en.wikipedia.org/wiki/Convolution
Convolution – Shift Horizontal & Vertical
Max pooling
Layer output
= 0.8
Convolution – 3 Weight Patterns, Shifted 2D
Hidden Layer 1 Output Sections Per Convolution 3(10x3) – detection layer
Hidden Layer
1 Weights
Per Conv.
Pattern
Input pixels,
audio, video
or IoT signal
(14 x 7).
Convolutions
can be over 3+
dimensions
(video frames,
time invariance)
Max pooling layer output = 0.8 Max pooling layer output = 1.0 Max pooling layer output = 0.9
Convolution Neural Net (CNN)

Same Low Level Features can support different output
http://stats.stackexchange.com/questions/146413/why-convolutional-neural-networks-belong-to-deep-learning
Previous Slides Showed Training
this Hidden 1 Layer
Same Training process
for later hidden layers,
one at a time
Think of fraud detection
higher level
node patterns
Convolution Neural Net: 

from LeNet-5
Gradient-Based Learning Applied to Document Recognition
Proceedings of the IEEE, Nov 1998
Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner
Director
Facebook, AI Research
http://yann.lecun.com/Can do some size invariance,
but it adds to the layers
Convolution Neural Net (CNN)
•  How is a CNN trained differently than a typical back
propagation (BP) network?
–  Parts of the training which is the same:
•  Present input record
•  Forward pass through the network
•  Back propagate error (i.e. per epoch)
–  Different parts of training:
•  Some connections are CONSTRAINED to the same value
–  The connections for the same pattern, sliding over all input space
•  Error updates are averaged and applied equally to the one set of weight
values
•  End up with the same pattern detector feeding many nodes at the next
level
http://www.cs.toronto.edu/~rgrosse/icml09-cdbn.pdf
Convolutional Deep Belief Networks for Scalable
Unsupervised Learning of Hierarchical Representations, 2009
The Mammalian Visual Cortex is Hierarchical

(The Brain is a Deep Neural Net - Yann LeCun)
http://www.pamitc.org/cvpr15/files/lecun-20150610-cvpr-keynote.pdf
0
1
2
3
4
5
6
7
8
9
1011
Convolution Neural Net (CNN)

Facebook example
https://gigaom.com/2014/03/18/facebook-shows-off-its-deep-learning-skills-with-deepface/
Convolution Neural Net (CNN)

Yahoo + Stanford example – find a face in a pic, even upside down
http://www.dailymail.co.uk/sciencetech/article-2958597/Facial-recognition-breakthrough-Deep-Dense-software-spots-faces-images-partially-hidden-UPSIDE-DOWN.html
Convolutional Neural Nets (CNN)

Robotic Grasp Detection (IoT)
http://pjreddie.com/media/files/papers/grasp_detection_1.pdf
•  Big Picture of 2016 Technology
•  Neural Net Basics
•  Deep Network Configurations for Practical Applications
–  Auto-Encoder (i.e. data compression or Principal Components
Analysis)
–  Convolutional (shift invariance in time or space for voice, image or IoT)
–  Real Time Scoring and Lambda Architecture
–  Deep Net libraries and tools (Theano, Tourch, TensorFlow, ...
Kamanja)
–  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games,
IoT)
–  Continuous Space Word Models (i.e. word2vec)
Deep Learning - Outline
Real Time Scoring

Neural Net Optimizations
•  Auto-Encoding nets
–  Can grow to millions of connections, and start to get computational
–  Can reduce connections by 5% to 25+% with pruning & retraining
•  Train with increased regularization settings
•  Drop connections with near zero weights, then retrain
•  Drop nodes with fan in connections which don’t get used much later,
such as in your predictive problem
•  Perform sensitivity analysis – delete possible input fields
•  Convolutional Neural Nets
–  With large enough data, can even skip the FFT preprocessing step
–  Can use wider than 10ms audio sampling rates for speed up
•  Implement other preprocessing as lookup tables (i.e. Bayesian
Priors)
•  Use cloud computing, do not limit to device computing
•  Large models don’t fit à use model or data parallelism to
train
© 2016 LigaData, Inc. All Rights Reserved.
 39
Real Time Scoring – Enterprise App Architecture uses

Lambda Architecture – for both Batch and Real Time
•  First architecture to really define how batch and stream processing can work together
•  Founded on the concepts of immutability and re-computation, with human fault tolerance
•  Pre-computes the results of batch & real-time processes as a set of views, & query layer
merges the views
https://en.wikipedia.org/wiki/Lambda_architecture
Speed Layer
Batch Layer
Query
Master
Data
Batch
Processing
Speed
Processing Real-time
AnalyticsSpeed
Views
Batch
Views
Merged
ViewsNew
Data B
New
Data A
© 2016 LigaData, Inc. All Rights Reserved.
 40
HDFS Spark
Map Reduce
Spark

Streaming
Storm
Real Time Scoring

Lambda Architecture With Kamanja
Decisioning Layer
Batch Layer
Query
Master
Data
Batch
Processing
Real-time Analytics
Action
Queue
Serving Layer
Speed
Views
Batch
Views
Merged
Views
Kafka
MQ
Files
AVRO/
Flume
Continuous Decisioning
Cassandra
HBase
Druid
PMML, Java,
Scala, Python,
Deep Learning
Kafka
MQ
All
New
Data
Speed
Layer
Speed
Processing
Decisioning
Data
Continuous Feedback
Model
Data
Elephant DB
Impala
© 2016 LigaData, Inc. All Rights Reserved.
 41
Kamanja Technology Stack
Compute
Fabric
Cloud, EC2
Internal Cloud
Security
Kerberos
Real Time
Streaming
Kafka,
MQ
Spark*
LigaData
Data Store
HBase,
Cassandra,
InfluxDB
HDFS
(Create
adaptors to
integrate
others)
Resource
Management
Zookeeper,
Yarn*
High Level Languages /
Abstractions
PMML Producers, MLlib
Real Time Computing
Kamanja
PMML, Java, Scala
Deep Learning Tools
Deep Learning Tools
By Google, 600 DL proj
Speech
Google Photos
Translation
Gmail
Search
Rajat Monga, Tech Lead & Manager for TensorFlow
Deep Learning Tools
https://www.tensorflow.org/versions/0.6.0/get_started/index.html
Python code to
make up data
in two dimensions
and then fit it
Deep Learning Tools
www.Kamanja.org
•  Big Picture of 2016 Technology
•  Neural Net Basics
•  Deep Network Configurations for Practical Applications
–  Auto-Encoder (i.e. data compression or Principal Components
Analysis)
–  Convolutional (shift invariance in time or space for voice, image or IoT)
–  Real Time Scoring and Lambda Architecture
–  Deep Net libraries and tools (Theano, Tourch, TensorFlow, ...
Kamanja)
–  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games,
IoT)
–  Continuous Space Word Models (i.e. word2vec)
Deep Learning - Outline
Reinforcement Learning (RL)
•  Different than supervised and unsupervised learning
•  Q) Can the network figure out hot to take one or more
actions NOW, to achieve a reward or payout (potentially
far-off, i.e. T steps in the FUTURE?
•  Need to solve the credit assignment problem
–  There is no teacher and very little labeled data
–  Need to learn the best POLICY that will achieve the best outcome
–  Assume no knowledge of the process model or reward function
•  Next guess =
–  Linear combination of ((current guess) and
(the new reward info just collected)), weighted by the learning rate
http://www.humphreysheil.com/blog/gorila-google-reinforcement-learning-architecture
http://robotics.ai.uiuc.edu/~scandido/?Developing_Reinforcement_Learning_from_the_Bellman_Equation
Deep Reinforcement Learning (RL),
Q-Learning
http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind
https://en.wikipedia.org/wiki/Reinforcement_learning
https://en.wikipedia.org/wiki/Q-learning
Think in terms of IoT….
Device agent measures, infers user’s action
Maximizes future reward, recommends to user or system
Deep Reinforcement Learning, Q-Learning

(Think about IoT possibilities)
http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf
David Silver, Google DeepMind
Use
last 4
screen
shots
Deep Reinforcement Learning, Q-Learning
http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind
Use 4
screen
shots
Use 4 screen shots
IoT challenge: How to replace game
score with IoT score?
Shift right fast
shift right
stay
shift left
shift left fast
Deep Reinforcement Learning, Q-Learning
http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind
Games w/ best Q-learning
Video Pinball
Breakout
Star Gunner
Crazy Climber
Gopher
•  Big Picture of 2016 Technology
•  Neural Net Basics
•  Deep Network Configurations for Practical Applications
–  Auto-Encoder (i.e. data compression or Principal Components
Analysis)
–  Convolutional (shift invariance in time or space for voice, image or IoT)
–  Real Time Scoring
–  Deep Net libraries and tools (Theano, Tourch, TensorFlow, ...
Kamanja)
–  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games,
IoT)
–  Continuous Space Word Models (i.e. word2vec)
Deep Learning - Outline
Continuous Space Word Models (word2vec)
•  Before (a predictive “Bag of Words” model):
–  One row per document, paragraph or web page
–  Binary word space: 10k to 200k columns, one per word or phrase
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 …. “This word space model is
….”
–  The “Bag of words model” relates input record to a target category
Continuous Space Word Models (word2vec)
•  Before (a predictive “Bag of Words” model):
–  One row per document, paragraph or web page
–  Binary word space: 10k to 200k columns, one per word or phrase
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 …. “This word space model is ….”
–  The “Bag of words model” relates input record to a target category
•  New:
–  One row per word (word2vec), possibly per sentence (sent2vec)
–  Continuous word space: 100 to 300 columns, continuous values
.01 .05 .02 .00 .00 .68 .01 .01 .35 ... .00 à “King”
.00 .00 .05 .01 .49 .52 .00 .11 .84 ... .01 à “Queen”
–  The deep net training resulted in an Emergent Property:
•  Numeric geometry location relates to concept space
•  “King” – “man” + “woman” = “Queen” (math to change gender relation)
•  “USA” – “Washington DC” + “England” = “London” (math for capital
relation)
Continuous Space Word Models (word2vec)

How to SCALE to larger vocabularies?
http://www.slideshare.net/hustwj/cikm-keynotenov2014?qid=f92c9e86-feea-41ac-a099-d086efa6fac1&v=default&b=&from_search=2
Training Continuous Space Word Models
•  How to Train These Models?
–  Raw data: “This example sentence shows the word2vec model
training.”
Training Continuous Space Word Models
•  How to Train These Models?
–  Raw data: “This example sentence shows the word2vec model
training.”
–  Training data (with target values underscored, and other words as
input)
“This example sentence shows word2vec” (prune “the”)
“example sentence shows word2vec model”
“sentence shows word2vec model training”
–  The context of the 2 to 5 prior and following words predict the
middle word
–  Deep Net model architecture, data compression to 300 continuous
nodes
•  50k binary word input vector à ... à 300 à ... à 50k word target
vector
Training Continuous Space Word Models
•  Use Pre-Trained Models https://code.google.com/p/word2vec/
–  Trained on 100 billion words from Google News
–  300 dim vectors for 3 million words and phrases
–  https://code.google.com/p/word2vec/
•  Questions on re-use:
–  What if I want to train to add client terms or docs?
–  What about stability (keeping past training) vs. placticity (learning
new content)
Training Continuous Space Word Models
http://www.slideshare.net/hustwj/cikm-keynotenov2014?qid=f92c9e86-feea-41ac-a099-d086efa6fac1&v=default&b=&from_search=2
Applying Continuous Space Word Models
http://static.googleusercontent.com/media/research.google.com/en//people/jeff/BayLearn2015.pdf
State of the art in machine translation
Sequence to Sequence Learning with neural Networks, NIPS 2014
Language translation
Document summary
Generate text captions for pictures
.01
.05
.89
.00
.05
.62
.00
.34
“Greg’s Guts” on Deep Learning
•  Some claim the need for preprocessing and knowledge
representation has ended
–  For most of the signal processing applications à yes, simplify
–  I am VERY READY TO COMPETE in other applications, continuing
•  expressing explicit domain knowledge – using lookup data for context
•  optimizing business value calculations
•  Deep Learning gets big advantages from big data
–  Why? Better populating high dimensional space combination
subsets
–  Unsupervised feature extraction reduces need for large labeled data
•  However, “regular sized data” gets a big boost as well
–  The “ratio of free parameters” (i.e. neurons) to training set records
–  For regressions or regular nets, want 5-10 times as many records
–  Regularization and weight drop out reduces this pressure
–  Especially when only training “the next auto encoding layer”
Deep Learning Summary – ITS EXCITING!
•  Discussed Deep Learning architectures
–  Auto Encoder, convolutional, reinforcement learning, continuous word
•  Real Time speed up
–  Train model, reduce complexity, retrain
–  Simplify preprocessing with lookup tables
–  Use cloud computing, do not be limited to device computing
–  Lambda architecture like Kamanja, to combine real time and batch
•  Applications
–  Fraud detection
–  Signal Data: IoT, Speech, Images
–  Control System models (like Atari game playing, IoT)
–  Language Models
https://www.quora.com/Why-is-deep-learning-in-such-demand-now
© 2016 LigaData, Inc. All Rights Reserved.
Using Deep Learning to
do Real-Time Scoring in
Practical Applications
SFbayACM Data Science Meetup Monday 1/25/2016
By Greg Makowski
www.Linkedin.com/in/GregMakowski
greg@LigaDATA.com
Community @ http://Kamanja.org 
Try out

More Related Content

What's hot

Deep Learning and Reinforcement Learning
Deep Learning and Reinforcement LearningDeep Learning and Reinforcement Learning
Deep Learning and Reinforcement Learning
Renārs Liepiņš
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile Phones
Anirudh Koul
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image Processing
Grigory Sapunov
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)
Ha Phuong
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep Learning
Asim Jalis
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)
Thomas da Silva Paula
 
Deep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles ApproachDeep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles Approach
Maurizio Calo Caligaris
 
ODSC West
ODSC WestODSC West
ODSC West
Intel Nervana
 
(BDT311) Deep Learning: Going Beyond Machine Learning
(BDT311) Deep Learning: Going Beyond Machine Learning(BDT311) Deep Learning: Going Beyond Machine Learning
(BDT311) Deep Learning: Going Beyond Machine Learning
Amazon Web Services
 
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlow
S N
 
Deep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorDeep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog Detector
Roelof Pieters
 
Tutorial on Deep Learning
Tutorial on Deep LearningTutorial on Deep Learning
Tutorial on Deep Learning
inside-BigData.com
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVec
Josh Patterson
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
Databricks
 
Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetDistributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNet
Amazon Web Services
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
Te-Yen Liu
 
H2O Deep Learning at Next.ML
H2O Deep Learning at Next.MLH2O Deep Learning at Next.ML
H2O Deep Learning at Next.ML
Sri Ambati
 
Machine Learning and Deep Learning with R
Machine Learning and Deep Learning with RMachine Learning and Deep Learning with R
Machine Learning and Deep Learning with R
Poo Kuan Hoong
 
Promises of Deep Learning
Promises of Deep LearningPromises of Deep Learning
Promises of Deep Learning
David Khosid
 
Deep learning at nmc devin jones
Deep learning at nmc devin jones Deep learning at nmc devin jones
Deep learning at nmc devin jones
Ido Shilon
 

What's hot (20)

Deep Learning and Reinforcement Learning
Deep Learning and Reinforcement LearningDeep Learning and Reinforcement Learning
Deep Learning and Reinforcement Learning
 
Squeezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile PhonesSqueezing Deep Learning Into Mobile Phones
Squeezing Deep Learning Into Mobile Phones
 
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image ProcessingDeep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image Processing
 
Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)Deep Learning And Business Models (VNITC 2015-09-13)
Deep Learning And Business Models (VNITC 2015-09-13)
 
Neural Networks and Deep Learning
Neural Networks and Deep LearningNeural Networks and Deep Learning
Neural Networks and Deep Learning
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)
 
Deep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles ApproachDeep Learning Primer: A First-Principles Approach
Deep Learning Primer: A First-Principles Approach
 
ODSC West
ODSC WestODSC West
ODSC West
 
(BDT311) Deep Learning: Going Beyond Machine Learning
(BDT311) Deep Learning: Going Beyond Machine Learning(BDT311) Deep Learning: Going Beyond Machine Learning
(BDT311) Deep Learning: Going Beyond Machine Learning
 
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlow
 
Deep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorDeep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog Detector
 
Tutorial on Deep Learning
Tutorial on Deep LearningTutorial on Deep Learning
Tutorial on Deep Learning
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVec
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
 
Distributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNetDistributed Deep Learning on AWS with Apache MXNet
Distributed Deep Learning on AWS with Apache MXNet
 
Machine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis IntroductionMachine Learning, Deep Learning and Data Analysis Introduction
Machine Learning, Deep Learning and Data Analysis Introduction
 
H2O Deep Learning at Next.ML
H2O Deep Learning at Next.MLH2O Deep Learning at Next.ML
H2O Deep Learning at Next.ML
 
Machine Learning and Deep Learning with R
Machine Learning and Deep Learning with RMachine Learning and Deep Learning with R
Machine Learning and Deep Learning with R
 
Promises of Deep Learning
Promises of Deep LearningPromises of Deep Learning
Promises of Deep Learning
 
Deep learning at nmc devin jones
Deep learning at nmc devin jones Deep learning at nmc devin jones
Deep learning at nmc devin jones
 

Similar to Using Deep Learning to do Real-Time Scoring in Practical Applications

Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Databricks
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
Julien SIMON
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Databricks
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data Platform
Shivaji Dutta
 
The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 The deep learning tour - Q1 2017
The deep learning tour - Q1 2017
Eran Shlomo
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
Subrat Panda, PhD
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Jen Aman
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Spark Summit
 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendations
Balázs Hidasi
 
Machine Learning in Action
Machine Learning in ActionMachine Learning in Action
Machine Learning in Action
Amazon Web Services
 
Amazon Deep Learning
Amazon Deep LearningAmazon Deep Learning
Amazon Deep Learning
Amanda Mackay (she/her)
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
NVIDIA Taiwan
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
PyData
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
Data Con LA
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Herman Wu
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
Ivo Andreev
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
DataStax Academy
 
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsDiscovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Wee Hyong Tok
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
Travis Oliphant
 
MXNet Workshop
MXNet WorkshopMXNet Workshop
MXNet Workshop
Amazon Web Services
 

Similar to Using Deep Learning to do Real-Time Scoring in Practical Applications (20)

Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data Platform
 
The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 The deep learning tour - Q1 2017
The deep learning tour - Q1 2017
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
 
Deep learning: the future of recommendations
Deep learning: the future of recommendationsDeep learning: the future of recommendations
Deep learning: the future of recommendations
 
Machine Learning in Action
Machine Learning in ActionMachine Learning in Action
Machine Learning in Action
 
Amazon Deep Learning
Amazon Deep LearningAmazon Deep Learning
Amazon Deep Learning
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
 
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakLearn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak
 
Explore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and SnappydataExplore big data at speed of thought with Spark 2.0 and Snappydata
Explore big data at speed of thought with Spark 2.0 and Snappydata
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
 
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
C* Summit 2013: Real-time Analytics using Cassandra, Spark and Shark by Evan ...
 
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsDiscovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
 
MXNet Workshop
MXNet WorkshopMXNet Workshop
MXNet Workshop
 

More from Greg Makowski

Understanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptxUnderstanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptx
Greg Makowski
 
Future of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxFuture of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptx
Greg Makowski
 
A Successful Hiring Process for Data Scientists
A Successful Hiring Process for Data ScientistsA Successful Hiring Process for Data Scientists
A Successful Hiring Process for Data Scientists
Greg Makowski
 
Kdd 2019: Standardizing Data Science to Help Hiring
Kdd 2019:  Standardizing Data Science to Help HiringKdd 2019:  Standardizing Data Science to Help Hiring
Kdd 2019: Standardizing Data Science to Help Hiring
Greg Makowski
 
Tales from an ip worker in consulting and software
Tales from an ip worker in consulting and softwareTales from an ip worker in consulting and software
Tales from an ip worker in consulting and software
Greg Makowski
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Greg Makowski
 
Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09
Greg Makowski
 
SFbayACM ACM Data Science Camp 2015 10 24
SFbayACM ACM Data Science Camp 2015 10 24SFbayACM ACM Data Science Camp 2015 10 24
SFbayACM ACM Data Science Camp 2015 10 24
Greg Makowski
 
How to Create 80% of a Big Data Pilot Project
How to Create 80% of a Big Data Pilot ProjectHow to Create 80% of a Big Data Pilot Project
How to Create 80% of a Big Data Pilot Project
Greg Makowski
 
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Greg Makowski
 
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning SolutionsKamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Greg Makowski
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
Greg Makowski
 
Three case studies deploying cluster analysis
Three case studies deploying cluster analysisThree case studies deploying cluster analysis
Three case studies deploying cluster analysis
Greg Makowski
 
Linked In Slides 2009 02 24 B
Linked In Slides 2009 02 24 BLinked In Slides 2009 02 24 B
Linked In Slides 2009 02 24 B
Greg Makowski
 
The 360º Leader (Section 2 of 6)
The 360º Leader (Section 2 of 6)The 360º Leader (Section 2 of 6)
The 360º Leader (Section 2 of 6)
Greg Makowski
 
The 360º Leader (Section 1 of 6)
The 360º Leader (Section 1 of 6)The 360º Leader (Section 1 of 6)
The 360º Leader (Section 1 of 6)
Greg Makowski
 

More from Greg Makowski (16)

Understanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptxUnderstanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptx
 
Future of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptxFuture of AI - 2023 07 25.pptx
Future of AI - 2023 07 25.pptx
 
A Successful Hiring Process for Data Scientists
A Successful Hiring Process for Data ScientistsA Successful Hiring Process for Data Scientists
A Successful Hiring Process for Data Scientists
 
Kdd 2019: Standardizing Data Science to Help Hiring
Kdd 2019:  Standardizing Data Science to Help HiringKdd 2019:  Standardizing Data Science to Help Hiring
Kdd 2019: Standardizing Data Science to Help Hiring
 
Tales from an ip worker in consulting and software
Tales from an ip worker in consulting and softwareTales from an ip worker in consulting and software
Tales from an ip worker in consulting and software
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
 
Production model lifecycle management 2016 09
Production model lifecycle management 2016 09Production model lifecycle management 2016 09
Production model lifecycle management 2016 09
 
SFbayACM ACM Data Science Camp 2015 10 24
SFbayACM ACM Data Science Camp 2015 10 24SFbayACM ACM Data Science Camp 2015 10 24
SFbayACM ACM Data Science Camp 2015 10 24
 
How to Create 80% of a Big Data Pilot Project
How to Create 80% of a Big Data Pilot ProjectHow to Create 80% of a Big Data Pilot Project
How to Create 80% of a Big Data Pilot Project
 
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
Powering Real­time Decision Engines in Finance and Healthcare using Open Sour...
 
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning SolutionsKamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
Three case studies deploying cluster analysis
Three case studies deploying cluster analysisThree case studies deploying cluster analysis
Three case studies deploying cluster analysis
 
Linked In Slides 2009 02 24 B
Linked In Slides 2009 02 24 BLinked In Slides 2009 02 24 B
Linked In Slides 2009 02 24 B
 
The 360º Leader (Section 2 of 6)
The 360º Leader (Section 2 of 6)The 360º Leader (Section 2 of 6)
The 360º Leader (Section 2 of 6)
 
The 360º Leader (Section 1 of 6)
The 360º Leader (Section 1 of 6)The 360º Leader (Section 1 of 6)
The 360º Leader (Section 1 of 6)
 

Recently uploaded

2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Jay Das
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 

Recently uploaded (20)

2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 

Using Deep Learning to do Real-Time Scoring in Practical Applications

  • 1. © 2016 LigaData, Inc. All Rights Reserved. Using Deep Learning to do Real-Time Scoring in Practical Applications SFbayACM Data Science SIG, Monday, 1/25/2016 By Greg Makowski www.Linkedin.com/in/GregMakowski greg@LigaDATA.com Community @ http://Kamanja.org Try out
  • 2. •  Big Picture of 2016 Technology •  Neural Net Basics •  Deep Network Configurations for Practical Applications –  Auto-Encoder (i.e. data compression or Principal Components Analysis) –  Convolutional (shift invariance in time or space for voice, image or IoT) –  Real Time Scoring and Lambda Architecture –  Deep Net libraries and tools (Theano, Tourch, TensorFlow, ... Kamanja) –  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games, IoT) –  Continuous Space Word Models (i.e. word2vec) Deep Learning - Outline
  • 4. Is Deep Learning Hype? Is this just a “buzzword of the day or year?” Is this improvement at the normal pace?
  • 5. Is Deep Learning Hype? Is this just a “buzzword of the day or year?” Is this improvement at the normal pace? NO ! Not only a buzzword This is a leap in the rate of improvement! So What? Show me…
  • 6. http://whatsnext.nuance.com/in-the-labs/what-is-deep-machine-learning/ Deep Learning Caused about an 18% / Year Reduction in Error in Speech Recognition (Nuance) not only did DNNs drive error rates down at once, … they promise a lot of poten8al for the years to come. It is no overstatement to say that DNNs were the single largest contributor to innovation across many of our products in recent years.
  • 7. http://whatsnext.nuance.com/in-the-labs/what-is-deep-machine-learning/ 2008, 09, 10, 11, 12, 13, 14, 15 Deep Learning Caused about an 18% / Year Reduction in Error in Speech Recognition (Nuance) What if Moore’s Law changed from 2X to 4x over the last 7 years, because of a new technology advance! not only did DNNs drive error rates down at once, … they promise a lot of poten8al for the years to come. It is no overstatement to say that DNNs were the single largest contributor to innovation across many of our products in recent years.
  • 8. Neural Net training is 10+ times faster on GPU’s
 The gaming market is pushing for faster GPU speeds https://jonpeddie.com/publications/ whitepapers/an-analysis-of-the-gpu- market https://developer.nvidia.com/cudnn
  • 9. •  Big Picture of 2016 Technology •  Neural Net Basics •  Deep Network Configurations for Practical Applications –  Auto-Encoder (i.e. data compression or Principal Components Analysis) –  Convolutional (shift invariance in time or space for voice, image or IoT) –  Real Time Scoring and Lambda Architecture –  Deep Net libraries and tools (Theano, Tourch, TensorFlow, ... Kamanja) –  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games, IoT) –  Continuous Space Word Models (i.e. word2vec) Deep Learning - Outline
  • 10. Advantages of a Net over Regression 10 field 1 field 2 $ c $ $ $ $ $ $ $ $ $ $ $ $ $ c c c c c c c c c c c c c c c c c c c c c c c c A Regression Solution “Linear” Fit one Line $ c Target values for a data point with source field values graphed by “field 1” and “field 2” Showing ONE target field, with values of $ or c https://en.wikipedia.org/wiki/Regression_analysis
  • 11. Advantages of a Net ! over Regression! 11 field 1 field 2 $ c $ $ $ $ $ $ $ $ $ $ $ $ $ c c c c c c c c c c c c c c c c c c c c c c c c A Neural Net Solution “Non-Linear” Several regions which are not adjacent Hidden nodes can be line or circle https://en.wikipedia.org/wiki/Artificial_neural_network
  • 12. ! A Comparison of a Neural Net! and Regression! A Logistic regression formula: Y = f( a0 + a1*X1 + a2*X2 + a3*X3) a* are coefficients Backpropagation, cast in a similar form: H1 = f(w0 + w1*I1 + w2*I2 + w3*I3) H2 = f(w4 + w5*I1 + w6*I2 + w7*I3) : Hn = f(w8 + w9*I1 + w10*I2 + w11*I3) O1 = f(w12 + w13*H1 + .... + w15*Hn) On = .... w* are weights, AKA coefficients I1..In are input nodes or input variables. H1..Hn are hidden nodes, which extract features of the data. O1..On are the outputs, which group disjoint categories. Look at ratio of training records v.s. free parameters (complexity, regularization) a0 a1 a2 a3 X1 X2 X3 Y Input 1 I2 I3 Bias H1 Hidden 2 Output w1 w2 w3 Dot product is Cosine similarity, used broadly Tensors are matrices of N dimensions
  • 13. Think of Separating Land vs. Water 13 1 line, Regression (more errors) 5 Hidden Nodes in a Neural Network Different algorithms use different Basis Functions: •  One line •  Many horizontal & vertical lines •  Many diagonal lines •  Circles Decision Tree 12 splits (more elements, Less computation) Q) What is too detailed? “Memorizing high tide boundary” and applying it at all times
  • 14. •  Big Picture of 2016 Technology •  Neural Net Basics •  Deep Network Configurations for Practical Applications –  Auto-Encoder (i.e. data compression or Principal Components Analysis) –  Convolutional (shift invariance in time or space for voice, image or IoT) –  Real Time Scoring and Lambda Architecture –  Deep Net libraries and tools (Theano, Tourch, TensorFlow, ... Kamanja) –  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games, IoT) –  Continuous Space Word Models (i.e. word2vec) Deep Learning - Outline http://deeplearning.net/ http://www.kdnuggets.com/ http://www.analyticbridge.com/
  • 15. Leading up to an Auto Encoder 
 •  Supervised Learning –  Regression (one layer, one line, one dot-product) •  50 inputs à 1 output –  Possible nets: •  256 à 120 à 1 •  256 à 120 à 5 (trees, regression, SVM & most algs are limited to 1 output) •  256 ! 120 ! 60 ! 1 (can try 2 hidden layers, 3 sets of weights) •  256 à 180 à 120 à 60 à 1 (start getting into training stability problems, with 1990’s training processes) •  Unsupervised Learning –  Clustering (traditional unsupervised): •  60 inputs (no output target); produce 1-2 new (cluster ID & distance)
  • 16. Auto Encoder (like data compression)
 Relate input to output, through compressed middle At each step of training Only train the black connections 256 256 180 Output (same as input values) Input …256 120 180 … 256 180 … … … Step 1, Train 1st Hidden Layer (Tensor) Step 2, Train 2nd Hidden Layer (Tensor) Called “Auto Encoder” because input values = target values Unsupervised, there are no additional target values “Data Compression” because Compress 256 numbers into 180 numbers
  • 17. Auto Encoder (like data compression)
 Relate input to output, through compressed middle •  Supervised Learning –  Regression, Tree or Net: 50 inputs à 1 output –  Possible nets: •  256 à 120 à 1 •  256 à 120 à 5 (trees, regressions, SVD and most are limited to 1 output) •  256 à 120 à 60 à 1 •  256 à 180 à 120 à 60 à 1 •  Unsupervised Learning –  Clustering (traditional unsupervised): •  60 inputs (no target); produce 1-2 new (cluster ID & distance) –  Unsupervised training of a net, assign (target record == input record) AUTO- ENCODING –  Train net in stages, •  256 à 180 à 256 à 120 à à 120 à à 120 à •  Add supervised layer to forecast 10 target categories à 10 Because of symmetry, Only need to update mirrored weights once (start getting long training times to stabilize, or may not finish, The BREAKTHROUGH provided by DEEP LEARNING) 4 hidden layers w/ unsupervised training 1 layer at end w/ supervised training https://en.wikipedia.org/wiki/Deep_learning
  • 18. Auto Encoder (like data compression)
 With Supervised Layers on Top Unsupervised output Like cluster output, Only large values are a match (not distance) Train Supervised Layers on Top Regular Back Propagation Using unsupervised nodes as input 256 180 120 120 : Target specific to the problem Fraud risk * $ Cat, dog, human, other 256 180 120 120 : 50 1, 2, 10 or…
  • 19. Auto Encoder
 How it can be generally used to solve problems •  Add supervised layer to forecast 10 target categories –  4 hidden layers trained with unuspervised training, –  1 new layer, trained with supervised learning à 10 •  Outlier detection •  The “activation” at each of the 120 output nodes indicates the “match” to that cluster or compressed feature •  When scoring new records, can detect outliers with a process like If ( max_output_match < 0.333) then suspected outlier •  How is it like PCA? –  Individual hidden nodes in the same layer are “different” or “orthogonal”
  • 20. Fraud Detection Example using
 Deep Learning – auto encoders •  Unsupervised Learning of Normal Behavior(Outlier Detection) –  May want to preprocess transaction data - in the context of the person’s past normal behavior •  0..1, where 1 is the most SURPRISING for that person to act •  0..1, where 1 is the most RISKY of fraud •  General, descriptive attributes that can be used for interactions •  Filter out from the training data – the most surprising & risky •  Want to the net to learn “normal” records –  Train 5-10 layers deep, end up with 50 to 100+ nodes at end –  Score records on membership in final nodes •  Transactions that are far from all final nodes are candidates for outliers •  Validate with existing surprising & risky. Add application post-processing •  Supervised Learning –  Add two layers on top, train to predict normal vs. surprising/risky labeled data (if it is available)
  • 21. •  Big Picture of 2016 Technology •  Neural Net Basics •  Deep Network Configurations for Practical Applications –  Auto-Encoder (i.e. data compression or Principal Components Analysis) –  Convolutional (shift invariance in time or space for voice, image or IoT) –  Real Time Scoring and Lambda Architecture –  Deep Net libraries and tools (Theano, Tourch, TensorFlow, ... Kamanja) –  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games, IoT) –  Continuous Space Word Models (i.e. word2vec) Deep Learning - Outline
  • 22. Internet of Things (IoT) is heavily signal data http://www.datasciencecentral.com/profiles/blogs/the-internet-of-things-data-science-and-big-data
  • 23. Convolutional Neural Net (CNN)
 Enables detecting shift invariant patterns In Speech and Image applications, patterns vary by size, can be shifted right or left Challenge: finding a bounding box for a pattern is almost as hard as detecting the pat. Neural Nets can be explicitly trained to provide a FFT (Fast Fourier Transform) to convert data from time domain to the frequency domain – but typically an explicit FFT is used Internet of Things Signal Data
  • 24. Convolutional Neural Net (CNN)
 Enables detecting shift invariant patterns In Speech and Image applications, patterns vary by size, can be shifted right or left Challenge: finding a bounding box for a pattern is almost as hard as detecting the pat. Solution: use a siding convolution to detect the pattern CNN can use very long observational windows, up to 400 ms, long context
  • 25. Convolution – Shift Horizontal •  SAME 25 WEIGHTS FEED INTO EACH OUTPUT •  Backpropaga8on weight update is averaged •  Otherwise NO convolu8on and HUGE complexity! Max pooling Layer output = 1.2
  • 27. Convolution – Shift Horizontal & Vertical Max pooling Layer output = 0.8
  • 28. Convolution – 3 Weight Patterns, Shifted 2D Hidden Layer 1 Output Sections Per Convolution 3(10x3) – detection layer Hidden Layer 1 Weights Per Conv. Pattern Input pixels, audio, video or IoT signal (14 x 7). Convolutions can be over 3+ dimensions (video frames, time invariance) Max pooling layer output = 0.8 Max pooling layer output = 1.0 Max pooling layer output = 0.9
  • 29. Convolution Neural Net (CNN)
 Same Low Level Features can support different output http://stats.stackexchange.com/questions/146413/why-convolutional-neural-networks-belong-to-deep-learning Previous Slides Showed Training this Hidden 1 Layer Same Training process for later hidden layers, one at a time Think of fraud detection higher level node patterns
  • 30. Convolution Neural Net: 
 from LeNet-5 Gradient-Based Learning Applied to Document Recognition Proceedings of the IEEE, Nov 1998 Yann LeCun, Leon Bottou, Yoshua Bengio and Patrick Haffner Director Facebook, AI Research http://yann.lecun.com/Can do some size invariance, but it adds to the layers
  • 31.
  • 32. Convolution Neural Net (CNN) •  How is a CNN trained differently than a typical back propagation (BP) network? –  Parts of the training which is the same: •  Present input record •  Forward pass through the network •  Back propagate error (i.e. per epoch) –  Different parts of training: •  Some connections are CONSTRAINED to the same value –  The connections for the same pattern, sliding over all input space •  Error updates are averaged and applied equally to the one set of weight values •  End up with the same pattern detector feeding many nodes at the next level http://www.cs.toronto.edu/~rgrosse/icml09-cdbn.pdf Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations, 2009
  • 33. The Mammalian Visual Cortex is Hierarchical
 (The Brain is a Deep Neural Net - Yann LeCun) http://www.pamitc.org/cvpr15/files/lecun-20150610-cvpr-keynote.pdf 0 1 2 3 4 5 6 7 8 9 1011
  • 34. Convolution Neural Net (CNN)
 Facebook example https://gigaom.com/2014/03/18/facebook-shows-off-its-deep-learning-skills-with-deepface/
  • 35. Convolution Neural Net (CNN)
 Yahoo + Stanford example – find a face in a pic, even upside down http://www.dailymail.co.uk/sciencetech/article-2958597/Facial-recognition-breakthrough-Deep-Dense-software-spots-faces-images-partially-hidden-UPSIDE-DOWN.html
  • 36. Convolutional Neural Nets (CNN)
 Robotic Grasp Detection (IoT) http://pjreddie.com/media/files/papers/grasp_detection_1.pdf
  • 37. •  Big Picture of 2016 Technology •  Neural Net Basics •  Deep Network Configurations for Practical Applications –  Auto-Encoder (i.e. data compression or Principal Components Analysis) –  Convolutional (shift invariance in time or space for voice, image or IoT) –  Real Time Scoring and Lambda Architecture –  Deep Net libraries and tools (Theano, Tourch, TensorFlow, ... Kamanja) –  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games, IoT) –  Continuous Space Word Models (i.e. word2vec) Deep Learning - Outline
  • 38. Real Time Scoring
 Neural Net Optimizations •  Auto-Encoding nets –  Can grow to millions of connections, and start to get computational –  Can reduce connections by 5% to 25+% with pruning & retraining •  Train with increased regularization settings •  Drop connections with near zero weights, then retrain •  Drop nodes with fan in connections which don’t get used much later, such as in your predictive problem •  Perform sensitivity analysis – delete possible input fields •  Convolutional Neural Nets –  With large enough data, can even skip the FFT preprocessing step –  Can use wider than 10ms audio sampling rates for speed up •  Implement other preprocessing as lookup tables (i.e. Bayesian Priors) •  Use cloud computing, do not limit to device computing •  Large models don’t fit à use model or data parallelism to train
  • 39. © 2016 LigaData, Inc. All Rights Reserved. 39 Real Time Scoring – Enterprise App Architecture uses
 Lambda Architecture – for both Batch and Real Time •  First architecture to really define how batch and stream processing can work together •  Founded on the concepts of immutability and re-computation, with human fault tolerance •  Pre-computes the results of batch & real-time processes as a set of views, & query layer merges the views https://en.wikipedia.org/wiki/Lambda_architecture Speed Layer Batch Layer Query Master Data Batch Processing Speed Processing Real-time AnalyticsSpeed Views Batch Views Merged ViewsNew Data B New Data A
  • 40. © 2016 LigaData, Inc. All Rights Reserved. 40 HDFS Spark Map Reduce Spark
 Streaming Storm Real Time Scoring
 Lambda Architecture With Kamanja Decisioning Layer Batch Layer Query Master Data Batch Processing Real-time Analytics Action Queue Serving Layer Speed Views Batch Views Merged Views Kafka MQ Files AVRO/ Flume Continuous Decisioning Cassandra HBase Druid PMML, Java, Scala, Python, Deep Learning Kafka MQ All New Data Speed Layer Speed Processing Decisioning Data Continuous Feedback Model Data Elephant DB Impala
  • 41. © 2016 LigaData, Inc. All Rights Reserved. 41 Kamanja Technology Stack Compute Fabric Cloud, EC2 Internal Cloud Security Kerberos Real Time Streaming Kafka, MQ Spark* LigaData Data Store HBase, Cassandra, InfluxDB HDFS (Create adaptors to integrate others) Resource Management Zookeeper, Yarn* High Level Languages / Abstractions PMML Producers, MLlib Real Time Computing Kamanja PMML, Java, Scala
  • 43. Deep Learning Tools By Google, 600 DL proj Speech Google Photos Translation Gmail Search Rajat Monga, Tech Lead & Manager for TensorFlow
  • 46. •  Big Picture of 2016 Technology •  Neural Net Basics •  Deep Network Configurations for Practical Applications –  Auto-Encoder (i.e. data compression or Principal Components Analysis) –  Convolutional (shift invariance in time or space for voice, image or IoT) –  Real Time Scoring and Lambda Architecture –  Deep Net libraries and tools (Theano, Tourch, TensorFlow, ... Kamanja) –  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games, IoT) –  Continuous Space Word Models (i.e. word2vec) Deep Learning - Outline
  • 47. Reinforcement Learning (RL) •  Different than supervised and unsupervised learning •  Q) Can the network figure out hot to take one or more actions NOW, to achieve a reward or payout (potentially far-off, i.e. T steps in the FUTURE? •  Need to solve the credit assignment problem –  There is no teacher and very little labeled data –  Need to learn the best POLICY that will achieve the best outcome –  Assume no knowledge of the process model or reward function •  Next guess = –  Linear combination of ((current guess) and (the new reward info just collected)), weighted by the learning rate http://www.humphreysheil.com/blog/gorila-google-reinforcement-learning-architecture http://robotics.ai.uiuc.edu/~scandido/?Developing_Reinforcement_Learning_from_the_Bellman_Equation
  • 48. Deep Reinforcement Learning (RL), Q-Learning http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind https://en.wikipedia.org/wiki/Reinforcement_learning https://en.wikipedia.org/wiki/Q-learning Think in terms of IoT…. Device agent measures, infers user’s action Maximizes future reward, recommends to user or system
  • 49. Deep Reinforcement Learning, Q-Learning
 (Think about IoT possibilities) http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind Use last 4 screen shots
  • 50. Deep Reinforcement Learning, Q-Learning http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind Use 4 screen shots Use 4 screen shots IoT challenge: How to replace game score with IoT score? Shift right fast shift right stay shift left shift left fast
  • 51. Deep Reinforcement Learning, Q-Learning http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind Games w/ best Q-learning Video Pinball Breakout Star Gunner Crazy Climber Gopher
  • 52. •  Big Picture of 2016 Technology •  Neural Net Basics •  Deep Network Configurations for Practical Applications –  Auto-Encoder (i.e. data compression or Principal Components Analysis) –  Convolutional (shift invariance in time or space for voice, image or IoT) –  Real Time Scoring –  Deep Net libraries and tools (Theano, Tourch, TensorFlow, ... Kamanja) –  Reinforcement Learning, Q-Learning (i.e. beat people at Atari games, IoT) –  Continuous Space Word Models (i.e. word2vec) Deep Learning - Outline
  • 53. Continuous Space Word Models (word2vec) •  Before (a predictive “Bag of Words” model): –  One row per document, paragraph or web page –  Binary word space: 10k to 200k columns, one per word or phrase 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 …. “This word space model is ….” –  The “Bag of words model” relates input record to a target category
  • 54. Continuous Space Word Models (word2vec) •  Before (a predictive “Bag of Words” model): –  One row per document, paragraph or web page –  Binary word space: 10k to 200k columns, one per word or phrase 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 …. “This word space model is ….” –  The “Bag of words model” relates input record to a target category •  New: –  One row per word (word2vec), possibly per sentence (sent2vec) –  Continuous word space: 100 to 300 columns, continuous values .01 .05 .02 .00 .00 .68 .01 .01 .35 ... .00 à “King” .00 .00 .05 .01 .49 .52 .00 .11 .84 ... .01 à “Queen” –  The deep net training resulted in an Emergent Property: •  Numeric geometry location relates to concept space •  “King” – “man” + “woman” = “Queen” (math to change gender relation) •  “USA” – “Washington DC” + “England” = “London” (math for capital relation)
  • 55. Continuous Space Word Models (word2vec)
 How to SCALE to larger vocabularies? http://www.slideshare.net/hustwj/cikm-keynotenov2014?qid=f92c9e86-feea-41ac-a099-d086efa6fac1&v=default&b=&from_search=2
  • 56. Training Continuous Space Word Models •  How to Train These Models? –  Raw data: “This example sentence shows the word2vec model training.”
  • 57. Training Continuous Space Word Models •  How to Train These Models? –  Raw data: “This example sentence shows the word2vec model training.” –  Training data (with target values underscored, and other words as input) “This example sentence shows word2vec” (prune “the”) “example sentence shows word2vec model” “sentence shows word2vec model training” –  The context of the 2 to 5 prior and following words predict the middle word –  Deep Net model architecture, data compression to 300 continuous nodes •  50k binary word input vector à ... à 300 à ... à 50k word target vector
  • 58. Training Continuous Space Word Models •  Use Pre-Trained Models https://code.google.com/p/word2vec/ –  Trained on 100 billion words from Google News –  300 dim vectors for 3 million words and phrases –  https://code.google.com/p/word2vec/ •  Questions on re-use: –  What if I want to train to add client terms or docs? –  What about stability (keeping past training) vs. placticity (learning new content)
  • 59. Training Continuous Space Word Models http://www.slideshare.net/hustwj/cikm-keynotenov2014?qid=f92c9e86-feea-41ac-a099-d086efa6fac1&v=default&b=&from_search=2
  • 60. Applying Continuous Space Word Models http://static.googleusercontent.com/media/research.google.com/en//people/jeff/BayLearn2015.pdf State of the art in machine translation Sequence to Sequence Learning with neural Networks, NIPS 2014 Language translation Document summary Generate text captions for pictures .01 .05 .89 .00 .05 .62 .00 .34
  • 61. “Greg’s Guts” on Deep Learning •  Some claim the need for preprocessing and knowledge representation has ended –  For most of the signal processing applications à yes, simplify –  I am VERY READY TO COMPETE in other applications, continuing •  expressing explicit domain knowledge – using lookup data for context •  optimizing business value calculations •  Deep Learning gets big advantages from big data –  Why? Better populating high dimensional space combination subsets –  Unsupervised feature extraction reduces need for large labeled data •  However, “regular sized data” gets a big boost as well –  The “ratio of free parameters” (i.e. neurons) to training set records –  For regressions or regular nets, want 5-10 times as many records –  Regularization and weight drop out reduces this pressure –  Especially when only training “the next auto encoding layer”
  • 62. Deep Learning Summary – ITS EXCITING! •  Discussed Deep Learning architectures –  Auto Encoder, convolutional, reinforcement learning, continuous word •  Real Time speed up –  Train model, reduce complexity, retrain –  Simplify preprocessing with lookup tables –  Use cloud computing, do not be limited to device computing –  Lambda architecture like Kamanja, to combine real time and batch •  Applications –  Fraud detection –  Signal Data: IoT, Speech, Images –  Control System models (like Atari game playing, IoT) –  Language Models https://www.quora.com/Why-is-deep-learning-in-such-demand-now
  • 63. © 2016 LigaData, Inc. All Rights Reserved. Using Deep Learning to do Real-Time Scoring in Practical Applications SFbayACM Data Science Meetup Monday 1/25/2016 By Greg Makowski www.Linkedin.com/in/GregMakowski greg@LigaDATA.com Community @ http://Kamanja.org Try out