SlideShare a Scribd company logo
1 of 48
Deep Learning: Evolution of ML 
from Statistical to Brain-like 
Computing 
Dr. Dobbs Conference Keynote 
20th Nov 2014, Bangalore. 
Dr. Vijay Srinivas Agneeswaran, 
Director, Big-data Labs, 
Impetus
Contents 
Introduction to Artificial Neural Networks 
Deep learning networks 
Towards deep learning 
From ANNs to DLNs. 
Basics of DLNs. 
Related Approaches. 
Distributed DLNs: Challenges 
Introduction to Spark 
Distributed DLNs over Spark 
Copyright @Impetus Technologies, 2014
Deep Learning: Evolution Timeline 
Copyright @Impetus Technologies, 2014
Introduction to Artificial Neural Networks (ANNs) 
Perceptron 
Copyright @Impetus Technologies, 
2014
Introduction to Artificial Neural Networks (ANNs) 
Sigmoid Neuron 
• Small change in input = small change in behaviour. 
• Output of a sigmoid neuron is given below: 
• Small change in input = small change in behaviour. 
• Output of a sigmoid neuron is given below: 
Copyright @Impetus Technologies,
Introduction to Artificial Neural Networks 
(ANNs): Back Propagation 
What is this? 
NAND Gate! 
http://zerkpage.tripod.com/ann.htm 
Copyright @Impetus Technologies, 
2014 
initialize network weights (often small random values) 
do forEach training example ex 
prediction = neural-net-output(network, ex) // forward pass 
actual = teacher-output(ex) 
compute error (prediction - actual) at the output units 
compute delta(wh)for all weights from hidden layer to output layer // 
backward pass 
compute delta(wi) for all weights from input layer to hidden layer 
// backward pass continued 
update network weights until all examples classified correctly or 
another stopping criterion satisfied 
return the network
The network to identify the individual digits 
from the input image 
http://neuralnetworksanddeeplearning.com/chap1.html 
Copyright @Impetus Technologies, 2014
Different Shallow Architectures 
Weighted 
Sum 
Weighted 
Sum 
Weighted 
Sum 
Template 
matchers 
Fixed Basis 
Functions 
Simple 
Trainable Basis 
Functions 
Linear predictor Kernel Machines ANN, Radial Basis Functions 
Y. Bengio and Y. LeCun, "Scaling learning algorithms towards AI," in Large Scale Kernel Machines, (L. 
Bottou, O. Chapelle, D. DeCoste, and J. Weston, eds.), MIT Press, 2007. 
Copyright @Impetus Technologies, 2014
ANNs for Face Recognition? 
Copyright @Impetus Technologies, 2014
DLN for Face Recognition 
http://theanalyticsstore.com/deep-learning/ 
Copyright @Impetus Technologies, 2014
Deep Learning Networks: Learning 
No general 
learning 
algorithm (No-free- 
lunch 
theorem by 
Wolpert 1996). 
Learning 
algorithm 
for specific 
tasks – 
perception, 
control, 
prediction, 
planning, 
reasoning, 
language 
understand 
ing. 
Copyright @Impetus Technologies, 2014 
Limitations 
of BP – 
local 
minima, 
optimization 
challenges 
for non-convex 
objective 
functions. 
Hinton’s 
deep belief 
networks as 
stack of 
RBMs. 
Lecun’s 
energy 
based 
learning for 
DBNs.
Deep Belief Networks 
• This is a deep neural network 
composed of multiple layers of 
latent variables (hidden units or 
feature detectors) 
• Can be viewed as a stack of 
RBMs 
• Hinton along with his student 
proposed that these networks 
can be trained greedily one 
layer at a time 
• Boltzmann Machine is a 
specific energy model with 
linear energy function. 
http://www.iro.umontreal.ca/~lisa/twiki/pub/Public/DeepBeliefNetworks/DBNs.png 
Copyright @Impetus Technologies, 2014
Other DL Networks: Auto Encoders (Auto-associators 
or Diabolo Network) 
Copyright @Impetus Technologies, 2014 
• Aim of auto encoders network is to 
learn a compressed representation for 
set of data 
• Is an unsupervised learning algorithm 
that applies back propagation, setting 
the target values equal to inputs 
(identity function) 
• Denoising auto encoder addresses 
identity function by randomly corrupting 
input that the auto encoder must then 
reconstruct or denoise 
• Best applied when there is structure in 
the data 
• Applications : Dimensionality reduction, 
feature selection
Why Deep Learning Networks are Brain-like? 
Statistical 
approach of 
traditional ML – 
SVMs or kernel 
approaches. 
• Not applicable in 
deep learning 
networks. 
Human 
brain – 
trophic 
factors 
Traditional ML – lot of 
data munging, 
representational 
issues (feature 
abstractor), before 
classifier can kick in. 
Deep learning – 
allows the 
system to learn 
representations 
as well 
naturally.
Copyright @Impetus Technologies, 
2014 
Success stories of DLNs 
Android voice 
recognition system – 
based on DLNs 
Improves accuracy by 
25% compared to state-of- 
art 
Microsoft Skype Translate software 
and Digital assistant Cortana 
1.2 million images, 1000 
classes (ImageNet Data) 
– error rate of 15.3%, 
better than state of art at 
26.1%
Success stories of DLNs….. 
Senna system – PoS tagging, chunking, NER, 
semantic role labeling, syntactic parsing 
Comparable F1 score with state-of-art with huge speed 
advantage (5 days VS few hours). 
DLNs VS TF-IDF: 1 million 
documents, relevance search. 
3.2ms VS 1.2s. 
Copyright @Impetus Technologies, 2014 
Robot navigation
Potential Applications of DLNs 
Speech recognition/enhancement 
Video sequencing 
Emotion recognition (video/audio), 
Malware detection, 
Robotics – navigation. 
multi-modal learning (text and image). 
Natural Language Processing 
Copyright @Impetus Technologies, 2014
Available resources 
• Deeplearning4j – open source 
implementation of Jeffery Dean’s 
distributed deep learning paper. 
• Theano: python library of math functions. 
• Efficient use of GPUs transparently. 
• Hinton’ courses on Coursera: 
https://www.coursera.org/instructor/~154 
Copyright @Impetus Technologies, 2014
Challenges in Realizing DLNs 
Large no. of training 
examples – high 
accuracy. 
• Large no. of 
parameters can also 
improve accuracy. 
Inherently sequential 
nature – freeze up 
one layer for learning. 
Copyright @Impetus Technologies, 2014 
GPUs to improve 
training speedup 
• Limitations – 
CPU_to_GPU data 
transfers. 
Distributed DLNs – 
Jeffrey Dean’s work.
Distributed DLNs 
• Motivation 
• Scalable, low latency training 
• Parallelize training data and learn fast 
• Jeffrey Dean’s work DistBelief 
• Pseudo-centralized realization 
Copyright @Impetus Technologies, 2014
What is Spark? 
21 
Spark provides a 
computing 
abstraction that 
generalizes Map- 
Reduce. 
More powerful set 
of operations than 
just map and 
reduce – group by, 
order by, sort, 
reduce by key, 
sample, union, etc. 
Provides efficient 
execution 
environment 
based on 
distributed shared 
memory – keep 
working set of data 
in memory. 
Shark provides 
Hive Query 
Language (HQL) 
interface over 
Spark
What is Spark? Data Flow in Hadoop 
22
What is Spark? Data Flow in Spark 
23
Real world use-case example: HITS algorithm 
The Hub score and Authority score for a node is calculated with the following algorithm: 
 Start with each node having a hub score and authority score of 1 i.e. auth(p) = 1 and 
hub(p) = 1 
 Run the Authority Update Rule: Update each node's Authority score to be equal 
to the sum of the Hub Scores of each node that points to it. That is, a node is given 
a high authority score by being linked to by pages that are recognized as Hubs for 
information. 
 Run the Hub Update Rule: Update each node's Hub Score to be equal to the sum 
of the Authority Scores of each node that it points to. That is, a node is given a high 
hub score by linking to nodes that are considered to be authorities on the subject. 
 Normalize the values by dividing each Hub score by square root of the sum of the 
squares of all Hub scores, and dividing each Authority score by square root of the 
sum of the squares of all Authority scores. 
 Repeat from the second step as necessary. 
24
Solve HITS algorithm using Hadoop MR 
HDFS 
Storag 
e 
Step 1 : auth(p) = 1 and 
hub(p) = 1 
Step 2 : Run Authority Update 
Rule auth(p) = X 
Step 3 : Run Hub Update Rule 
hub(p) = Y 
Step 4 : Normalize hub(p) and 
auth(p) Write 
Read 
25 Flow
Solve HITS algorithm using Spark 
HDFS 
Storag 
e 
Step 1 : auth(p) = 1 and 
hub(p) = 1 
Step 2 : Run Authority Update 
Rule auth(p) = X 
Step 3 : Run Hub Update Rule 
hub(p) = Y 
Step 4 : Normalize hub(p) and 
auth(p) 
Write 
Read 
26 Flow
Spark 
Transformations/Actions Description 
Map(function f1) Pass each element of the RDD through f1 in parallel and return the resulting RDD. 
Filter(function f2) Select elements of RDD that return true when passed through f2. 
flatMap(function f3) Similar to Map, but f3 returns a sequence to facilitate mapping single input to multiple 
outputs. 
Union(RDD r1) Returns result of union of the RDD r1 with the self. 
Sample(flag, p, seed) Returns a randomly sampled (with seed) p percentage of the RDD. 
groupByKey(noTasks) Can only be invoked on key-value paired data – returns data grouped by value. No. of 
parallel tasks is given as an argument (default is 8). 
reduceByKey(function f4, 
noTasks) 
Aggregates result of applying f4 on elements with same key. No. of parallel tasks is the 
second argument. 
Join(RDD r2, noTasks) Joins RDD r2 with self – computes all possible pairs for given key. 
groupWith(RDD r3, 
Joins RDD r3 with self and groups by key. 
noTasks) 
sortByKey(flag) Sorts the self RDD in ascending or descending based on flag. 
Reduce(function f5) Aggregates result of applying function f5 on all elements of self RDD 
Collect() Return all elements of the RDD as an array. 
Count() Count no. of elements in RDD 
take(n) Get first n elements of RDD. 
First() Equivalent to take(1) 
saveAsTextFile(path) Persists RDD in a file in HDFS or other Hadoop supported file system at given path. 
saveAsSequenceFile(path 
) 
Persist RDD as a Hadoop sequence file. Can be invoked only on key-value paired RDDs 
that implement Hadoop writable interface or equivalent. 
foreach(function f6) Run f6 in parallel on elements of self RDD. 
[MZ12] Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael 
J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: a fault-tolerant abstraction for in-memory 
cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and 
Implementation (NSDI'12). USENIX Association, Berkeley, CA, USA, 2-2.
Berkeley Big-data Analytics Stack (BDAS) 
28
Spark: Use Cases 
29 
Ooyala 
Uses Cassandra for 
video data 
personalization. 
Pre-compute 
aggregates VS on-the- 
fly queries. 
Moved to Spark for 
ML and computing 
views. 
Moved to Shark for on-the-fly 
queries – C* OLAP aggregate 
queries on Cassandra 130 secs, 60 
ms in Spark 
Conviva 
Uses Hive for 
repeatedly running 
ad-hoc queries on 
video data. 
Optimized ad-hoc 
queries using Spark 
RDDs – found Spark 
is 30 times faster 
than Hive 
ML for connection 
analysis and video 
streaming 
optimization. 
Yahoo 
Advertisement 
targeting: 30K nodes 
on Hadoop Yarn 
Hadoop – batch processing 
Spark – iterative processing 
Storm – on-the-fly processing 
Content 
recommendation – 
collaborative 
filtering
30 
Spark Use Cases: Spark is good for linear algebra, optimization and 
N-body problems. 
Computations/Operations 
Giant 1 (simple stats) is perfect 
for Hadoop 1.0. 
Giants 2 (linear algebra), 3 (N-body), 
4 (optimization) Spark 
from UC Berkeley is efficient. 
Logistic regression, kernel SVMs, 
conjugate gradient descent, 
collaborative filtering, Gibbs 
sampling, alternating least squares. 
Example is social group-first 
approach for consumer churn 
analysis [2] 
Interactive/On-the-fly data 
processing – Storm. 
OLAP – data cube operations. 
Dremel/Drill 
Data sets – not embarrassingly 
parallel? 
Deep Learning 
Artificial Neural Networks/Deep 
Belief Networks 
Machine vision from Google [3] 
Speech analysis from Microsoft 
Giant 5 – Graph processing – 
GraphLab, Pregel, Giraph 
[1] National Research Council. Frontiers in Massive Data Analysis . Washington, DC: The National Academies Press, 2013. 
[2] Richter, Yossi ; Yom-Tov, Elad ; Slonim, Noam: Predicting Customer Churn in Mobile Networks through Analysis of Social 
Groups. In: Proceedings of SIAM International Conference on Data Mining, 2010, S. 732-741 
[3] Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc'Aurelio 
Ranzato, Andrew W. Senior, Paul A. Tucker, Ke Yang, Andrew Y. Ng: Large Scale Distributed Deep Networks. NIPS 2012:
Some Spark(ling) examples 
Scala code (serial) 
var count = 0 
for (i <- 1 to 100000) 
{ val x = Math.random * 2 - 1 
val y = Math.random * 2 - 1 
if (x*x + y*y < 1) count += 1 } 
println("Pi is roughly " + 4 * count / 100000.0) 
Sample random point on unit circle – count how many are inside them (roughly about PI/4). 
Hence, u get approximate value for PI. 
Based on the PS/PC = AS/AC=4/PI, so PI = 4 * (PC/PS).
Some Spark(ling) examples 
Spark code (parallel) 
val spark = new SparkContext(<Mesos master>) 
var count = spark.accumulator(0) 
for (i <- spark.parallelize(1 to 100000, 12)) 
{ val x = Math.random * 2 – 1 
val y = Math.random * 2 - 1 
if (x*x + y*y < 1) count += 1 } 
println("Pi is roughly " + 4 * count / 100000.0) 
Notable points: 
1. Spark context created – talks to Mesos1 master. 
2. Count becomes shared variable – accumulator. 
3. For loop is an RDD – breaks scala range object (1 to 100000) into 12 slices. 
4. Parallelize method invokes foreach method of RDD. 
1 Mesos is an Apache incubated clustering system – http://mesosproject.org
Logistic Regression in Spark: Serial Code 
// Read data file and convert it into Point objects 
val lines = scala.io.Source.fromFile("data.txt").getLines() 
val points = lines.map(x => parsePoint(x)) 
// Run logistic regression 
var w = Vector.random(D) 
for (i <- 1 to ITERATIONS) { 
val gradient = Vector.zeros(D) 
for (p <- points) { 
val scale = (1/(1+Math.exp(-p.y*(w dot p.x)))-1)*p.y 
gradient += scale * p.x 
} 
w -= gradient 
} 
println("Result: " + w)
Logistic Regression in Spark 
// Read data file and transform it into Point objects 
val spark = new SparkContext(<Mesos master>) 
val lines = spark.hdfsTextFile("hdfs://.../data.txt") 
val points = lines.map(x => parsePoint(x)).cache() 
// Run logistic regression 
var w = Vector.random(D) 
for (i <- 1 to ITERATIONS) { 
val gradient = spark.accumulator(Vector.zeros(D)) 
for (p <- points) { 
val scale = (1/(1+Math.exp(-p.y*(w dot p.x)))-1)*p.y 
gradient += scale * p.x 
} 
w -= gradient.value 
} 
println("Result: " + w)
Deep Learning on 
SparkFully Distributed Deep 
learning network 
implementation on 
Spark. 
Spark would handle 
the parallelism, 
synchronization, 
distribution, and fail 
over. 
The input data set in 
HDFS, intermediate 
data in local file 
system 
Publish/subscribe 
message passing 
framework built on top 
of Apache Spark using 
Akka Framework.
Conclusions 
• ANN to Distributed Deep Learning 
• Key ideas in deep learning 
• Need for distributed realizations. 
• DistBelief, deeplearning4j etc. 
• Our work on large scale distributed deep learning 
• Deep learning leads us from statistics based 
machine learning towards brain inspired AI. 
Copyright @Impetus Technologies, 2014
Thank You! 
Mail • vijay.sa@impetus.co.in 
LinkedIn • http://in.linkedin.com/in/vijaysrinivasagneeswaran 
Blogs • blogs.impetus.com 
Twitter • @a_vijaysrinivas. 
Copyright @Impetus Technologies, 2014
Backup Slides 
Copyright @Impetus 
Technologies, 2014
Copyright @Impetus Technologies, 
2014 
Energy Based Models 
http://www.cs.nyu.edu/~yann/research/ebm/loss-func. 
png 
• RBM are Energy Based Models (EBM) 
• EBM associate an energy with every configuration of a 
system 
• Learning corresponds to modifying the shape of energy 
function, so that it has desirable properties 
• Like in physics, lower energy = more stability 
• So, modify shape of energy function such that the 
desirable configurations have lower energy
Other DL networks: 
Convolutional Networks 
Yann LeCun, Patrick Haffner, Léon Bottou, and Yoshua Bengio. 1999. Object Recognition with Gradient-Based 
Learning. In Shape, Contour and Grouping in Computer Vision, David A. Forsyth, Joseph L. Mundy, Vito Di 
Gesù, and Roberto Cipolla (Eds.). Springer-Verlag, London, UK, UK, 319-. 
Copyright @Impetus Technologies, 2014
Other Brain-like Approaches 
• Recurrent Neural networks 
• Long Short Term Memory (LSTM), Temporal 
data 
• Sum-product networks 
• Deep architectures of sum-product networks 
• Hierarchical temporal memory 
• online structural and algorithmic model of 
neocortex. 
Copyright @Impetus Technologies, 2014
Recurrent Neural Networks 
• Connections between units form a Directed 
cycle i.e. a typical feed back connections 
• RNNs can use their internal memory to process 
arbitrary sequences of inputs 
• RNNs cannot learn to look far back past 
• LSTM solve this problem by introducing stem 
cells 
• These stem cells can remember a value for an 
arbitrary amount of time 
Copyright @Impetus Technologies, 2014
Sum-Product Networks (SPN) 
• SPN is deep network model and is a directed 
acyclic graph 
• These networks allow to compute the 
probability of an event quickly 
• SPNs try to convert multi linear functions to 
ones in computationally short forms i.e. it must 
consist of multiple additions and multiplications 
• Leaves correspond to variables and nodes 
correspond to sums and products 
Copyright @Impetus Technologies, 2014
Hierarchical Temporal Memory 
• Is a online machine learning model developed by 
Jeff Hawkins 
• This model learns one instance at a time 
• Best explained by online stock model. Today’s 
situation of stock helps in prediction of tomorrow’s 
stock 
• A HTM network is tree shaped hierarchy of levels 
• Higher hierarchy levels can use patterns learned at 
lower levels. This is adopted from learning model 
adopted by brain in the form of neo cortex 
Copyright @Impetus Technologies, 2014
http://en.wikipedia.org/wiki/Hierarchical_temporal_memory 
Copyright @Impetus Technologies, 2014
Mathematical Equations 
• The Energy Function is defined as follows: 
퐸 푥, ℎ = −푏′푥 − 푐′ℎ − ℎ′푊푥 
where, W represents the 
weights connecting 
visible layer and hidden 
layer. 
b’ and c’ are the biases 
Copyright @Impetus Technologies, 2014
Learning Energy Based Models 
• Energy based models can be learnt by performing gradient 
descent on negative log-likelihood of training data 
• It has the following form: 
− 
휕 log 푝 푥 
휕θ 
= 
휕 퐹 푥 
휕θ 
− 
푥̃ 
푝 푥 
휕 퐹 푥 
휕θ 
Positive phaseNegative phase 
Copyright @Impetus Technologies, 2014

More Related Content

What's hot

Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
Domino Data Lab
 
Proof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesProof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-series
DataWorks Summit
 
Real-time Energy Data Analytics with Storm
Real-time Energy Data Analytics with StormReal-time Energy Data Analytics with Storm
Real-time Energy Data Analytics with Storm
DataWorks Summit
 
Data Tactics Semantic and Interoperability Summit Feb 12, 2013
Data Tactics Semantic and Interoperability Summit Feb 12, 2013Data Tactics Semantic and Interoperability Summit Feb 12, 2013
Data Tactics Semantic and Interoperability Summit Feb 12, 2013
DataTactics
 

What's hot (20)

Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
 
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the CloudLeveraging NLP and Deep Learning for Document Recommendations in the Cloud
Leveraging NLP and Deep Learning for Document Recommendations in the Cloud
 
Video Analytics on Hadoop webinar victor fang-201309
Video Analytics on Hadoop webinar victor fang-201309Video Analytics on Hadoop webinar victor fang-201309
Video Analytics on Hadoop webinar victor fang-201309
 
Perspective on HPC-enabled AI
Perspective on HPC-enabled AIPerspective on HPC-enabled AI
Perspective on HPC-enabled AI
 
Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...
 
Graph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise GraphGraph Gurus Episode 1: Enterprise Graph
Graph Gurus Episode 1: Enterprise Graph
 
Quoc Le at AI Frontiers : Automated Machine Learning
Quoc Le at AI Frontiers : Automated Machine LearningQuoc Le at AI Frontiers : Automated Machine Learning
Quoc Le at AI Frontiers : Automated Machine Learning
 
Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014Deeplearning on Hadoop @OSCON 2014
Deeplearning on Hadoop @OSCON 2014
 
Energy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshopEnergy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshop
 
Kinetica master chug_9.12
Kinetica master chug_9.12Kinetica master chug_9.12
Kinetica master chug_9.12
 
Proof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesProof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-series
 
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
The Transformation of HPC: Simulation and Cognitive Methods in the Era of Big...
 
Big Data HPC Convergence
Big Data HPC ConvergenceBig Data HPC Convergence
Big Data HPC Convergence
 
Real-time Energy Data Analytics with Storm
Real-time Energy Data Analytics with StormReal-time Energy Data Analytics with Storm
Real-time Energy Data Analytics with Storm
 
Data Tactics Semantic and Interoperability Summit Feb 12, 2013
Data Tactics Semantic and Interoperability Summit Feb 12, 2013Data Tactics Semantic and Interoperability Summit Feb 12, 2013
Data Tactics Semantic and Interoperability Summit Feb 12, 2013
 
Chug dl presentation
Chug dl presentationChug dl presentation
Chug dl presentation
 
High Performance Computing and Big Data: The coming wave
High Performance Computing and Big Data: The coming waveHigh Performance Computing and Big Data: The coming wave
High Performance Computing and Big Data: The coming wave
 
Android Malware 2020 (CCCS-CIC-AndMal-2020)
Android Malware 2020 (CCCS-CIC-AndMal-2020)Android Malware 2020 (CCCS-CIC-AndMal-2020)
Android Malware 2020 (CCCS-CIC-AndMal-2020)
 
Graph Gurus Episode 6: Community Detection
Graph Gurus Episode 6: Community DetectionGraph Gurus Episode 6: Community Detection
Graph Gurus Episode 6: Community Detection
 

Similar to Distributed deep learning_over_spark_20_nov_2014_ver_2.8

Similar to Distributed deep learning_over_spark_20_nov_2014_ver_2.8 (20)

Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark MeetupDistributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data...
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4JGeorgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
Synthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep LearningSynthetic dialogue generation with Deep Learning
Synthetic dialogue generation with Deep Learning
 
Sjug #26 ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23
Sjug #26   ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23Sjug #26   ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23
Sjug #26 ml is in java but is dl too - ver1.04 - tomasz sikora 2018-03-23
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
 
Introduction to parallel iterative deep learning on hadoop’s next​ generation...
Introduction to parallel iterative deep learning on hadoop’s next​ generation...Introduction to parallel iterative deep learning on hadoop’s next​ generation...
Introduction to parallel iterative deep learning on hadoop’s next​ generation...
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data Platform
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
Neural Networks, Spark MLlib, Deep Learning
Neural Networks, Spark MLlib, Deep LearningNeural Networks, Spark MLlib, Deep Learning
Neural Networks, Spark MLlib, Deep Learning
 
Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016Kaz Sato, Evangelist, Google at MLconf ATL 2016
Kaz Sato, Evangelist, Google at MLconf ATL 2016
 
Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications Spark Based Distributed Deep Learning Framework For Big Data Applications
Spark Based Distributed Deep Learning Framework For Big Data Applications
 
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on HadoopHadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
Hadoop Summit 2014 - San Jose - Introduction to Deep Learning on Hadoop
 
Hadoop Summit 2014 Distributed Deep Learning
Hadoop Summit 2014 Distributed Deep LearningHadoop Summit 2014 Distributed Deep Learning
Hadoop Summit 2014 Distributed Deep Learning
 
Deep Learning on Hadoop
Deep Learning on HadoopDeep Learning on Hadoop
Deep Learning on Hadoop
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
 
Deep learning
Deep learningDeep learning
Deep learning
 

Recently uploaded

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 

Distributed deep learning_over_spark_20_nov_2014_ver_2.8

  • 1. Deep Learning: Evolution of ML from Statistical to Brain-like Computing Dr. Dobbs Conference Keynote 20th Nov 2014, Bangalore. Dr. Vijay Srinivas Agneeswaran, Director, Big-data Labs, Impetus
  • 2. Contents Introduction to Artificial Neural Networks Deep learning networks Towards deep learning From ANNs to DLNs. Basics of DLNs. Related Approaches. Distributed DLNs: Challenges Introduction to Spark Distributed DLNs over Spark Copyright @Impetus Technologies, 2014
  • 3. Deep Learning: Evolution Timeline Copyright @Impetus Technologies, 2014
  • 4. Introduction to Artificial Neural Networks (ANNs) Perceptron Copyright @Impetus Technologies, 2014
  • 5. Introduction to Artificial Neural Networks (ANNs) Sigmoid Neuron • Small change in input = small change in behaviour. • Output of a sigmoid neuron is given below: • Small change in input = small change in behaviour. • Output of a sigmoid neuron is given below: Copyright @Impetus Technologies,
  • 6. Introduction to Artificial Neural Networks (ANNs): Back Propagation What is this? NAND Gate! http://zerkpage.tripod.com/ann.htm Copyright @Impetus Technologies, 2014 initialize network weights (often small random values) do forEach training example ex prediction = neural-net-output(network, ex) // forward pass actual = teacher-output(ex) compute error (prediction - actual) at the output units compute delta(wh)for all weights from hidden layer to output layer // backward pass compute delta(wi) for all weights from input layer to hidden layer // backward pass continued update network weights until all examples classified correctly or another stopping criterion satisfied return the network
  • 7. The network to identify the individual digits from the input image http://neuralnetworksanddeeplearning.com/chap1.html Copyright @Impetus Technologies, 2014
  • 8. Different Shallow Architectures Weighted Sum Weighted Sum Weighted Sum Template matchers Fixed Basis Functions Simple Trainable Basis Functions Linear predictor Kernel Machines ANN, Radial Basis Functions Y. Bengio and Y. LeCun, "Scaling learning algorithms towards AI," in Large Scale Kernel Machines, (L. Bottou, O. Chapelle, D. DeCoste, and J. Weston, eds.), MIT Press, 2007. Copyright @Impetus Technologies, 2014
  • 9. ANNs for Face Recognition? Copyright @Impetus Technologies, 2014
  • 10. DLN for Face Recognition http://theanalyticsstore.com/deep-learning/ Copyright @Impetus Technologies, 2014
  • 11. Deep Learning Networks: Learning No general learning algorithm (No-free- lunch theorem by Wolpert 1996). Learning algorithm for specific tasks – perception, control, prediction, planning, reasoning, language understand ing. Copyright @Impetus Technologies, 2014 Limitations of BP – local minima, optimization challenges for non-convex objective functions. Hinton’s deep belief networks as stack of RBMs. Lecun’s energy based learning for DBNs.
  • 12. Deep Belief Networks • This is a deep neural network composed of multiple layers of latent variables (hidden units or feature detectors) • Can be viewed as a stack of RBMs • Hinton along with his student proposed that these networks can be trained greedily one layer at a time • Boltzmann Machine is a specific energy model with linear energy function. http://www.iro.umontreal.ca/~lisa/twiki/pub/Public/DeepBeliefNetworks/DBNs.png Copyright @Impetus Technologies, 2014
  • 13. Other DL Networks: Auto Encoders (Auto-associators or Diabolo Network) Copyright @Impetus Technologies, 2014 • Aim of auto encoders network is to learn a compressed representation for set of data • Is an unsupervised learning algorithm that applies back propagation, setting the target values equal to inputs (identity function) • Denoising auto encoder addresses identity function by randomly corrupting input that the auto encoder must then reconstruct or denoise • Best applied when there is structure in the data • Applications : Dimensionality reduction, feature selection
  • 14. Why Deep Learning Networks are Brain-like? Statistical approach of traditional ML – SVMs or kernel approaches. • Not applicable in deep learning networks. Human brain – trophic factors Traditional ML – lot of data munging, representational issues (feature abstractor), before classifier can kick in. Deep learning – allows the system to learn representations as well naturally.
  • 15. Copyright @Impetus Technologies, 2014 Success stories of DLNs Android voice recognition system – based on DLNs Improves accuracy by 25% compared to state-of- art Microsoft Skype Translate software and Digital assistant Cortana 1.2 million images, 1000 classes (ImageNet Data) – error rate of 15.3%, better than state of art at 26.1%
  • 16. Success stories of DLNs….. Senna system – PoS tagging, chunking, NER, semantic role labeling, syntactic parsing Comparable F1 score with state-of-art with huge speed advantage (5 days VS few hours). DLNs VS TF-IDF: 1 million documents, relevance search. 3.2ms VS 1.2s. Copyright @Impetus Technologies, 2014 Robot navigation
  • 17. Potential Applications of DLNs Speech recognition/enhancement Video sequencing Emotion recognition (video/audio), Malware detection, Robotics – navigation. multi-modal learning (text and image). Natural Language Processing Copyright @Impetus Technologies, 2014
  • 18. Available resources • Deeplearning4j – open source implementation of Jeffery Dean’s distributed deep learning paper. • Theano: python library of math functions. • Efficient use of GPUs transparently. • Hinton’ courses on Coursera: https://www.coursera.org/instructor/~154 Copyright @Impetus Technologies, 2014
  • 19. Challenges in Realizing DLNs Large no. of training examples – high accuracy. • Large no. of parameters can also improve accuracy. Inherently sequential nature – freeze up one layer for learning. Copyright @Impetus Technologies, 2014 GPUs to improve training speedup • Limitations – CPU_to_GPU data transfers. Distributed DLNs – Jeffrey Dean’s work.
  • 20. Distributed DLNs • Motivation • Scalable, low latency training • Parallelize training data and learn fast • Jeffrey Dean’s work DistBelief • Pseudo-centralized realization Copyright @Impetus Technologies, 2014
  • 21. What is Spark? 21 Spark provides a computing abstraction that generalizes Map- Reduce. More powerful set of operations than just map and reduce – group by, order by, sort, reduce by key, sample, union, etc. Provides efficient execution environment based on distributed shared memory – keep working set of data in memory. Shark provides Hive Query Language (HQL) interface over Spark
  • 22. What is Spark? Data Flow in Hadoop 22
  • 23. What is Spark? Data Flow in Spark 23
  • 24. Real world use-case example: HITS algorithm The Hub score and Authority score for a node is calculated with the following algorithm:  Start with each node having a hub score and authority score of 1 i.e. auth(p) = 1 and hub(p) = 1  Run the Authority Update Rule: Update each node's Authority score to be equal to the sum of the Hub Scores of each node that points to it. That is, a node is given a high authority score by being linked to by pages that are recognized as Hubs for information.  Run the Hub Update Rule: Update each node's Hub Score to be equal to the sum of the Authority Scores of each node that it points to. That is, a node is given a high hub score by linking to nodes that are considered to be authorities on the subject.  Normalize the values by dividing each Hub score by square root of the sum of the squares of all Hub scores, and dividing each Authority score by square root of the sum of the squares of all Authority scores.  Repeat from the second step as necessary. 24
  • 25. Solve HITS algorithm using Hadoop MR HDFS Storag e Step 1 : auth(p) = 1 and hub(p) = 1 Step 2 : Run Authority Update Rule auth(p) = X Step 3 : Run Hub Update Rule hub(p) = Y Step 4 : Normalize hub(p) and auth(p) Write Read 25 Flow
  • 26. Solve HITS algorithm using Spark HDFS Storag e Step 1 : auth(p) = 1 and hub(p) = 1 Step 2 : Run Authority Update Rule auth(p) = X Step 3 : Run Hub Update Rule hub(p) = Y Step 4 : Normalize hub(p) and auth(p) Write Read 26 Flow
  • 27. Spark Transformations/Actions Description Map(function f1) Pass each element of the RDD through f1 in parallel and return the resulting RDD. Filter(function f2) Select elements of RDD that return true when passed through f2. flatMap(function f3) Similar to Map, but f3 returns a sequence to facilitate mapping single input to multiple outputs. Union(RDD r1) Returns result of union of the RDD r1 with the self. Sample(flag, p, seed) Returns a randomly sampled (with seed) p percentage of the RDD. groupByKey(noTasks) Can only be invoked on key-value paired data – returns data grouped by value. No. of parallel tasks is given as an argument (default is 8). reduceByKey(function f4, noTasks) Aggregates result of applying f4 on elements with same key. No. of parallel tasks is the second argument. Join(RDD r2, noTasks) Joins RDD r2 with self – computes all possible pairs for given key. groupWith(RDD r3, Joins RDD r3 with self and groups by key. noTasks) sortByKey(flag) Sorts the self RDD in ascending or descending based on flag. Reduce(function f5) Aggregates result of applying function f5 on all elements of self RDD Collect() Return all elements of the RDD as an array. Count() Count no. of elements in RDD take(n) Get first n elements of RDD. First() Equivalent to take(1) saveAsTextFile(path) Persists RDD in a file in HDFS or other Hadoop supported file system at given path. saveAsSequenceFile(path ) Persist RDD as a Hadoop sequence file. Can be invoked only on key-value paired RDDs that implement Hadoop writable interface or equivalent. foreach(function f6) Run f6 in parallel on elements of self RDD. [MZ12] Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation (NSDI'12). USENIX Association, Berkeley, CA, USA, 2-2.
  • 28. Berkeley Big-data Analytics Stack (BDAS) 28
  • 29. Spark: Use Cases 29 Ooyala Uses Cassandra for video data personalization. Pre-compute aggregates VS on-the- fly queries. Moved to Spark for ML and computing views. Moved to Shark for on-the-fly queries – C* OLAP aggregate queries on Cassandra 130 secs, 60 ms in Spark Conviva Uses Hive for repeatedly running ad-hoc queries on video data. Optimized ad-hoc queries using Spark RDDs – found Spark is 30 times faster than Hive ML for connection analysis and video streaming optimization. Yahoo Advertisement targeting: 30K nodes on Hadoop Yarn Hadoop – batch processing Spark – iterative processing Storm – on-the-fly processing Content recommendation – collaborative filtering
  • 30. 30 Spark Use Cases: Spark is good for linear algebra, optimization and N-body problems. Computations/Operations Giant 1 (simple stats) is perfect for Hadoop 1.0. Giants 2 (linear algebra), 3 (N-body), 4 (optimization) Spark from UC Berkeley is efficient. Logistic regression, kernel SVMs, conjugate gradient descent, collaborative filtering, Gibbs sampling, alternating least squares. Example is social group-first approach for consumer churn analysis [2] Interactive/On-the-fly data processing – Storm. OLAP – data cube operations. Dremel/Drill Data sets – not embarrassingly parallel? Deep Learning Artificial Neural Networks/Deep Belief Networks Machine vision from Google [3] Speech analysis from Microsoft Giant 5 – Graph processing – GraphLab, Pregel, Giraph [1] National Research Council. Frontiers in Massive Data Analysis . Washington, DC: The National Academies Press, 2013. [2] Richter, Yossi ; Yom-Tov, Elad ; Slonim, Noam: Predicting Customer Churn in Mobile Networks through Analysis of Social Groups. In: Proceedings of SIAM International Conference on Data Mining, 2010, S. 732-741 [3] Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc'Aurelio Ranzato, Andrew W. Senior, Paul A. Tucker, Ke Yang, Andrew Y. Ng: Large Scale Distributed Deep Networks. NIPS 2012:
  • 31. Some Spark(ling) examples Scala code (serial) var count = 0 for (i <- 1 to 100000) { val x = Math.random * 2 - 1 val y = Math.random * 2 - 1 if (x*x + y*y < 1) count += 1 } println("Pi is roughly " + 4 * count / 100000.0) Sample random point on unit circle – count how many are inside them (roughly about PI/4). Hence, u get approximate value for PI. Based on the PS/PC = AS/AC=4/PI, so PI = 4 * (PC/PS).
  • 32. Some Spark(ling) examples Spark code (parallel) val spark = new SparkContext(<Mesos master>) var count = spark.accumulator(0) for (i <- spark.parallelize(1 to 100000, 12)) { val x = Math.random * 2 – 1 val y = Math.random * 2 - 1 if (x*x + y*y < 1) count += 1 } println("Pi is roughly " + 4 * count / 100000.0) Notable points: 1. Spark context created – talks to Mesos1 master. 2. Count becomes shared variable – accumulator. 3. For loop is an RDD – breaks scala range object (1 to 100000) into 12 slices. 4. Parallelize method invokes foreach method of RDD. 1 Mesos is an Apache incubated clustering system – http://mesosproject.org
  • 33. Logistic Regression in Spark: Serial Code // Read data file and convert it into Point objects val lines = scala.io.Source.fromFile("data.txt").getLines() val points = lines.map(x => parsePoint(x)) // Run logistic regression var w = Vector.random(D) for (i <- 1 to ITERATIONS) { val gradient = Vector.zeros(D) for (p <- points) { val scale = (1/(1+Math.exp(-p.y*(w dot p.x)))-1)*p.y gradient += scale * p.x } w -= gradient } println("Result: " + w)
  • 34. Logistic Regression in Spark // Read data file and transform it into Point objects val spark = new SparkContext(<Mesos master>) val lines = spark.hdfsTextFile("hdfs://.../data.txt") val points = lines.map(x => parsePoint(x)).cache() // Run logistic regression var w = Vector.random(D) for (i <- 1 to ITERATIONS) { val gradient = spark.accumulator(Vector.zeros(D)) for (p <- points) { val scale = (1/(1+Math.exp(-p.y*(w dot p.x)))-1)*p.y gradient += scale * p.x } w -= gradient.value } println("Result: " + w)
  • 35. Deep Learning on SparkFully Distributed Deep learning network implementation on Spark. Spark would handle the parallelism, synchronization, distribution, and fail over. The input data set in HDFS, intermediate data in local file system Publish/subscribe message passing framework built on top of Apache Spark using Akka Framework.
  • 36.
  • 37. Conclusions • ANN to Distributed Deep Learning • Key ideas in deep learning • Need for distributed realizations. • DistBelief, deeplearning4j etc. • Our work on large scale distributed deep learning • Deep learning leads us from statistics based machine learning towards brain inspired AI. Copyright @Impetus Technologies, 2014
  • 38. Thank You! Mail • vijay.sa@impetus.co.in LinkedIn • http://in.linkedin.com/in/vijaysrinivasagneeswaran Blogs • blogs.impetus.com Twitter • @a_vijaysrinivas. Copyright @Impetus Technologies, 2014
  • 39. Backup Slides Copyright @Impetus Technologies, 2014
  • 40. Copyright @Impetus Technologies, 2014 Energy Based Models http://www.cs.nyu.edu/~yann/research/ebm/loss-func. png • RBM are Energy Based Models (EBM) • EBM associate an energy with every configuration of a system • Learning corresponds to modifying the shape of energy function, so that it has desirable properties • Like in physics, lower energy = more stability • So, modify shape of energy function such that the desirable configurations have lower energy
  • 41. Other DL networks: Convolutional Networks Yann LeCun, Patrick Haffner, Léon Bottou, and Yoshua Bengio. 1999. Object Recognition with Gradient-Based Learning. In Shape, Contour and Grouping in Computer Vision, David A. Forsyth, Joseph L. Mundy, Vito Di Gesù, and Roberto Cipolla (Eds.). Springer-Verlag, London, UK, UK, 319-. Copyright @Impetus Technologies, 2014
  • 42. Other Brain-like Approaches • Recurrent Neural networks • Long Short Term Memory (LSTM), Temporal data • Sum-product networks • Deep architectures of sum-product networks • Hierarchical temporal memory • online structural and algorithmic model of neocortex. Copyright @Impetus Technologies, 2014
  • 43. Recurrent Neural Networks • Connections between units form a Directed cycle i.e. a typical feed back connections • RNNs can use their internal memory to process arbitrary sequences of inputs • RNNs cannot learn to look far back past • LSTM solve this problem by introducing stem cells • These stem cells can remember a value for an arbitrary amount of time Copyright @Impetus Technologies, 2014
  • 44. Sum-Product Networks (SPN) • SPN is deep network model and is a directed acyclic graph • These networks allow to compute the probability of an event quickly • SPNs try to convert multi linear functions to ones in computationally short forms i.e. it must consist of multiple additions and multiplications • Leaves correspond to variables and nodes correspond to sums and products Copyright @Impetus Technologies, 2014
  • 45. Hierarchical Temporal Memory • Is a online machine learning model developed by Jeff Hawkins • This model learns one instance at a time • Best explained by online stock model. Today’s situation of stock helps in prediction of tomorrow’s stock • A HTM network is tree shaped hierarchy of levels • Higher hierarchy levels can use patterns learned at lower levels. This is adopted from learning model adopted by brain in the form of neo cortex Copyright @Impetus Technologies, 2014
  • 47. Mathematical Equations • The Energy Function is defined as follows: 퐸 푥, ℎ = −푏′푥 − 푐′ℎ − ℎ′푊푥 where, W represents the weights connecting visible layer and hidden layer. b’ and c’ are the biases Copyright @Impetus Technologies, 2014
  • 48. Learning Energy Based Models • Energy based models can be learnt by performing gradient descent on negative log-likelihood of training data • It has the following form: − 휕 log 푝 푥 휕θ = 휕 퐹 푥 휕θ − 푥̃ 푝 푥 휕 퐹 푥 휕θ Positive phaseNegative phase Copyright @Impetus Technologies, 2014

Editor's Notes

  1. Reference : http://neuralnetworksanddeeplearning.com/chap1.html Consider the problem to identify the individual digits from the input image Each image 28 by 28 pixel image. Then network is designed as follows Input layer (image) -> 28*28 = 784 neurons. Each neuron corresponds to a pixel The output layer can be identified by the number of digits to be identified i.e. 10 (0 to 9) The intermediate hidden layer can be experimented with varied number of neurons. Let us fix at 10 nodes in hidden layer
  2. Reference: http://neuralnetworksanddeeplearning.com/chap1.html How about recognizing a human face from given set of random images? Attack this problem in the similar fashion explained earlier. Input -> Image pixels, output -> Is it a face or not? (a single node) A face can be recognized by answering some questions like “Is there an eye in the top left?”, “Is there a nose in the middle?” etc.. Each question corresponds to a hidden layer ANN for face recognition? Why SVMs or any kernel based approach cannot be used here. Implicit assumption of a locally smooth function around each training example. Problem decomposition into sub-problems Breakdown into sub-problems, solvable by sub-networks. Complex problem requires more sub-networks, more hidden layers, hence need for deep neural networks.
  3. http://ufldl.stanford.edu/wiki/index.php/Autoencoders_and_Sparsity
  4. http://deeplearning4j.org/convolutionalnets.html Refined by Lecun in 1989 – mainly to apply CNNs to identify variability in 2D image data. Introduced in 1980 by Fukushima A type of RBMs where the communication is absent across the nodes in the same layer Nodes are not connected to every other node of next layer. Symmetry is not there Convolution networks learn images by pieces rather than learning as a whole (RBM does this) Designed to use minimal amounts of pre processing
  5. http://www.idsia.ch/~juergen/rnn.html
  6. http://deep-awesomeness.tumblr.com/post/63736448581/sum-product-networks-spm http://lessoned.blogspot.in/2011/10/intro-to-sum-product-networks.html
  7. http://en.wikipedia.org/wiki/Hierarchical_temporal_memory