© 2014 Impetus Technologies1
Impetus Technologies Inc.
Deep Learning: Evolution of ML
from Statistical to Brain-like
Compu...
© 2014 Impetus Technologies2
Contents
Introduction to Artificial Neural Networks
Deep learning networks
• Towards deep lea...
© 2014 Impetus Technologies3
Deep Learning: Evolution Timeline
© 2014 Impetus Technologies4
Introduction to Artificial Neural Networks (ANNs)
Perceptron
© 2014 Impetus Technologies5
Introduction to Artificial Neural Networks (ANNs)
Sigmoid Neuron
• Small change in input = sm...
© 2014 Impetus Technologies6
Introduction to Artificial Neural Networks (ANNs): Back
Propagation
http://zerkpage.tripod.co...
© 2014 Impetus Technologies7
The network to identify the individual digits from the
input image
http://neuralnetworksandde...
© 2014 Impetus Technologies8
Different Shallow Architectures
Weighted Sum Weighted SumWeighted Sum
Template matchers
Fixed...
© 2014 Impetus Technologies9
ANNs for Face Recognition?
© 2014 Impetus Technologies10
DLN for Face Recognition
http://theanalyticsstore.com/deep-learning/
© 2014 Impetus Technologies11
Deep Learning Networks: Learning
No general
learning algorithm
(No-free-lunch
theorem by
Wol...
© 2014 Impetus Technologies12
• This is a deep neural network composed
of multiple layers of latent variables
(hidden unit...
© 2014 Impetus Technologies13
• RBM are Energy Based Models (EBM)
• EBM associate an energy with every
configuration of a ...
© 2014 Impetus Technologies14
Other DL networks: Convolutional Networks
Yann LeCun, Patrick Haffner, Léon Bottou, and Yosh...
© 2014 Impetus Technologies15
• Aim of auto encoders network is to learn
a compressed representation for set of
data
• Is ...
© 2014 Impetus Technologies16
Why Deep Learning Networks are Brain-like?
Statistical
approach of
traditional ML –
SVMs or ...
© 2014 Impetus Technologies17
Success stories of DLNs
Android voice
recognition system –
based on DLNs
Improves accuracy b...
© 2014 Impetus Technologies18
Success stories of DLNs…..
Senna system – PoS tagging, chunking, NER, semantic role
labeling...
© 2014 Impetus Technologies19
Potential Applications of DLNs
Speech recognition/enhancement
Video sequencing
Emotion recog...
© 2014 Impetus Technologies20
• Deeplearning4j – open source
implementation of Jeffery Dean’s
distributed deep learning pa...
© 2014 Impetus Technologies21
• Large no. of
parameters can also
improve accuracy.
• Limitations –
CPU_to_GPU data
transfe...
© 2014 Impetus Technologies22
• Motivation
– Scalable, low latency training
– Parallelize training data and learn
fast
Dis...
© 2014 Impetus Technologies23
• Purely distributed realizations are
needed.
• Our approach
– Use asynchronous graph
proces...
© 2014 Impetus Technologies24
Distributed DLNs over GraphLab
Engine
© 2014 Impetus Technologies25
• ANN to Distributed Deep Learning
– Key ideas in deep learning
– Need for distributed
reali...
© 2014 Impetus Technologies26
THANK YOU!
Mail • bigdata@impetus.com
LinkedIn • www.linkedin.com/company/impetus
Blogs • bl...
© 2014 Impetus Technologies27
BACKUP SLIDES
© 2014 Impetus Technologies28
• Recurrent Neural networks
– Long Short Term Memory
(LSTM), Temporal data
• Sum-product net...
© 2014 Impetus Technologies29
• Connections between units form a Directed cycle i.e. a typical feed back
connections
• RNN...
© 2014 Impetus Technologies30
• SPN is deep network model and is a directed acyclic graph
• These networks allow to comput...
© 2014 Impetus Technologies31
• Is a online machine learning model developed by Jeff Hawkins
• This model learns one insta...
© 2014 Impetus Technologies32
http://en.wikipedia.org/wiki/Hierarchical_temporal_memory
© 2014 Impetus Technologies33
Mathematical Equations
• The Energy Function is defined as follows:
b’ and c’ are the biases...
© 2014 Impetus Technologies34
Learning Energy Based Models
• Energy based models can be learnt by performing gradient desc...
© 2014 Impetus Technologies35
Thank you.
Questions??
Upcoming SlideShare
Loading in...5
×

Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data Science Presentation

1,141

Published on

Presentation on 'Deep Learning: Evolution of ML from Statistical to Brain-like Computing'

Speaker- Dr. Vijay Srinivas Agneeswaran,Director, Big Data Labs, Impetus

The main objective of the presentation is to give an overview of our cutting edge work on realizing distributed deep learning networks over GraphLab. The objectives can be summarized as below:

- First-hand experience and insights into implementation of distributed deep learning networks.
- Thorough view of GraphLab (including descriptions of code) and the extensions required to implement these networks.
- Details of how the extensions were realized/implemented in GraphLab source – they have been submitted to the community for evaluation.
- Arrhythmia detection use case as an application of the large scale distributed deep learning network.

Published in: Technology
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,141
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide
  • Reference : http://neuralnetworksanddeeplearning.com/chap1.html
    Consider the problem to identify the individual digits from the input image
    Each image 28 by 28 pixel image. Then network is designed as follows
    Input layer (image) -> 28*28 = 784 neurons. Each neuron corresponds to a pixel
    The output layer can be identified by the number of digits to be identified i.e. 10 (0 to 9)
    The intermediate hidden layer can be experimented with varied number of neurons. Let us fix at 10 nodes in hidden layer
  • Reference: http://neuralnetworksanddeeplearning.com/chap1.html
    How about recognizing a human face from given set of random images?
    Attack this problem in the similar fashion explained earlier. Input -> Image pixels, output -> Is it a face or not? (a single node)
    A face can be recognized by answering some questions like “Is there an eye in the top left?”, “Is there a nose in the middle?” etc..
    Each question corresponds to a hidden layer

    ANN for face recognition?
    Why SVMs or any kernel based approach cannot be used here.
    Implicit assumption of a locally smooth function around each training example.
    Problem decomposition into sub-problems
    Breakdown into sub-problems, solvable by sub-networks. Complex problem requires more sub-networks, more hidden layers, hence need for deep neural networks.
  • http://deeplearning4j.org/convolutionalnets.html
    Refined by Lecun in 1989 – mainly to apply CNNs to identify variability in 2D image data.
    Introduced in 1980 by Fukushima
    A type of RBMs where the communication is absent across the nodes in the same layer
    Nodes are not connected to every other node of next layer. Symmetry is not there
    Convolution networks learn images by pieces rather than learning as a whole (RBM does this)
    Designed to use minimal amounts of pre processing
  • http://ufldl.stanford.edu/wiki/index.php/Autoencoders_and_Sparsity
  • Add layers
    Give no. of nodes in each layer
    Create max. no. of nodes across the layers.
    Forward propagation
    Backward propagation.
    Run the engine
  • http://www.idsia.ch/~juergen/rnn.html
  • http://deep-awesomeness.tumblr.com/post/63736448581/sum-product-networks-spm

    http://lessoned.blogspot.in/2011/10/intro-to-sum-product-networks.html
  • http://en.wikipedia.org/wiki/Hierarchical_temporal_memory
  • Transcript of "Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data Science Presentation"

    1. 1. © 2014 Impetus Technologies1 Impetus Technologies Inc. Deep Learning: Evolution of ML from Statistical to Brain-like Computing The Fifth Elephant July 25, 2014. Dr. Vijay Srinivas Agneeswaran, Director, Big Data Labs, Impetus
    2. 2. © 2014 Impetus Technologies2 Contents Introduction to Artificial Neural Networks Deep learning networks • Towards deep learning • From ANNs to DLNs. • Basics of DLNs. • Related Approaches. Distributed DLNs: Challenges Distributed DLNs over GraphLab
    3. 3. © 2014 Impetus Technologies3 Deep Learning: Evolution Timeline
    4. 4. © 2014 Impetus Technologies4 Introduction to Artificial Neural Networks (ANNs) Perceptron
    5. 5. © 2014 Impetus Technologies5 Introduction to Artificial Neural Networks (ANNs) Sigmoid Neuron • Small change in input = small change in behaviour. • Output of a sigmoid neuron is given below: • Small change in input = small change in behaviour. • Output of a sigmoid neuron is given below:
    6. 6. © 2014 Impetus Technologies6 Introduction to Artificial Neural Networks (ANNs): Back Propagation http://zerkpage.tripod.com/ann.htm What is this? NAND Gate! initialize network weights (often small random values) do forEach training example ex prediction = neural-net-output(network, ex) // forward pass actual = teacher-output(ex) compute error (prediction - actual) at the output units compute delta(wh)for all weights from hidden layer to output layer // backward pass compute delta(wi) for all weights from input layer to hidden layer // backward pass continued update network weights until all examples classified correctly or another stopping criterion satisfied return the network
    7. 7. © 2014 Impetus Technologies7 The network to identify the individual digits from the input image http://neuralnetworksanddeeplearning.com/chap1.html
    8. 8. © 2014 Impetus Technologies8 Different Shallow Architectures Weighted Sum Weighted SumWeighted Sum Template matchers Fixed Basis Functions Simple Trainable Basis Functions Y. Bengio and Y. LeCun, "Scaling learning algorithms towards AI," in Large Scale Kernel Machines, (L. Bottou, O. Chapelle, D. DeCoste, and J. Weston, eds.), MIT Press, 2007. Linear predictor ANN, Radial Basis FunctionsKernel Machines
    9. 9. © 2014 Impetus Technologies9 ANNs for Face Recognition?
    10. 10. © 2014 Impetus Technologies10 DLN for Face Recognition http://theanalyticsstore.com/deep-learning/
    11. 11. © 2014 Impetus Technologies11 Deep Learning Networks: Learning No general learning algorithm (No-free-lunch theorem by Wolpert 1996). Learning algorithm for specific tasks – perception, control, prediction, planning, reasoning, language understanding . Limitations of BP – local minima, optimization challenges for non-convex objective functions. Hinton’s deep belief networks as stack of RBMs. Lecun’s energy based learning for DBNs.
    12. 12. © 2014 Impetus Technologies12 • This is a deep neural network composed of multiple layers of latent variables (hidden units or feature detectors) • Can be viewed as a stack of RBMs • Hinton along with his student proposed that these networks can be trained greedily one layer at a time Deep Belief Networks http://www.iro.umontreal.ca/~lisa/twiki/pub/Public/DeepBeliefNetworks/DBNs.png • Boltzmann Machine is a specific energy model with linear energy function.
    13. 13. © 2014 Impetus Technologies13 • RBM are Energy Based Models (EBM) • EBM associate an energy with every configuration of a system • Learning corresponds to modifying the shape of energy function, so that it has desirable properties • Like in physics, lower energy = more stability • So, modify shape of energy function such that the desirable configurations have lower energy Energy Based Models http://www.cs.nyu.edu/~yann/research /ebm/loss-func.png
    14. 14. © 2014 Impetus Technologies14 Other DL networks: Convolutional Networks Yann LeCun, Patrick Haffner, Léon Bottou, and Yoshua Bengio. 1999. Object Recognition with Gradient-Based Learning. In Shape, Contour and Grouping in Computer Vision, David A. Forsyth, Joseph L. Mundy, Vito Di Gesù, and Roberto Cipolla (Eds.). Springer-Verlag, London, UK, UK, 319-.
    15. 15. © 2014 Impetus Technologies15 • Aim of auto encoders network is to learn a compressed representation for set of data • Is an unsupervised learning algorithm that applies back propagation, setting the target values equal to inputs (identity function) • Denoising auto encoder addresses identity function by randomly corrupting input that the auto encoder must then reconstruct or denoise • Best applied when there is structure in the data • Applications : Dimensionality reduction, feature selection Other DL Networks: Auto Encoders (Auto- associators or Diabolo Network
    16. 16. © 2014 Impetus Technologies16 Why Deep Learning Networks are Brain-like? Statistical approach of traditional ML – SVMs or kernel approaches. • Not applicable in deep learning networks. Human brain – trophic factors Traditional ML – lot of data munging, representational issues (feature abstractor), before classifier can kick in. Deep learning – allows the system to learn representations as well naturally.
    17. 17. © 2014 Impetus Technologies17 Success stories of DLNs Android voice recognition system – based on DLNs Improves accuracy by 25% compared to state- of-art Microsoft Skype Translate software and Digital assistant Cortana 1.2 million images, 1000 classes (ImageNet Data) – error rate of 15.3%, better than state of art at 26.1%
    18. 18. © 2014 Impetus Technologies18 Success stories of DLNs….. Senna system – PoS tagging, chunking, NER, semantic role labeling, syntactic parsing Comparable F1 score with state-of-art with huge speed advantage (5 days VS few hours). DLNs VS TF-IDF: 1 million documents, relevance search. 3.2ms VS 1.2s. Robot navigation
    19. 19. © 2014 Impetus Technologies19 Potential Applications of DLNs Speech recognition/enhancement Video sequencing Emotion recognition (video/audio), Malware detection, Robotics – navigation. multi-modal learning (text and image). Natural Language Processing
    20. 20. © 2014 Impetus Technologies20 • Deeplearning4j – open source implementation of Jeffery Dean’s distributed deep learning paper. • Theano: python library of math functions. – Efficient use of GPUs transparently. • Hinton’ courses on Coursera: https://www.coursera.org/instructor/~15 4 Available resources
    21. 21. © 2014 Impetus Technologies21 • Large no. of parameters can also improve accuracy. • Limitations – CPU_to_GPU data transfers. Challenges in Realizing DLNs Large no. of training examples – high accuracy. Inherently sequential nature – freeze up one layer for learning. GPUs to improve training speedup Distributed DLNs – Jeffrey Dean’s work.
    22. 22. © 2014 Impetus Technologies22 • Motivation – Scalable, low latency training – Parallelize training data and learn fast Distributed DLNs • Jeffrey Dean’s work DistBelief – Pseudo-centralized realization
    23. 23. © 2014 Impetus Technologies23 • Purely distributed realizations are needed. • Our approach – Use asynchronous graph processing framework (GraphLab) – Making modifications in GraphLab code as required • Layer abstraction, mass communication Distributed DLNs over GraphLab
    24. 24. © 2014 Impetus Technologies24 Distributed DLNs over GraphLab Engine
    25. 25. © 2014 Impetus Technologies25 • ANN to Distributed Deep Learning – Key ideas in deep learning – Need for distributed realizations. – DistBelief, deeplearning4j etc. – Our work on large scale distributed deep learning • Deep learning leads us from statistics based machine learning towards brain inspired AI. Conclusions
    26. 26. © 2014 Impetus Technologies26 THANK YOU! Mail • bigdata@impetus.com LinkedIn • www.linkedin.com/company/impetus Blogs • blogs.impetus.com Twitter • @impetustech
    27. 27. © 2014 Impetus Technologies27 BACKUP SLIDES
    28. 28. © 2014 Impetus Technologies28 • Recurrent Neural networks – Long Short Term Memory (LSTM), Temporal data • Sum-product networks – Deep architectures of sum- product networks • Hierarchical temporal memory – online structural and algorithmic model of neocortex. Other Brain-like Approaches
    29. 29. © 2014 Impetus Technologies29 • Connections between units form a Directed cycle i.e. a typical feed back connections • RNNs can use their internal memory to process arbitrary sequences of inputs • RNNs cannot learn to look far back past • LSTM solve this problem by introducing stem cells • These stem cells can remember a value for an arbitrary amount of time Recurrent Neural Networks
    30. 30. © 2014 Impetus Technologies30 • SPN is deep network model and is a directed acyclic graph • These networks allow to compute the probability of an event quickly • SPNs try to convert multi linear functions to ones in computationally short forms i.e. it must consist of multiple additions and multiplications • Leaves correspond to variables and nodes correspond to sums and products Sum-Product Networks (SPN)
    31. 31. © 2014 Impetus Technologies31 • Is a online machine learning model developed by Jeff Hawkins • This model learns one instance at a time • Best explained by online stock model. Today’s situation of stock helps in prediction of tomorrow’s stock • A HTM network is tree shaped hierarchy of levels • Higher hierarchy levels can use patterns learned at lower levels. This is adopted from learning model adopted by brain in the form of neo cortex Hierarchical Temporal Memory
    32. 32. © 2014 Impetus Technologies32 http://en.wikipedia.org/wiki/Hierarchical_temporal_memory
    33. 33. © 2014 Impetus Technologies33 Mathematical Equations • The Energy Function is defined as follows: b’ and c’ are the biases 𝐸 𝑥, ℎ = −𝑏′ 𝑥 − 𝑐′ℎ − ℎ′ 𝑊𝑥 where, W represents the weights connecting visible layer and hidden layer.
    34. 34. © 2014 Impetus Technologies34 Learning Energy Based Models • Energy based models can be learnt by performing gradient descent on negative log-likelihood of training data • It has the following form: − 𝜕 log 𝑝 𝑥 𝜕θ = 𝜕 𝐹 𝑥 𝜕θ − 𝑥̃ 𝑝 𝑥 𝜕 𝐹 𝑥 𝜕θ Positive phase Negative phase
    35. 35. © 2014 Impetus Technologies35 Thank you. Questions??

    ×