Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data Science Presentation


Published on

Presentation on 'Deep Learning: Evolution of ML from Statistical to Brain-like Computing'

Speaker- Dr. Vijay Srinivas Agneeswaran,Director, Big Data Labs, Impetus

The main objective of the presentation is to give an overview of our cutting edge work on realizing distributed deep learning networks over GraphLab. The objectives can be summarized as below:

- First-hand experience and insights into implementation of distributed deep learning networks.
- Thorough view of GraphLab (including descriptions of code) and the extensions required to implement these networks.
- Details of how the extensions were realized/implemented in GraphLab source – they have been submitted to the community for evaluation.
- Arrhythmia detection use case as an application of the large scale distributed deep learning network.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Reference :
    Consider the problem to identify the individual digits from the input image
    Each image 28 by 28 pixel image. Then network is designed as follows
    Input layer (image) -> 28*28 = 784 neurons. Each neuron corresponds to a pixel
    The output layer can be identified by the number of digits to be identified i.e. 10 (0 to 9)
    The intermediate hidden layer can be experimented with varied number of neurons. Let us fix at 10 nodes in hidden layer
  • Reference:
    How about recognizing a human face from given set of random images?
    Attack this problem in the similar fashion explained earlier. Input -> Image pixels, output -> Is it a face or not? (a single node)
    A face can be recognized by answering some questions like “Is there an eye in the top left?”, “Is there a nose in the middle?” etc..
    Each question corresponds to a hidden layer

    ANN for face recognition?
    Why SVMs or any kernel based approach cannot be used here.
    Implicit assumption of a locally smooth function around each training example.
    Problem decomposition into sub-problems
    Breakdown into sub-problems, solvable by sub-networks. Complex problem requires more sub-networks, more hidden layers, hence need for deep neural networks.
    Refined by Lecun in 1989 – mainly to apply CNNs to identify variability in 2D image data.
    Introduced in 1980 by Fukushima
    A type of RBMs where the communication is absent across the nodes in the same layer
    Nodes are not connected to every other node of next layer. Symmetry is not there
    Convolution networks learn images by pieces rather than learning as a whole (RBM does this)
    Designed to use minimal amounts of pre processing
  • Add layers
    Give no. of nodes in each layer
    Create max. no. of nodes across the layers.
    Forward propagation
    Backward propagation.
    Run the engine
  • Deep Learning: Evolution of ML from Statistical to Brain-like Computing- Data Science Presentation

    1. 1. © 2014 Impetus Technologies1 Impetus Technologies Inc. Deep Learning: Evolution of ML from Statistical to Brain-like Computing The Fifth Elephant July 25, 2014. Dr. Vijay Srinivas Agneeswaran, Director, Big Data Labs, Impetus
    2. 2. © 2014 Impetus Technologies2 Contents Introduction to Artificial Neural Networks Deep learning networks • Towards deep learning • From ANNs to DLNs. • Basics of DLNs. • Related Approaches. Distributed DLNs: Challenges Distributed DLNs over GraphLab
    3. 3. © 2014 Impetus Technologies3 Deep Learning: Evolution Timeline
    4. 4. © 2014 Impetus Technologies4 Introduction to Artificial Neural Networks (ANNs) Perceptron
    5. 5. © 2014 Impetus Technologies5 Introduction to Artificial Neural Networks (ANNs) Sigmoid Neuron • Small change in input = small change in behaviour. • Output of a sigmoid neuron is given below: • Small change in input = small change in behaviour. • Output of a sigmoid neuron is given below:
    6. 6. © 2014 Impetus Technologies6 Introduction to Artificial Neural Networks (ANNs): Back Propagation What is this? NAND Gate! initialize network weights (often small random values) do forEach training example ex prediction = neural-net-output(network, ex) // forward pass actual = teacher-output(ex) compute error (prediction - actual) at the output units compute delta(wh)for all weights from hidden layer to output layer // backward pass compute delta(wi) for all weights from input layer to hidden layer // backward pass continued update network weights until all examples classified correctly or another stopping criterion satisfied return the network
    7. 7. © 2014 Impetus Technologies7 The network to identify the individual digits from the input image
    8. 8. © 2014 Impetus Technologies8 Different Shallow Architectures Weighted Sum Weighted SumWeighted Sum Template matchers Fixed Basis Functions Simple Trainable Basis Functions Y. Bengio and Y. LeCun, "Scaling learning algorithms towards AI," in Large Scale Kernel Machines, (L. Bottou, O. Chapelle, D. DeCoste, and J. Weston, eds.), MIT Press, 2007. Linear predictor ANN, Radial Basis FunctionsKernel Machines
    9. 9. © 2014 Impetus Technologies9 ANNs for Face Recognition?
    10. 10. © 2014 Impetus Technologies10 DLN for Face Recognition
    11. 11. © 2014 Impetus Technologies11 Deep Learning Networks: Learning No general learning algorithm (No-free-lunch theorem by Wolpert 1996). Learning algorithm for specific tasks – perception, control, prediction, planning, reasoning, language understanding . Limitations of BP – local minima, optimization challenges for non-convex objective functions. Hinton’s deep belief networks as stack of RBMs. Lecun’s energy based learning for DBNs.
    12. 12. © 2014 Impetus Technologies12 • This is a deep neural network composed of multiple layers of latent variables (hidden units or feature detectors) • Can be viewed as a stack of RBMs • Hinton along with his student proposed that these networks can be trained greedily one layer at a time Deep Belief Networks • Boltzmann Machine is a specific energy model with linear energy function.
    13. 13. © 2014 Impetus Technologies13 • RBM are Energy Based Models (EBM) • EBM associate an energy with every configuration of a system • Learning corresponds to modifying the shape of energy function, so that it has desirable properties • Like in physics, lower energy = more stability • So, modify shape of energy function such that the desirable configurations have lower energy Energy Based Models /ebm/loss-func.png
    14. 14. © 2014 Impetus Technologies14 Other DL networks: Convolutional Networks Yann LeCun, Patrick Haffner, Léon Bottou, and Yoshua Bengio. 1999. Object Recognition with Gradient-Based Learning. In Shape, Contour and Grouping in Computer Vision, David A. Forsyth, Joseph L. Mundy, Vito Di Gesù, and Roberto Cipolla (Eds.). Springer-Verlag, London, UK, UK, 319-.
    15. 15. © 2014 Impetus Technologies15 • Aim of auto encoders network is to learn a compressed representation for set of data • Is an unsupervised learning algorithm that applies back propagation, setting the target values equal to inputs (identity function) • Denoising auto encoder addresses identity function by randomly corrupting input that the auto encoder must then reconstruct or denoise • Best applied when there is structure in the data • Applications : Dimensionality reduction, feature selection Other DL Networks: Auto Encoders (Auto- associators or Diabolo Network
    16. 16. © 2014 Impetus Technologies16 Why Deep Learning Networks are Brain-like? Statistical approach of traditional ML – SVMs or kernel approaches. • Not applicable in deep learning networks. Human brain – trophic factors Traditional ML – lot of data munging, representational issues (feature abstractor), before classifier can kick in. Deep learning – allows the system to learn representations as well naturally.
    17. 17. © 2014 Impetus Technologies17 Success stories of DLNs Android voice recognition system – based on DLNs Improves accuracy by 25% compared to state- of-art Microsoft Skype Translate software and Digital assistant Cortana 1.2 million images, 1000 classes (ImageNet Data) – error rate of 15.3%, better than state of art at 26.1%
    18. 18. © 2014 Impetus Technologies18 Success stories of DLNs….. Senna system – PoS tagging, chunking, NER, semantic role labeling, syntactic parsing Comparable F1 score with state-of-art with huge speed advantage (5 days VS few hours). DLNs VS TF-IDF: 1 million documents, relevance search. 3.2ms VS 1.2s. Robot navigation
    19. 19. © 2014 Impetus Technologies19 Potential Applications of DLNs Speech recognition/enhancement Video sequencing Emotion recognition (video/audio), Malware detection, Robotics – navigation. multi-modal learning (text and image). Natural Language Processing
    20. 20. © 2014 Impetus Technologies20 • Deeplearning4j – open source implementation of Jeffery Dean’s distributed deep learning paper. • Theano: python library of math functions. – Efficient use of GPUs transparently. • Hinton’ courses on Coursera: 4 Available resources
    21. 21. © 2014 Impetus Technologies21 • Large no. of parameters can also improve accuracy. • Limitations – CPU_to_GPU data transfers. Challenges in Realizing DLNs Large no. of training examples – high accuracy. Inherently sequential nature – freeze up one layer for learning. GPUs to improve training speedup Distributed DLNs – Jeffrey Dean’s work.
    22. 22. © 2014 Impetus Technologies22 • Motivation – Scalable, low latency training – Parallelize training data and learn fast Distributed DLNs • Jeffrey Dean’s work DistBelief – Pseudo-centralized realization
    23. 23. © 2014 Impetus Technologies23 • Purely distributed realizations are needed. • Our approach – Use asynchronous graph processing framework (GraphLab) – Making modifications in GraphLab code as required • Layer abstraction, mass communication Distributed DLNs over GraphLab
    24. 24. © 2014 Impetus Technologies24 Distributed DLNs over GraphLab Engine
    25. 25. © 2014 Impetus Technologies25 • ANN to Distributed Deep Learning – Key ideas in deep learning – Need for distributed realizations. – DistBelief, deeplearning4j etc. – Our work on large scale distributed deep learning • Deep learning leads us from statistics based machine learning towards brain inspired AI. Conclusions
    26. 26. © 2014 Impetus Technologies26 THANK YOU! Mail • LinkedIn • Blogs • Twitter • @impetustech
    27. 27. © 2014 Impetus Technologies27 BACKUP SLIDES
    28. 28. © 2014 Impetus Technologies28 • Recurrent Neural networks – Long Short Term Memory (LSTM), Temporal data • Sum-product networks – Deep architectures of sum- product networks • Hierarchical temporal memory – online structural and algorithmic model of neocortex. Other Brain-like Approaches
    29. 29. © 2014 Impetus Technologies29 • Connections between units form a Directed cycle i.e. a typical feed back connections • RNNs can use their internal memory to process arbitrary sequences of inputs • RNNs cannot learn to look far back past • LSTM solve this problem by introducing stem cells • These stem cells can remember a value for an arbitrary amount of time Recurrent Neural Networks
    30. 30. © 2014 Impetus Technologies30 • SPN is deep network model and is a directed acyclic graph • These networks allow to compute the probability of an event quickly • SPNs try to convert multi linear functions to ones in computationally short forms i.e. it must consist of multiple additions and multiplications • Leaves correspond to variables and nodes correspond to sums and products Sum-Product Networks (SPN)
    31. 31. © 2014 Impetus Technologies31 • Is a online machine learning model developed by Jeff Hawkins • This model learns one instance at a time • Best explained by online stock model. Today’s situation of stock helps in prediction of tomorrow’s stock • A HTM network is tree shaped hierarchy of levels • Higher hierarchy levels can use patterns learned at lower levels. This is adopted from learning model adopted by brain in the form of neo cortex Hierarchical Temporal Memory
    32. 32. © 2014 Impetus Technologies32
    33. 33. © 2014 Impetus Technologies33 Mathematical Equations • The Energy Function is defined as follows: b’ and c’ are the biases 𝐸 𝑥, ℎ = −𝑏′ 𝑥 − 𝑐′ℎ − ℎ′ 𝑊𝑥 where, W represents the weights connecting visible layer and hidden layer.
    34. 34. © 2014 Impetus Technologies34 Learning Energy Based Models • Energy based models can be learnt by performing gradient descent on negative log-likelihood of training data • It has the following form: − 𝜕 log 𝑝 𝑥 𝜕θ = 𝜕 𝐹 𝑥 𝜕θ − 𝑥̃ 𝑝 𝑥 𝜕 𝐹 𝑥 𝜕θ Positive phase Negative phase
    35. 35. © 2014 Impetus Technologies35 Thank you. Questions??