Like this presentation? Why not share!

- Neuroevolution and deep learing by henoky 1320 views
- Large Scale Deep Learning Jeff Dean by Jun Wang 6165 views
- Deep Learning for Data Scientists -... by Andrew Gardner 4565 views
- H2O Open Source Deep Learning, Arno... by Sri Ambati 11707 views
- Deep learning - Conceptual understa... by Buhwan Jeong 6093 views
- H2O Distributed Deep Learning by Ar... by Sri Ambati 9680 views

1,422

-1

-1

Published on

Speaker- Dr. Vijay Srinivas Agneeswaran,Director, Big Data Labs, Impetus

The main objective of the presentation is to give an overview of our cutting edge work on realizing distributed deep learning networks over GraphLab. The objectives can be summarized as below:

- First-hand experience and insights into implementation of distributed deep learning networks.

- Thorough view of GraphLab (including descriptions of code) and the extensions required to implement these networks.

- Details of how the extensions were realized/implemented in GraphLab source – they have been submitted to the community for evaluation.

- Arrhythmia detection use case as an application of the large scale distributed deep learning network.

Published in:
Technology

No Downloads

Total Views

1,422

On Slideshare

0

From Embeds

0

Number of Embeds

2

Shares

0

Downloads

0

Comments

0

Likes

10

No embeds

No notes for slide

Consider the problem to identify the individual digits from the input image

Each image 28 by 28 pixel image. Then network is designed as follows

Input layer (image) -> 28*28 = 784 neurons. Each neuron corresponds to a pixel

The output layer can be identified by the number of digits to be identified i.e. 10 (0 to 9)

The intermediate hidden layer can be experimented with varied number of neurons. Let us fix at 10 nodes in hidden layer

How about recognizing a human face from given set of random images?

Attack this problem in the similar fashion explained earlier. Input -> Image pixels, output -> Is it a face or not? (a single node)

A face can be recognized by answering some questions like “Is there an eye in the top left?”, “Is there a nose in the middle?” etc..

Each question corresponds to a hidden layer

ANN for face recognition?

Why SVMs or any kernel based approach cannot be used here.

Implicit assumption of a locally smooth function around each training example.

Problem decomposition into sub-problems

Breakdown into sub-problems, solvable by sub-networks. Complex problem requires more sub-networks, more hidden layers, hence need for deep neural networks.

Refined by Lecun in 1989 – mainly to apply CNNs to identify variability in 2D image data.

Introduced in 1980 by Fukushima

A type of RBMs where the communication is absent across the nodes in the same layer

Nodes are not connected to every other node of next layer. Symmetry is not there

Convolution networks learn images by pieces rather than learning as a whole (RBM does this)

Designed to use minimal amounts of pre processing

Give no. of nodes in each layer

Create max. no. of nodes across the layers.

Forward propagation

Backward propagation.

Run the engine

http://lessoned.blogspot.in/2011/10/intro-to-sum-product-networks.html

- 1. © 2014 Impetus Technologies1 Impetus Technologies Inc. Deep Learning: Evolution of ML from Statistical to Brain-like Computing The Fifth Elephant July 25, 2014. Dr. Vijay Srinivas Agneeswaran, Director, Big Data Labs, Impetus
- 2. © 2014 Impetus Technologies2 Contents Introduction to Artificial Neural Networks Deep learning networks • Towards deep learning • From ANNs to DLNs. • Basics of DLNs. • Related Approaches. Distributed DLNs: Challenges Distributed DLNs over GraphLab
- 3. © 2014 Impetus Technologies3 Deep Learning: Evolution Timeline
- 4. © 2014 Impetus Technologies4 Introduction to Artificial Neural Networks (ANNs) Perceptron
- 5. © 2014 Impetus Technologies5 Introduction to Artificial Neural Networks (ANNs) Sigmoid Neuron • Small change in input = small change in behaviour. • Output of a sigmoid neuron is given below: • Small change in input = small change in behaviour. • Output of a sigmoid neuron is given below:
- 6. © 2014 Impetus Technologies6 Introduction to Artificial Neural Networks (ANNs): Back Propagation http://zerkpage.tripod.com/ann.htm What is this? NAND Gate! initialize network weights (often small random values) do forEach training example ex prediction = neural-net-output(network, ex) // forward pass actual = teacher-output(ex) compute error (prediction - actual) at the output units compute delta(wh)for all weights from hidden layer to output layer // backward pass compute delta(wi) for all weights from input layer to hidden layer // backward pass continued update network weights until all examples classified correctly or another stopping criterion satisfied return the network
- 7. © 2014 Impetus Technologies7 The network to identify the individual digits from the input image http://neuralnetworksanddeeplearning.com/chap1.html
- 8. © 2014 Impetus Technologies8 Different Shallow Architectures Weighted Sum Weighted SumWeighted Sum Template matchers Fixed Basis Functions Simple Trainable Basis Functions Y. Bengio and Y. LeCun, "Scaling learning algorithms towards AI," in Large Scale Kernel Machines, (L. Bottou, O. Chapelle, D. DeCoste, and J. Weston, eds.), MIT Press, 2007. Linear predictor ANN, Radial Basis FunctionsKernel Machines
- 9. © 2014 Impetus Technologies9 ANNs for Face Recognition?
- 10. © 2014 Impetus Technologies10 DLN for Face Recognition http://theanalyticsstore.com/deep-learning/
- 11. © 2014 Impetus Technologies11 Deep Learning Networks: Learning No general learning algorithm (No-free-lunch theorem by Wolpert 1996). Learning algorithm for specific tasks – perception, control, prediction, planning, reasoning, language understanding . Limitations of BP – local minima, optimization challenges for non-convex objective functions. Hinton’s deep belief networks as stack of RBMs. Lecun’s energy based learning for DBNs.
- 12. © 2014 Impetus Technologies12 • This is a deep neural network composed of multiple layers of latent variables (hidden units or feature detectors) • Can be viewed as a stack of RBMs • Hinton along with his student proposed that these networks can be trained greedily one layer at a time Deep Belief Networks http://www.iro.umontreal.ca/~lisa/twiki/pub/Public/DeepBeliefNetworks/DBNs.png • Boltzmann Machine is a specific energy model with linear energy function.
- 13. © 2014 Impetus Technologies13 • RBM are Energy Based Models (EBM) • EBM associate an energy with every configuration of a system • Learning corresponds to modifying the shape of energy function, so that it has desirable properties • Like in physics, lower energy = more stability • So, modify shape of energy function such that the desirable configurations have lower energy Energy Based Models http://www.cs.nyu.edu/~yann/research /ebm/loss-func.png
- 14. © 2014 Impetus Technologies14 Other DL networks: Convolutional Networks Yann LeCun, Patrick Haffner, Léon Bottou, and Yoshua Bengio. 1999. Object Recognition with Gradient-Based Learning. In Shape, Contour and Grouping in Computer Vision, David A. Forsyth, Joseph L. Mundy, Vito Di Gesù, and Roberto Cipolla (Eds.). Springer-Verlag, London, UK, UK, 319-.
- 15. © 2014 Impetus Technologies15 • Aim of auto encoders network is to learn a compressed representation for set of data • Is an unsupervised learning algorithm that applies back propagation, setting the target values equal to inputs (identity function) • Denoising auto encoder addresses identity function by randomly corrupting input that the auto encoder must then reconstruct or denoise • Best applied when there is structure in the data • Applications : Dimensionality reduction, feature selection Other DL Networks: Auto Encoders (Auto- associators or Diabolo Network
- 16. © 2014 Impetus Technologies16 Why Deep Learning Networks are Brain-like? Statistical approach of traditional ML – SVMs or kernel approaches. • Not applicable in deep learning networks. Human brain – trophic factors Traditional ML – lot of data munging, representational issues (feature abstractor), before classifier can kick in. Deep learning – allows the system to learn representations as well naturally.
- 17. © 2014 Impetus Technologies17 Success stories of DLNs Android voice recognition system – based on DLNs Improves accuracy by 25% compared to state- of-art Microsoft Skype Translate software and Digital assistant Cortana 1.2 million images, 1000 classes (ImageNet Data) – error rate of 15.3%, better than state of art at 26.1%
- 18. © 2014 Impetus Technologies18 Success stories of DLNs….. Senna system – PoS tagging, chunking, NER, semantic role labeling, syntactic parsing Comparable F1 score with state-of-art with huge speed advantage (5 days VS few hours). DLNs VS TF-IDF: 1 million documents, relevance search. 3.2ms VS 1.2s. Robot navigation
- 19. © 2014 Impetus Technologies19 Potential Applications of DLNs Speech recognition/enhancement Video sequencing Emotion recognition (video/audio), Malware detection, Robotics – navigation. multi-modal learning (text and image). Natural Language Processing
- 20. © 2014 Impetus Technologies20 • Deeplearning4j – open source implementation of Jeffery Dean’s distributed deep learning paper. • Theano: python library of math functions. – Efficient use of GPUs transparently. • Hinton’ courses on Coursera: https://www.coursera.org/instructor/~15 4 Available resources
- 21. © 2014 Impetus Technologies21 • Large no. of parameters can also improve accuracy. • Limitations – CPU_to_GPU data transfers. Challenges in Realizing DLNs Large no. of training examples – high accuracy. Inherently sequential nature – freeze up one layer for learning. GPUs to improve training speedup Distributed DLNs – Jeffrey Dean’s work.
- 22. © 2014 Impetus Technologies22 • Motivation – Scalable, low latency training – Parallelize training data and learn fast Distributed DLNs • Jeffrey Dean’s work DistBelief – Pseudo-centralized realization
- 23. © 2014 Impetus Technologies23 • Purely distributed realizations are needed. • Our approach – Use asynchronous graph processing framework (GraphLab) – Making modifications in GraphLab code as required • Layer abstraction, mass communication Distributed DLNs over GraphLab
- 24. © 2014 Impetus Technologies24 Distributed DLNs over GraphLab Engine
- 25. © 2014 Impetus Technologies25 • ANN to Distributed Deep Learning – Key ideas in deep learning – Need for distributed realizations. – DistBelief, deeplearning4j etc. – Our work on large scale distributed deep learning • Deep learning leads us from statistics based machine learning towards brain inspired AI. Conclusions
- 26. © 2014 Impetus Technologies26 THANK YOU! Mail • bigdata@impetus.com LinkedIn • www.linkedin.com/company/impetus Blogs • blogs.impetus.com Twitter • @impetustech
- 27. © 2014 Impetus Technologies27 BACKUP SLIDES
- 28. © 2014 Impetus Technologies28 • Recurrent Neural networks – Long Short Term Memory (LSTM), Temporal data • Sum-product networks – Deep architectures of sum- product networks • Hierarchical temporal memory – online structural and algorithmic model of neocortex. Other Brain-like Approaches
- 29. © 2014 Impetus Technologies29 • Connections between units form a Directed cycle i.e. a typical feed back connections • RNNs can use their internal memory to process arbitrary sequences of inputs • RNNs cannot learn to look far back past • LSTM solve this problem by introducing stem cells • These stem cells can remember a value for an arbitrary amount of time Recurrent Neural Networks
- 30. © 2014 Impetus Technologies30 • SPN is deep network model and is a directed acyclic graph • These networks allow to compute the probability of an event quickly • SPNs try to convert multi linear functions to ones in computationally short forms i.e. it must consist of multiple additions and multiplications • Leaves correspond to variables and nodes correspond to sums and products Sum-Product Networks (SPN)
- 31. © 2014 Impetus Technologies31 • Is a online machine learning model developed by Jeff Hawkins • This model learns one instance at a time • Best explained by online stock model. Today’s situation of stock helps in prediction of tomorrow’s stock • A HTM network is tree shaped hierarchy of levels • Higher hierarchy levels can use patterns learned at lower levels. This is adopted from learning model adopted by brain in the form of neo cortex Hierarchical Temporal Memory
- 32. © 2014 Impetus Technologies32 http://en.wikipedia.org/wiki/Hierarchical_temporal_memory
- 33. © 2014 Impetus Technologies33 Mathematical Equations • The Energy Function is defined as follows: b’ and c’ are the biases 𝐸 𝑥, ℎ = −𝑏′ 𝑥 − 𝑐′ℎ − ℎ′ 𝑊𝑥 where, W represents the weights connecting visible layer and hidden layer.
- 34. © 2014 Impetus Technologies34 Learning Energy Based Models • Energy based models can be learnt by performing gradient descent on negative log-likelihood of training data • It has the following form: − 𝜕 log 𝑝 𝑥 𝜕θ = 𝜕 𝐹 𝑥 𝜕θ − 𝑥̃ 𝑝 𝑥 𝜕 𝐹 𝑥 𝜕θ Positive phase Negative phase
- 35. © 2014 Impetus Technologies35 Thank you. Questions??

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment