Practical Deep Learning
Arthi Venkataraman
Table Of Contents
 Brief Introduction
 Part 1 - Deep Learning Background
 Part 2 - Deep Learning Technical Understanding
 Part 3 - Deep Learning Algorithms Introduction
 Part 4 - Miscellaneous Topics
Agenda - Part 1 – Introduction
To Deep Learning
Why do you want to know about Deep Learning?
What is Deep Learning?
What can we do with Deep Learning?
Why Deep Learning now?
Deep Learning Technical Applications
Deep Learning Industrial Application Areas
Why do you want to know about Deep Learning?
Deep Learning in Industry
Deep Learning Success Stories
 Google deep-learning system that had been shown 10 million images from
YouTube videos proved almost twice as good as any previous image recognition
effort at identifying objects such as cats
 Google also used the technology to cut the error rate significantly on speech
recognition in its latest Android mobile software
Deep Learning Success Stories
 Team of three graduate students and two professors won a contest held by Merck
to identify molecules that could lead to new drugs. The group used deep learning
to zero in on the molecules most likely to bind to their targets.
 Watson computer, uses some deep-learning techniques and is now being trained
to help doctors make better decisions.
 Microsoft has deployed deep learning in its Windows Phone and Bing voice search.
What is Deep Learning?
What is Deep Learning?
As per Wikipedia
 Deep learning (deep structured learning, hierarchical
learning or deep machine learning) is a branch of machine learning
based on a set of algorithms that attempt to model high-level
abstractions in data by using multiple processing layers, with complex
structures or otherwise, composed of multiple non-linear
transformations
As per LISA
 Deep Learning is about learning multiple levels of representation and
abstraction that help to make sense of data such as images, sound, and
text
What is Deep Learning?
 From Chris Nicholson, Co-Founder of Skymind
 Deep learning is basically machine perception. What is perception? It is the
power to interpret sensory data. Two main ways we interpret things are by
− Naming what we sense;
• See and recognize your mother’s picture
− If do not know a name find similarities and dissimilarities
• See different photos of faces
• Bucket similar faces together
 Deep-learning software attempts to mimic the activity in layers of neurons in the
neocortex, the wrinkly 80 percent of the brain where thinking occurs. The software
learns, in a very real sense, to recognize patterns in digital representations of sounds,
images, and other data.

Why Deep Learning now?
Deep Learning History
Mcculloch and Pitt’s work
(1943)
 Brain could produce highly complex patterns by using many basic cells that are
connected together.
 These basic brain cells are called neurons, and McCulloch and Pitts gave a highly
simplified model of a neuron in their paper ( "threshold logic units.“)
 A group of MCP neurons that are connected together is called an artificial neural
network. In a sense, the brain is a very large neural network. It has billions of
neurons, and each neuron is connected to thousands of other neurons.
 McCulloch and Pitts showed how to encode any logical proposition by an
appropriate network of MCP neurons.
Perceptron – Rosenblatt(1957)
 In machine learning, the perceptron is an algorithm for supervised learning
of binary classifiers: functions that can decide whether an input (represented by a
vector of numbers) belongs to one class or another.
 The perceptron algorithm was invented in 1957 at the Cornell Aeronautical
Laboratory by Frank Rosenblatt
 First invented in Software and later implemented in hardware as Mark1
Perceptron
AdaLine (1960)
 ADALINE (Adaptive Linear Neuron or later Adaptive Linear Element) is an early
single-layer artificial neural network and the name of the physical device that
implemented this network
 It was developed by Professor Bernard Widrow and his graduate student Ted
Hoff at Stanford University in 1960. It is based on the McCulloch–Pitts neuron. It
consists of a weight, a bias and a summation function.
Lull till 1986 - XOR issue - AI
Winter
 1969, Minsky co-authored with Seymour Papert , Perceptrons: An Introduction to
Computational Geometry. In this work they attacked the limitations of the
perceptron. They showed that the perceptron could only solve linearly separable
functions.
 Of particular interest was the fact that the perceptron still could not solve the XOR
and NXOR functions.
 Likewise, Minsky and Papert stated that the style of research being done on the
perceptron was doomed to failure because of these limitations. This was, of
course, Minsky’s equally ill-timed remark.
 As a result, very little research was done in the area until about the 1980’s.
Multi Layer Perceptron (1986)
 Solution of Non Linear Problems
 Applied to areas like Speech recognition, Image Recognition and Machine
Translation
 However competition from SVM (1996)
− Simple
− Light Weight
Why Deep Learning now? GPU
performance
Why GPU?
Readily available frameworks
What can you do with Deep Learning?
Classify
Cluster
Predict
Learn Features
Deep Learning Use Cases
Deep Learning Use cases
Speech Recognition
Natural Language Processing
Image Classification ( Deep
Learning Approach)
Understand Image and Label it
with Text
Deep Learning Industrial
Application Areas
 Medical
• Voice controlled Robotic Surgery
 Automotive
− Self Driving cars
 Military
− Drones
 Security
− Surveillance
Deep Learning Industrial
Application Areas
 Drug discovery and toxicology
• Multi-task deep neural networks to predict the biomolecular target of a
compound
 Customer relationship management
− Deep learning to extract meaningful deep features for latent factor model for content-based
recommendation for music
 Recommendation systems
− Deep reinforcement learning to approximate the value of possible direct marketing actions
 Bioinformatics
− Predict gene ontology annotations
Part 2 - Technical Introduction
Agenda - Part 2 - Technical
Deep Dive
What is Machine Learning?
What is Artificial Neural Network?
Definition of Machine Learning
General Concepts of Machine Learning
Machine Learning Recipe
Deep Dive into Multi-Layer Perceptron
Introduction to Machine Learning
What is Machine Learning?
Arthur Samuel defined machine learning as a
 Field of study that gives computers the ability to learn without being
explicitly programmed".
 Machine learning explores the study and construction
of algorithms that can learn from and make predictions on data.[3] Such
algorithms operate by building a model from example inputs in order
to make data-driven predictions or decisions,[4]:2 rather than following
strictly static program instructions.
Why are we talking of Machine
Learning?
 Basic building block of Deep Learning Algorithms
− Multi Layer Perceptron
 Deep Learning builds on Multi Layer Perceptron. Differences
− Increase number of Layers
− Connectivity between layers
Definition of Machine Learning
 “A computer program is said to learn from experience E with respect to some class
of tasks T and performance measure P , if its performance at tasks in T, as
measured by P , improves with experience E ” (Mitchell, 1997).
Common Tasks (T)
Common Tasks
 Classification - Computer program is asked to specify which of k categories some
input belongs to.
 Regression - In this type of task, the computer program is asked to predict a
numerical value given some input.
Common Tasks (T)
Common Tasks
 Transcription - In this type of task, the machine learning system is asked to observe
a relatively unstructured representation of some kind of data and transcribe it into
discrete, textual form. For example, in optical character recognition, the computer
program is shown a photograph containing an image of text and is asked to return
this text in the form of a sequence of characters
 Translation: In a translation task, the input already consists of a sequence of
symbols in some language, and the computer program must convert this into a
sequence of symbols in another language.
Common Tasks (T)
Other Common Tasks
 Structured Output Analysis - Output is a vector contained important relationships
between different elements *
 Anomaly Detection - Flag unnatural values
 Synthesis and Sampling - Generate new examples similar to those in training data
**
 Impute missing values
Performance Measure (P)
 P measures how well an ML algorithm performs on a tasks
 Different tasks will have different performance measures
 Also the use case / scenario itself will influence which performance measure we
have to use and what is the threshold for a measure
 Accuracy for Classifier can be measured as number of correct classifications
divided by total number of classifications
Experience
 Experience is the training data set using which learning is done
 This learning can happen in below two ways :
− Unsupervised learning involves observing several examples of a random vector x , and
attempting to implicitly or explicitly learn the probability distribution p(x), or some
interesting properties of that distribution.
− Supervised learning involves observing several examples of a random vector x and an
associated value or vector y , and learning to predict y from x , e.g. estimating p(y | x)
General concepts in Machine
Learning
 Training / Fitting of a Model
− Process of creating a function f which can then be used to do the intended function. Some
of the different categories of the function are :
• Map any input to one of the set of k categories
• Place the inputs in different buckets
• Learn a hierarchical representation of the input data
 Validating the model
− Give a performance measure as to how well the model us doing
General concepts in Machine
Learning
 Over fitting
− Fitting the function weights to have a very high accuracy on training data at the cost of
performing poorly on unseen data
 Under fitting
− Not fitting the models well enough to training data and getting a higher error rate on same
 Generalization
− Learning the weights and model in a way that model does well even on new unseen data
 Regularization - This is used to ensure there is no overfitting
Artificial Neural Networks
What is Artificial Neural
Network?
 Artificial Neural networks (ANNs) are a family of models inspired
by biological neural networks (the central nervous systems of animals,
in particular the brain) and are used to estimate
or approximate functions that can depend on a large number
of inputs and are generally unknown. Artificial neural networks are
generally presented as systems of interconnected "neurons" which
exchange messages between each other. The connections have
numeric weights that can be tuned based on experience, making
neural nets adaptive to inputs and capable of learning.
ANN Basics
 ANN is a finite directed acyclic graph
 Learns a non linear function from the data
 3 Main Kinds of Nodes
− Source
− Target
− Hidden
Neural Network Model
Most Basic Unit - Single Neuron
Single processing unit view
Different Functions in Single
Layer Neuron
SIGMOID ACTIVATION
TANH ACTIVATION
Sigmoid Function View
Tanh Function View
ANN Recipe ( Similar to ML
recipe)
Combine
 Specification of a dataset
 Cost function
 Optimization procedure
 Model
 All algorithms follow the above Recipe
 The variation will come in terms of the cost function , optimization procedure as
well as the network architecture itself
Data Set
Forward Pass
 Forward Pass calculates the output
Cost Function
 Cost Function calculates the error between the predicted output and actual output
Backward Propagation using
Optimization
 Backward Pass - Attempt the minimize the error in previous step through
mathematical optimization of the different learnt weight terms
 Objective is to find the global minima for the error function – Stochastic Gradient
Descent uses below formulae to update parameters
 Calculate the slope (partial derivative ) at every layer
 Based on value and sign of slope take decision on
− To increase or decrease weight in layer
− Magnitude of increase and decrease of weight ( Learning rate)
Optimization Procedure
 Objective - Minimize the error ( Difference between predicted and actual output).
This is done by varying the weights of the different network terms
 Different algorithms
− Stochastic Gradient Descent
− L-BFGS algorithm
− Conjugate Gradient Descent
Outline of Forward and Backward propagation algo to learn the weights
 Initialize weights
 Predict output
 Calculate error
 Backward propagate the weights to minimize the errors
− Magnitude of weight change and direction determined by the partial derivates as well as
learning rate
Model
 Output of the optimization process is the built network with weights
 This is the trained model
Part 3 - Introduction to Long Short Term Memory Network – A Deep
Learning Algorithm
Long Short Term Memory
 Long Short Term Memory
LSTM
 Used for Sequence to Sequence Learning
 RNN - Networks with loops in them, allowing information to persist.
 LSTM - Special kind of RNN, capable of learning long-term dependencies.
 Additional Special structures as part of each repeating module
 Practical Applications –
− Machine Translation
− Question Answering Systems
− Conversational Systems
Unfolded RNN View
LSTM Chain View
Cell State
Gates
Forget Gate
Input Gate
Output gate
LSTM – Memory Cell View
LSTM Variants
 Many Variants based on connectivity's
 GRU
− Combines the forget and input gates into a single “update gate.”
− Merges the cell state and hidden state, and makes some other changes
 Attention Mechanisms
− Additional context while making predictions
Part 4 – Building a Text Classifier using LSTM
Available Deep Learning
Frameworks
 Multiple deep learning frameworks across different languages
 Active community support
 Popular ones include
− In Python
• Theano
• Keras
• TensorFlow
− In Lua
• Torch
− In Java
• Deeplearning4j
Key Steps (To be updated)
 Steps which are required for this
Key Steps (Contd)
Results (To be updated)
Part 4 - Miscellaneous
Learnings
 Standard systems sufficient to get started
 However for effective results more compute power is required
 Patience – Each algorithms takes a lot of time to complete hence patience is
important
 Incremental Training
 Need lots of data. Traditional algorithms work well for Standard / Small data
size
 Data cleaning and curation still helps
Advantages of using Deep
Learning
 Automated extraction of complex features. Example
• Feature Extraction for Images*
• Implication : Need not be deep domain expert to build solutions
−Performances which took years of feature tuning by domain experts can now
be achieved with no feature engineering
 Similar algorithms can be used across domains – Image, Speech , text
understanding
Recap
 Why Deep Learning algorithms?
− Industrial Applications
− Technical Capabilities
 Working of Basic Building Block of Deep Learning algorithm
− Multi Layer Perceptron
 Introduction to LSTM
 Text Classifier using LSTM
 Advantage of Using Deep Learning
Way Forward
−Contact – arthi.venkat@wipro.com
Arthi Venkataraman
Arthi.venkat@wipro.com
Thank you
DMTS Senior Member
Attributions
 Images used in this presentation are from different sources on the net
 Reference list the sources of these images as well as source of some of the
explanations
References
 https://www.toptal.com/machine-learning/an-introduction-to-deep-
learning-from-perceptrons-to-deep-networks
 http://ufldl.stanford.edu/tutorial
 https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-
history-training/
 http://colah.github.io/posts/2015-08-Understanding-LSTMs/
 http://colah.github.io/posts/2014-07-Conv-Nets-Modular/
 http://deeplearning.net/tutorial/lenet.html

Practical deepllearningv1

  • 1.
  • 2.
    Table Of Contents Brief Introduction  Part 1 - Deep Learning Background  Part 2 - Deep Learning Technical Understanding  Part 3 - Deep Learning Algorithms Introduction  Part 4 - Miscellaneous Topics
  • 3.
    Agenda - Part1 – Introduction To Deep Learning Why do you want to know about Deep Learning? What is Deep Learning? What can we do with Deep Learning? Why Deep Learning now? Deep Learning Technical Applications Deep Learning Industrial Application Areas
  • 4.
    Why do youwant to know about Deep Learning?
  • 5.
  • 6.
    Deep Learning SuccessStories  Google deep-learning system that had been shown 10 million images from YouTube videos proved almost twice as good as any previous image recognition effort at identifying objects such as cats  Google also used the technology to cut the error rate significantly on speech recognition in its latest Android mobile software
  • 7.
    Deep Learning SuccessStories  Team of three graduate students and two professors won a contest held by Merck to identify molecules that could lead to new drugs. The group used deep learning to zero in on the molecules most likely to bind to their targets.  Watson computer, uses some deep-learning techniques and is now being trained to help doctors make better decisions.  Microsoft has deployed deep learning in its Windows Phone and Bing voice search.
  • 8.
    What is DeepLearning?
  • 9.
    What is DeepLearning? As per Wikipedia  Deep learning (deep structured learning, hierarchical learning or deep machine learning) is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using multiple processing layers, with complex structures or otherwise, composed of multiple non-linear transformations As per LISA  Deep Learning is about learning multiple levels of representation and abstraction that help to make sense of data such as images, sound, and text
  • 10.
    What is DeepLearning?  From Chris Nicholson, Co-Founder of Skymind  Deep learning is basically machine perception. What is perception? It is the power to interpret sensory data. Two main ways we interpret things are by − Naming what we sense; • See and recognize your mother’s picture − If do not know a name find similarities and dissimilarities • See different photos of faces • Bucket similar faces together  Deep-learning software attempts to mimic the activity in layers of neurons in the neocortex, the wrinkly 80 percent of the brain where thinking occurs. The software learns, in a very real sense, to recognize patterns in digital representations of sounds, images, and other data. 
  • 11.
  • 12.
  • 13.
    Mcculloch and Pitt’swork (1943)  Brain could produce highly complex patterns by using many basic cells that are connected together.  These basic brain cells are called neurons, and McCulloch and Pitts gave a highly simplified model of a neuron in their paper ( "threshold logic units.“)  A group of MCP neurons that are connected together is called an artificial neural network. In a sense, the brain is a very large neural network. It has billions of neurons, and each neuron is connected to thousands of other neurons.  McCulloch and Pitts showed how to encode any logical proposition by an appropriate network of MCP neurons.
  • 14.
    Perceptron – Rosenblatt(1957) In machine learning, the perceptron is an algorithm for supervised learning of binary classifiers: functions that can decide whether an input (represented by a vector of numbers) belongs to one class or another.  The perceptron algorithm was invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt  First invented in Software and later implemented in hardware as Mark1 Perceptron
  • 15.
    AdaLine (1960)  ADALINE(Adaptive Linear Neuron or later Adaptive Linear Element) is an early single-layer artificial neural network and the name of the physical device that implemented this network  It was developed by Professor Bernard Widrow and his graduate student Ted Hoff at Stanford University in 1960. It is based on the McCulloch–Pitts neuron. It consists of a weight, a bias and a summation function.
  • 16.
    Lull till 1986- XOR issue - AI Winter  1969, Minsky co-authored with Seymour Papert , Perceptrons: An Introduction to Computational Geometry. In this work they attacked the limitations of the perceptron. They showed that the perceptron could only solve linearly separable functions.  Of particular interest was the fact that the perceptron still could not solve the XOR and NXOR functions.  Likewise, Minsky and Papert stated that the style of research being done on the perceptron was doomed to failure because of these limitations. This was, of course, Minsky’s equally ill-timed remark.  As a result, very little research was done in the area until about the 1980’s.
  • 17.
    Multi Layer Perceptron(1986)  Solution of Non Linear Problems  Applied to areas like Speech recognition, Image Recognition and Machine Translation  However competition from SVM (1996) − Simple − Light Weight
  • 18.
    Why Deep Learningnow? GPU performance
  • 19.
  • 20.
  • 21.
    What can youdo with Deep Learning?
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
    Image Classification (Deep Learning Approach)
  • 31.
    Understand Image andLabel it with Text
  • 32.
    Deep Learning Industrial ApplicationAreas  Medical • Voice controlled Robotic Surgery  Automotive − Self Driving cars  Military − Drones  Security − Surveillance
  • 33.
    Deep Learning Industrial ApplicationAreas  Drug discovery and toxicology • Multi-task deep neural networks to predict the biomolecular target of a compound  Customer relationship management − Deep learning to extract meaningful deep features for latent factor model for content-based recommendation for music  Recommendation systems − Deep reinforcement learning to approximate the value of possible direct marketing actions  Bioinformatics − Predict gene ontology annotations
  • 34.
    Part 2 -Technical Introduction
  • 35.
    Agenda - Part2 - Technical Deep Dive What is Machine Learning? What is Artificial Neural Network? Definition of Machine Learning General Concepts of Machine Learning Machine Learning Recipe Deep Dive into Multi-Layer Perceptron
  • 36.
  • 37.
    What is MachineLearning? Arthur Samuel defined machine learning as a  Field of study that gives computers the ability to learn without being explicitly programmed".  Machine learning explores the study and construction of algorithms that can learn from and make predictions on data.[3] Such algorithms operate by building a model from example inputs in order to make data-driven predictions or decisions,[4]:2 rather than following strictly static program instructions.
  • 38.
    Why are wetalking of Machine Learning?  Basic building block of Deep Learning Algorithms − Multi Layer Perceptron  Deep Learning builds on Multi Layer Perceptron. Differences − Increase number of Layers − Connectivity between layers
  • 39.
    Definition of MachineLearning  “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P , if its performance at tasks in T, as measured by P , improves with experience E ” (Mitchell, 1997).
  • 40.
    Common Tasks (T) CommonTasks  Classification - Computer program is asked to specify which of k categories some input belongs to.  Regression - In this type of task, the computer program is asked to predict a numerical value given some input.
  • 41.
    Common Tasks (T) CommonTasks  Transcription - In this type of task, the machine learning system is asked to observe a relatively unstructured representation of some kind of data and transcribe it into discrete, textual form. For example, in optical character recognition, the computer program is shown a photograph containing an image of text and is asked to return this text in the form of a sequence of characters  Translation: In a translation task, the input already consists of a sequence of symbols in some language, and the computer program must convert this into a sequence of symbols in another language.
  • 42.
    Common Tasks (T) OtherCommon Tasks  Structured Output Analysis - Output is a vector contained important relationships between different elements *  Anomaly Detection - Flag unnatural values  Synthesis and Sampling - Generate new examples similar to those in training data **  Impute missing values
  • 43.
    Performance Measure (P) P measures how well an ML algorithm performs on a tasks  Different tasks will have different performance measures  Also the use case / scenario itself will influence which performance measure we have to use and what is the threshold for a measure  Accuracy for Classifier can be measured as number of correct classifications divided by total number of classifications
  • 44.
    Experience  Experience isthe training data set using which learning is done  This learning can happen in below two ways : − Unsupervised learning involves observing several examples of a random vector x , and attempting to implicitly or explicitly learn the probability distribution p(x), or some interesting properties of that distribution. − Supervised learning involves observing several examples of a random vector x and an associated value or vector y , and learning to predict y from x , e.g. estimating p(y | x)
  • 45.
    General concepts inMachine Learning  Training / Fitting of a Model − Process of creating a function f which can then be used to do the intended function. Some of the different categories of the function are : • Map any input to one of the set of k categories • Place the inputs in different buckets • Learn a hierarchical representation of the input data  Validating the model − Give a performance measure as to how well the model us doing
  • 46.
    General concepts inMachine Learning  Over fitting − Fitting the function weights to have a very high accuracy on training data at the cost of performing poorly on unseen data  Under fitting − Not fitting the models well enough to training data and getting a higher error rate on same  Generalization − Learning the weights and model in a way that model does well even on new unseen data  Regularization - This is used to ensure there is no overfitting
  • 47.
  • 48.
    What is ArtificialNeural Network?  Artificial Neural networks (ANNs) are a family of models inspired by biological neural networks (the central nervous systems of animals, in particular the brain) and are used to estimate or approximate functions that can depend on a large number of inputs and are generally unknown. Artificial neural networks are generally presented as systems of interconnected "neurons" which exchange messages between each other. The connections have numeric weights that can be tuned based on experience, making neural nets adaptive to inputs and capable of learning.
  • 49.
    ANN Basics  ANNis a finite directed acyclic graph  Learns a non linear function from the data  3 Main Kinds of Nodes − Source − Target − Hidden
  • 50.
  • 51.
    Most Basic Unit- Single Neuron
  • 52.
  • 53.
    Different Functions inSingle Layer Neuron SIGMOID ACTIVATION TANH ACTIVATION
  • 54.
  • 55.
  • 56.
    ANN Recipe (Similar to ML recipe) Combine  Specification of a dataset  Cost function  Optimization procedure  Model  All algorithms follow the above Recipe  The variation will come in terms of the cost function , optimization procedure as well as the network architecture itself
  • 57.
  • 58.
    Forward Pass  ForwardPass calculates the output
  • 59.
    Cost Function  CostFunction calculates the error between the predicted output and actual output
  • 60.
    Backward Propagation using Optimization Backward Pass - Attempt the minimize the error in previous step through mathematical optimization of the different learnt weight terms  Objective is to find the global minima for the error function – Stochastic Gradient Descent uses below formulae to update parameters  Calculate the slope (partial derivative ) at every layer  Based on value and sign of slope take decision on − To increase or decrease weight in layer − Magnitude of increase and decrease of weight ( Learning rate)
  • 61.
    Optimization Procedure  Objective- Minimize the error ( Difference between predicted and actual output). This is done by varying the weights of the different network terms  Different algorithms − Stochastic Gradient Descent − L-BFGS algorithm − Conjugate Gradient Descent Outline of Forward and Backward propagation algo to learn the weights  Initialize weights  Predict output  Calculate error  Backward propagate the weights to minimize the errors − Magnitude of weight change and direction determined by the partial derivates as well as learning rate
  • 62.
    Model  Output ofthe optimization process is the built network with weights  This is the trained model
  • 63.
    Part 3 -Introduction to Long Short Term Memory Network – A Deep Learning Algorithm
  • 64.
    Long Short TermMemory  Long Short Term Memory
  • 65.
    LSTM  Used forSequence to Sequence Learning  RNN - Networks with loops in them, allowing information to persist.  LSTM - Special kind of RNN, capable of learning long-term dependencies.  Additional Special structures as part of each repeating module  Practical Applications – − Machine Translation − Question Answering Systems − Conversational Systems
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
    LSTM – MemoryCell View
  • 72.
    LSTM Variants  ManyVariants based on connectivity's  GRU − Combines the forget and input gates into a single “update gate.” − Merges the cell state and hidden state, and makes some other changes  Attention Mechanisms − Additional context while making predictions
  • 73.
    Part 4 –Building a Text Classifier using LSTM
  • 74.
    Available Deep Learning Frameworks Multiple deep learning frameworks across different languages  Active community support  Popular ones include − In Python • Theano • Keras • TensorFlow − In Lua • Torch − In Java • Deeplearning4j
  • 75.
    Key Steps (Tobe updated)  Steps which are required for this
  • 76.
  • 77.
  • 78.
    Part 4 -Miscellaneous
  • 79.
    Learnings  Standard systemssufficient to get started  However for effective results more compute power is required  Patience – Each algorithms takes a lot of time to complete hence patience is important  Incremental Training  Need lots of data. Traditional algorithms work well for Standard / Small data size  Data cleaning and curation still helps
  • 80.
    Advantages of usingDeep Learning  Automated extraction of complex features. Example • Feature Extraction for Images* • Implication : Need not be deep domain expert to build solutions −Performances which took years of feature tuning by domain experts can now be achieved with no feature engineering  Similar algorithms can be used across domains – Image, Speech , text understanding
  • 81.
    Recap  Why DeepLearning algorithms? − Industrial Applications − Technical Capabilities  Working of Basic Building Block of Deep Learning algorithm − Multi Layer Perceptron  Introduction to LSTM  Text Classifier using LSTM  Advantage of Using Deep Learning
  • 82.
    Way Forward −Contact –arthi.venkat@wipro.com
  • 83.
  • 84.
    Attributions  Images usedin this presentation are from different sources on the net  Reference list the sources of these images as well as source of some of the explanations
  • 85.
    References  https://www.toptal.com/machine-learning/an-introduction-to-deep- learning-from-perceptrons-to-deep-networks  http://ufldl.stanford.edu/tutorial https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell- history-training/  http://colah.github.io/posts/2015-08-Understanding-LSTMs/  http://colah.github.io/posts/2014-07-Conv-Nets-Modular/  http://deeplearning.net/tutorial/lenet.html