Practical deepllearningv1

Practical Deep Learning
Arthi Venkataraman

Table Of Contents
 Brief Introduction
 Part 1 - Deep Learning Background
 Part 2 - Deep Learning Technical Understanding
 Part 3 - Deep Learning Algorithms Introduction
 Part 4 - Miscellaneous Topics

Agenda - Part 1 – Introduction
To Deep Learning
Why do you want to know about Deep Learning?
What is Deep Learning?
What can we do with Deep Learning?
Why Deep Learning now?
Deep Learning Technical Applications
Deep Learning Industrial Application Areas

Why do you want to know about Deep Learning?

Deep Learning Success Stories
 Google deep-learning system that had been shown 10 million images from
YouTube videos proved almost twice as good as any previous image recognition
effort at identifying objects such as cats
 Google also used the technology to cut the error rate significantly on speech
recognition in its latest Android mobile software

Deep Learning Success Stories
 Team of three graduate students and two professors won a contest held by Merck
to identify molecules that could lead to new drugs. The group used deep learning
to zero in on the molecules most likely to bind to their targets.
 Watson computer, uses some deep-learning techniques and is now being trained
to help doctors make better decisions.
 Microsoft has deployed deep learning in its Windows Phone and Bing voice search.

As per Wikipedia
 Deep learning (deep structured learning, hierarchical
learning or deep machine learning) is a branch of machine learning
based on a set of algorithms that attempt to model high-level
abstractions in data by using multiple processing layers, with complex
structures or otherwise, composed of multiple non-linear
transformations
As per LISA
 Deep Learning is about learning multiple levels of representation and
abstraction that help to make sense of data such as images, sound, and
text

 From Chris Nicholson, Co-Founder of Skymind
 Deep learning is basically machine perception. What is perception? It is the
power to interpret sensory data. Two main ways we interpret things are by
− Naming what we sense;
• See and recognize your mother’s picture
− If do not know a name find similarities and dissimilarities
• See different photos of faces
• Bucket similar faces together
 Deep-learning software attempts to mimic the activity in layers of neurons in the
neocortex, the wrinkly 80 percent of the brain where thinking occurs. The software
learns, in a very real sense, to recognize patterns in digital representations of sounds,
images, and other data.


Mcculloch and Pitt’s work
(1943)
 Brain could produce highly complex patterns by using many basic cells that are
connected together.
 These basic brain cells are called neurons, and McCulloch and Pitts gave a highly
simplified model of a neuron in their paper ( "threshold logic units.“)
 A group of MCP neurons that are connected together is called an artificial neural
network. In a sense, the brain is a very large neural network. It has billions of
neurons, and each neuron is connected to thousands of other neurons.
 McCulloch and Pitts showed how to encode any logical proposition by an
appropriate network of MCP neurons.

Perceptron – Rosenblatt(1957)
 In machine learning, the perceptron is an algorithm for supervised learning
of binary classifiers: functions that can decide whether an input (represented by a
vector of numbers) belongs to one class or another.
 The perceptron algorithm was invented in 1957 at the Cornell Aeronautical
Laboratory by Frank Rosenblatt
 First invented in Software and later implemented in hardware as Mark1
Perceptron

AdaLine (1960)
 ADALINE (Adaptive Linear Neuron or later Adaptive Linear Element) is an early
single-layer artificial neural network and the name of the physical device that
implemented this network
 It was developed by Professor Bernard Widrow and his graduate student Ted
Hoff at Stanford University in 1960. It is based on the McCulloch–Pitts neuron. It
consists of a weight, a bias and a summation function.

Lull till 1986 - XOR issue - AI
Winter
 1969, Minsky co-authored with Seymour Papert , Perceptrons: An Introduction to
Computational Geometry. In this work they attacked the limitations of the
perceptron. They showed that the perceptron could only solve linearly separable
functions.
 Of particular interest was the fact that the perceptron still could not solve the XOR
and NXOR functions.
 Likewise, Minsky and Papert stated that the style of research being done on the
perceptron was doomed to failure because of these limitations. This was, of
course, Minsky’s equally ill-timed remark.
 As a result, very little research was done in the area until about the 1980’s.

Multi Layer Perceptron (1986)
 Solution of Non Linear Problems
 Applied to areas like Speech recognition, Image Recognition and Machine
Translation
 However competition from SVM (1996)
− Simple
− Light Weight

Why Deep Learning now? GPU
performance

What can you do with Deep Learning?

Image Classification ( Deep
Learning Approach)

Understand Image and Label it
with Text

Deep Learning Industrial
Application Areas
 Medical
• Voice controlled Robotic Surgery
 Automotive
− Self Driving cars
 Military
− Drones
 Security
− Surveillance

Deep Learning Industrial
Application Areas
 Drug discovery and toxicology
• Multi-task deep neural networks to predict the biomolecular target of a
compound
 Customer relationship management
− Deep learning to extract meaningful deep features for latent factor model for content-based
recommendation for music
 Recommendation systems
− Deep reinforcement learning to approximate the value of possible direct marketing actions
 Bioinformatics
− Predict gene ontology annotations

Part 2 - Technical Introduction

Agenda - Part 2 - Technical
Deep Dive
What is Machine Learning?
What is Artificial Neural Network?
Definition of Machine Learning
General Concepts of Machine Learning
Machine Learning Recipe
Deep Dive into Multi-Layer Perceptron

Introduction to Machine Learning

What is Machine Learning?
Arthur Samuel defined machine learning as a
 Field of study that gives computers the ability to learn without being
explicitly programmed".
 Machine learning explores the study and construction
of algorithms that can learn from and make predictions on data.[3] Such
algorithms operate by building a model from example inputs in order
to make data-driven predictions or decisions,[4]:2 rather than following
strictly static program instructions.

Why are we talking of Machine
Learning?
 Basic building block of Deep Learning Algorithms
− Multi Layer Perceptron
 Deep Learning builds on Multi Layer Perceptron. Differences
− Increase number of Layers
− Connectivity between layers

Definition of Machine Learning
 “A computer program is said to learn from experience E with respect to some class
of tasks T and performance measure P , if its performance at tasks in T, as
measured by P , improves with experience E ” (Mitchell, 1997).

Common Tasks (T)
Common Tasks
 Classification - Computer program is asked to specify which of k categories some
input belongs to.
 Regression - In this type of task, the computer program is asked to predict a
numerical value given some input.

Common Tasks (T)
Common Tasks
 Transcription - In this type of task, the machine learning system is asked to observe
a relatively unstructured representation of some kind of data and transcribe it into
discrete, textual form. For example, in optical character recognition, the computer
program is shown a photograph containing an image of text and is asked to return
this text in the form of a sequence of characters
 Translation: In a translation task, the input already consists of a sequence of
symbols in some language, and the computer program must convert this into a
sequence of symbols in another language.

Common Tasks (T)
Other Common Tasks
 Structured Output Analysis - Output is a vector contained important relationships
between different elements *
 Anomaly Detection - Flag unnatural values
 Synthesis and Sampling - Generate new examples similar to those in training data
**
 Impute missing values

Performance Measure (P)
 P measures how well an ML algorithm performs on a tasks
 Different tasks will have different performance measures
 Also the use case / scenario itself will influence which performance measure we
have to use and what is the threshold for a measure
 Accuracy for Classifier can be measured as number of correct classifications
divided by total number of classifications

Experience
 Experience is the training data set using which learning is done
 This learning can happen in below two ways :
− Unsupervised learning involves observing several examples of a random vector x , and
attempting to implicitly or explicitly learn the probability distribution p(x), or some
interesting properties of that distribution.
− Supervised learning involves observing several examples of a random vector x and an
associated value or vector y , and learning to predict y from x , e.g. estimating p(y | x)

General concepts in Machine
Learning
 Training / Fitting of a Model
− Process of creating a function f which can then be used to do the intended function. Some
of the different categories of the function are :
• Map any input to one of the set of k categories
• Place the inputs in different buckets
• Learn a hierarchical representation of the input data
 Validating the model
− Give a performance measure as to how well the model us doing

General concepts in Machine
Learning
 Over fitting
− Fitting the function weights to have a very high accuracy on training data at the cost of
performing poorly on unseen data
 Under fitting
− Not fitting the models well enough to training data and getting a higher error rate on same
 Generalization
− Learning the weights and model in a way that model does well even on new unseen data
 Regularization - This is used to ensure there is no overfitting

What is Artificial Neural
Network?
 Artificial Neural networks (ANNs) are a family of models inspired
by biological neural networks (the central nervous systems of animals,
in particular the brain) and are used to estimate
or approximate functions that can depend on a large number
of inputs and are generally unknown. Artificial neural networks are
generally presented as systems of interconnected "neurons" which
exchange messages between each other. The connections have
numeric weights that can be tuned based on experience, making
neural nets adaptive to inputs and capable of learning.

ANN Basics
 ANN is a finite directed acyclic graph
 Learns a non linear function from the data
 3 Main Kinds of Nodes
− Source
− Target
− Hidden

Most Basic Unit - Single Neuron

Different Functions in Single
Layer Neuron
SIGMOID ACTIVATION
TANH ACTIVATION

ANN Recipe ( Similar to ML
recipe)
Combine
 Speciﬁcation of a dataset
 Cost function
 Optimization procedure
 Model
 All algorithms follow the above Recipe
 The variation will come in terms of the cost function , optimization procedure as
well as the network architecture itself

Forward Pass
 Forward Pass calculates the output

Cost Function
 Cost Function calculates the error between the predicted output and actual output

Backward Propagation using
Optimization
 Backward Pass - Attempt the minimize the error in previous step through
mathematical optimization of the different learnt weight terms
 Objective is to find the global minima for the error function – Stochastic Gradient
Descent uses below formulae to update parameters
 Calculate the slope (partial derivative ) at every layer
 Based on value and sign of slope take decision on
− To increase or decrease weight in layer
− Magnitude of increase and decrease of weight ( Learning rate)

Optimization Procedure
 Objective - Minimize the error ( Difference between predicted and actual output).
This is done by varying the weights of the different network terms
 Different algorithms
− Stochastic Gradient Descent
− L-BFGS algorithm
− Conjugate Gradient Descent
Outline of Forward and Backward propagation algo to learn the weights
 Initialize weights
 Predict output
 Calculate error
 Backward propagate the weights to minimize the errors
− Magnitude of weight change and direction determined by the partial derivates as well as
learning rate

Model
 Output of the optimization process is the built network with weights
 This is the trained model

Part 3 - Introduction to Long Short Term Memory Network – A Deep
Learning Algorithm

Long Short Term Memory
 Long Short Term Memory

LSTM
 Used for Sequence to Sequence Learning
 RNN - Networks with loops in them, allowing information to persist.
 LSTM - Special kind of RNN, capable of learning long-term dependencies.
 Additional Special structures as part of each repeating module
 Practical Applications –
− Machine Translation
− Question Answering Systems
− Conversational Systems

LSTM Chain View
Cell State
Gates

LSTM Variants
 Many Variants based on connectivity's
 GRU
− Combines the forget and input gates into a single “update gate.”
− Merges the cell state and hidden state, and makes some other changes
 Attention Mechanisms
− Additional context while making predictions

Part 4 – Building a Text Classifier using LSTM

Available Deep Learning
Frameworks
 Multiple deep learning frameworks across different languages
 Active community support
 Popular ones include
− In Python
• Theano
• Keras
• TensorFlow
− In Lua
• Torch
− In Java
• Deeplearning4j

Key Steps (To be updated)
 Steps which are required for this

Learnings
 Standard systems sufficient to get started
 However for effective results more compute power is required
 Patience – Each algorithms takes a lot of time to complete hence patience is
important
 Incremental Training
 Need lots of data. Traditional algorithms work well for Standard / Small data
size
 Data cleaning and curation still helps

Advantages of using Deep
Learning
 Automated extraction of complex features. Example
• Feature Extraction for Images*
• Implication : Need not be deep domain expert to build solutions
−Performances which took years of feature tuning by domain experts can now
be achieved with no feature engineering
 Similar algorithms can be used across domains – Image, Speech , text
understanding

Recap
 Why Deep Learning algorithms?
− Industrial Applications
− Technical Capabilities
 Working of Basic Building Block of Deep Learning algorithm
− Multi Layer Perceptron
 Introduction to LSTM
 Text Classifier using LSTM
 Advantage of Using Deep Learning

Way Forward
−Contact – arthi.venkat@wipro.com

Arthi Venkataraman
Arthi.venkat@wipro.com
Thank you
DMTS Senior Member

Attributions
 Images used in this presentation are from different sources on the net
 Reference list the sources of these images as well as source of some of the
explanations

References
 https://www.toptal.com/machine-learning/an-introduction-to-deep-
learning-from-perceptrons-to-deep-networks
 http://ufldl.stanford.edu/tutorial
 https://devblogs.nvidia.com/parallelforall/deep-learning-nutshell-
history-training/
 http://colah.github.io/posts/2015-08-Understanding-LSTMs/
 http://colah.github.io/posts/2014-07-Conv-Nets-Modular/
 http://deeplearning.net/tutorial/lenet.html

Practical deepllearningv1

More Related Content

What's hot

Viewers also liked

Similar to Practical deepllearningv1

Recently uploaded

Practical deepllearningv1