SlideShare a Scribd company logo
1 of 57
Deep Learning and
TensorFlow
Sample Class
Jon Lederman
Deep Learning and
FeedForward Neural Networks
Features/Representations
• Features or representations:
• Measurable property or characteristic of a phenomenon being observed
• Specific variables that are provided as input to an algorithm
• The success of a machine learning algorithm depends on determining the right
features
• With the right features, a machine learning algorithm can learn almost anything
• With the wrong features, performance will be abysmal
• But how do we decide what are the good features?
Examples of Features
• Character Recognition
• Histograms counting number of black pixels along horizontal and vertical directions,
number of internal holes, stroke detection, etc.
• Speech Recognition
• Mel frequency cepstral coefficients, phonemes, noise ratios, length of sound, etc.
• Computer Vision
• Edges, objects, colors, etc.
History Lesson - Perceptrons
‘60s’
A perceptron is one example of a statistical pattern
recognition system.
. . .
Decision unit
Learned Weights
Feature Units
Inputs
Features are hand engineered.
Weights are learned here.
Limitations of Perceptrons
• Neural network research came to a halt in late ‘60s and early ‘70s largely due to
the fact that perceptrons were shown to be limited. In particular:
• Minsky and Papert’s “Group Invariance Theorem” proved that perceptron cannot
learn if there exist transformations of the features that form a group.
• This is very bad news for perceptrons, as pattern recognition requires translation and
rotation invariance, which are both groups
• If you can choose features by hand and use enough features a perceptron is
very powerful
• Thus, for binary input vectors a separate feature unit can be chosen for each vector.
However, this results in an exponential explosion of the number of feature units
required.
Hallmarks of Deep Learning
(Lessons From Perceptrons)
• Feature Learning or Representational Learning
• Deep neural networks learn their own feature detectors (more on this later)
• Hierarchical Learning
• More complex representations are expressed in terms of simpler representations
• Non-linear
• Deep Neural Networks have non-linearity “baked” into the neuron model. This
allows them to learn much more complex features
• Most of the interesting complexities of the world are non-linear
• Superposition does not apply
• Linear networks can only learn linear things as composition of linear operator is still linear
Biological Neurons
• Each neuron receives input from other neurons
• The effect of each input line on the neuron is controlled by a synaptic weight
• Weight can be positive or negative
• The synaptic weights adapt so that the entire network learns to perform useful
computations
• Human brain has about 10^11 each with about 10^4 weights
• Brain cortex looks the same all over and can become specialized
• Provides for rapid parallel computation
• Similar to FPGA
• In fact, even a single neuron is not explained by neuroscience. In fact it is much
more complex or possibly entirely different than our conception of artificial
neurons. Upshot: Use this analogy loosely.
Biological Neurons as Inspiration For Artificial
Neurons
Artificial Neuron Model
Artificial
Neuron
Model
Operation of Artificial Neuron
But What is “Learning” and How Does It
Happen?
• Deep learning is a form of supervised learning
• We build a network of artificial neurons which takes in an input and generates some
output
• Input can be a single number or can be a vector
• We show the network a series of training examples and ask the network to learn
from these examples
• The training examples consist of an input and a (hopefully) correct output called the
“ground truth”
Deep Feedforward Networks
Multilayer Neural Networks
.
.
.
.
.
.
Input Hidden Unit Output
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Input Layer Output LayerHidden Layers
a[0]
= x a[1] a[2]
a[l]
ˆy = a[L ]
Combine a bunch of
artificial neurons into
layers and let them
talk to one another!
Example 4-Layer Neural Network
Input
Prediction/Inference
Multiclass Classification
3 Classes
C=Number of Classes
Softmax
Activation
Function
Can prove if C=2, Softmax reduces to logistic regression
Two Questions About Neural Networks
• What does a neural network do?
• How does a neural network learn?
• What is the learning mechanism?
What Does A Deep Neural Network Do?
(Formal Definition)
• The goal of a deep neural network is to approximate
some function 𝑓∗
(typically in some high dimensional
space).
• A feedforward neural network defines an mapping
𝒚 = 𝑓(𝒙, 𝜽)
• 𝒚 is the output or prediction/inference.
• 𝒙 is the input
• 𝜽 are the learned parameters (typically weights and
biases)
• The feedforward network learns the value of the
parameters 𝜽 that result in the the best function
approximation between 𝑓 and 𝑓∗
Output
How Does a Deep Neural Network Learn?
Maximum Likelihood Estimation
𝑝 𝑑𝑎𝑡𝑎(𝒙) is an unknown data-generating distribution
Training samples drawn from our unknown
distribution
𝑝 𝑚𝑜𝑑𝑒𝑙 𝒙, 𝜽 is a parameterized family of
probability distributions indexed by 𝜽.
Goal: We wish to find the parameters 𝜽 that
maximize the likelihood of the observed
training examples (i.e., that make the
observed data most probable).
The Maximum Likelihood Estimator (“MLE”) for 𝜽 is
formally defined:
How Does a Deep Neural Network Learn?
Maximum Likelihood Estimation
After some algebraic manipulation, we an show that MLE amounts to minimizing the dissimilarity bet
the empirical distribution 𝑝 𝑑𝑎𝑡𝑎(𝒙) (the training set) and the model distribution 𝑝 𝑚𝑜𝑑𝑒𝑙 𝒙, 𝜽 .
This is also known as the cross-entropy or Kullback-
Liebler divergence.
This means to train our model we need only minimize the following expression:
Supervised Learning
Show the network a series of examples of labeled
training examples. These are input and output
pairs that give the correct input/output behavior
(ground truth). Update parameters of neural
network accordingly. This process is called
training or learning.
Deep Neural
Network
Learning
Mechanism
Training Examples
Learning Mechanism
(High Level)
• Encode MLE in a loss function 𝐿
• Loss function defines how far away any given training example is from the ground
truth: 𝐿(𝒚, 𝒚)
• Over all training examples this encapsulates the relative entropy (Kullback-Liebler
divergence)
• Define a cost function 𝐽 that aggregates the loss over all training examples: 𝐽 𝜽 =
1
𝑚 𝑖 𝐿(𝒚, 𝒚)
• Take incremental steps over portions of training examples (called mini-
batches), to minimize J
• This process minimizes the relative entropy between the unknown distribution
(training examples) and the model distribution we are learning
Traversing The Error Surface
Find w, b that minimize J:
Is it Convex?
In general no. We need to worry about local minima!
Is it Convex
In higher dimensions, the issue turns out to be more about saddle points and very slow l
How Do Deep Neural Networks Learn Their
Own Feature Detectors?
• The learned parameters (weights and biases) are the feature detectors
• We let the network decide what features are important as expressed through the
weights and biases
• Each hidden layer/hidden unit may learn a different feature
Mechanics of Learning
• ForwardPropagation
• Update a’s and z’s based on next training example
• Cache this information for backpropagation
• BackPropagation
• Compute Gradients dW, db
• Gradient Descent
• Take small step on error surface in direction of gradients
The 4 Fundamental Equations Of
Backpropagation And Their Interpretation
(1)
(2)
(3)
(4)
Calculate error of
last layer
Propagate error
backwards preceding
layers
Calculate gradient
of cost function with
respect to weights using
errors
Calculate gradient
of cost function with
respect to biases
using errors
Gradient Operator
The gradient vector points in the direction of steepest ascent.
Proof:
must be by properties of the dot product.
Gradient Descent
• Algo:
• Randomly initialize weights and biases
• Calculate gradients
𝜕𝐽
𝜕𝑤 𝑖
and
𝜕𝐽
𝜕𝑏 𝑖
for all weights
and biases
• Update weights and biases using learning rate
and gradients
• 𝑤𝑖 = 𝑤𝑖-𝛼
𝜕𝐽
𝜕𝑤 𝑖
• 𝑏𝑖 = 𝑏𝑖-𝛼
𝜕𝐽
𝜕𝑏 𝑖
• Repeat until stopping condition
Notation:
𝑑𝑤 ≡
𝜕𝐽
𝜕𝑤
𝑑𝑏 ≡
𝜕𝐽
𝜕𝑏
Learning Rate
Backpropagation With Gradient Descent
• For each training example x, set the input activation 𝒂[0](𝑥) and perform the
following steps:
• Feedforward: For each l=1, 2, 3, … L compute 𝒛[𝑙](𝑥) = 𝒘[𝑙] 𝒂 𝑙−1 (𝑥) + 𝒃[𝑙] and 𝒂[𝑙](𝑥) =
𝜎(𝒛 𝑙
)
• Output Error: Compute 𝜺[𝐿](𝑥) = 𝜵 𝒂 𝐽⨀𝜎′(𝒛[𝐿](𝑥))
• Backpropagate Error: For each i=L-l, L-2 , … 1 compute 𝜺[𝑙](𝑥) =
((𝒘[𝑙+1]) 𝑇 𝜺[𝑙+1](𝑥))⨀𝜎′(𝒛[𝑙](𝑥))
• Compute One Step Of Gradient Descent: For each l=L, L-1, L-2, … 1, update the
weights according to the rules:
• 𝒘𝑙
= 𝒘𝑙
−
∝
𝑚 𝑥 𝜺 𝑙 𝑥
(𝒂 𝑙−1 𝑥
) 𝑇
• 𝒃𝑙
= 𝒃𝑙
−
𝛼
𝑚 𝑥 𝜺 𝑙 𝑥
Learning Rate
Learning Rate
Representational Learning
From Deep Learning – Goodfellow, Bengio and Courville
Input is presented at
the visible layer
(observable features).
Then a series of hidden
layers extracts
increasingly abstract
features from the
images. These layers
are called ”hidden”
because their values
are not given in the
data. Instead the
model must learn
which concepts are
useful for explaining
the relationships in the
observed data.
In deep learning, each
level learns to transform
its input data into a
slightly more abstract
and composite
representation.
How are Features Represented in DNNs?
• Tensors
• A tensor is simply a multidimensional array of numbers
• That’s it!
• Not to be confused with tensors in physics
• In physics, a tensor is a multi-linear operator or map
• Tensors in deep learning are definitely NOT that
Deep Neural Networks as Feature Detectors
• AlexNet (Sneak preview)
• Convolutional neural network that achieved a top-5 error of 15.3%, more than 10.8
percentage points ahead of the runner up in ImageNet Large Scale Visualization
Recognition Challenge
• Think of convolutional network as:
• Feature detectors – Conv layers that detect features
• Fully connected feedforward layers – compose features detected by conv layers into more complex
representations
• Will discuss convolutional neural networks in depth later
• AlexNet has 8 layers
• 5 Convolutional Layers – Feature Detectors
• 3 Fully Connected Layers – Compose Features
AlexNet
(Layer 1 Conv1 Features)
Edge detectors and color
detectors. Note that edge
detectors are at different
angles.
AlexNet
(Layer 6 Conv2 Features)
First 30 features learned by
Conv2 layer.
AlexNet
(Conv2-Conv5 Features)
Conv3 Layer Features Conv4 Layer Features Conv5 Layer Features
AlexNet
(Fully Connected Layer Features)
Fully Connected Layer (fc6) Fully Connected Layer (fc7)
AlexNet
(Images Resembling Specific Classes Most
In Final Fully Connected Layer)
Classes Selected:
‘hen’
‘Yorkshire terrier’
‘Shetland sheepdog’
‘fountain’
‘theatre curtain’
‘geyser’
Hyperparameter Tuning
Parameters and Hyperparameters
• Model Parameters
• These are the entities learned via training from the training data. They are not set
manually by the designer.
• With respect to deep neural networks, the model parameters are:
• Weights
• Biases
• Model Hyperparameters
• These are parameters that govern the determination of the model parameters during
training
• They are typically set manually via heuristics
• They are tuned during a cross-validation phase (discussed later)
• Examples:
• Learning rate, number of layers, number of units in each layer, many others to be
Model Selection
• To optimize the inference time behavior (the goal of training), a process known as
model selection is performed
• Model selection amounts to selecting an optimal set hyperparameters that yield the best
performance of the neural network
• The hyperparameters are tuned using an iterative process of either:
• Validation
• Cross-Validation
• Many models may be evaluated during the validation/cross-validation phase and the
optimal model is selected
• The optimal model is then evaluated on the test dataset to determine how well it performs on
data never seen before
Bias and Variance Pictures
From Coursera Deep Learning – Andrew N
high bias “just right” high variance
Analysis Of Bias-Variance Decomposition
• What is variance?
• Amount that 𝑓 would change if estimated it with a different training set
• Ideally, 𝑓 should not vary much between training sets
• With high variances, small perturbations in training set result in large changes in 𝑓
• What is bias?
• Bias is the error introduced by approximating real-life problems, which may be very
complex.
• For example, the world is highly non-linear and choosing a linear model will result in high
bias.
• In order to minimize the expected test error, need to minimize both bias and
variance
L2 Regularization
For Neural Network Regularization Term
Frobenius Norm – (Equiv to L2 Norm)
Why Learning Can Be Slow
If ellipse is very elongated (will happen if
lines corresponding to two training
examples are almost parallel), steepest
descent can be very slow. This is due to
the fact that with an elongated ellipse,
the gradient is big in the direction in
which we don’t want to move very far
and small in direction where we would
like to move a long way. This condition
will cause the trajectory across the
ravine rather than along the ravine. This
is the opposite of the desired goal.
*From Neural Networks For Machine Learning (Coursera – Hinton)
Local Optima
Intuition would suggest that it is likely to get stuck in a local optimum (left plot) because non-convex
However, in high dimensional spaces, a saddle point is much more likely (likelihood of all dimensions
up or down collectively is low). Thus, local optima are less like. Instead, a saddle point is most likely
dimensional spaces and algorithms like Adam can help escape from saddle points.
From Coursera Deep Learning
Andrew Ng
Gradient Descent With Momentum
Physics Analogy
Acceleration
Assume unit mass so velocity= momentum
Momentum
Friction
J can be viewed as the negative of the Hamiltonian of the system!
Hamilton’s Equations
Convolutional Neural Networks
Feedforward Neural Network To Do Image
Processing?
.
.
.
Image Pixels
Problem 1: Parameter Space ExplosionProblem 2: Rotational and Translation Invariance
Convolutional Neural Networks
• Features:
• Shared parameter space
• Translational and Rotational invariance
• Receptive Fields
• Convolution Operator
• It’s really Correlation Operator but nobody tells you that
Recurrent Neural Networks
What about Memory?
• Our neurons cannot remember anything
• What about correlations to the past?
• What about correlations to the future?
• Solution: Recurrent Neural Networks
• Carry Hidden State
• LSTMs (”Long Short Term Memory”) are one example
LSTM
(”Long Short Term Memory”)
What is TensorFlow?
• TensorFlow is a machine learning software framework based on the dataflow programming
paradigm
• A software framework is a reusable software environment that provides generic functionality that can
be selectively changed by additional user-write code, thus providing application specific software.
• Dataflow Programming
• Programming paradigm that models a program as a directed graph of the data flowing between
operations
• Data moves between nodes of the graph
• Imagine an assembly line with data moving between workers (data in motion)
• No hidden state to manage
• Contrast sequential programming:
• Data is at rest
• Requires state handling code
TensorFlow Graphs And Sessions
• TensorFlow is modeled on the Dataflow paradigm
• Dataflow is a programming model for parallel computing. In a dataflow graph, the nodes
represent units of computation and the edges represent the data (tensors) consumed or
produced by a computation.
• Dataflow has several advantages that TensorFlow leverages when executing programs:
• Parallelism – By using explicit edges to represent dependencies between operations, the
framework can identify operations that execute in parallel.
• Distributed Execution – By using explicit edges to represent the values that flow between
operations, it is possible for TensorFlow to partition a program across multiple devices (CPUs,
GPUs, TPUs) attached to different machines.
• Compilation - TensorFlows’s XLA compiler can use information the dataflow graph to generate
faster code by fusing together adjacent operations.
• Portability – The dataflow graph is a language-independent representation of the code in a
model.
TensorFlow Graph
Nodes represent Operations.
An Operation (tf.Operation ) in TensorFlow takes zero or more Tensor (tf.Tensor) objects as input
and generates zero or more Tensor objects as output.
. . .. . .
Edges represent the flow of Tensors (tf.Tens
between nodes.
A tf.Graph contains a set of tf.Operation objects, which represent
units of computation and tf.Tensor objects, which represent the units of
data that flow between operations.
Computation Graph
Logistic Regression
Update Rules For Gradient Descent:

More Related Content

What's hot

Deep learning summary
Deep learning summaryDeep learning summary
Deep learning summaryankit_ppt
 
Basic Learning Algorithms of ANN
Basic Learning Algorithms of ANNBasic Learning Algorithms of ANN
Basic Learning Algorithms of ANNwaseem khan
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
 
Unsupervised Feature Learning
Unsupervised Feature LearningUnsupervised Feature Learning
Unsupervised Feature LearningAmgad Muhammad
 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Randa Elanwar
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesMohammed Bennamoun
 
08 neural networks
08 neural networks08 neural networks
08 neural networksankit_ppt
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design TrainingESCOM
 
Artificial neural network for machine learning
Artificial neural network for machine learningArtificial neural network for machine learning
Artificial neural network for machine learninggrinu
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANNMohamed Talaat
 
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...Simplilearn
 
A neuro fuzzy decision support system
A neuro fuzzy decision support systemA neuro fuzzy decision support system
A neuro fuzzy decision support systemR A Akerkar
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkDEEPASHRI HK
 
Introduction to Neural networks (under graduate course) Lecture 7 of 9
Introduction to Neural networks (under graduate course) Lecture 7 of 9Introduction to Neural networks (under graduate course) Lecture 7 of 9
Introduction to Neural networks (under graduate course) Lecture 7 of 9Randa Elanwar
 

What's hot (20)

Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
02 Fundamental Concepts of ANN
02 Fundamental Concepts of ANN02 Fundamental Concepts of ANN
02 Fundamental Concepts of ANN
 
Deep learning summary
Deep learning summaryDeep learning summary
Deep learning summary
 
Basic Learning Algorithms of ANN
Basic Learning Algorithms of ANNBasic Learning Algorithms of ANN
Basic Learning Algorithms of ANN
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 
Unsupervised Feature Learning
Unsupervised Feature LearningUnsupervised Feature Learning
Unsupervised Feature Learning
 
Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9Introduction to Neural networks (under graduate course) Lecture 9 of 9
Introduction to Neural networks (under graduate course) Lecture 9 of 9
 
Artificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rulesArtificial Neural Networks Lect3: Neural Network Learning rules
Artificial Neural Networks Lect3: Neural Network Learning rules
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design Training
 
Artificial neural network for machine learning
Artificial neural network for machine learningArtificial neural network for machine learning
Artificial neural network for machine learning
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
 
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
 
test
testtest
test
 
Ann
Ann Ann
Ann
 
A neuro fuzzy decision support system
A neuro fuzzy decision support systemA neuro fuzzy decision support system
A neuro fuzzy decision support system
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
deona
deonadeona
deona
 
Introduction to Neural networks (under graduate course) Lecture 7 of 9
Introduction to Neural networks (under graduate course) Lecture 7 of 9Introduction to Neural networks (under graduate course) Lecture 7 of 9
Introduction to Neural networks (under graduate course) Lecture 7 of 9
 

Similar to Deep Learning Sample Class (Jon Lederman)

Neural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsNeural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsJon Lederman
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspectiveAnirban Santara
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Simplilearn
 
Artificial Neural Network (ANN
Artificial Neural Network (ANNArtificial Neural Network (ANN
Artificial Neural Network (ANNAndrew Molina
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxDebabrataPain1
 
Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxPoonam60376
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfAnkita Tiwari
 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sVidyasagar Bhargava
 
Artificial Neural Network Learning Algorithm.ppt
Artificial Neural Network Learning Algorithm.pptArtificial Neural Network Learning Algorithm.ppt
Artificial Neural Network Learning Algorithm.pptNJUSTAiMo
 
33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdf33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdfgnans Kgnanshek
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learningStanley Wang
 
introduction to DL network deep learning.ppt
introduction to DL network deep learning.pptintroduction to DL network deep learning.ppt
introduction to DL network deep learning.pptQuangMinhHuynh
 
introduction to deep Learning with full detail
introduction to deep Learning with full detailintroduction to deep Learning with full detail
introduction to deep Learning with full detailsonykhan3
 
Visualization of Deep Learning
Visualization of Deep LearningVisualization of Deep Learning
Visualization of Deep LearningYaminiAlapati1
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningJunaid Bhat
 

Similar to Deep Learning Sample Class (Jon Lederman) (20)

Neural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsNeural Networks and Deep Learning Basics
Neural Networks and Deep Learning Basics
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
Unit 2 ml.pptx
Unit 2 ml.pptxUnit 2 ml.pptx
Unit 2 ml.pptx
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
 
Artificial Neural Network (ANN
Artificial Neural Network (ANNArtificial Neural Network (ANN
Artificial Neural Network (ANN
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
 
Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptx
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdf
 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner's
 
Deep learning
Deep learningDeep learning
Deep learning
 
Artificial Neural Network Learning Algorithm.ppt
Artificial Neural Network Learning Algorithm.pptArtificial Neural Network Learning Algorithm.ppt
Artificial Neural Network Learning Algorithm.ppt
 
33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdf33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdf
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
 
Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
introduction to DL network deep learning.ppt
introduction to DL network deep learning.pptintroduction to DL network deep learning.ppt
introduction to DL network deep learning.ppt
 
introduction to deep Learning with full detail
introduction to deep Learning with full detailintroduction to deep Learning with full detail
introduction to deep Learning with full detail
 
Visualization of Deep Learning
Visualization of Deep LearningVisualization of Deep Learning
Visualization of Deep Learning
 
Deep learning
Deep learningDeep learning
Deep learning
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 

Recently uploaded

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Recently uploaded (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

Deep Learning Sample Class (Jon Lederman)

  • 3. Features/Representations • Features or representations: • Measurable property or characteristic of a phenomenon being observed • Specific variables that are provided as input to an algorithm • The success of a machine learning algorithm depends on determining the right features • With the right features, a machine learning algorithm can learn almost anything • With the wrong features, performance will be abysmal • But how do we decide what are the good features?
  • 4. Examples of Features • Character Recognition • Histograms counting number of black pixels along horizontal and vertical directions, number of internal holes, stroke detection, etc. • Speech Recognition • Mel frequency cepstral coefficients, phonemes, noise ratios, length of sound, etc. • Computer Vision • Edges, objects, colors, etc.
  • 5. History Lesson - Perceptrons ‘60s’ A perceptron is one example of a statistical pattern recognition system. . . . Decision unit Learned Weights Feature Units Inputs Features are hand engineered. Weights are learned here.
  • 6. Limitations of Perceptrons • Neural network research came to a halt in late ‘60s and early ‘70s largely due to the fact that perceptrons were shown to be limited. In particular: • Minsky and Papert’s “Group Invariance Theorem” proved that perceptron cannot learn if there exist transformations of the features that form a group. • This is very bad news for perceptrons, as pattern recognition requires translation and rotation invariance, which are both groups • If you can choose features by hand and use enough features a perceptron is very powerful • Thus, for binary input vectors a separate feature unit can be chosen for each vector. However, this results in an exponential explosion of the number of feature units required.
  • 7. Hallmarks of Deep Learning (Lessons From Perceptrons) • Feature Learning or Representational Learning • Deep neural networks learn their own feature detectors (more on this later) • Hierarchical Learning • More complex representations are expressed in terms of simpler representations • Non-linear • Deep Neural Networks have non-linearity “baked” into the neuron model. This allows them to learn much more complex features • Most of the interesting complexities of the world are non-linear • Superposition does not apply • Linear networks can only learn linear things as composition of linear operator is still linear
  • 8. Biological Neurons • Each neuron receives input from other neurons • The effect of each input line on the neuron is controlled by a synaptic weight • Weight can be positive or negative • The synaptic weights adapt so that the entire network learns to perform useful computations • Human brain has about 10^11 each with about 10^4 weights • Brain cortex looks the same all over and can become specialized • Provides for rapid parallel computation • Similar to FPGA • In fact, even a single neuron is not explained by neuroscience. In fact it is much more complex or possibly entirely different than our conception of artificial neurons. Upshot: Use this analogy loosely.
  • 9. Biological Neurons as Inspiration For Artificial Neurons
  • 12. But What is “Learning” and How Does It Happen? • Deep learning is a form of supervised learning • We build a network of artificial neurons which takes in an input and generates some output • Input can be a single number or can be a vector • We show the network a series of training examples and ask the network to learn from these examples • The training examples consist of an input and a (hopefully) correct output called the “ground truth” Deep Feedforward Networks Multilayer Neural Networks . . . . . . Input Hidden Unit Output . . . . . . . . . . . . . . . . . . Input Layer Output LayerHidden Layers a[0] = x a[1] a[2] a[l] ˆy = a[L ] Combine a bunch of artificial neurons into layers and let them talk to one another!
  • 13. Example 4-Layer Neural Network Input Prediction/Inference
  • 14. Multiclass Classification 3 Classes C=Number of Classes Softmax Activation Function Can prove if C=2, Softmax reduces to logistic regression
  • 15. Two Questions About Neural Networks • What does a neural network do? • How does a neural network learn? • What is the learning mechanism?
  • 16. What Does A Deep Neural Network Do? (Formal Definition) • The goal of a deep neural network is to approximate some function 𝑓∗ (typically in some high dimensional space). • A feedforward neural network defines an mapping 𝒚 = 𝑓(𝒙, 𝜽) • 𝒚 is the output or prediction/inference. • 𝒙 is the input • 𝜽 are the learned parameters (typically weights and biases) • The feedforward network learns the value of the parameters 𝜽 that result in the the best function approximation between 𝑓 and 𝑓∗ Output
  • 17. How Does a Deep Neural Network Learn? Maximum Likelihood Estimation 𝑝 𝑑𝑎𝑡𝑎(𝒙) is an unknown data-generating distribution Training samples drawn from our unknown distribution 𝑝 𝑚𝑜𝑑𝑒𝑙 𝒙, 𝜽 is a parameterized family of probability distributions indexed by 𝜽. Goal: We wish to find the parameters 𝜽 that maximize the likelihood of the observed training examples (i.e., that make the observed data most probable). The Maximum Likelihood Estimator (“MLE”) for 𝜽 is formally defined:
  • 18. How Does a Deep Neural Network Learn? Maximum Likelihood Estimation After some algebraic manipulation, we an show that MLE amounts to minimizing the dissimilarity bet the empirical distribution 𝑝 𝑑𝑎𝑡𝑎(𝒙) (the training set) and the model distribution 𝑝 𝑚𝑜𝑑𝑒𝑙 𝒙, 𝜽 . This is also known as the cross-entropy or Kullback- Liebler divergence. This means to train our model we need only minimize the following expression:
  • 19. Supervised Learning Show the network a series of examples of labeled training examples. These are input and output pairs that give the correct input/output behavior (ground truth). Update parameters of neural network accordingly. This process is called training or learning. Deep Neural Network Learning Mechanism Training Examples
  • 20. Learning Mechanism (High Level) • Encode MLE in a loss function 𝐿 • Loss function defines how far away any given training example is from the ground truth: 𝐿(𝒚, 𝒚) • Over all training examples this encapsulates the relative entropy (Kullback-Liebler divergence) • Define a cost function 𝐽 that aggregates the loss over all training examples: 𝐽 𝜽 = 1 𝑚 𝑖 𝐿(𝒚, 𝒚) • Take incremental steps over portions of training examples (called mini- batches), to minimize J • This process minimizes the relative entropy between the unknown distribution (training examples) and the model distribution we are learning
  • 21. Traversing The Error Surface Find w, b that minimize J:
  • 22. Is it Convex? In general no. We need to worry about local minima!
  • 23. Is it Convex In higher dimensions, the issue turns out to be more about saddle points and very slow l
  • 24. How Do Deep Neural Networks Learn Their Own Feature Detectors? • The learned parameters (weights and biases) are the feature detectors • We let the network decide what features are important as expressed through the weights and biases • Each hidden layer/hidden unit may learn a different feature
  • 25. Mechanics of Learning • ForwardPropagation • Update a’s and z’s based on next training example • Cache this information for backpropagation • BackPropagation • Compute Gradients dW, db • Gradient Descent • Take small step on error surface in direction of gradients
  • 26. The 4 Fundamental Equations Of Backpropagation And Their Interpretation (1) (2) (3) (4) Calculate error of last layer Propagate error backwards preceding layers Calculate gradient of cost function with respect to weights using errors Calculate gradient of cost function with respect to biases using errors
  • 27. Gradient Operator The gradient vector points in the direction of steepest ascent. Proof: must be by properties of the dot product.
  • 28. Gradient Descent • Algo: • Randomly initialize weights and biases • Calculate gradients 𝜕𝐽 𝜕𝑤 𝑖 and 𝜕𝐽 𝜕𝑏 𝑖 for all weights and biases • Update weights and biases using learning rate and gradients • 𝑤𝑖 = 𝑤𝑖-𝛼 𝜕𝐽 𝜕𝑤 𝑖 • 𝑏𝑖 = 𝑏𝑖-𝛼 𝜕𝐽 𝜕𝑏 𝑖 • Repeat until stopping condition Notation: 𝑑𝑤 ≡ 𝜕𝐽 𝜕𝑤 𝑑𝑏 ≡ 𝜕𝐽 𝜕𝑏 Learning Rate
  • 29. Backpropagation With Gradient Descent • For each training example x, set the input activation 𝒂[0](𝑥) and perform the following steps: • Feedforward: For each l=1, 2, 3, … L compute 𝒛[𝑙](𝑥) = 𝒘[𝑙] 𝒂 𝑙−1 (𝑥) + 𝒃[𝑙] and 𝒂[𝑙](𝑥) = 𝜎(𝒛 𝑙 ) • Output Error: Compute 𝜺[𝐿](𝑥) = 𝜵 𝒂 𝐽⨀𝜎′(𝒛[𝐿](𝑥)) • Backpropagate Error: For each i=L-l, L-2 , … 1 compute 𝜺[𝑙](𝑥) = ((𝒘[𝑙+1]) 𝑇 𝜺[𝑙+1](𝑥))⨀𝜎′(𝒛[𝑙](𝑥)) • Compute One Step Of Gradient Descent: For each l=L, L-1, L-2, … 1, update the weights according to the rules: • 𝒘𝑙 = 𝒘𝑙 − ∝ 𝑚 𝑥 𝜺 𝑙 𝑥 (𝒂 𝑙−1 𝑥 ) 𝑇 • 𝒃𝑙 = 𝒃𝑙 − 𝛼 𝑚 𝑥 𝜺 𝑙 𝑥 Learning Rate Learning Rate
  • 30. Representational Learning From Deep Learning – Goodfellow, Bengio and Courville Input is presented at the visible layer (observable features). Then a series of hidden layers extracts increasingly abstract features from the images. These layers are called ”hidden” because their values are not given in the data. Instead the model must learn which concepts are useful for explaining the relationships in the observed data. In deep learning, each level learns to transform its input data into a slightly more abstract and composite representation.
  • 31. How are Features Represented in DNNs? • Tensors • A tensor is simply a multidimensional array of numbers • That’s it! • Not to be confused with tensors in physics • In physics, a tensor is a multi-linear operator or map • Tensors in deep learning are definitely NOT that
  • 32. Deep Neural Networks as Feature Detectors • AlexNet (Sneak preview) • Convolutional neural network that achieved a top-5 error of 15.3%, more than 10.8 percentage points ahead of the runner up in ImageNet Large Scale Visualization Recognition Challenge • Think of convolutional network as: • Feature detectors – Conv layers that detect features • Fully connected feedforward layers – compose features detected by conv layers into more complex representations • Will discuss convolutional neural networks in depth later • AlexNet has 8 layers • 5 Convolutional Layers – Feature Detectors • 3 Fully Connected Layers – Compose Features
  • 33. AlexNet (Layer 1 Conv1 Features) Edge detectors and color detectors. Note that edge detectors are at different angles.
  • 34. AlexNet (Layer 6 Conv2 Features) First 30 features learned by Conv2 layer.
  • 35. AlexNet (Conv2-Conv5 Features) Conv3 Layer Features Conv4 Layer Features Conv5 Layer Features
  • 36. AlexNet (Fully Connected Layer Features) Fully Connected Layer (fc6) Fully Connected Layer (fc7)
  • 37. AlexNet (Images Resembling Specific Classes Most In Final Fully Connected Layer) Classes Selected: ‘hen’ ‘Yorkshire terrier’ ‘Shetland sheepdog’ ‘fountain’ ‘theatre curtain’ ‘geyser’
  • 39. Parameters and Hyperparameters • Model Parameters • These are the entities learned via training from the training data. They are not set manually by the designer. • With respect to deep neural networks, the model parameters are: • Weights • Biases • Model Hyperparameters • These are parameters that govern the determination of the model parameters during training • They are typically set manually via heuristics • They are tuned during a cross-validation phase (discussed later) • Examples: • Learning rate, number of layers, number of units in each layer, many others to be
  • 40. Model Selection • To optimize the inference time behavior (the goal of training), a process known as model selection is performed • Model selection amounts to selecting an optimal set hyperparameters that yield the best performance of the neural network • The hyperparameters are tuned using an iterative process of either: • Validation • Cross-Validation • Many models may be evaluated during the validation/cross-validation phase and the optimal model is selected • The optimal model is then evaluated on the test dataset to determine how well it performs on data never seen before
  • 41. Bias and Variance Pictures From Coursera Deep Learning – Andrew N high bias “just right” high variance
  • 42. Analysis Of Bias-Variance Decomposition • What is variance? • Amount that 𝑓 would change if estimated it with a different training set • Ideally, 𝑓 should not vary much between training sets • With high variances, small perturbations in training set result in large changes in 𝑓 • What is bias? • Bias is the error introduced by approximating real-life problems, which may be very complex. • For example, the world is highly non-linear and choosing a linear model will result in high bias. • In order to minimize the expected test error, need to minimize both bias and variance
  • 43. L2 Regularization For Neural Network Regularization Term Frobenius Norm – (Equiv to L2 Norm)
  • 44. Why Learning Can Be Slow If ellipse is very elongated (will happen if lines corresponding to two training examples are almost parallel), steepest descent can be very slow. This is due to the fact that with an elongated ellipse, the gradient is big in the direction in which we don’t want to move very far and small in direction where we would like to move a long way. This condition will cause the trajectory across the ravine rather than along the ravine. This is the opposite of the desired goal. *From Neural Networks For Machine Learning (Coursera – Hinton)
  • 45. Local Optima Intuition would suggest that it is likely to get stuck in a local optimum (left plot) because non-convex However, in high dimensional spaces, a saddle point is much more likely (likelihood of all dimensions up or down collectively is low). Thus, local optima are less like. Instead, a saddle point is most likely dimensional spaces and algorithms like Adam can help escape from saddle points. From Coursera Deep Learning Andrew Ng
  • 46. Gradient Descent With Momentum Physics Analogy Acceleration Assume unit mass so velocity= momentum Momentum Friction J can be viewed as the negative of the Hamiltonian of the system! Hamilton’s Equations
  • 48. Feedforward Neural Network To Do Image Processing? . . . Image Pixels Problem 1: Parameter Space ExplosionProblem 2: Rotational and Translation Invariance
  • 49. Convolutional Neural Networks • Features: • Shared parameter space • Translational and Rotational invariance • Receptive Fields • Convolution Operator • It’s really Correlation Operator but nobody tells you that
  • 51. What about Memory? • Our neurons cannot remember anything • What about correlations to the past? • What about correlations to the future? • Solution: Recurrent Neural Networks • Carry Hidden State • LSTMs (”Long Short Term Memory”) are one example
  • 53.
  • 54. What is TensorFlow? • TensorFlow is a machine learning software framework based on the dataflow programming paradigm • A software framework is a reusable software environment that provides generic functionality that can be selectively changed by additional user-write code, thus providing application specific software. • Dataflow Programming • Programming paradigm that models a program as a directed graph of the data flowing between operations • Data moves between nodes of the graph • Imagine an assembly line with data moving between workers (data in motion) • No hidden state to manage • Contrast sequential programming: • Data is at rest • Requires state handling code
  • 55. TensorFlow Graphs And Sessions • TensorFlow is modeled on the Dataflow paradigm • Dataflow is a programming model for parallel computing. In a dataflow graph, the nodes represent units of computation and the edges represent the data (tensors) consumed or produced by a computation. • Dataflow has several advantages that TensorFlow leverages when executing programs: • Parallelism – By using explicit edges to represent dependencies between operations, the framework can identify operations that execute in parallel. • Distributed Execution – By using explicit edges to represent the values that flow between operations, it is possible for TensorFlow to partition a program across multiple devices (CPUs, GPUs, TPUs) attached to different machines. • Compilation - TensorFlows’s XLA compiler can use information the dataflow graph to generate faster code by fusing together adjacent operations. • Portability – The dataflow graph is a language-independent representation of the code in a model.
  • 56. TensorFlow Graph Nodes represent Operations. An Operation (tf.Operation ) in TensorFlow takes zero or more Tensor (tf.Tensor) objects as input and generates zero or more Tensor objects as output. . . .. . . Edges represent the flow of Tensors (tf.Tens between nodes. A tf.Graph contains a set of tf.Operation objects, which represent units of computation and tf.Tensor objects, which represent the units of data that flow between operations.
  • 57. Computation Graph Logistic Regression Update Rules For Gradient Descent: