SlideShare a Scribd company logo
1 of 155
Information Technology
Information Technology 2
Day 1
• Module 1 – Computer Vision and Neural Networks- 45 Min
10 mins break
• Module 2 – CNN Internals– 90 Min
• Q & A – 20 min
1 Hr Lunch break
• Module 3 – Tensorflow – 45 Min
• Module 4 – Mnist Dataset with NN and CNN-30 Min
• 10 mins break
• Module 5 – Keras Framework – 30 min
• Module 6 – Short Talk on RNN- 30 Min
Day 2
• Module 7 – Classic Networks - 75 Min
10 mins break
• Module 8 – Classic Networks Programming (Resnet 50
etc.) – 30 Min
• Module 9 – Short Talk on Object Detection and Face
Recognition – 45 Min
1 Hr Lunch break
 Module 10 – Android AI with OPENVINO – 30 min
Course Topics (9-4, 9-12)
Information Technology 3
Refrences
https://www.pyimagesearch.com/
http://www.deeplearningbook.org/contents/intro.html
Information Technology 4
Information Technology 5
Information Technology 6
What is COMPUETR VISION (Legacy)
Computer vision is the transformation of data from a still or video camera
into either a decision or a new representation.
Decisions like “laser range finder indicates an object is 1 meter away” or
“there is a person in this scene” or “there are 14 tumor cells on this slide”
etc….
A new representation might mean turning a color image into a grayscale
image or removing camera motion from an image sequence.
Information Technology 7
What is deep learning
From Geforrey
Hilton (~9 min)
Information Technology 8
Vision
• Human thinks vision is easy (though its seamless), but the human brain divides the vision signal into many channels
that stream different kinds of information into your brain.
• Your brain has an attention system that identifies, in a task-dependent way, important parts of an image to examine
while suppressing examination of other areas.
• There is massive feedback in the visual stream that is, as yet, little understood.
• There are widespread associative inputs from muscle control sensors and all of the other senses that allow the brain
to draw on cross-associations made from years of living in the world.
• The feedback loops in the brain go back to all stages of processing, including the hardware sensors themselves (the
eyes), which mechanically control lighting via the iris and tune the reception on the surface of the retina.
Information Technology 9
Many neurons in the visual cortex have a small local receptive field, meaning they react only to visual
stimuli located in a limited region of the visual field (see next slide , in which the local receptive fields of
five neurons are represented by dashed circles). The receptive fields of different neurons may overlap,
and together they tile the whole visual field. Moreover, some neurons react only to images of horizontal
lines, while others react only to lines with different orientations (two neurons may have the same
receptive field but react to different line orientations). They also noticed that some neurons have larger
receptive fields, and they react to more complex patterns that are combinations of the lower-level
patterns. These observations led to the idea that the higher-level neurons are based on the outputs of
neighboring lower-level neurons (in next slide, notice that each neuron is connected only to a few
neurons from the previous layer). This powerful architecture is able to detect all sorts of complex patterns
in any area of the visual field.
Excerpts from Nobel prize winner in physiology in 1981, David H. Hubel
and Torsten Wiesel
A neuroscientific motivation behind
convolution
Information Technology 10
Biological Neurons in Visual cortex
Receptive Fields
Neurons
Information Technology 11
Brain possesses more than 1011 cells (or neurons), each of
which have well over 104 contacts/weights (or synapses)
with other neurons. If each neuron acts as a type of
microprocessor, then we have an immense computer (total
~1015 weights) in which all the processing elements can
operate concurrently.
Information Technology 12
Simulating vision intelligence
by deep learning !!
The hierarchy of concepts enables the computer
to learn complicated concepts by building them out
of simpler ones. If we draw a graph showing how
these concepts are built on top of each other, the
graph is deep, with many layers. For this reason,
we call this approach to AI deep learning.
Information Technology 13
What motivates us to do Computer
VISION with deep learning?
Avalanche/intrusion/landslides ?
License plate recognition – Traffic rule
violation/law-order situation
Medical Imaging
Information Technology 14
Information Technology 15
Machine vision
GRID OF
CELLS
Information Technology 16
16
Machine vision
GRID OF CELLS
Information Technology 17
Machine vision
Computer receives a grid of numbers/Cells from the camera or from disk.
For the most part, there’s no built-in pattern recognition, no automatic control of
focus and aperture, no cross-associations with years of experience. It’s a naïve
vision systems.
Any given number within grid shown on previous slide has a rather large noise
component and so by itself gives us little information, but this grid of numbers is
all the computer “sees” .
Our task, then, becomes to turn this noisy grid of numbers into the perception
understandable to humans.
Information Technology 18
Pixels- Each cell in the grid is called a pixel
Information Technology 19
Image Data
Information Technology 20
Image representation
Information Technology 21
Information Technology 22
Module 2 – CNN and NN Internals
Information Technology 23
Deep Learning using neural networks
• In deep learning, we feed millions of data instances into a network of neurons (neural networks) ,
teaching them to recognize patterns from raw inputs.
• The deep neural networks take raw inputs (such as pixel values in an image) and transform them into
useful representations, extracting higher-level features (such as shapes and edges in images) that
capture complex concepts by combining smaller and smaller pieces of information to solve challenging
tasks such as image classification.
• The networks automatically learn to build abstract representations by adapting and correcting
themselves, fitting patterns observed in the data. Here networks are trained with a feedback process
called backpropagation based on gradient descent optimization.
Information Technology 24
Neural Network examples
Standard NN Recurrent NNConvolutional NN
Information Technology 25
Two way neural network
Information Technology 26
Information Technology 27
What is filter/kernel and what it does?
Notice that the vertical white lines get
enhanced while the rest gets blurred
notice that the horizontal white lines
get enhanced while the rest is blurred
out
In Deep Learning Literature , No need to define the filters manually, Instead during training the
convolutional layer will automatically learn the most useful filters for its task, and the layers above will
learn to combine them into more complex patterns.
Filter examples like Sobel
filter , Scharr filter etc.
Information Technology 29
What does it mean to convolve?Each value in the matrix on the left corresponds to a
single pixel value, and we convolve a 3x3 filter with
the image by multiplying its values element-wise with
the original matrix, then summing them up and adding
a bias. Here the Stride value is 1.
Information Technology 31
How cnn works with multiple filters?
Information Technology 32
A convolution layer transforms an input volume into an output volume of different size. The size of input image generally
reduces!! (see Padding in upcoming slides if size reduction is a concern!!)
What does cnn do finally ?
Information Technology 33
Parameter sharing: A feature detector (such as a vertical edge detector)
that’s useful in one part of the image is probably useful in another part of
the image.
Sparsity of connections: In each layer, each output value depends only
on small number of inputs.
Nature Of Images  Establishing Invariance (Irrespective of Translation)
Learning Theory  Establishing Regularization (reducing Degrees of
Freedom to Degrees of Filter Size)
WHY CNN ?
Information Technology 34
One Convolution Layer
Information Technology 35
The main benefits of padding are the following:
– It allows you to use a CONV layer without necessarily shrinking the height and width of the
volumes. This is important for building deeper networks, since otherwise the height/width would
shrink as you go to deeper layers. An important special case is the "same" convolution, in which
the height/width is exactly preserved after one layer.
– It helps us keep more of the information at the border of an image. Without padding, very few
values at the next layer would be affected by pixels as the edges of an image.
Two Kinds of Padding 
1) Valid Convolutions  No Padding is done.
2) Same Convolutions  Pad to keep the output size same as input size.
Padding
Information Technology 36
Example of Zero padding
Information Technology 37
With each layer of convolution , the output size reduces at the edge i.e.
output images shrinks. Hence with deeper networks, lot of informations
at the edge would go away . Hence Padding is needed.
Even in some case, information at edge becomes crucial or It helps us
keep more of the information at the border of an image , , hence padding
would be required anyway .
Why Padding is required?
Information Technology 38
Stride
Information Technology 39
Forumlas
Information Technology 40
Fire a Neuron or Not!!  To activate a neuron ,
specialized functions are used which are called as
activation functions.
The purpose of the activation function is to introduce non-linearity into the network .
Real world problems are non-linear.
Hence, to make the incoming data nonlinear, we use nonlinear mapping called activation function.
An activation function is a decision making function that determines the presence of a particular neural
feature.
What is activation function ?
Information Technology 41
why activation function is needed?
A feed-forward neural network with linear
activation and any number of hidden layers
is equivalent to just a linear neural network
with no hidden layer.
Depth doesn’t contribute to
the expressiveness of the
model unless we use
nonlinear activations
between the linear layers.
Information Technology 42
1. Sigmoid (used for binary classification mostly at output layer)
2. Relu (used mostly at hidden layers)
3. Tanh (used for better derivatives)
4. Softmax (used for multilevel classifications)
5. Leaky relu
6. Relu6
Some Popular types of activations
Functions
Information Technology 43
Assume activation function is
Sigmoid activation function
Information Technology 44
Basic Neuron visualization
Information Technology 45
Types of Non - Linearities
Information Technology 46
The Softmax regression is a form of logistic regression that normalizes an input value into
a vector of values that follows a probability distribution whose total sums up to 1 . Below is
an example for multiclass
Softmax
Information Technology 47
Comparison of activation function
Not to be
Discussed
Information Technology 48
L-layer neural network
64X64X3
b gets broadcasted here
Information Technology 49
Cost/Loss Function or cross
entropy/similarity
Information Technology 50
Back propagation Computation
Information Technology 51
Summary of Gradient descent
Information Technology 52
Forward propagation and backward
propogation
Information Technology 53
Since we are using backpropagation the function we generate must be
differentiable at any point.
Nonlinear functions must be continuous and differentiable between it’s range.
Why Differentiable ?
Important Properties of activation
function!!
Rate of Change or Differentiability of
loss with respect to weight reaches
zero (global minima) to get the
perfect predictions!!
HOW and WHY?
(next Slide)
Information Technology 54
Gradient Descent
Cost/J
W1
W2
Gradient with
respect to
W1
Adjusted value
of W1
Information Technology 55
The pooling (POOL) layer reduces the height and width of the input. It helps speed up computation, as well as helps make
feature detectors more invariant to its position in the input. The two types of pooling layers are:
Max-pooling layer: slides an (f, f) window over the input and stores the max value of the window in the output.
Average-pooling layer: slides an (f, f) window over the input and stores the average value of the window in the output.
POOLING Layer
Theoretically reason for applying pooling
is that we would like our computed
features not to care about small changes
in position in an image
Information Technology 56
POOLING Layer Reduces size of
parameters considerably
Information Technology 57
A Typical CNn Architecture
Information Technology 58
Cnn over volume
Information Technology 59
If there are 10 Filters and each Filter is 3X3 , how many
weights are there to be learnt for an RGB Input image?
Exercise for CNN with Volume
(3X3X3+1)X10 = 280
Information Technology 60
1. Doing Convolution over RGB Image quite important instead of doing with Gray Image.
2. Here in CNN Over Volume , Depth of Input Image should be same as Depth of Filters. The Depth of
Filters can be customized to extract different features out of input image. Also if Depth of Filter is 1 ,
we would extract only gray scale information not the RGB information, hence it won’t be much
useful.
3. Various Kind of Filters can be stacked together to do the Feature Extraction based on the
requirement. Like Edge detection in only RED channel or in ALL channels etc.
4. Filters are learnable .
5. A filter maps to a local receptive field on the given input image.
6. CNN’s Property is that it avoids overfitting by using less number of parameters no matter how big
the input image is .
7. One Layer of Convolution is “Convolution (W * X + b ) + Activation(like Relu etc) “ .From this layer,
output goes to next layer of Convolution or Pooling or FC etc.
CNN Over volume
Information Technology 61
• So whatever be your input depth, only 2-D layer of neurons will be the output.
• The output volume is independent of the input volume, and it only depends on the number
filters(depth).Following Formula seems to be true 
• Number of filters (Depth of the CNN layer) is a hyper parameter.
• Each filter has it's own set of weights enabling it to learn a different feature on the same local region
covered by the filter.
CNN Over Volume
Depth of output layer = Depth of convolution layer
Information Technology 62
CNN over volume
Information Technology 63
Libraries
Keras
Information Technology 64
Information Technology 65
• Tensors are the standard way of representing data in deep learning.
• Scalars as rank-0 tensors, vectors as rank-1 tensors, matrices as rank-2 tensors and rank-3 tensor as a rectangular
prism of numbers or cube.
• A rank-1 tensor has a shape of dimension 1, a rank-2 tensor a shape of dimension 2, and a rank-3 tensor of dimension
3.
• RGB images are represented as tensors (three-dimensional arrays), with each pixel having three values corresponding
to red, green, and blue components.
• Here computation is approached as a dataflow graph/computation graph . In this graph, nodes represent operations
(add/mul/cnn/concat/pool etc.), and edges represent data (tensors) .
• Calling a TensorFlow operation adds a description of a computation to TensorFlow’s “computation graph”.
• Variables in TensorFlow hold tensors and allow for stateful computation that modifies variables to occur.
• TensorFlow 1.x is largely follows declarative programming style.
Basics on tensorflow
Information Technology 66
• Tensorflow 2.0 follows imperative programming style . Hence you can run your model instantly.
• TensorFlow derives gradient descent optimization algorithm automatically based on the computation graph and loss
function provided by the user.
• To monitor, debug, and visualize the training process, and to streamline experiments, TensorFlow comes with
TensorBoard.
• It has support for distributed training, asynchronous computation with threading and queues, efficient I/O and data
formats, and much more.
Basics on tensorflow
Information Technology 67
Flowing tensors
Edges
Nodes
Information Technology 68
In Tensorflow , A Computation graph is Dataflow Graph.
In a dataflow graph, the edges allow data to “flow” from one node to another in a directed manner.
Each of the graph’s nodes represents an operation.
Operations in the graph include all kinds of functions, from simple arithmetic ones such as subtraction
and multiplication to more complex ones.
What is a Computation Graph ?
The key idea behind computation graphs in
TensorFlow is that we first define what
computations should take place, and then
trigger the computation in an external
mechanism.
Information Technology 69
Writing and running programs in TensorFlow has the following steps:
• Create Tensors (variables) that are not yet executed/evaluated.
• Write operations between those Tensors.
• Initialize your Tensors.
• Create a Session.
• Run the Session. This will run the operations you'd written above.
Construct a Graph
Execute a Graph
Note :Importing TensorFlow (with import tensorflow as tf), a specific empty default
graph is formed. Additional Graphs can be created (with tf.Graph()) but they need
to be set a default graph ( “with ‘graph’.as_default()” ) for operation to be added
and executed.
Information Technology 70
Requested Node in sess.run() is called Fetches . The requested node is
part of elements of the graph , we wish to compute .
Fetches
Asking sess.run() for multiple
nodes’ outputs simply by
inputting a list of requested
nodes:
Information Technology 71
How tensorflow execution works
• Starts at requested output/outputs and works backward
• Compute Nodes that must be executed according to
dependencies
• Part of the Graph that would be computed , depends on
output query .
Information Technology 72
Basic tensorflow tutorial (hands-on)
Find the code at https://github.com/intelav/cnn-exploration
Information Technology 73
It denotes measure of similarity only when the model outputs class probabilities .
Hence it is a "cost" function that attempts to compute the difference between two
probability distribution functions.
Here it applies to activation function like Softmax and Sigmoid which outputs
probabilities , but not to Relu. Relu doesn’t output probabilities .
CROSS Entropy (CE)
Information Technology 74
Building Neural Network in Tensorflow
(Hands-ON)
Find the code at https://github.com/intelav/cnn-exploration
Information Technology 75
Convolutional Neural Networks in
Tensorflow (Hands-ON)
Find the code at https://github.com/intelav/cnn-exploration
Information Technology 76
A one hot encoding allows the representation of categorical data to be more
expressive.
There may be problems when there is no ordinal relationship and allowing the
representation to lean on any such relationship might be damaging to learning to
solve the problem. An example might be the labels ‘dog’ and ‘cat’
ONE HOT Encoding
Information Technology 77
one-hot encoding
Information Technology 78
Some Tensorflow operations
Information Technology 79
Some Tensorflow operations
Information Technology 80
Data Types in Tensorflow
Information Technology 81
Tensorflow initializers
Information Technology 82
Information Technology 83
MNIST DataSet of handwritten digits
Information Technology 84
MNIST Image Classification with SOFTMAX ONLY
(not using spatial information/CNN)Example of Supervised
Learning
Information Technology 85
Softmax regression model will figure out,
• For each pixel in the image, which digits tend to have high (or low) values in that location. Pixel values are
correlated with digit in image.
• For instance, the center of the image will tend to be white for zeros, but black for sixes.
• Thus, a black pixel in the center of an image will be evidence against the image containing a zero, and in favor of it
containing a six.
MNIST Image Classification with SOFTMAX ONLY
(not using spatial information/CNN)
Evidence for the image containing the digit 0
Here xi and wi are respective vectors for digit 0 , similarly for all other digits (1….9). Here
Learning in this model consists of finding weights that tell us how to accumulate evidence
for the existence of each of the digits.
Information Technology 86
Graph representation of mnist softmax
model “bias term,” which is equivalent to
stating which digits we believe an image
to be before seeing the pixel values. If
you have seen this before, then try
adding it to the model and check results.
Information Technology 87
7 * 7 * 64 * 1024 ~= 3.2M Vs
28 * 28 * 64 8 1024 ~= 51 M if
no Pooling Used.
CONV Layer with MNIST
DataSet
Information Technology 88
CNN Helper Functions
Information Technology 89
MNIST CNN MOdel
One quarter of
the size of image
Why Dropout ?
(next slide)
Information Technology 90
• This is a regularization trick used in order to force the network to distribute the learned representation
across all the neurons.
• It “turns off” a random preset fraction of the units in a layer, by setting their values to zero during
training.
• The Dropped-out neurons are random—different for each computation—forcing the network to learn a
representation that will work even after the dropout.
• This process is often thought of as training an “ensemble” of multiple networks, thereby increasing
generalization hence prevent overfitting .
• When using the network as a classifier at test time (“inference”), there is no dropout and the full
network is used as is .
Dropout
Information Technology 91
Information Technology 92
Being able to go from idea to result with the least possible delay is
key to finding good models.
Other Attributes of Keras Framework 
• Keras was developed to enable deep learning engineers to build and experiment with different
models very quickly.
• Keras is an even higher-level framework and provides additional abstractions.
• Keras is more restrictive than the lower-level frameworks, so there are some very complex models
that you can implement in TensorFlow but not (without more difficulty) in Keras.
Why keras ?
Information Technology 93
Two Kind of Models
Sequential Models
Model Class with
Functional APIs
1. Create the model by calling the function above
2. Compile the model by calling model.compile(optimizer = "...",
loss = "...", metrics = ["accuracy"])
3. Train the model on train data by calling model.fit(x = ..., y = ...,
epochs = ..., batch_size = ...)
4. Test the model on test data by calling model.evaluate(x = ..., y =
...)
Four steps
in Keras for
training and
test
Predict using model.predict(input_image of same shape on which
it has been trained.)
Information Technology 94
KERAS with Sequential Examples
Find the code at https://github.com/intelav/cnn-exploration
Information Technology 95
Keras with functional examples
Find the code at https://github.com/intelav/cnn-exploration
Information Technology 96
Information Technology 97
Exploiting structure is key to success –
RNN exploits sequential structure.
Information Technology 98
Sequence of Data
Information Technology 99
Basic idea behind RNN
Rnn – recurrent neural networks
Each new element in the
sequence contributes some
new information, which
updates the current state of
the model.
Markov chain model
View data sequences as “chains,” with
each node in the chain dependent in
some way on the previous node, so
that “history” is not erased but carried
on
Mathematically in
statistics and probability
Information Technology 100
Rnn models based on chain structure
Information Technology 101
Different Kinds of RNN
One to many
𝑎<0>
𝑥
𝑦<1>
𝑦<2>
𝑦<𝑇𝑦>
⋯
𝑥<2>
𝑥<𝑇𝑥>
𝑎<0>
𝑥<1>
𝑦
⋯
Many to one
Information Technology 102
Different Kinds of RNN
𝑎<0>
𝑥<1>
𝑦<𝑇𝑦>
⋯
𝑥<2>
𝑥<𝑇𝑥>
𝑦<1> 𝑦<2>
Many to many
Information Technology 103
Different Kinds of RNN
Many to many
𝑎<0>
𝑥<1>
𝑦<1>
⋯
𝑥<𝑇𝑥>
𝑦<𝑇𝑦>
⋯⋯
Information Technology 104
Update step for rnn
Tan h  hyperbolic tangent function that has its range in [–1,1]
xt and ht are the input and state vectors
Information Technology 105
MNIST IMAGES AS SEQUENCES
Information Technology 106
MNIST with RNN Example
Find the code at https://github.com/intelav/cnn-exploration
Information Technology 107
Information Technology 108
Information Technology 109
Information Technology 110
Layer Names/Types Size Stride Padding Number of Filters
1 Conv 11X11 4 No 96
2 Max-Pool 3X3 2
3 Conv 5X5 1 Yes 256
4 Max-Pool 3X3 2
5 Conv 3X3 1 Yes 384
6 Conv 3X3 1 Yes 384
7 Conv 3X3 1 Yes 256
8 Max-pool 3X3 2
9 FC [9216,4096]
10 FC [4096,4096]
11 Output layer [4096,10]
Alexnet Details
7 hidden “weight” layers
Information Technology 111
CONV = 3x3 filter, S=1, Same Convolution , MAX-POOL=2x2,S=2
Information Technology 112
VGG Net Details
Information Technology 113
Problem with deeper neural networks are they are harder to train and once the number of layers reach certain number, the
training error starts to raise again.
Deep networks are also harder to train due to exploding and vanishing gradients problem.
Problems with Deeper networks?
Information Technology 114
resnet
Information Technology 115
ResNet (Residual Network), proposed by He at all in Deep Residual Learning for Image Recognition paper (2015), solves
these problems by implementing skip connection where output from one layer is fed to layer deeper in the network as
below, hence a "shortcut" or a "skip connection" allows the gradient to be directly backpropagated to earlier layers: 
Skip Connection - RESNET
Information Technology 116
CONV 3X3 , Same Conv with intermittent Pooling Layers
On using CONV 3x3 and Same Conv , dimension of al+2 and al doesn’t change ,
hence addition of them should not be a problem. But
On using Pooling Layers , dimension of al+2 and al differs , then use intermittent
Ws to multiplied to get same dimension to achieve identity function as below
al+2 = Ws * al
Why resnet ?
Information Technology 117
Resnet details
Information Technology 118
Multiple layers skipping
Information Technology 119
Convolutional block with dimension matchup for
shortcut with final layer
The CONV2D layer on the shortcut path
does not use any non-linear activation
function. Its main role is to just apply a
(learned) linear function that reduces the
dimension of the input, so that the
dimensions match up for the later
addition step.
Information Technology 120
The advantages of ResNets are:
• performance doesn’t degrade with very deep network
• cheaper to compute
• ability to train very very deep network
ResNet works because:
• identify function is easy for residual block to learn
• using a skip-connection helps the gradient to back-propagate and thus helps
you to train deeper networks
Benefits of resnet
Information Technology 121
This leads to less computation by giving the similar output of reduced channel dimensions for a given input image.
it suffers with less over-fitting due to smaller kernel size (1x1).
One by One convolution was first introduced in this paper titled Network in Network .
1X1 CONVOLUTION or Network-IN-
Network
Information Technology 122
1 x 1 Convolution
Information Technology 123
Depth wise separable Convolution – A foundation to
mobilenet and googlnet/Inception network and Many more…DF  Input/output Feature Map,
Dk  Kernel Map,
M  Input Depth , NOutput Depth
Information Technology 124
Mobilenet’s efficiency Vs
VGG/Googlenet/Alexnet’s
Information Technology 125
Mobilenet architecture
Information Technology 126
One layer of Inception network -
googlenet
Information Technology 127
Complete Inception network
Information Technology 128
Information Technology 129
Resnet-50
Find the code at https://github.com/intelav/cnn-exploration
Information Technology 130
Information Technology 131
Information Technology 132
Object Detection
Object Localization
Image Classification
Probabilities whether an
object exists or not ?
Classes to detect in
Image (car,Pedestrian
Motorcycle ?
Bounding Box Location
numbers for
each Image
in form of
coordinates
Information Technology 133
Train Convnet on Cropped Images and then run sliding windows protocol as follows 
HOW OD works?
Run Sliding Window
convolutionally to get all
bounding box in one
shot
Sequentially with
different window sizes
on the given input
image leading to
higher computation
or
Information Technology 134
Evaluating object localization
IoU (Intersection of Union) is a measure of the overlap between two bounding
boxes.
How accurate detected
bounding boxes are ?
Information Technology 135
IOU Computation Example
Information Technology 136
Non-max suppression
Get Rid of Bounding box with
Low Probability detection
Score for each Class detected
Get Rid of Overlapped
Bounding Box by measuring
IoU with Highest Probability
Boxes for each of the class
detected
Information Technology 137
Anchor boxes (for overlapping objects)
• Propose 2-5 Anchor boxes for different
sizes of objects detected
• Match IoU of Anchor Boxes with the
Ground Truth of Image and Predict that
Class in respective Anchor Box
• Anchor boxes are defined only by their
width and height
Information Technology 138
Yolo(You Look only once)
Sum of 80 Classes and 5 parameters
5 Anchor Boxes
Grid Cell
Information Technology 139
Region proposal CNN (R-CNN)
Run Semantic Segmentation Algorithm to
detect Blobs in the Image
Here Blob is part of Image where at least
one object can be classified .
Run Classifier on the Proposed regions
of Blobs only (also called ROI- region of
interest)
Information Technology 140
• YOLO (you only look once)
• F-RCNN
• Fast-RCNN
• Faster-RCNN
• SSD (single Shot detector) and
• Many More …
Other better methods of od
Information Technology 141
Methodologies of object detection
Information Technology 142
Information Technology 143
Face Recognition
Detect
Stages for Face Recognition
Align
Represent
Classify
MTCN
N
Facenet
Information Technology 144
Face embedding to be used for Oneshot Learning,
Similarity function, Siamese Network , triplet loss
and logisitic regression
⋮
f(𝑥(𝑖)
)
⋮
f(𝑥(𝑗)
)
𝑦
Information Technology 145
An encoding is a good one if:
• The encodings of two images of the same person are quite similar to each other
• The encodings of two images of different persons are very different
One example of face
embedding/encoding
Information Technology 146
Some other examples of encoding
Information Technology 147
Training via Triple Loss uses triplets of Images (A,P,N) where
• A is an "Anchor" image--a picture of a person.
• P is a "Positive" image--a picture of the same person as the Anchor image.
• N is a "Negative" image--a picture of a different person than the Anchor image.
Triplet Loss in Detail
Minimize the Loss
Function using Gradient
Descent
Margin( Hyper parameter)
Denoting max(z,0) as [z]+
Information Technology 148
• Train via Triplet Loss and then Classify
• Train via Logistic Regression and then Classify
• Compare Pre-computed Encoding (Facenet ) with the Encoding of New
Image
Different Ways of Doing face recognition
Information Technology 149
Face verification vs. face recognition
Verification
• Input image, name/ID
• Output whether the input image is that of the claimed person
Recognition
• Has a database of K persons
• Get an input image
• Output ID if the image is any of the K persons (or “not recognized”)
Information Technology 150
Information Technology 151
Android AI Architecture on Celadon
Android Applications TensorFlow Lite
Android NNAPI/NN Runtime
Android NN
HAL
CPU
fallback
Caffe Tensorflow MxNet
Model
Optimizer
ONNX Kaldi
TensorFlow
Load Plugin, XML and
Infer
OPENVINO (Inference Engine API )
Plug-InArchitecture
USB/PCI
Driver
Myrad
2/X
DLA
GEN
CPU: Xeon/Core
/SKL/Atom
MKL-DNN
clDNN Plugin
MKLDNN
Plugin
PCI Driver
Intrinsics
FPGA Plugin
clDNN/Open
CL
Myriad
Plugin
GNA Plugin
PCI Driver
GNA
GNA API
C++
C+
+
Java/JNI
VULKAN Driver
CPU:
Xeon/Core
/SKL/Atom
MKLDNN
HAL
VPU
HAL
GPU
HAL
GNA
HAL
VULKAN
HAL
Android OPENVIN
O
GOS AI Team AI
Frameworks
Intel
HW
Execution Capabilities
Partitioning
IR
https://github.com/projectceladon
Information Technology 152
OpenCV [OpenCV] is an open source (see http://opensource.org) computer vision library available from http://opencv.org.
In 1999 Gary Bradski [Bradski], working at Intel Corporation, launched OpenCV with the hopes of accelerating computer
vision and artificial intelligence by providing a solid infrastructure for everyone working in the field.
The OpenCV library contains over 500 functions that span many areas in vision, including factory product inspection,
medical imaging, security, user interface, camera calibration, stereo vision, and robotics.
It has its own ML and DNN libraries .
What is opencv?
OPENCV Architecture on Different OSes
Information Technology 153
https://docs.openvinotoolkit.org/latest/_intel_models_face_detection_adas_0001_description_face_detection_adas_0001.
html#inputs
https://docs.openvinotoolkit.org/latest/_demos_interactive_face_detection_demo_README.html
Interactive face demo using OPENCV
and OPENVINO
Information Technology 154
Interactive face demo
OPENCV
API/Data
Types
Information Technology 155
Interactive face demo
OPENVINO
API/Data
Types
Information Technology 156
Q & A
Information Technology 157
Thankyou!!

More Related Content

What's hot

Pres Tesi LM-2016+transcript_eng
Pres Tesi LM-2016+transcript_engPres Tesi LM-2016+transcript_eng
Pres Tesi LM-2016+transcript_engDaniele Ciriello
 
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationMohammed Bennamoun
 
Neural Networks Ver1
Neural  Networks  Ver1Neural  Networks  Ver1
Neural Networks Ver1ncct
 
Intro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer VisionIntro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer VisionChristoph Körner
 
Neural networks...
Neural networks...Neural networks...
Neural networks...Molly Chugh
 
Deep Learning in Bio-Medical Imaging
Deep Learning in Bio-Medical ImagingDeep Learning in Bio-Medical Imaging
Deep Learning in Bio-Medical ImagingJoonhyung Lee
 
let's dive to deep learning
let's dive to deep learninglet's dive to deep learning
let's dive to deep learningMohamed Essam
 
Speech Processing with deep learning
Speech Processing  with deep learningSpeech Processing  with deep learning
Speech Processing with deep learningMohamed Essam
 
Artificial neural networks and its application
Artificial neural networks and its applicationArtificial neural networks and its application
Artificial neural networks and its applicationHưng Đặng
 
Artificial nueral network slideshare
Artificial nueral network slideshareArtificial nueral network slideshare
Artificial nueral network slideshareRed Innovators
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & OpportunityiTrain
 
Artificial Neural Network for hand Gesture recognition
Artificial Neural Network for hand Gesture recognitionArtificial Neural Network for hand Gesture recognition
Artificial Neural Network for hand Gesture recognitionVigneshwer Dhinakaran
 
Artifical Neural Network and its applications
Artifical Neural Network and its applicationsArtifical Neural Network and its applications
Artifical Neural Network and its applicationsSangeeta Tiwari
 
Neural network final NWU 4.3 Graphics Course
Neural network final NWU 4.3 Graphics CourseNeural network final NWU 4.3 Graphics Course
Neural network final NWU 4.3 Graphics CourseMohaiminur Rahman
 
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikDeep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikThe Hive
 

What's hot (20)

Deep learning
Deep learning Deep learning
Deep learning
 
Pres Tesi LM-2016+transcript_eng
Pres Tesi LM-2016+transcript_engPres Tesi LM-2016+transcript_eng
Pres Tesi LM-2016+transcript_eng
 
Neural networks
Neural networksNeural networks
Neural networks
 
88 92
88 9288 92
88 92
 
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computation
 
Neural Networks Ver1
Neural  Networks  Ver1Neural  Networks  Ver1
Neural Networks Ver1
 
Artificial Neural Network.pptx
Artificial Neural Network.pptxArtificial Neural Network.pptx
Artificial Neural Network.pptx
 
Intro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer VisionIntro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer Vision
 
Neural networks...
Neural networks...Neural networks...
Neural networks...
 
Deep Learning in Bio-Medical Imaging
Deep Learning in Bio-Medical ImagingDeep Learning in Bio-Medical Imaging
Deep Learning in Bio-Medical Imaging
 
let's dive to deep learning
let's dive to deep learninglet's dive to deep learning
let's dive to deep learning
 
Speech Processing with deep learning
Speech Processing  with deep learningSpeech Processing  with deep learning
Speech Processing with deep learning
 
Artificial neural networks and its application
Artificial neural networks and its applicationArtificial neural networks and its application
Artificial neural networks and its application
 
Artificial nueral network slideshare
Artificial nueral network slideshareArtificial nueral network slideshare
Artificial nueral network slideshare
 
Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & Opportunity
 
Artificial Neural Network for hand Gesture recognition
Artificial Neural Network for hand Gesture recognitionArtificial Neural Network for hand Gesture recognition
Artificial Neural Network for hand Gesture recognition
 
Neural network
Neural networkNeural network
Neural network
 
Artifical Neural Network and its applications
Artifical Neural Network and its applicationsArtifical Neural Network and its applications
Artifical Neural Network and its applications
 
Neural network final NWU 4.3 Graphics Course
Neural network final NWU 4.3 Graphics CourseNeural network final NWU 4.3 Graphics Course
Neural network final NWU 4.3 Graphics Course
 
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra MalikDeep Visual Understanding from Deep Learning by Prof. Jitendra Malik
Deep Visual Understanding from Deep Learning by Prof. Jitendra Malik
 

Similar to IT Professional's Guide to Deep Learning and Computer Vision

BASIC CONCEPT OF DEEP LEARNING.pptx
BASIC CONCEPT OF DEEP LEARNING.pptxBASIC CONCEPT OF DEEP LEARNING.pptx
BASIC CONCEPT OF DEEP LEARNING.pptxRiteshPandey184067
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learningRADO7900
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceLukas Masuch
 
Let_s_Dive_to_Deep_Learning.pptx
Let_s_Dive_to_Deep_Learning.pptxLet_s_Dive_to_Deep_Learning.pptx
Let_s_Dive_to_Deep_Learning.pptxMohamed Essam
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningAmr Rashed
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual IntroductionLukas Masuch
 
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in RUnderstanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in RManish Saraswat
 
Top 10 deep learning algorithms you should know in
Top 10 deep learning algorithms you should know inTop 10 deep learning algorithms you should know in
Top 10 deep learning algorithms you should know inAmanKumarSingh97
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognitionvatsal199567
 
Hand Written Digit Classification
Hand Written Digit ClassificationHand Written Digit Classification
Hand Written Digit Classificationijtsrd
 
Lect1_Threshold_Logic_Unit lecture 1 - ANN
Lect1_Threshold_Logic_Unit  lecture 1 - ANNLect1_Threshold_Logic_Unit  lecture 1 - ANN
Lect1_Threshold_Logic_Unit lecture 1 - ANNMostafaHazemMostafaa
 
Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Amr Rashed
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning TutorialAmr Rashed
 

Similar to IT Professional's Guide to Deep Learning and Computer Vision (20)

BASIC CONCEPT OF DEEP LEARNING.pptx
BASIC CONCEPT OF DEEP LEARNING.pptxBASIC CONCEPT OF DEEP LEARNING.pptx
BASIC CONCEPT OF DEEP LEARNING.pptx
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
 
Let_s_Dive_to_Deep_Learning.pptx
Let_s_Dive_to_Deep_Learning.pptxLet_s_Dive_to_Deep_Learning.pptx
Let_s_Dive_to_Deep_Learning.pptx
 
Cnn
CnnCnn
Cnn
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in RUnderstanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
 
AINL 2016: Filchenkov
AINL 2016: FilchenkovAINL 2016: Filchenkov
AINL 2016: Filchenkov
 
Mnist report
Mnist reportMnist report
Mnist report
 
Artificial Neural networks
Artificial Neural networksArtificial Neural networks
Artificial Neural networks
 
Top 10 deep learning algorithms you should know in
Top 10 deep learning algorithms you should know inTop 10 deep learning algorithms you should know in
Top 10 deep learning algorithms you should know in
 
Mnist report ppt
Mnist report pptMnist report ppt
Mnist report ppt
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognition
 
Hand Written Digit Classification
Hand Written Digit ClassificationHand Written Digit Classification
Hand Written Digit Classification
 
Neural Network
Neural NetworkNeural Network
Neural Network
 
Lect1_Threshold_Logic_Unit lecture 1 - ANN
Lect1_Threshold_Logic_Unit  lecture 1 - ANNLect1_Threshold_Logic_Unit  lecture 1 - ANN
Lect1_Threshold_Logic_Unit lecture 1 - ANN
 
Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Deep learning tutorial 9/2019
Deep learning tutorial 9/2019
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning Tutorial
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 

Recently uploaded

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Recently uploaded (20)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

IT Professional's Guide to Deep Learning and Computer Vision

  • 2. Information Technology 2 Day 1 • Module 1 – Computer Vision and Neural Networks- 45 Min 10 mins break • Module 2 – CNN Internals– 90 Min • Q & A – 20 min 1 Hr Lunch break • Module 3 – Tensorflow – 45 Min • Module 4 – Mnist Dataset with NN and CNN-30 Min • 10 mins break • Module 5 – Keras Framework – 30 min • Module 6 – Short Talk on RNN- 30 Min Day 2 • Module 7 – Classic Networks - 75 Min 10 mins break • Module 8 – Classic Networks Programming (Resnet 50 etc.) – 30 Min • Module 9 – Short Talk on Object Detection and Face Recognition – 45 Min 1 Hr Lunch break  Module 10 – Android AI with OPENVINO – 30 min Course Topics (9-4, 9-12)
  • 6. Information Technology 6 What is COMPUETR VISION (Legacy) Computer vision is the transformation of data from a still or video camera into either a decision or a new representation. Decisions like “laser range finder indicates an object is 1 meter away” or “there is a person in this scene” or “there are 14 tumor cells on this slide” etc…. A new representation might mean turning a color image into a grayscale image or removing camera motion from an image sequence.
  • 7. Information Technology 7 What is deep learning From Geforrey Hilton (~9 min)
  • 8. Information Technology 8 Vision • Human thinks vision is easy (though its seamless), but the human brain divides the vision signal into many channels that stream different kinds of information into your brain. • Your brain has an attention system that identifies, in a task-dependent way, important parts of an image to examine while suppressing examination of other areas. • There is massive feedback in the visual stream that is, as yet, little understood. • There are widespread associative inputs from muscle control sensors and all of the other senses that allow the brain to draw on cross-associations made from years of living in the world. • The feedback loops in the brain go back to all stages of processing, including the hardware sensors themselves (the eyes), which mechanically control lighting via the iris and tune the reception on the surface of the retina.
  • 9. Information Technology 9 Many neurons in the visual cortex have a small local receptive field, meaning they react only to visual stimuli located in a limited region of the visual field (see next slide , in which the local receptive fields of five neurons are represented by dashed circles). The receptive fields of different neurons may overlap, and together they tile the whole visual field. Moreover, some neurons react only to images of horizontal lines, while others react only to lines with different orientations (two neurons may have the same receptive field but react to different line orientations). They also noticed that some neurons have larger receptive fields, and they react to more complex patterns that are combinations of the lower-level patterns. These observations led to the idea that the higher-level neurons are based on the outputs of neighboring lower-level neurons (in next slide, notice that each neuron is connected only to a few neurons from the previous layer). This powerful architecture is able to detect all sorts of complex patterns in any area of the visual field. Excerpts from Nobel prize winner in physiology in 1981, David H. Hubel and Torsten Wiesel A neuroscientific motivation behind convolution
  • 10. Information Technology 10 Biological Neurons in Visual cortex Receptive Fields Neurons
  • 11. Information Technology 11 Brain possesses more than 1011 cells (or neurons), each of which have well over 104 contacts/weights (or synapses) with other neurons. If each neuron acts as a type of microprocessor, then we have an immense computer (total ~1015 weights) in which all the processing elements can operate concurrently.
  • 12. Information Technology 12 Simulating vision intelligence by deep learning !! The hierarchy of concepts enables the computer to learn complicated concepts by building them out of simpler ones. If we draw a graph showing how these concepts are built on top of each other, the graph is deep, with many layers. For this reason, we call this approach to AI deep learning.
  • 13. Information Technology 13 What motivates us to do Computer VISION with deep learning? Avalanche/intrusion/landslides ? License plate recognition – Traffic rule violation/law-order situation Medical Imaging
  • 15. Information Technology 15 Machine vision GRID OF CELLS
  • 16. Information Technology 16 16 Machine vision GRID OF CELLS
  • 17. Information Technology 17 Machine vision Computer receives a grid of numbers/Cells from the camera or from disk. For the most part, there’s no built-in pattern recognition, no automatic control of focus and aperture, no cross-associations with years of experience. It’s a naïve vision systems. Any given number within grid shown on previous slide has a rather large noise component and so by itself gives us little information, but this grid of numbers is all the computer “sees” . Our task, then, becomes to turn this noisy grid of numbers into the perception understandable to humans.
  • 18. Information Technology 18 Pixels- Each cell in the grid is called a pixel
  • 22. Information Technology 22 Module 2 – CNN and NN Internals
  • 23. Information Technology 23 Deep Learning using neural networks • In deep learning, we feed millions of data instances into a network of neurons (neural networks) , teaching them to recognize patterns from raw inputs. • The deep neural networks take raw inputs (such as pixel values in an image) and transform them into useful representations, extracting higher-level features (such as shapes and edges in images) that capture complex concepts by combining smaller and smaller pieces of information to solve challenging tasks such as image classification. • The networks automatically learn to build abstract representations by adapting and correcting themselves, fitting patterns observed in the data. Here networks are trained with a feedback process called backpropagation based on gradient descent optimization.
  • 24. Information Technology 24 Neural Network examples Standard NN Recurrent NNConvolutional NN
  • 25. Information Technology 25 Two way neural network
  • 27. Information Technology 27 What is filter/kernel and what it does? Notice that the vertical white lines get enhanced while the rest gets blurred notice that the horizontal white lines get enhanced while the rest is blurred out In Deep Learning Literature , No need to define the filters manually, Instead during training the convolutional layer will automatically learn the most useful filters for its task, and the layers above will learn to combine them into more complex patterns. Filter examples like Sobel filter , Scharr filter etc.
  • 28. Information Technology 29 What does it mean to convolve?Each value in the matrix on the left corresponds to a single pixel value, and we convolve a 3x3 filter with the image by multiplying its values element-wise with the original matrix, then summing them up and adding a bias. Here the Stride value is 1.
  • 29. Information Technology 31 How cnn works with multiple filters?
  • 30. Information Technology 32 A convolution layer transforms an input volume into an output volume of different size. The size of input image generally reduces!! (see Padding in upcoming slides if size reduction is a concern!!) What does cnn do finally ?
  • 31. Information Technology 33 Parameter sharing: A feature detector (such as a vertical edge detector) that’s useful in one part of the image is probably useful in another part of the image. Sparsity of connections: In each layer, each output value depends only on small number of inputs. Nature Of Images  Establishing Invariance (Irrespective of Translation) Learning Theory  Establishing Regularization (reducing Degrees of Freedom to Degrees of Filter Size) WHY CNN ?
  • 32. Information Technology 34 One Convolution Layer
  • 33. Information Technology 35 The main benefits of padding are the following: – It allows you to use a CONV layer without necessarily shrinking the height and width of the volumes. This is important for building deeper networks, since otherwise the height/width would shrink as you go to deeper layers. An important special case is the "same" convolution, in which the height/width is exactly preserved after one layer. – It helps us keep more of the information at the border of an image. Without padding, very few values at the next layer would be affected by pixels as the edges of an image. Two Kinds of Padding  1) Valid Convolutions  No Padding is done. 2) Same Convolutions  Pad to keep the output size same as input size. Padding
  • 35. Information Technology 37 With each layer of convolution , the output size reduces at the edge i.e. output images shrinks. Hence with deeper networks, lot of informations at the edge would go away . Hence Padding is needed. Even in some case, information at edge becomes crucial or It helps us keep more of the information at the border of an image , , hence padding would be required anyway . Why Padding is required?
  • 38. Information Technology 40 Fire a Neuron or Not!!  To activate a neuron , specialized functions are used which are called as activation functions. The purpose of the activation function is to introduce non-linearity into the network . Real world problems are non-linear. Hence, to make the incoming data nonlinear, we use nonlinear mapping called activation function. An activation function is a decision making function that determines the presence of a particular neural feature. What is activation function ?
  • 39. Information Technology 41 why activation function is needed? A feed-forward neural network with linear activation and any number of hidden layers is equivalent to just a linear neural network with no hidden layer. Depth doesn’t contribute to the expressiveness of the model unless we use nonlinear activations between the linear layers.
  • 40. Information Technology 42 1. Sigmoid (used for binary classification mostly at output layer) 2. Relu (used mostly at hidden layers) 3. Tanh (used for better derivatives) 4. Softmax (used for multilevel classifications) 5. Leaky relu 6. Relu6 Some Popular types of activations Functions
  • 41. Information Technology 43 Assume activation function is Sigmoid activation function
  • 42. Information Technology 44 Basic Neuron visualization
  • 43. Information Technology 45 Types of Non - Linearities
  • 44. Information Technology 46 The Softmax regression is a form of logistic regression that normalizes an input value into a vector of values that follows a probability distribution whose total sums up to 1 . Below is an example for multiclass Softmax
  • 45. Information Technology 47 Comparison of activation function Not to be Discussed
  • 46. Information Technology 48 L-layer neural network 64X64X3 b gets broadcasted here
  • 47. Information Technology 49 Cost/Loss Function or cross entropy/similarity
  • 48. Information Technology 50 Back propagation Computation
  • 49. Information Technology 51 Summary of Gradient descent
  • 50. Information Technology 52 Forward propagation and backward propogation
  • 51. Information Technology 53 Since we are using backpropagation the function we generate must be differentiable at any point. Nonlinear functions must be continuous and differentiable between it’s range. Why Differentiable ? Important Properties of activation function!! Rate of Change or Differentiability of loss with respect to weight reaches zero (global minima) to get the perfect predictions!! HOW and WHY? (next Slide)
  • 52. Information Technology 54 Gradient Descent Cost/J W1 W2 Gradient with respect to W1 Adjusted value of W1
  • 53. Information Technology 55 The pooling (POOL) layer reduces the height and width of the input. It helps speed up computation, as well as helps make feature detectors more invariant to its position in the input. The two types of pooling layers are: Max-pooling layer: slides an (f, f) window over the input and stores the max value of the window in the output. Average-pooling layer: slides an (f, f) window over the input and stores the average value of the window in the output. POOLING Layer Theoretically reason for applying pooling is that we would like our computed features not to care about small changes in position in an image
  • 54. Information Technology 56 POOLING Layer Reduces size of parameters considerably
  • 55. Information Technology 57 A Typical CNn Architecture
  • 57. Information Technology 59 If there are 10 Filters and each Filter is 3X3 , how many weights are there to be learnt for an RGB Input image? Exercise for CNN with Volume (3X3X3+1)X10 = 280
  • 58. Information Technology 60 1. Doing Convolution over RGB Image quite important instead of doing with Gray Image. 2. Here in CNN Over Volume , Depth of Input Image should be same as Depth of Filters. The Depth of Filters can be customized to extract different features out of input image. Also if Depth of Filter is 1 , we would extract only gray scale information not the RGB information, hence it won’t be much useful. 3. Various Kind of Filters can be stacked together to do the Feature Extraction based on the requirement. Like Edge detection in only RED channel or in ALL channels etc. 4. Filters are learnable . 5. A filter maps to a local receptive field on the given input image. 6. CNN’s Property is that it avoids overfitting by using less number of parameters no matter how big the input image is . 7. One Layer of Convolution is “Convolution (W * X + b ) + Activation(like Relu etc) “ .From this layer, output goes to next layer of Convolution or Pooling or FC etc. CNN Over volume
  • 59. Information Technology 61 • So whatever be your input depth, only 2-D layer of neurons will be the output. • The output volume is independent of the input volume, and it only depends on the number filters(depth).Following Formula seems to be true  • Number of filters (Depth of the CNN layer) is a hyper parameter. • Each filter has it's own set of weights enabling it to learn a different feature on the same local region covered by the filter. CNN Over Volume Depth of output layer = Depth of convolution layer
  • 63. Information Technology 65 • Tensors are the standard way of representing data in deep learning. • Scalars as rank-0 tensors, vectors as rank-1 tensors, matrices as rank-2 tensors and rank-3 tensor as a rectangular prism of numbers or cube. • A rank-1 tensor has a shape of dimension 1, a rank-2 tensor a shape of dimension 2, and a rank-3 tensor of dimension 3. • RGB images are represented as tensors (three-dimensional arrays), with each pixel having three values corresponding to red, green, and blue components. • Here computation is approached as a dataflow graph/computation graph . In this graph, nodes represent operations (add/mul/cnn/concat/pool etc.), and edges represent data (tensors) . • Calling a TensorFlow operation adds a description of a computation to TensorFlow’s “computation graph”. • Variables in TensorFlow hold tensors and allow for stateful computation that modifies variables to occur. • TensorFlow 1.x is largely follows declarative programming style. Basics on tensorflow
  • 64. Information Technology 66 • Tensorflow 2.0 follows imperative programming style . Hence you can run your model instantly. • TensorFlow derives gradient descent optimization algorithm automatically based on the computation graph and loss function provided by the user. • To monitor, debug, and visualize the training process, and to streamline experiments, TensorFlow comes with TensorBoard. • It has support for distributed training, asynchronous computation with threading and queues, efficient I/O and data formats, and much more. Basics on tensorflow
  • 65. Information Technology 67 Flowing tensors Edges Nodes
  • 66. Information Technology 68 In Tensorflow , A Computation graph is Dataflow Graph. In a dataflow graph, the edges allow data to “flow” from one node to another in a directed manner. Each of the graph’s nodes represents an operation. Operations in the graph include all kinds of functions, from simple arithmetic ones such as subtraction and multiplication to more complex ones. What is a Computation Graph ? The key idea behind computation graphs in TensorFlow is that we first define what computations should take place, and then trigger the computation in an external mechanism.
  • 67. Information Technology 69 Writing and running programs in TensorFlow has the following steps: • Create Tensors (variables) that are not yet executed/evaluated. • Write operations between those Tensors. • Initialize your Tensors. • Create a Session. • Run the Session. This will run the operations you'd written above. Construct a Graph Execute a Graph Note :Importing TensorFlow (with import tensorflow as tf), a specific empty default graph is formed. Additional Graphs can be created (with tf.Graph()) but they need to be set a default graph ( “with ‘graph’.as_default()” ) for operation to be added and executed.
  • 68. Information Technology 70 Requested Node in sess.run() is called Fetches . The requested node is part of elements of the graph , we wish to compute . Fetches Asking sess.run() for multiple nodes’ outputs simply by inputting a list of requested nodes:
  • 69. Information Technology 71 How tensorflow execution works • Starts at requested output/outputs and works backward • Compute Nodes that must be executed according to dependencies • Part of the Graph that would be computed , depends on output query .
  • 70. Information Technology 72 Basic tensorflow tutorial (hands-on) Find the code at https://github.com/intelav/cnn-exploration
  • 71. Information Technology 73 It denotes measure of similarity only when the model outputs class probabilities . Hence it is a "cost" function that attempts to compute the difference between two probability distribution functions. Here it applies to activation function like Softmax and Sigmoid which outputs probabilities , but not to Relu. Relu doesn’t output probabilities . CROSS Entropy (CE)
  • 72. Information Technology 74 Building Neural Network in Tensorflow (Hands-ON) Find the code at https://github.com/intelav/cnn-exploration
  • 73. Information Technology 75 Convolutional Neural Networks in Tensorflow (Hands-ON) Find the code at https://github.com/intelav/cnn-exploration
  • 74. Information Technology 76 A one hot encoding allows the representation of categorical data to be more expressive. There may be problems when there is no ordinal relationship and allowing the representation to lean on any such relationship might be damaging to learning to solve the problem. An example might be the labels ‘dog’ and ‘cat’ ONE HOT Encoding
  • 76. Information Technology 78 Some Tensorflow operations
  • 77. Information Technology 79 Some Tensorflow operations
  • 78. Information Technology 80 Data Types in Tensorflow
  • 81. Information Technology 83 MNIST DataSet of handwritten digits
  • 82. Information Technology 84 MNIST Image Classification with SOFTMAX ONLY (not using spatial information/CNN)Example of Supervised Learning
  • 83. Information Technology 85 Softmax regression model will figure out, • For each pixel in the image, which digits tend to have high (or low) values in that location. Pixel values are correlated with digit in image. • For instance, the center of the image will tend to be white for zeros, but black for sixes. • Thus, a black pixel in the center of an image will be evidence against the image containing a zero, and in favor of it containing a six. MNIST Image Classification with SOFTMAX ONLY (not using spatial information/CNN) Evidence for the image containing the digit 0 Here xi and wi are respective vectors for digit 0 , similarly for all other digits (1….9). Here Learning in this model consists of finding weights that tell us how to accumulate evidence for the existence of each of the digits.
  • 84. Information Technology 86 Graph representation of mnist softmax model “bias term,” which is equivalent to stating which digits we believe an image to be before seeing the pixel values. If you have seen this before, then try adding it to the model and check results.
  • 85. Information Technology 87 7 * 7 * 64 * 1024 ~= 3.2M Vs 28 * 28 * 64 8 1024 ~= 51 M if no Pooling Used. CONV Layer with MNIST DataSet
  • 86. Information Technology 88 CNN Helper Functions
  • 87. Information Technology 89 MNIST CNN MOdel One quarter of the size of image Why Dropout ? (next slide)
  • 88. Information Technology 90 • This is a regularization trick used in order to force the network to distribute the learned representation across all the neurons. • It “turns off” a random preset fraction of the units in a layer, by setting their values to zero during training. • The Dropped-out neurons are random—different for each computation—forcing the network to learn a representation that will work even after the dropout. • This process is often thought of as training an “ensemble” of multiple networks, thereby increasing generalization hence prevent overfitting . • When using the network as a classifier at test time (“inference”), there is no dropout and the full network is used as is . Dropout
  • 90. Information Technology 92 Being able to go from idea to result with the least possible delay is key to finding good models. Other Attributes of Keras Framework  • Keras was developed to enable deep learning engineers to build and experiment with different models very quickly. • Keras is an even higher-level framework and provides additional abstractions. • Keras is more restrictive than the lower-level frameworks, so there are some very complex models that you can implement in TensorFlow but not (without more difficulty) in Keras. Why keras ?
  • 91. Information Technology 93 Two Kind of Models Sequential Models Model Class with Functional APIs 1. Create the model by calling the function above 2. Compile the model by calling model.compile(optimizer = "...", loss = "...", metrics = ["accuracy"]) 3. Train the model on train data by calling model.fit(x = ..., y = ..., epochs = ..., batch_size = ...) 4. Test the model on test data by calling model.evaluate(x = ..., y = ...) Four steps in Keras for training and test Predict using model.predict(input_image of same shape on which it has been trained.)
  • 92. Information Technology 94 KERAS with Sequential Examples Find the code at https://github.com/intelav/cnn-exploration
  • 93. Information Technology 95 Keras with functional examples Find the code at https://github.com/intelav/cnn-exploration
  • 95. Information Technology 97 Exploiting structure is key to success – RNN exploits sequential structure.
  • 97. Information Technology 99 Basic idea behind RNN Rnn – recurrent neural networks Each new element in the sequence contributes some new information, which updates the current state of the model. Markov chain model View data sequences as “chains,” with each node in the chain dependent in some way on the previous node, so that “history” is not erased but carried on Mathematically in statistics and probability
  • 98. Information Technology 100 Rnn models based on chain structure
  • 99. Information Technology 101 Different Kinds of RNN One to many 𝑎<0> 𝑥 𝑦<1> 𝑦<2> 𝑦<𝑇𝑦> ⋯ 𝑥<2> 𝑥<𝑇𝑥> 𝑎<0> 𝑥<1> 𝑦 ⋯ Many to one
  • 100. Information Technology 102 Different Kinds of RNN 𝑎<0> 𝑥<1> 𝑦<𝑇𝑦> ⋯ 𝑥<2> 𝑥<𝑇𝑥> 𝑦<1> 𝑦<2> Many to many
  • 101. Information Technology 103 Different Kinds of RNN Many to many 𝑎<0> 𝑥<1> 𝑦<1> ⋯ 𝑥<𝑇𝑥> 𝑦<𝑇𝑦> ⋯⋯
  • 102. Information Technology 104 Update step for rnn Tan h  hyperbolic tangent function that has its range in [–1,1] xt and ht are the input and state vectors
  • 103. Information Technology 105 MNIST IMAGES AS SEQUENCES
  • 104. Information Technology 106 MNIST with RNN Example Find the code at https://github.com/intelav/cnn-exploration
  • 108. Information Technology 110 Layer Names/Types Size Stride Padding Number of Filters 1 Conv 11X11 4 No 96 2 Max-Pool 3X3 2 3 Conv 5X5 1 Yes 256 4 Max-Pool 3X3 2 5 Conv 3X3 1 Yes 384 6 Conv 3X3 1 Yes 384 7 Conv 3X3 1 Yes 256 8 Max-pool 3X3 2 9 FC [9216,4096] 10 FC [4096,4096] 11 Output layer [4096,10] Alexnet Details 7 hidden “weight” layers
  • 109. Information Technology 111 CONV = 3x3 filter, S=1, Same Convolution , MAX-POOL=2x2,S=2
  • 111. Information Technology 113 Problem with deeper neural networks are they are harder to train and once the number of layers reach certain number, the training error starts to raise again. Deep networks are also harder to train due to exploding and vanishing gradients problem. Problems with Deeper networks?
  • 113. Information Technology 115 ResNet (Residual Network), proposed by He at all in Deep Residual Learning for Image Recognition paper (2015), solves these problems by implementing skip connection where output from one layer is fed to layer deeper in the network as below, hence a "shortcut" or a "skip connection" allows the gradient to be directly backpropagated to earlier layers:  Skip Connection - RESNET
  • 114. Information Technology 116 CONV 3X3 , Same Conv with intermittent Pooling Layers On using CONV 3x3 and Same Conv , dimension of al+2 and al doesn’t change , hence addition of them should not be a problem. But On using Pooling Layers , dimension of al+2 and al differs , then use intermittent Ws to multiplied to get same dimension to achieve identity function as below al+2 = Ws * al Why resnet ?
  • 117. Information Technology 119 Convolutional block with dimension matchup for shortcut with final layer The CONV2D layer on the shortcut path does not use any non-linear activation function. Its main role is to just apply a (learned) linear function that reduces the dimension of the input, so that the dimensions match up for the later addition step.
  • 118. Information Technology 120 The advantages of ResNets are: • performance doesn’t degrade with very deep network • cheaper to compute • ability to train very very deep network ResNet works because: • identify function is easy for residual block to learn • using a skip-connection helps the gradient to back-propagate and thus helps you to train deeper networks Benefits of resnet
  • 119. Information Technology 121 This leads to less computation by giving the similar output of reduced channel dimensions for a given input image. it suffers with less over-fitting due to smaller kernel size (1x1). One by One convolution was first introduced in this paper titled Network in Network . 1X1 CONVOLUTION or Network-IN- Network
  • 120. Information Technology 122 1 x 1 Convolution
  • 121. Information Technology 123 Depth wise separable Convolution – A foundation to mobilenet and googlnet/Inception network and Many more…DF  Input/output Feature Map, Dk  Kernel Map, M  Input Depth , NOutput Depth
  • 122. Information Technology 124 Mobilenet’s efficiency Vs VGG/Googlenet/Alexnet’s
  • 124. Information Technology 126 One layer of Inception network - googlenet
  • 127. Information Technology 129 Resnet-50 Find the code at https://github.com/intelav/cnn-exploration
  • 130. Information Technology 132 Object Detection Object Localization Image Classification Probabilities whether an object exists or not ? Classes to detect in Image (car,Pedestrian Motorcycle ? Bounding Box Location numbers for each Image in form of coordinates
  • 131. Information Technology 133 Train Convnet on Cropped Images and then run sliding windows protocol as follows  HOW OD works? Run Sliding Window convolutionally to get all bounding box in one shot Sequentially with different window sizes on the given input image leading to higher computation or
  • 132. Information Technology 134 Evaluating object localization IoU (Intersection of Union) is a measure of the overlap between two bounding boxes. How accurate detected bounding boxes are ?
  • 133. Information Technology 135 IOU Computation Example
  • 134. Information Technology 136 Non-max suppression Get Rid of Bounding box with Low Probability detection Score for each Class detected Get Rid of Overlapped Bounding Box by measuring IoU with Highest Probability Boxes for each of the class detected
  • 135. Information Technology 137 Anchor boxes (for overlapping objects) • Propose 2-5 Anchor boxes for different sizes of objects detected • Match IoU of Anchor Boxes with the Ground Truth of Image and Predict that Class in respective Anchor Box • Anchor boxes are defined only by their width and height
  • 136. Information Technology 138 Yolo(You Look only once) Sum of 80 Classes and 5 parameters 5 Anchor Boxes Grid Cell
  • 137. Information Technology 139 Region proposal CNN (R-CNN) Run Semantic Segmentation Algorithm to detect Blobs in the Image Here Blob is part of Image where at least one object can be classified . Run Classifier on the Proposed regions of Blobs only (also called ROI- region of interest)
  • 138. Information Technology 140 • YOLO (you only look once) • F-RCNN • Fast-RCNN • Faster-RCNN • SSD (single Shot detector) and • Many More … Other better methods of od
  • 141. Information Technology 143 Face Recognition Detect Stages for Face Recognition Align Represent Classify MTCN N Facenet
  • 142. Information Technology 144 Face embedding to be used for Oneshot Learning, Similarity function, Siamese Network , triplet loss and logisitic regression ⋮ f(𝑥(𝑖) ) ⋮ f(𝑥(𝑗) ) 𝑦
  • 143. Information Technology 145 An encoding is a good one if: • The encodings of two images of the same person are quite similar to each other • The encodings of two images of different persons are very different One example of face embedding/encoding
  • 144. Information Technology 146 Some other examples of encoding
  • 145. Information Technology 147 Training via Triple Loss uses triplets of Images (A,P,N) where • A is an "Anchor" image--a picture of a person. • P is a "Positive" image--a picture of the same person as the Anchor image. • N is a "Negative" image--a picture of a different person than the Anchor image. Triplet Loss in Detail Minimize the Loss Function using Gradient Descent Margin( Hyper parameter) Denoting max(z,0) as [z]+
  • 146. Information Technology 148 • Train via Triplet Loss and then Classify • Train via Logistic Regression and then Classify • Compare Pre-computed Encoding (Facenet ) with the Encoding of New Image Different Ways of Doing face recognition
  • 147. Information Technology 149 Face verification vs. face recognition Verification • Input image, name/ID • Output whether the input image is that of the claimed person Recognition • Has a database of K persons • Get an input image • Output ID if the image is any of the K persons (or “not recognized”)
  • 149. Information Technology 151 Android AI Architecture on Celadon Android Applications TensorFlow Lite Android NNAPI/NN Runtime Android NN HAL CPU fallback Caffe Tensorflow MxNet Model Optimizer ONNX Kaldi TensorFlow Load Plugin, XML and Infer OPENVINO (Inference Engine API ) Plug-InArchitecture USB/PCI Driver Myrad 2/X DLA GEN CPU: Xeon/Core /SKL/Atom MKL-DNN clDNN Plugin MKLDNN Plugin PCI Driver Intrinsics FPGA Plugin clDNN/Open CL Myriad Plugin GNA Plugin PCI Driver GNA GNA API C++ C+ + Java/JNI VULKAN Driver CPU: Xeon/Core /SKL/Atom MKLDNN HAL VPU HAL GPU HAL GNA HAL VULKAN HAL Android OPENVIN O GOS AI Team AI Frameworks Intel HW Execution Capabilities Partitioning IR https://github.com/projectceladon
  • 150. Information Technology 152 OpenCV [OpenCV] is an open source (see http://opensource.org) computer vision library available from http://opencv.org. In 1999 Gary Bradski [Bradski], working at Intel Corporation, launched OpenCV with the hopes of accelerating computer vision and artificial intelligence by providing a solid infrastructure for everyone working in the field. The OpenCV library contains over 500 functions that span many areas in vision, including factory product inspection, medical imaging, security, user interface, camera calibration, stereo vision, and robotics. It has its own ML and DNN libraries . What is opencv? OPENCV Architecture on Different OSes
  • 152. Information Technology 154 Interactive face demo OPENCV API/Data Types
  • 153. Information Technology 155 Interactive face demo OPENVINO API/Data Types

Editor's Notes

  1. In this Architecture device selection done by NN runtime