IT Professional's Guide to Deep Learning and Computer Vision

Information Technology 2
Day 1
• Module 1 – Computer Vision and Neural Networks- 45 Min
10 mins break
• Module 2 – CNN Internals– 90 Min
• Q & A – 20 min
1 Hr Lunch break
• Module 3 – Tensorflow – 45 Min
• Module 4 – Mnist Dataset with NN and CNN-30 Min
• 10 mins break
• Module 5 – Keras Framework – 30 min
• Module 6 – Short Talk on RNN- 30 Min
Day 2
• Module 7 – Classic Networks - 75 Min
10 mins break
• Module 8 – Classic Networks Programming (Resnet 50
etc.) – 30 Min
• Module 9 – Short Talk on Object Detection and Face
Recognition – 45 Min
1 Hr Lunch break
 Module 10 – Android AI with OPENVINO – 30 min
Course Topics (9-4, 9-12)

Refrences
https://www.pyimagesearch.com/
http://www.deeplearningbook.org/contents/intro.html

What is COMPUETR VISION (Legacy)
Computer vision is the transformation of data from a still or video camera
into either a decision or a new representation.
Decisions like “laser range finder indicates an object is 1 meter away” or
“there is a person in this scene” or “there are 14 tumor cells on this slide”
etc….
A new representation might mean turning a color image into a grayscale
image or removing camera motion from an image sequence.

What is deep learning
From Geforrey
Hilton (~9 min)

Vision
• Human thinks vision is easy (though its seamless), but the human brain divides the vision signal into many channels
that stream different kinds of information into your brain.
• Your brain has an attention system that identifies, in a task-dependent way, important parts of an image to examine
while suppressing examination of other areas.
• There is massive feedback in the visual stream that is, as yet, little understood.
• There are widespread associative inputs from muscle control sensors and all of the other senses that allow the brain
to draw on cross-associations made from years of living in the world.
• The feedback loops in the brain go back to all stages of processing, including the hardware sensors themselves (the
eyes), which mechanically control lighting via the iris and tune the reception on the surface of the retina.

Many neurons in the visual cortex have a small local receptive field, meaning they react only to visual
stimuli located in a limited region of the visual field (see next slide , in which the local receptive fields of
five neurons are represented by dashed circles). The receptive fields of different neurons may overlap,
and together they tile the whole visual field. Moreover, some neurons react only to images of horizontal
lines, while others react only to lines with different orientations (two neurons may have the same
receptive field but react to different line orientations). They also noticed that some neurons have larger
receptive fields, and they react to more complex patterns that are combinations of the lower-level
patterns. These observations led to the idea that the higher-level neurons are based on the outputs of
neighboring lower-level neurons (in next slide, notice that each neuron is connected only to a few
neurons from the previous layer). This powerful architecture is able to detect all sorts of complex patterns
in any area of the visual field.
Excerpts from Nobel prize winner in physiology in 1981, David H. Hubel
and Torsten Wiesel
A neuroscientific motivation behind
convolution

Biological Neurons in Visual cortex
Receptive Fields
Neurons

Brain possesses more than 1011 cells (or neurons), each of
which have well over 104 contacts/weights (or synapses)
with other neurons. If each neuron acts as a type of
microprocessor, then we have an immense computer (total
~1015 weights) in which all the processing elements can
operate concurrently.

Simulating vision intelligence
by deep learning !!
The hierarchy of concepts enables the computer
to learn complicated concepts by building them out
of simpler ones. If we draw a graph showing how
these concepts are built on top of each other, the
graph is deep, with many layers. For this reason,
we call this approach to AI deep learning.

What motivates us to do Computer
VISION with deep learning?
Avalanche/intrusion/landslides ?
License plate recognition – Traffic rule
violation/law-order situation
Medical Imaging

Machine vision
GRID OF
CELLS

16
Machine vision
GRID OF CELLS

Machine vision
Computer receives a grid of numbers/Cells from the camera or from disk.
For the most part, there’s no built-in pattern recognition, no automatic control of
focus and aperture, no cross-associations with years of experience. It’s a naïve
vision systems.
Any given number within grid shown on previous slide has a rather large noise
component and so by itself gives us little information, but this grid of numbers is
all the computer “sees” .
Our task, then, becomes to turn this noisy grid of numbers into the perception
understandable to humans.

Pixels- Each cell in the grid is called a pixel

Image Data

Image representation

Module 2 – CNN and NN Internals

Deep Learning using neural networks
• In deep learning, we feed millions of data instances into a network of neurons (neural networks) ,
teaching them to recognize patterns from raw inputs.
• The deep neural networks take raw inputs (such as pixel values in an image) and transform them into
useful representations, extracting higher-level features (such as shapes and edges in images) that
capture complex concepts by combining smaller and smaller pieces of information to solve challenging
tasks such as image classification.
• The networks automatically learn to build abstract representations by adapting and correcting
themselves, fitting patterns observed in the data. Here networks are trained with a feedback process
called backpropagation based on gradient descent optimization.

Neural Network examples
Standard NN Recurrent NNConvolutional NN

Two way neural network

What is filter/kernel and what it does?
Notice that the vertical white lines get
enhanced while the rest gets blurred
notice that the horizontal white lines
get enhanced while the rest is blurred
out
In Deep Learning Literature , No need to define the filters manually, Instead during training the
convolutional layer will automatically learn the most useful filters for its task, and the layers above will
learn to combine them into more complex patterns.
Filter examples like Sobel
filter , Scharr filter etc.

What does it mean to convolve?Each value in the matrix on the left corresponds to a
single pixel value, and we convolve a 3x3 filter with
the image by multiplying its values element-wise with
the original matrix, then summing them up and adding
a bias. Here the Stride value is 1.

How cnn works with multiple filters?

A convolution layer transforms an input volume into an output volume of different size. The size of input image generally
reduces!! (see Padding in upcoming slides if size reduction is a concern!!)
What does cnn do finally ?

Parameter sharing: A feature detector (such as a vertical edge detector)
that’s useful in one part of the image is probably useful in another part of
the image.
Sparsity of connections: In each layer, each output value depends only
on small number of inputs.
Nature Of Images  Establishing Invariance (Irrespective of Translation)
Learning Theory  Establishing Regularization (reducing Degrees of
Freedom to Degrees of Filter Size)
WHY CNN ?

One Convolution Layer

The main benefits of padding are the following:
– It allows you to use a CONV layer without necessarily shrinking the height and width of the
volumes. This is important for building deeper networks, since otherwise the height/width would
shrink as you go to deeper layers. An important special case is the "same" convolution, in which
the height/width is exactly preserved after one layer.
– It helps us keep more of the information at the border of an image. Without padding, very few
values at the next layer would be affected by pixels as the edges of an image.
Two Kinds of Padding 
1) Valid Convolutions  No Padding is done.
2) Same Convolutions  Pad to keep the output size same as input size.
Padding

Example of Zero padding

With each layer of convolution , the output size reduces at the edge i.e.
output images shrinks. Hence with deeper networks, lot of informations
at the edge would go away . Hence Padding is needed.
Even in some case, information at edge becomes crucial or It helps us
keep more of the information at the border of an image , , hence padding
would be required anyway .
Why Padding is required?

Stride

Forumlas

Fire a Neuron or Not!!  To activate a neuron ,
specialized functions are used which are called as
activation functions.
The purpose of the activation function is to introduce non-linearity into the network .
Real world problems are non-linear.
Hence, to make the incoming data nonlinear, we use nonlinear mapping called activation function.
An activation function is a decision making function that determines the presence of a particular neural
feature.
What is activation function ?

why activation function is needed?
A feed-forward neural network with linear
activation and any number of hidden layers
is equivalent to just a linear neural network
with no hidden layer.
Depth doesn’t contribute to
the expressiveness of the
model unless we use
nonlinear activations
between the linear layers.

1. Sigmoid (used for binary classification mostly at output layer)
2. Relu (used mostly at hidden layers)
3. Tanh (used for better derivatives)
4. Softmax (used for multilevel classifications)
5. Leaky relu
6. Relu6
Some Popular types of activations
Functions

Assume activation function is
Sigmoid activation function

Basic Neuron visualization

Types of Non - Linearities

The Softmax regression is a form of logistic regression that normalizes an input value into
a vector of values that follows a probability distribution whose total sums up to 1 . Below is
an example for multiclass
Softmax

Comparison of activation function
Not to be
Discussed

L-layer neural network
64X64X3
b gets broadcasted here

Cost/Loss Function or cross
entropy/similarity

Back propagation Computation

Summary of Gradient descent

Forward propagation and backward
propogation

Since we are using backpropagation the function we generate must be
differentiable at any point.
Nonlinear functions must be continuous and differentiable between it’s range.
Why Differentiable ?
Important Properties of activation
function!!
Rate of Change or Differentiability of
loss with respect to weight reaches
zero (global minima) to get the
perfect predictions!!
HOW and WHY?
(next Slide)

Gradient Descent
Cost/J
W1
W2
Gradient with
respect to
W1
Adjusted value
of W1

The pooling (POOL) layer reduces the height and width of the input. It helps speed up computation, as well as helps make
feature detectors more invariant to its position in the input. The two types of pooling layers are:
Max-pooling layer: slides an (f, f) window over the input and stores the max value of the window in the output.
Average-pooling layer: slides an (f, f) window over the input and stores the average value of the window in the output.
POOLING Layer
Theoretically reason for applying pooling
is that we would like our computed
features not to care about small changes
in position in an image

POOLING Layer Reduces size of
parameters considerably

A Typical CNn Architecture

Cnn over volume

If there are 10 Filters and each Filter is 3X3 , how many
weights are there to be learnt for an RGB Input image?
Exercise for CNN with Volume
(3X3X3+1)X10 = 280

1. Doing Convolution over RGB Image quite important instead of doing with Gray Image.
2. Here in CNN Over Volume , Depth of Input Image should be same as Depth of Filters. The Depth of
Filters can be customized to extract different features out of input image. Also if Depth of Filter is 1 ,
we would extract only gray scale information not the RGB information, hence it won’t be much
useful.
3. Various Kind of Filters can be stacked together to do the Feature Extraction based on the
requirement. Like Edge detection in only RED channel or in ALL channels etc.
4. Filters are learnable .
5. A filter maps to a local receptive field on the given input image.
6. CNN’s Property is that it avoids overfitting by using less number of parameters no matter how big
the input image is .
7. One Layer of Convolution is “Convolution (W * X + b ) + Activation(like Relu etc) “ .From this layer,
output goes to next layer of Convolution or Pooling or FC etc.
CNN Over volume

• So whatever be your input depth, only 2-D layer of neurons will be the output.
• The output volume is independent of the input volume, and it only depends on the number
filters(depth).Following Formula seems to be true 
• Number of filters (Depth of the CNN layer) is a hyper parameter.
• Each filter has it's own set of weights enabling it to learn a different feature on the same local region
covered by the filter.
CNN Over Volume
Depth of output layer = Depth of convolution layer

CNN over volume

Libraries
Keras

• Tensors are the standard way of representing data in deep learning.
• Scalars as rank-0 tensors, vectors as rank-1 tensors, matrices as rank-2 tensors and rank-3 tensor as a rectangular
prism of numbers or cube.
• A rank-1 tensor has a shape of dimension 1, a rank-2 tensor a shape of dimension 2, and a rank-3 tensor of dimension
3.
• RGB images are represented as tensors (three-dimensional arrays), with each pixel having three values corresponding
to red, green, and blue components.
• Here computation is approached as a dataflow graph/computation graph . In this graph, nodes represent operations
(add/mul/cnn/concat/pool etc.), and edges represent data (tensors) .
• Calling a TensorFlow operation adds a description of a computation to TensorFlow’s “computation graph”.
• Variables in TensorFlow hold tensors and allow for stateful computation that modifies variables to occur.
• TensorFlow 1.x is largely follows declarative programming style.
Basics on tensorflow

• Tensorflow 2.0 follows imperative programming style . Hence you can run your model instantly.
• TensorFlow derives gradient descent optimization algorithm automatically based on the computation graph and loss
function provided by the user.
• To monitor, debug, and visualize the training process, and to streamline experiments, TensorFlow comes with
TensorBoard.
• It has support for distributed training, asynchronous computation with threading and queues, efficient I/O and data
formats, and much more.
Basics on tensorflow

Flowing tensors
Edges
Nodes

In Tensorflow , A Computation graph is Dataflow Graph.
In a dataflow graph, the edges allow data to “flow” from one node to another in a directed manner.
Each of the graph’s nodes represents an operation.
Operations in the graph include all kinds of functions, from simple arithmetic ones such as subtraction
and multiplication to more complex ones.
What is a Computation Graph ?
The key idea behind computation graphs in
TensorFlow is that we first define what
computations should take place, and then
trigger the computation in an external
mechanism.

Writing and running programs in TensorFlow has the following steps:
• Create Tensors (variables) that are not yet executed/evaluated.
• Write operations between those Tensors.
• Initialize your Tensors.
• Create a Session.
• Run the Session. This will run the operations you'd written above.
Construct a Graph
Execute a Graph
Note :Importing TensorFlow (with import tensorflow as tf), a specific empty default
graph is formed. Additional Graphs can be created (with tf.Graph()) but they need
to be set a default graph ( “with ‘graph’.as_default()” ) for operation to be added
and executed.

Requested Node in sess.run() is called Fetches . The requested node is
part of elements of the graph , we wish to compute .
Fetches
Asking sess.run() for multiple
nodes’ outputs simply by
inputting a list of requested
nodes:

How tensorflow execution works
• Starts at requested output/outputs and works backward
• Compute Nodes that must be executed according to
dependencies
• Part of the Graph that would be computed , depends on
output query .

Basic tensorflow tutorial (hands-on)
Find the code at https://github.com/intelav/cnn-exploration

It denotes measure of similarity only when the model outputs class probabilities .
Hence it is a "cost" function that attempts to compute the difference between two
probability distribution functions.
Here it applies to activation function like Softmax and Sigmoid which outputs
probabilities , but not to Relu. Relu doesn’t output probabilities .
CROSS Entropy (CE)

Building Neural Network in Tensorflow
(Hands-ON)

Convolutional Neural Networks in
Tensorflow (Hands-ON)

A one hot encoding allows the representation of categorical data to be more
expressive.
There may be problems when there is no ordinal relationship and allowing the
representation to lean on any such relationship might be damaging to learning to
solve the problem. An example might be the labels ‘dog’ and ‘cat’
ONE HOT Encoding

one-hot encoding

Some Tensorflow operations

Data Types in Tensorflow

Tensorflow initializers

MNIST DataSet of handwritten digits

MNIST Image Classification with SOFTMAX ONLY
(not using spatial information/CNN)Example of Supervised
Learning

Softmax regression model will figure out,
• For each pixel in the image, which digits tend to have high (or low) values in that location. Pixel values are
correlated with digit in image.
• For instance, the center of the image will tend to be white for zeros, but black for sixes.
• Thus, a black pixel in the center of an image will be evidence against the image containing a zero, and in favor of it
containing a six.
MNIST Image Classification with SOFTMAX ONLY
(not using spatial information/CNN)
Evidence for the image containing the digit 0
Here xi and wi are respective vectors for digit 0 , similarly for all other digits (1….9). Here
Learning in this model consists of finding weights that tell us how to accumulate evidence
for the existence of each of the digits.

Graph representation of mnist softmax
model “bias term,” which is equivalent to
stating which digits we believe an image
to be before seeing the pixel values. If
you have seen this before, then try
adding it to the model and check results.

7 * 7 * 64 * 1024 ~= 3.2M Vs
28 * 28 * 64 8 1024 ~= 51 M if
no Pooling Used.
CONV Layer with MNIST
DataSet

CNN Helper Functions

MNIST CNN MOdel
One quarter of
the size of image
Why Dropout ?
(next slide)

• This is a regularization trick used in order to force the network to distribute the learned representation
across all the neurons.
• It “turns off” a random preset fraction of the units in a layer, by setting their values to zero during
training.
• The Dropped-out neurons are random—different for each computation—forcing the network to learn a
representation that will work even after the dropout.
• This process is often thought of as training an “ensemble” of multiple networks, thereby increasing
generalization hence prevent overfitting .
• When using the network as a classifier at test time (“inference”), there is no dropout and the full
network is used as is .
Dropout

Being able to go from idea to result with the least possible delay is
key to finding good models.
Other Attributes of Keras Framework 
• Keras was developed to enable deep learning engineers to build and experiment with different
models very quickly.
• Keras is an even higher-level framework and provides additional abstractions.
• Keras is more restrictive than the lower-level frameworks, so there are some very complex models
that you can implement in TensorFlow but not (without more difficulty) in Keras.
Why keras ?

Two Kind of Models
Sequential Models
Model Class with
Functional APIs
1. Create the model by calling the function above
2. Compile the model by calling model.compile(optimizer = "...",
loss = "...", metrics = ["accuracy"])
3. Train the model on train data by calling model.fit(x = ..., y = ...,
epochs = ..., batch_size = ...)
4. Test the model on test data by calling model.evaluate(x = ..., y =
...)
Four steps
in Keras for
training and
test
Predict using model.predict(input_image of same shape on which
it has been trained.)

KERAS with Sequential Examples

Keras with functional examples

Exploiting structure is key to success –
RNN exploits sequential structure.

Sequence of Data

Basic idea behind RNN
Rnn – recurrent neural networks
Each new element in the
sequence contributes some
new information, which
updates the current state of
the model.
Markov chain model
View data sequences as “chains,” with
each node in the chain dependent in
some way on the previous node, so
that “history” is not erased but carried
on
Mathematically in
statistics and probability

Rnn models based on chain structure

Different Kinds of RNN
One to many
𝑎<0>
𝑥
𝑦<1>
𝑦<2>
𝑦<𝑇𝑦>
⋯
𝑥<2>
𝑥<𝑇𝑥>
𝑎<0>
𝑥<1>
𝑦
⋯
Many to one

𝑎<0>
𝑥<1>
𝑦<𝑇𝑦>
⋯
𝑥<2>
𝑥<𝑇𝑥>
𝑦<1> 𝑦<2>
Many to many

Many to many
𝑎<0>
𝑥<1>
𝑦<1>
⋯
𝑥<𝑇𝑥>
𝑦<𝑇𝑦>
⋯⋯

Update step for rnn
Tan h  hyperbolic tangent function that has its range in [–1,1]
xt and ht are the input and state vectors

MNIST IMAGES AS SEQUENCES

MNIST with RNN Example

Layer Names/Types Size Stride Padding Number of Filters
1 Conv 11X11 4 No 96
2 Max-Pool 3X3 2
3 Conv 5X5 1 Yes 256
4 Max-Pool 3X3 2
8 Max-pool 3X3 2
9 FC [9216,4096]
10 FC [4096,4096]
11 Output layer [4096,10]
Alexnet Details
7 hidden “weight” layers

CONV = 3x3 filter, S=1, Same Convolution , MAX-POOL=2x2,S=2

VGG Net Details

Problem with deeper neural networks are they are harder to train and once the number of layers reach certain number, the
training error starts to raise again.
Deep networks are also harder to train due to exploding and vanishing gradients problem.
Problems with Deeper networks?

resnet

ResNet (Residual Network), proposed by He at all in Deep Residual Learning for Image Recognition paper (2015), solves
these problems by implementing skip connection where output from one layer is fed to layer deeper in the network as
below, hence a "shortcut" or a "skip connection" allows the gradient to be directly backpropagated to earlier layers: 
Skip Connection - RESNET

CONV 3X3 , Same Conv with intermittent Pooling Layers
On using CONV 3x3 and Same Conv , dimension of al+2 and al doesn’t change ,
hence addition of them should not be a problem. But
On using Pooling Layers , dimension of al+2 and al differs , then use intermittent
Ws to multiplied to get same dimension to achieve identity function as below
al+2 = Ws * al
Why resnet ?

Resnet details

Multiple layers skipping

Convolutional block with dimension matchup for
shortcut with final layer
The CONV2D layer on the shortcut path
does not use any non-linear activation
function. Its main role is to just apply a
(learned) linear function that reduces the
dimension of the input, so that the
dimensions match up for the later
addition step.

The advantages of ResNets are:
• performance doesn’t degrade with very deep network
• cheaper to compute
• ability to train very very deep network
ResNet works because:
• identify function is easy for residual block to learn
• using a skip-connection helps the gradient to back-propagate and thus helps
you to train deeper networks
Benefits of resnet

This leads to less computation by giving the similar output of reduced channel dimensions for a given input image.
it suffers with less over-fitting due to smaller kernel size (1x1).
One by One convolution was first introduced in this paper titled Network in Network .
1X1 CONVOLUTION or Network-IN-
Network

1 x 1 Convolution

Depth wise separable Convolution – A foundation to
mobilenet and googlnet/Inception network and Many more…DF  Input/output Feature Map,
Dk  Kernel Map,
M  Input Depth , NOutput Depth

Mobilenet’s efficiency Vs
VGG/Googlenet/Alexnet’s

Mobilenet architecture

One layer of Inception network -
googlenet

Complete Inception network

Resnet-50

Object Detection
Object Localization
Image Classification
Probabilities whether an
object exists or not ?
Classes to detect in
Image (car,Pedestrian
Motorcycle ?
Bounding Box Location
numbers for
each Image
in form of
coordinates

Train Convnet on Cropped Images and then run sliding windows protocol as follows 
HOW OD works?
Run Sliding Window
convolutionally to get all
bounding box in one
shot
Sequentially with
different window sizes
on the given input
image leading to
higher computation
or

Evaluating object localization
IoU (Intersection of Union) is a measure of the overlap between two bounding
boxes.
How accurate detected
bounding boxes are ?

IOU Computation Example

Non-max suppression
Get Rid of Bounding box with
Low Probability detection
Score for each Class detected
Get Rid of Overlapped
Bounding Box by measuring
IoU with Highest Probability
Boxes for each of the class
detected

Anchor boxes (for overlapping objects)
• Propose 2-5 Anchor boxes for different
sizes of objects detected
• Match IoU of Anchor Boxes with the
Ground Truth of Image and Predict that
Class in respective Anchor Box
• Anchor boxes are defined only by their
width and height

Yolo(You Look only once)
Sum of 80 Classes and 5 parameters
5 Anchor Boxes
Grid Cell

Region proposal CNN (R-CNN)
Run Semantic Segmentation Algorithm to
detect Blobs in the Image
Here Blob is part of Image where at least
one object can be classified .
Run Classifier on the Proposed regions
of Blobs only (also called ROI- region of
interest)

• YOLO (you only look once)
• F-RCNN
• Fast-RCNN
• Faster-RCNN
• SSD (single Shot detector) and
• Many More …
Other better methods of od

Methodologies of object detection

Face Recognition
Detect
Stages for Face Recognition
Align
Represent
Classify
MTCN
N
Facenet

Face embedding to be used for Oneshot Learning,
Similarity function, Siamese Network , triplet loss
and logisitic regression
⋮
f(𝑥(𝑖)
)
⋮
f(𝑥(𝑗)
)
𝑦

An encoding is a good one if:
• The encodings of two images of the same person are quite similar to each other
• The encodings of two images of different persons are very different
One example of face
embedding/encoding

Some other examples of encoding

Training via Triple Loss uses triplets of Images (A,P,N) where
• A is an "Anchor" image--a picture of a person.
• P is a "Positive" image--a picture of the same person as the Anchor image.
• N is a "Negative" image--a picture of a different person than the Anchor image.
Triplet Loss in Detail
Minimize the Loss
Function using Gradient
Descent
Margin( Hyper parameter)
Denoting max(z,0) as [z]+

• Train via Triplet Loss and then Classify
• Train via Logistic Regression and then Classify
• Compare Pre-computed Encoding (Facenet ) with the Encoding of New
Image
Different Ways of Doing face recognition

Face verification vs. face recognition
Verification
• Input image, name/ID
• Output whether the input image is that of the claimed person
Recognition
• Has a database of K persons
• Get an input image
• Output ID if the image is any of the K persons (or “not recognized”)

Android AI Architecture on Celadon
Android Applications TensorFlow Lite
Android NNAPI/NN Runtime
Android NN
HAL
CPU
fallback
Caffe Tensorflow MxNet
Model
Optimizer
ONNX Kaldi
TensorFlow
Load Plugin, XML and
Infer
OPENVINO (Inference Engine API )
Plug-InArchitecture
USB/PCI
Driver
Myrad
2/X
DLA
GEN
CPU: Xeon/Core
/SKL/Atom
MKL-DNN
clDNN Plugin
MKLDNN
Plugin
PCI Driver
Intrinsics
FPGA Plugin
clDNN/Open
CL
Myriad
Plugin
GNA Plugin
PCI Driver
GNA
GNA API
C++
C+
+
Java/JNI
VULKAN Driver
CPU:
Xeon/Core
/SKL/Atom
MKLDNN
HAL
VPU
HAL
GPU
HAL
GNA
HAL
VULKAN
HAL
Android OPENVIN
O
GOS AI Team AI
Frameworks
Intel
HW
Execution Capabilities
Partitioning
IR
https://github.com/projectceladon

OpenCV [OpenCV] is an open source (see http://opensource.org) computer vision library available from http://opencv.org.
In 1999 Gary Bradski [Bradski], working at Intel Corporation, launched OpenCV with the hopes of accelerating computer
vision and artificial intelligence by providing a solid infrastructure for everyone working in the field.
The OpenCV library contains over 500 functions that span many areas in vision, including factory product inspection,
medical imaging, security, user interface, camera calibration, stereo vision, and robotics.
It has its own ML and DNN libraries .
What is opencv?
OPENCV Architecture on Different OSes

https://docs.openvinotoolkit.org/latest/_intel_models_face_detection_adas_0001_description_face_detection_adas_0001.
html#inputs
https://docs.openvinotoolkit.org/latest/_demos_interactive_face_detection_demo_README.html
Interactive face demo using OPENCV
and OPENVINO

Interactive face demo
OPENCV
API/Data
Types

Interactive face demo
OPENVINO
API/Data
Types

Q & A

Thankyou!!

IT Professional's Guide to Deep Learning and Computer Vision

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to IT Professional's Guide to Deep Learning and Computer Vision

Similar to IT Professional's Guide to Deep Learning and Computer Vision (20)

Recently uploaded

Recently uploaded (20)

IT Professional's Guide to Deep Learning and Computer Vision

Editor's Notes