SlideShare a Scribd company logo
1 of 87
Advance Topics in Deep
Learning:
Deep Computer Vision
• By: Dr. Zafar Mahmood Khattak,
PhD (AI)
Deep Computer Vision
• How to build computer that can achieve the sense of sight and vision
• The outcome of this topic is to learn how we can use deep learning & machine learning to build powerful vision
systems that can both see and predict what is where by only looking at raw visual inputs.
• Vision, one of the most important human senses that we all have.
• characteristics of sighted peoples……..
Vision is much more than understanding what is where
Vision is much more than understanding what is where
The rise and impact of impact
of Computer Vision
Impact: Facial Detection and & Recognition
Impact: Self-Driving Cars: a go-to example
Impact: Medicine, Biology, Healthcare
Impact: Medicine, Biology, Healthcare
• What Computer “see”
How images are Represented in Computer?
2-D Matrix of Numbers
Grayscale image
How the representation if we have a color image?
1 no for pixel or 3 no for pixel
1-2D Matrix or 3-2D Matrix
How images
are
Represented
in Computer?
Processing in Computer Vision
Zavyar
Wildan
Amir
Rovaid
Classification: Predict/classify output as a discrete class label such as Male or Female, True or False,
Spam or Not Spam, etc.
Regression: Predict output as continuous values such as price, salary, age, etc.
What types of computer vision algorithm that we can build to take this as input to predict an image
• If we have to build a computer vision
pipeline, two important steps to be
considered: identify feature/patters
looking for,
• Detect those features to infer the
membership of the class
High Level Feature Detection: What we Need
What are the issues Problem while extracting the features?
What makes up a face We can define what of each of those
components/features look like
We can define what of each of those
components/features look like in
defining our features
High Level Feature Detection: Remember Images are just 3-D arrays of numbers
•Lets Revised
concept of Neural
Network Lecture for
features extraction
learning Features Representation Neural Network Lecture)
• Can we learn a hierarchy of features directly from the data instead of hand engineering?
• How Machine Learning deals with such types of images?
• How to detect Features and Patterns in an image?
• Time consuming
• Hard to do
• Not scalable in practice
• Key Idea of Deep Learning:
• No Human intervention
• Machine extract and uncover the core patterns in the data and then
• Decision/prediction on new data/sample
• Example of detecting faces in image using DLNN:
• detect edges
• Detect line and edges
• Combine to create composition of features (curves and corners) results in
• High level features (eyes, nose, eras etc.)
• Hierarchical learning of features
• Progression of information
• Face detected
Solution is :::
Deep learning
Algorithms
(Neural
Network)
Learning visual Features
How Neural network Extract hierarchical features in the image Domain? And Most
Importantly
How Neural network allows us to learn those visual features from visual Data if
cleverly constructed!!!
Fully
Connected
Neural
Network
Fully
Connected
Neural
Network
Input:
• 2-D Image
• Collapsing into 1-D sequence
of number (Vector of pixel
values)
Fully Connected:
• Connect Neuron in hidden layer to all
neurons in input layer
• No spatial Information (3-D or 4-D)
• Large no. of parameters, as the system is
fully connected
Fully connected NN can’t process 2-D Matrix
Remember that our image is just 2-D array
Fully Connected Neural Network
How can we use spatial structure in the input to inform the architecture of the
network?
Using Spatial
Structure
Input:
• 2-D Image
• Collapsing into 1-D sequence
of number (Vector of pixel
values)
• Idea: connect patches of input
to neuron in hidden layer
• Neuron connected to region
of input, only sees the values
Using Spatial Structure
Connect patch in input layer to a single neuron in subsequent layer by using a sliding window to define connections, and thus
preserving very rich spatial information.
How we can weight the patch to detect particular patch?
Applying
Filters to
Extract
Features Apply a set of
weight - a filter –
to extract local
features
1
Use Multiple
filters to extract
different
features.
2
Spatially share
parameters of
each filter
3
Features Extraction with Convolution: An Example
• Filter of size 4*4= 16 different weights
• Apply this same filter to 4*4 patches in input and use the result of
that operation to update the state of the neuron in the next layer
• Shift by 2 pixel for next patch to define the next neuron in the
adjacent location in the future layer
Concept of Convolution at high level
Feature Extraction and
Convolution: A Case Study
How convolution allows us to learn these features/patterns in the data
which is our ultimate goal
Designing a Convolutional Algorithm to Detect or Classify X as X in a Black & White Image
No Grayscale image, black and white (1-
white and -1 for black)
Detect x in both of these images slightly
rotated
Cleverly we have to detect those feature that define an X: So let see how to use
Convolution to do that
Identifying Features of X using Convolution
We need our model to compare images of this X, piece by piece or patch by patch and the important patches that we
look for are exactly these features that will define our X:
If our model can find these rough features roughly in the same position in our input than we can infer that these two
images or of the same type
Identifying Features of X using Filters
In the case of X’s these filters may represent semantic things for example the diagonal lines or the crossing that captures all of
the important characteristics of the X
Probably we will capture these features in the arms and the centers of our letter in any image of X regardless of its position
These smaller matrices are Filter of
weights or jus numerical values of each
pixel in these mini patches
The Convolution Operation: Another Example:
Element wise Multiplication between the Filter Matrix those mini patches as well as the patch of our input image
The Convolution Operation: Another Example:
Lets see how NN and CNN Process an Image
• Feed the pixels of the image in the form of arrays to the input layer of the NN
• Multi-layer networks used to classify things.
• The hidden layers carry out feature extraction by performing different calculations and manipulations.
• There are multiple hidden layers like the convolution layer, the ReLU layer, and pooling layer, that perform feature
extraction from the image.
• Finally, there’s a fully connected layer that identifies the object in the image.
Lets see how NN and CNN Process an Image
Lets see how NN and CNN Process an Image
In CNN, every image is represented in the form of an array of pixel values.
Let’s understand the convolution operation using two matrices, a and b, of 1 dimension.
a = [5,3,2,5,9,7]
b = [1,2,3]
Lets see how NN and CNN Process an Image
a = [5,3,2,5,9,7]
b = [1,2,3]
Lets see how NN and CNN Process an Image
How CNN Recognized Image
The Convolution Operation: Lets Consider one More Example:
To compute the convolution we have to slide this image over the filter piece by piece and comparing the
similarity or the convolution of this filter across this image
Suppose we want to compute the convolution of a 5*5 image and 3*3 filter
We slide the 3*3 filter over the input image,
element-wise multiply, and add the output
The Convolution Operation: Lets Consider one More Example:
The resultant 4 is placed into the next layer, which is again another image as a result of convolution operation
We slide the 3*3 filter over the input image, element-wise multiply, and add the output
The Convolution Operation: Lets Consider one More Example:
Simple slide over the previous filter to the next location which provide the next value in our image:
keep repeating this process over again and again until we have covered our filter over the entire image and filled the output
feature map
We slide the 3*3 filter over the input image, element-wise multiply, and add the output
The Convolution Operation: Lets Consider one More Example:
We slide the 3*3 filter over the input image, element-wise multiply, and add the output
The Convolution Operation: Lets Consider one More Example:
We slide the 3*3 filter over the input image, element-wise multiply, and add the output
The Convolution Operation: Lets Consider one More Example:
We slide the 3*3 filter over the input image, element-wise multiply, and add the output
After discussing the mechanism that define the operations of convolution, let see how different filter could be used to detect
different types of patterns in our data
Convolution Operation Simulation for Grayscale image
Producing Feature Map
Apply 3*3 filter to see the drastic change in the image just by changing the weights that are present in 3*3 matrices :
If we apply 3 different filters to detect different type of features
Features Extraction with Convolution
1. Apply a set of weight - a filter – to extract local features
2. Use Multiple filters to extract different features.
2. Spatially share parameters of each filter
Convolutional Neural Networks
(CNNs)
After understanding the basic operation of Convolution, patching, sliding, and feature map, lets
think to extend the singular convolution operation to the entire layer
To Discus the Architecture of CNN: Which is the core of this topic
Simple CNNs for Classifying an Image
Thee main Operations of CNN:
1- Convolution:
Main task: To extract feature directly from the raw data and use these learn features for classification or detecting an object
2- Non-linearity:
3- Pooling:
Simple CNNs for Classifying an Image
Main task: To extract feature directly from the raw data and use these learn features for classification or detecting an object
Finally feeding all of these resulting features to some NN to infer the class score
Non-linearity Operation
ReLU performs an element-wise operation and sets all the
negative pixels to 0.
It introduces non-linearity to the network, and the generated
output is a rectified feature map
So for the structure
Building Basic Architecture of CNN: Convolutional Layers (Local Connectivity)
Finally feeding all of these resulting features to some NN to infer the class score
For neuron in hidden Layers:
- Take input from patch
- Compute weighted sum
- Apply bias
A very special point is the local connectivity, where every single neuron in this hidden layer only sees a certain patch in its
previous layer
Building Basic Architecture of CNN: Convolutional Layers (Local Connectivity)
Convolution is about:
- Element-wise multiplication operation
- Addition operation
- Sliding operation
Provide basis for layers that define how neurons in the
convolutional layers are connected and how they are
mathematically formulated.
Now lets define the actual computation that going on for a neuron in a hidden layer, its inputs are those neuron that fell
with its patch in the previous layer
Computation at Neuron:
- applying matrix of weights
- Computing linear combinations: do element-wise
multiplication, add the output and apply the bias
- Apply the non-linearity (activate the neuron with non-
linear function)
Building Basic Architecture of CNN: Convolutional Layers (Local Connectivity)
What We need to think of is actually convolutional operations that can output a volume of different images, and
Every slice of this volume effectively denotes a different filter that can be identified in our original input and each
of those filter is going to basically correspond to a specific pattern or feature in our image
Layer dimension:
h x w x d
Where h and w are spatial dimension and d(depth) is the no
of filters
- Stride
Filter step size
- Receptive Field:
Location in input image that a node is path connected to
Stride
- Stride
Filter step size
Some times we do not want to capture all the data or information available so we skip some neighboring cells let
us visualize it,
Here the input matrix or image is of dimensions 5×5 with a filter of 3×3 and a stride of 2 so every time we skip two
columns and convolute, let us formulate this
2nd Key Operation in CNN: Introducing Non-Linearity
. Apply after every convolution operation (after convolutional operator)
. Non-linear operation: ReLU ( pixel by pixel operation that simply returns 0 if your value is negative else it
returns the same value you give
Take those resulting feature map that our convolutional layer extract and apply the non-linearity to the output
volume of the convolution layer
Negative values indicate the negative detection in convolution (no detection
3rd Key Operation in CNN: Pooling
Main purpose off pooling: To reduce the dimensionality of the image progressively as you go deep & deeper, through you
convolutional layers (decreasing the dimensionality of the features increasing the dimensionality of the filters)
3rd Key Operation in CNN: Pooling (Max, min, Avg)
Representation Learning in Deep CNNs
So now put them altogether: Convolution layer, Non-linearity and polling to form and construct a CNN in a
repeating way over and over again to learn these hierarchies of features
CNN for image Classification: Feature Learning
1. Learn features in input image through Convolution
2. Introduce Non-Linearity through activation function (as the real-world data is non-linear)
3. Reduce dimensionality and preserve spatial invariance with pooling
So now put them altogether: Convolution layer, Non-linearity and polling to form and construct a CNN in a
repeating way over and over again to learn these hierarchies of features: Two Parts
Feature learning pipeline: to learn the features that we want to detect and Detecting those feature and doing the
classification
CNN for image Classification: Feature Learning
1. The goal of Convolutional and pooling layer is to output high-level features that are extracted from the input
2. Fully connected layer uses these feature to detect their presence in order to classify the input image
3. Express output as probability of image belonging to a particular class
Feature learning pipeline: to learn the features that we want to detect and Detecting those feature and doing the
classification
The softmax function is just a normalizing function whose output represents that of a categorical probability
distribution
CNN for Classification: Feature Learning
So now put them altogether: Convolution layer, Non-linearity and polling to form and construct a CNN in a
repeating way over and over again to learn these hierarchies of features
CNN for Classification: Feature Learning
CNN for Classification: Feature Learning
•An Architecture for Many Applications
• After getting the basics of how to use CNNs to perform image classification task, remember that the same
architecture can be extend to so many application and model types that we can imagine
An Architecture for Many Applications
When we consider CNN for image classification: Two parts,
1. Feature extraction learning (feature of interest)
2. Detection of those features and classification
Classification: Breast Cancer Screening
Object Detection
This is not easy as that, if our scene contains multiple objects: In that case our NN model should be able to infer a dynamic no of
objects
Object Detection
If we want to detect a taxi only its no problem, but if the image contains different objects of different classes, and our model need
to draw a bounding box for each objects of different class is quiet complicated in practice:: A naive solution to object detection
Predicted classification labels independently to
each class
A Naive Solution to Object Detection
Problem: too many potential inputs, this results in too many scales, position and
size,,,,,, totally impractical to run in real life application with today computers
Object Detection Using R-CNN: Instead of picking random boxes use a simple heuristic
Identify meaningful object in the image and just feed in those high attention locations to our CNN to speed up the first part.
R-CNN Algorithm: Find regions that we think have objects. Using CNN to classify
Faster R-CNN: Among many variance proposed for object detection
Faster R-CNN Algorithm: Which attempts to learn not only how to classify those boxes but learned how to propos those boxes
might be in the first place so that you could learn how to feed or where to feed into the downstream NN
Faster R-CNN: Among many variance proposed for object detection
So the goal here is to directly try to learn or extract all of those key regions and process them through the later part of the model
Semantic segmentation : Fully Convolutional Network
In classification we saw not only to predict a single object per image, we also saw an object detection potentially inferring
multiple objects with bounding boxes in you image.
Semantic segmentation : Biomedical Image Analysis
The same concept may be applied to so many application in healthcare especially for segmenting out lets say cancerous regions.
Continuous Control : Navigation from Vision Control
Lets consider we want to build a NN model for autonomous navigation for self driving car
Key point to remember
• All these different architecture uses the exact same encoder, we
haven't change anything going from classification to detection to
sematic segmentation and autonomous navigation, are using the
same building block
• Convolution
• Non-linearity
• pooling
Concluding the CNN lecture: Deep Learning for Computer Vision : Impact
Deep Learning for Computer Vision : Summary

More Related Content

Similar to Deep Computer Vision - 1.pptx

Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1ananth
 
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...Jedha Bootcamp
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksParrotAI
 
11_Saloni Malhotra_SummerTraining_PPT.pptx
11_Saloni Malhotra_SummerTraining_PPT.pptx11_Saloni Malhotra_SummerTraining_PPT.pptx
11_Saloni Malhotra_SummerTraining_PPT.pptxSaloniMalhotra23
 
Performance Evaluation Of Ontology And Fuzzybase Cbir
Performance Evaluation Of Ontology And Fuzzybase CbirPerformance Evaluation Of Ontology And Fuzzybase Cbir
Performance Evaluation Of Ontology And Fuzzybase Cbiracijjournal
 
PERFORMANCE EVALUATION OF ONTOLOGY AND FUZZYBASE CBIR
PERFORMANCE EVALUATION OF ONTOLOGY AND FUZZYBASE CBIRPERFORMANCE EVALUATION OF ONTOLOGY AND FUZZYBASE CBIR
PERFORMANCE EVALUATION OF ONTOLOGY AND FUZZYBASE CBIRacijjournal
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMeetupDataScienceRoma
 
Image Classification and Annotation Using Deep Learning
Image Classification and Annotation Using Deep LearningImage Classification and Annotation Using Deep Learning
Image Classification and Annotation Using Deep LearningIRJET Journal
 
Data Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingData Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingDerek Kane
 
Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple featuresHirantha Pradeep
 
WEB IMAGE RETRIEVAL USING CLUSTERING APPROACHES
WEB IMAGE RETRIEVAL USING CLUSTERING APPROACHESWEB IMAGE RETRIEVAL USING CLUSTERING APPROACHES
WEB IMAGE RETRIEVAL USING CLUSTERING APPROACHEScscpconf
 
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...Deep Convolutional 3D Object Classification from a Single Depth Image and Its...
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...Yuji Oyamada
 
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...CSCJournals
 
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdfHandwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdfSachin414679
 
AN INTEGRATED APPROACH TO CONTENT BASED IMAGE RETRIEVAL by Madhu
AN INTEGRATED APPROACH TO CONTENT BASED IMAGERETRIEVAL by MadhuAN INTEGRATED APPROACH TO CONTENT BASED IMAGERETRIEVAL by Madhu
AN INTEGRATED APPROACH TO CONTENT BASED IMAGE RETRIEVAL by MadhuMadhu Rock
 
IRJET- Face Recognition using Machine Learning
IRJET- Face Recognition using Machine LearningIRJET- Face Recognition using Machine Learning
IRJET- Face Recognition using Machine LearningIRJET Journal
 

Similar to Deep Computer Vision - 1.pptx (20)

Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1
 
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
 
11_Saloni Malhotra_SummerTraining_PPT.pptx
11_Saloni Malhotra_SummerTraining_PPT.pptx11_Saloni Malhotra_SummerTraining_PPT.pptx
11_Saloni Malhotra_SummerTraining_PPT.pptx
 
Performance Evaluation Of Ontology And Fuzzybase Cbir
Performance Evaluation Of Ontology And Fuzzybase CbirPerformance Evaluation Of Ontology And Fuzzybase Cbir
Performance Evaluation Of Ontology And Fuzzybase Cbir
 
PERFORMANCE EVALUATION OF ONTOLOGY AND FUZZYBASE CBIR
PERFORMANCE EVALUATION OF ONTOLOGY AND FUZZYBASE CBIRPERFORMANCE EVALUATION OF ONTOLOGY AND FUZZYBASE CBIR
PERFORMANCE EVALUATION OF ONTOLOGY AND FUZZYBASE CBIR
 
Deep learning (2)
Deep learning (2)Deep learning (2)
Deep learning (2)
 
Mirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image ProcessingMirko Lucchese - Deep Image Processing
Mirko Lucchese - Deep Image Processing
 
Dssg talk CNN intro
Dssg talk CNN introDssg talk CNN intro
Dssg talk CNN intro
 
Image Classification and Annotation Using Deep Learning
Image Classification and Annotation Using Deep LearningImage Classification and Annotation Using Deep Learning
Image Classification and Annotation Using Deep Learning
 
Data Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image ProcessingData Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image Processing
 
Rapid object detection using boosted cascade of simple features
Rapid object detection using boosted  cascade of simple featuresRapid object detection using boosted  cascade of simple features
Rapid object detection using boosted cascade of simple features
 
WEB IMAGE RETRIEVAL USING CLUSTERING APPROACHES
WEB IMAGE RETRIEVAL USING CLUSTERING APPROACHESWEB IMAGE RETRIEVAL USING CLUSTERING APPROACHES
WEB IMAGE RETRIEVAL USING CLUSTERING APPROACHES
 
TensorFlow.pptx
TensorFlow.pptxTensorFlow.pptx
TensorFlow.pptx
 
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...Deep Convolutional 3D Object Classification from a Single Depth Image and Its...
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...
 
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
Shallow vs. Deep Image Representations: A Comparative Study with Enhancements...
 
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdfHandwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
Handwriting_Recognition_using_KNN_classificatiob_algorithm_ijariie6729 (1).pdf
 
deep learning
deep learningdeep learning
deep learning
 
AN INTEGRATED APPROACH TO CONTENT BASED IMAGE RETRIEVAL by Madhu
AN INTEGRATED APPROACH TO CONTENT BASED IMAGERETRIEVAL by MadhuAN INTEGRATED APPROACH TO CONTENT BASED IMAGERETRIEVAL by Madhu
AN INTEGRATED APPROACH TO CONTENT BASED IMAGE RETRIEVAL by Madhu
 
IRJET- Face Recognition using Machine Learning
IRJET- Face Recognition using Machine LearningIRJET- Face Recognition using Machine Learning
IRJET- Face Recognition using Machine Learning
 

Recently uploaded

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 

Recently uploaded (20)

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptx
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 

Deep Computer Vision - 1.pptx

  • 1. Advance Topics in Deep Learning: Deep Computer Vision • By: Dr. Zafar Mahmood Khattak, PhD (AI)
  • 2. Deep Computer Vision • How to build computer that can achieve the sense of sight and vision • The outcome of this topic is to learn how we can use deep learning & machine learning to build powerful vision systems that can both see and predict what is where by only looking at raw visual inputs. • Vision, one of the most important human senses that we all have. • characteristics of sighted peoples……..
  • 3. Vision is much more than understanding what is where
  • 4. Vision is much more than understanding what is where
  • 5. The rise and impact of impact of Computer Vision
  • 6. Impact: Facial Detection and & Recognition
  • 7. Impact: Self-Driving Cars: a go-to example
  • 10. • What Computer “see”
  • 11. How images are Represented in Computer? 2-D Matrix of Numbers Grayscale image How the representation if we have a color image? 1 no for pixel or 3 no for pixel 1-2D Matrix or 3-2D Matrix
  • 13. Processing in Computer Vision Zavyar Wildan Amir Rovaid Classification: Predict/classify output as a discrete class label such as Male or Female, True or False, Spam or Not Spam, etc. Regression: Predict output as continuous values such as price, salary, age, etc. What types of computer vision algorithm that we can build to take this as input to predict an image
  • 14. • If we have to build a computer vision pipeline, two important steps to be considered: identify feature/patters looking for, • Detect those features to infer the membership of the class
  • 15. High Level Feature Detection: What we Need What are the issues Problem while extracting the features? What makes up a face We can define what of each of those components/features look like We can define what of each of those components/features look like in defining our features
  • 16. High Level Feature Detection: Remember Images are just 3-D arrays of numbers
  • 17. •Lets Revised concept of Neural Network Lecture for features extraction
  • 18.
  • 19.
  • 20.
  • 21.
  • 22. learning Features Representation Neural Network Lecture) • Can we learn a hierarchy of features directly from the data instead of hand engineering? • How Machine Learning deals with such types of images?
  • 23. • How to detect Features and Patterns in an image? • Time consuming • Hard to do • Not scalable in practice
  • 24. • Key Idea of Deep Learning: • No Human intervention • Machine extract and uncover the core patterns in the data and then • Decision/prediction on new data/sample • Example of detecting faces in image using DLNN: • detect edges • Detect line and edges • Combine to create composition of features (curves and corners) results in • High level features (eyes, nose, eras etc.) • Hierarchical learning of features • Progression of information • Face detected Solution is ::: Deep learning Algorithms (Neural Network)
  • 25. Learning visual Features How Neural network Extract hierarchical features in the image Domain? And Most Importantly How Neural network allows us to learn those visual features from visual Data if cleverly constructed!!!
  • 27. Fully Connected Neural Network Input: • 2-D Image • Collapsing into 1-D sequence of number (Vector of pixel values) Fully Connected: • Connect Neuron in hidden layer to all neurons in input layer • No spatial Information (3-D or 4-D) • Large no. of parameters, as the system is fully connected Fully connected NN can’t process 2-D Matrix Remember that our image is just 2-D array
  • 28. Fully Connected Neural Network How can we use spatial structure in the input to inform the architecture of the network?
  • 29. Using Spatial Structure Input: • 2-D Image • Collapsing into 1-D sequence of number (Vector of pixel values) • Idea: connect patches of input to neuron in hidden layer • Neuron connected to region of input, only sees the values
  • 30. Using Spatial Structure Connect patch in input layer to a single neuron in subsequent layer by using a sliding window to define connections, and thus preserving very rich spatial information. How we can weight the patch to detect particular patch?
  • 31. Applying Filters to Extract Features Apply a set of weight - a filter – to extract local features 1 Use Multiple filters to extract different features. 2 Spatially share parameters of each filter 3
  • 32. Features Extraction with Convolution: An Example • Filter of size 4*4= 16 different weights • Apply this same filter to 4*4 patches in input and use the result of that operation to update the state of the neuron in the next layer • Shift by 2 pixel for next patch to define the next neuron in the adjacent location in the future layer Concept of Convolution at high level
  • 33. Feature Extraction and Convolution: A Case Study How convolution allows us to learn these features/patterns in the data which is our ultimate goal
  • 34. Designing a Convolutional Algorithm to Detect or Classify X as X in a Black & White Image No Grayscale image, black and white (1- white and -1 for black) Detect x in both of these images slightly rotated Cleverly we have to detect those feature that define an X: So let see how to use Convolution to do that
  • 35. Identifying Features of X using Convolution We need our model to compare images of this X, piece by piece or patch by patch and the important patches that we look for are exactly these features that will define our X: If our model can find these rough features roughly in the same position in our input than we can infer that these two images or of the same type
  • 36. Identifying Features of X using Filters In the case of X’s these filters may represent semantic things for example the diagonal lines or the crossing that captures all of the important characteristics of the X Probably we will capture these features in the arms and the centers of our letter in any image of X regardless of its position These smaller matrices are Filter of weights or jus numerical values of each pixel in these mini patches
  • 37. The Convolution Operation: Another Example: Element wise Multiplication between the Filter Matrix those mini patches as well as the patch of our input image
  • 38. The Convolution Operation: Another Example:
  • 39. Lets see how NN and CNN Process an Image • Feed the pixels of the image in the form of arrays to the input layer of the NN • Multi-layer networks used to classify things. • The hidden layers carry out feature extraction by performing different calculations and manipulations. • There are multiple hidden layers like the convolution layer, the ReLU layer, and pooling layer, that perform feature extraction from the image. • Finally, there’s a fully connected layer that identifies the object in the image.
  • 40. Lets see how NN and CNN Process an Image
  • 41. Lets see how NN and CNN Process an Image In CNN, every image is represented in the form of an array of pixel values. Let’s understand the convolution operation using two matrices, a and b, of 1 dimension. a = [5,3,2,5,9,7] b = [1,2,3]
  • 42. Lets see how NN and CNN Process an Image a = [5,3,2,5,9,7] b = [1,2,3]
  • 43. Lets see how NN and CNN Process an Image
  • 45. The Convolution Operation: Lets Consider one More Example: To compute the convolution we have to slide this image over the filter piece by piece and comparing the similarity or the convolution of this filter across this image Suppose we want to compute the convolution of a 5*5 image and 3*3 filter We slide the 3*3 filter over the input image, element-wise multiply, and add the output
  • 46. The Convolution Operation: Lets Consider one More Example: The resultant 4 is placed into the next layer, which is again another image as a result of convolution operation We slide the 3*3 filter over the input image, element-wise multiply, and add the output
  • 47. The Convolution Operation: Lets Consider one More Example: Simple slide over the previous filter to the next location which provide the next value in our image: keep repeating this process over again and again until we have covered our filter over the entire image and filled the output feature map We slide the 3*3 filter over the input image, element-wise multiply, and add the output
  • 48. The Convolution Operation: Lets Consider one More Example: We slide the 3*3 filter over the input image, element-wise multiply, and add the output
  • 49. The Convolution Operation: Lets Consider one More Example: We slide the 3*3 filter over the input image, element-wise multiply, and add the output
  • 50. The Convolution Operation: Lets Consider one More Example: We slide the 3*3 filter over the input image, element-wise multiply, and add the output After discussing the mechanism that define the operations of convolution, let see how different filter could be used to detect different types of patterns in our data
  • 51. Convolution Operation Simulation for Grayscale image
  • 52.
  • 53. Producing Feature Map Apply 3*3 filter to see the drastic change in the image just by changing the weights that are present in 3*3 matrices : If we apply 3 different filters to detect different type of features
  • 54. Features Extraction with Convolution 1. Apply a set of weight - a filter – to extract local features 2. Use Multiple filters to extract different features. 2. Spatially share parameters of each filter
  • 55. Convolutional Neural Networks (CNNs) After understanding the basic operation of Convolution, patching, sliding, and feature map, lets think to extend the singular convolution operation to the entire layer To Discus the Architecture of CNN: Which is the core of this topic
  • 56. Simple CNNs for Classifying an Image Thee main Operations of CNN: 1- Convolution: Main task: To extract feature directly from the raw data and use these learn features for classification or detecting an object 2- Non-linearity: 3- Pooling:
  • 57. Simple CNNs for Classifying an Image Main task: To extract feature directly from the raw data and use these learn features for classification or detecting an object Finally feeding all of these resulting features to some NN to infer the class score
  • 58. Non-linearity Operation ReLU performs an element-wise operation and sets all the negative pixels to 0. It introduces non-linearity to the network, and the generated output is a rectified feature map
  • 59. So for the structure
  • 60. Building Basic Architecture of CNN: Convolutional Layers (Local Connectivity) Finally feeding all of these resulting features to some NN to infer the class score For neuron in hidden Layers: - Take input from patch - Compute weighted sum - Apply bias A very special point is the local connectivity, where every single neuron in this hidden layer only sees a certain patch in its previous layer
  • 61. Building Basic Architecture of CNN: Convolutional Layers (Local Connectivity) Convolution is about: - Element-wise multiplication operation - Addition operation - Sliding operation Provide basis for layers that define how neurons in the convolutional layers are connected and how they are mathematically formulated. Now lets define the actual computation that going on for a neuron in a hidden layer, its inputs are those neuron that fell with its patch in the previous layer Computation at Neuron: - applying matrix of weights - Computing linear combinations: do element-wise multiplication, add the output and apply the bias - Apply the non-linearity (activate the neuron with non- linear function)
  • 62. Building Basic Architecture of CNN: Convolutional Layers (Local Connectivity) What We need to think of is actually convolutional operations that can output a volume of different images, and Every slice of this volume effectively denotes a different filter that can be identified in our original input and each of those filter is going to basically correspond to a specific pattern or feature in our image Layer dimension: h x w x d Where h and w are spatial dimension and d(depth) is the no of filters - Stride Filter step size - Receptive Field: Location in input image that a node is path connected to
  • 63. Stride - Stride Filter step size Some times we do not want to capture all the data or information available so we skip some neighboring cells let us visualize it, Here the input matrix or image is of dimensions 5×5 with a filter of 3×3 and a stride of 2 so every time we skip two columns and convolute, let us formulate this
  • 64. 2nd Key Operation in CNN: Introducing Non-Linearity . Apply after every convolution operation (after convolutional operator) . Non-linear operation: ReLU ( pixel by pixel operation that simply returns 0 if your value is negative else it returns the same value you give Take those resulting feature map that our convolutional layer extract and apply the non-linearity to the output volume of the convolution layer Negative values indicate the negative detection in convolution (no detection
  • 65. 3rd Key Operation in CNN: Pooling Main purpose off pooling: To reduce the dimensionality of the image progressively as you go deep & deeper, through you convolutional layers (decreasing the dimensionality of the features increasing the dimensionality of the filters)
  • 66. 3rd Key Operation in CNN: Pooling (Max, min, Avg)
  • 67. Representation Learning in Deep CNNs So now put them altogether: Convolution layer, Non-linearity and polling to form and construct a CNN in a repeating way over and over again to learn these hierarchies of features
  • 68. CNN for image Classification: Feature Learning 1. Learn features in input image through Convolution 2. Introduce Non-Linearity through activation function (as the real-world data is non-linear) 3. Reduce dimensionality and preserve spatial invariance with pooling So now put them altogether: Convolution layer, Non-linearity and polling to form and construct a CNN in a repeating way over and over again to learn these hierarchies of features: Two Parts Feature learning pipeline: to learn the features that we want to detect and Detecting those feature and doing the classification
  • 69. CNN for image Classification: Feature Learning 1. The goal of Convolutional and pooling layer is to output high-level features that are extracted from the input 2. Fully connected layer uses these feature to detect their presence in order to classify the input image 3. Express output as probability of image belonging to a particular class Feature learning pipeline: to learn the features that we want to detect and Detecting those feature and doing the classification The softmax function is just a normalizing function whose output represents that of a categorical probability distribution
  • 70. CNN for Classification: Feature Learning So now put them altogether: Convolution layer, Non-linearity and polling to form and construct a CNN in a repeating way over and over again to learn these hierarchies of features
  • 71. CNN for Classification: Feature Learning
  • 72. CNN for Classification: Feature Learning
  • 73. •An Architecture for Many Applications • After getting the basics of how to use CNNs to perform image classification task, remember that the same architecture can be extend to so many application and model types that we can imagine
  • 74. An Architecture for Many Applications When we consider CNN for image classification: Two parts, 1. Feature extraction learning (feature of interest) 2. Detection of those features and classification
  • 76. Object Detection This is not easy as that, if our scene contains multiple objects: In that case our NN model should be able to infer a dynamic no of objects
  • 77. Object Detection If we want to detect a taxi only its no problem, but if the image contains different objects of different classes, and our model need to draw a bounding box for each objects of different class is quiet complicated in practice:: A naive solution to object detection Predicted classification labels independently to each class
  • 78. A Naive Solution to Object Detection Problem: too many potential inputs, this results in too many scales, position and size,,,,,, totally impractical to run in real life application with today computers
  • 79. Object Detection Using R-CNN: Instead of picking random boxes use a simple heuristic Identify meaningful object in the image and just feed in those high attention locations to our CNN to speed up the first part. R-CNN Algorithm: Find regions that we think have objects. Using CNN to classify
  • 80. Faster R-CNN: Among many variance proposed for object detection Faster R-CNN Algorithm: Which attempts to learn not only how to classify those boxes but learned how to propos those boxes might be in the first place so that you could learn how to feed or where to feed into the downstream NN
  • 81. Faster R-CNN: Among many variance proposed for object detection So the goal here is to directly try to learn or extract all of those key regions and process them through the later part of the model
  • 82. Semantic segmentation : Fully Convolutional Network In classification we saw not only to predict a single object per image, we also saw an object detection potentially inferring multiple objects with bounding boxes in you image.
  • 83. Semantic segmentation : Biomedical Image Analysis The same concept may be applied to so many application in healthcare especially for segmenting out lets say cancerous regions.
  • 84. Continuous Control : Navigation from Vision Control Lets consider we want to build a NN model for autonomous navigation for self driving car
  • 85. Key point to remember • All these different architecture uses the exact same encoder, we haven't change anything going from classification to detection to sematic segmentation and autonomous navigation, are using the same building block • Convolution • Non-linearity • pooling
  • 86. Concluding the CNN lecture: Deep Learning for Computer Vision : Impact
  • 87. Deep Learning for Computer Vision : Summary