Introduction to Computer
Vision with Convolutional
Neural Networks
Mohammad Haghighat
Senior Manager, CoreAI
eBay
• High level introduction to AI
• Conventional vs. deep learning
• Neural networks and deep learning
• Fully connected networks
• Elements of a neural network
• Neural network training
• Convolutional neural networks (CNNs)
• Building blocks of CNNs
• Applications of CNNs
• Popular CNN architectures
• Mobile CNN architectures
Outline
2
High-level introduction to AI
3
Machine
Learning
(ML) Model
person
ML Model person
dancing on
the beach
ML Model negative
feedback
“Nothing to love
about this
presentation.”
ML Model beginning
begining
ML Model “Let‘s go for
lunch”
ML Model
Classical learning vs deep learning
4
Input Data
(e.g., image)
Feature
Extraction
(e.g., edges)
Dimensionali
ty Reduction
(e.g., PCA*)
Classifier
(e.g., SVM*)
Output
Dog
Input Data
(e.g., image)
Output
*PCA: Principal Component Analysis
*SVM: Support Vector Machines
What are neurons?
5
… and what are neural networks?
6
a layer
Neural networks as a vehicle for deep learning
7
Universal Approximation Theorem
A one-hidden-layer neural network with enough neurons can approximate any continuous
function within the given input range.
non-linear
activation function
Neural network-based classifier
8
apple
banana
orange
color
taste
weight
shape
0.12
0.05
0.83
0
0
1
network
output
ideal
output
error/loss
Neural network training
9
Reference
Loss and gradient descent algorithm
Image as an input data
10
How computer sees an edge
Convolutional vs fully connected
11
Convolutional layer
● Capture local patterns and spatial
relationships between pixels
● Parameter efficiency: shared weights
● Better generalization: translation invariance
Introduction to CNNs
12
Building blocks of CNNs
13
Size (kernel size): The dimensions of the convolutional filter/kernel
Stride: The number of pixels the convolutional filter/kernel moves across the input image or
feature map during each convolution operation.
Padding: Additional pixels added around the borders of the input image or feature map to
control the output size after convolution and preserve spatial information.
Building blocks of CNNs
14
Stride
Building blocks of CNNs
15
Padding: same vs. valid
Building blocks of CNNs
16
Number of parameters in a convolutional layer
17
Number of
parameters for a
K×K kernel:
(K × K × N + 1) × M
N: input depth
M: output depth
Building blocks of CNNs
18
Pooling layer
Building blocks of CNNs
19
A multi-layer CNN
Deep learning is representation learning
(a.k.a. feature learning)
20
Applications of CNNs
21
Image Classification
Pdog = 0.9
Pcat = 0.1
Applications of CNNs
22
Object Detection
Applications of CNNs
23
Instance Segmentation
AlexNet (2012) – Top-5 Error 15.3% on ImageNet
“The neural network, which has 60 million parameters and 500,000 neurons, consists of five
convolutional layers, some of which are followed by max-pooling layers, and two globally connected
layers with a final 1000-way softmax.”
Popular CNN architectures
24
Popular CNN architectures
25
VGG16 (2014) – Top-5 Error 7.32% on ImageNet
Inception (2014)
Motivation: let the network decide what filter size to put in a layer
Popular CNN architectures
26
GoogleNet (2014) - Top-5 Error 6.67% on ImageNet
Popular CNN architectures
27
Residual block with a skip connection
Popular CNN architectures
28
ResNet (2015) – Top-5 Error 3.57% on ImageNet for ResNet-152
Popular CNN architectures
29
Trend of CNN-based classifiers
30
https://paperswithcode.com
Trend of CNN-based classifiers
31
Comparison of popular CNN
architectures. The vertical axis
shows top 1 accuracy on
ImageNet classification. The
horizontal axis shows the
number of operations needed to
classify an image. Circle size is
proportional to the number of
parameters in the network.
What do we want on edge?
• Low computational complexity
• Small model size for small memory
• Low energy usage
• Good enough accuracy (depends on
application)
• Deployable on embedded
processors
• Easily updatable (over-the-air)
CNNs for edge devices
32
MobileNets
33
MobileNets
34
Regular convolution
Number of parameters
for a K×K kernel:
K × K × N × M
N: input depth
M: output depth
MobileNets
35
Depthwise separable
convolution
Number of parameters:
Depthwise:
• K × K × N
Pointwise:
• 1 × 1 × M
Total:
• K × K × N + M
N: input depth
M: output depth
MobileNets
36
Model shrinking hyperparameter
Depth Multiplier :: Width Multiplier :: alpha :: α
To thin a network uniformly at each layer
Number of channels: M → αM
Log linear dependence between accuracy and computation
Let’s uniformly scale network width, depth, and resolution with a set of fixed scaling coefficients
EfficientNets
37
Note: the baseline B0 architecture is
designed using neural architecture
search (NAS).
EfficientNets
38
Conclusions
39
We talked about:
• Deep neural networks and CNNs as the network of choice for computer vision
• The building blocks of CNNs: Convolution layer, pooling layer, padding, stride, etc.
• Application of CNNs in computer vision: image classification, object detection, segmentation, etc.
• CNN architectures: AlexNet, VGG, GoogleNet, ResNet
• Edge-optimized CNNs architectures: MobileNets & EfficientNets
Choosing the right model for an application and hardware is crucial for accuracy and efficiency.
Any Questions?
40
dog: 97%
• EfficientNet: https://arxiv.org/abs/1905.11946
• Papers With Code: https://paperswithcode.com
• Understanding of MobileNet: https://wikidocs.net/165429
• New mobile neural network architectures https://machinethink.net/blog/mobile-architectures/
• An Analysis of Deep Neural Network Models for Practical Applications: https://arxiv.org/abs/1605.07678
• Deep Learning Equivariance and Invariance: https://www.doc.ic.ac.uk/~bkainz/teaching/DL/notes/equivariance.pdf
• IndoML Student Notes: Convolutional Neural Networks (CNN) Introduction:
https://indoml.com/2018/03/07/student-notes-convolutional-neural-networks-cnn-introduction/
• Beginners Guide to Convolutional Neural Networks: https://towardsdatascience.com/beginners-guide-to-
understanding-convolutional-neural-networks-ae9ed58bb17d
• A Comprehensive Guide to Convolutional Neural Networks: https://towardsdatascience.com/a-comprehensive-
guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53
Resources
41

“Introduction to Computer Vision with Convolutional Neural Networks,” a Presentation from eBay

  • 1.
    Introduction to Computer Visionwith Convolutional Neural Networks Mohammad Haghighat Senior Manager, CoreAI eBay
  • 2.
    • High levelintroduction to AI • Conventional vs. deep learning • Neural networks and deep learning • Fully connected networks • Elements of a neural network • Neural network training • Convolutional neural networks (CNNs) • Building blocks of CNNs • Applications of CNNs • Popular CNN architectures • Mobile CNN architectures Outline 2
  • 3.
    High-level introduction toAI 3 Machine Learning (ML) Model person ML Model person dancing on the beach ML Model negative feedback “Nothing to love about this presentation.” ML Model beginning begining ML Model “Let‘s go for lunch” ML Model
  • 4.
    Classical learning vsdeep learning 4 Input Data (e.g., image) Feature Extraction (e.g., edges) Dimensionali ty Reduction (e.g., PCA*) Classifier (e.g., SVM*) Output Dog Input Data (e.g., image) Output *PCA: Principal Component Analysis *SVM: Support Vector Machines
  • 5.
  • 6.
    … and whatare neural networks? 6 a layer
  • 7.
    Neural networks asa vehicle for deep learning 7 Universal Approximation Theorem A one-hidden-layer neural network with enough neurons can approximate any continuous function within the given input range. non-linear activation function
  • 8.
  • 9.
    Neural network training 9 Reference Lossand gradient descent algorithm
  • 10.
    Image as aninput data 10 How computer sees an edge
  • 11.
    Convolutional vs fullyconnected 11 Convolutional layer ● Capture local patterns and spatial relationships between pixels ● Parameter efficiency: shared weights ● Better generalization: translation invariance
  • 12.
  • 13.
    Building blocks ofCNNs 13 Size (kernel size): The dimensions of the convolutional filter/kernel Stride: The number of pixels the convolutional filter/kernel moves across the input image or feature map during each convolution operation. Padding: Additional pixels added around the borders of the input image or feature map to control the output size after convolution and preserve spatial information.
  • 14.
    Building blocks ofCNNs 14 Stride
  • 15.
    Building blocks ofCNNs 15 Padding: same vs. valid
  • 16.
  • 17.
    Number of parametersin a convolutional layer 17 Number of parameters for a K×K kernel: (K × K × N + 1) × M N: input depth M: output depth
  • 18.
    Building blocks ofCNNs 18 Pooling layer
  • 19.
    Building blocks ofCNNs 19 A multi-layer CNN
  • 20.
    Deep learning isrepresentation learning (a.k.a. feature learning) 20
  • 21.
    Applications of CNNs 21 ImageClassification Pdog = 0.9 Pcat = 0.1
  • 22.
  • 23.
  • 24.
    AlexNet (2012) –Top-5 Error 15.3% on ImageNet “The neural network, which has 60 million parameters and 500,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and two globally connected layers with a final 1000-way softmax.” Popular CNN architectures 24
  • 25.
    Popular CNN architectures 25 VGG16(2014) – Top-5 Error 7.32% on ImageNet
  • 26.
    Inception (2014) Motivation: letthe network decide what filter size to put in a layer Popular CNN architectures 26
  • 27.
    GoogleNet (2014) -Top-5 Error 6.67% on ImageNet Popular CNN architectures 27
  • 28.
    Residual block witha skip connection Popular CNN architectures 28
  • 29.
    ResNet (2015) –Top-5 Error 3.57% on ImageNet for ResNet-152 Popular CNN architectures 29
  • 30.
    Trend of CNN-basedclassifiers 30 https://paperswithcode.com
  • 31.
    Trend of CNN-basedclassifiers 31 Comparison of popular CNN architectures. The vertical axis shows top 1 accuracy on ImageNet classification. The horizontal axis shows the number of operations needed to classify an image. Circle size is proportional to the number of parameters in the network.
  • 32.
    What do wewant on edge? • Low computational complexity • Small model size for small memory • Low energy usage • Good enough accuracy (depends on application) • Deployable on embedded processors • Easily updatable (over-the-air) CNNs for edge devices 32
  • 33.
  • 34.
    MobileNets 34 Regular convolution Number ofparameters for a K×K kernel: K × K × N × M N: input depth M: output depth
  • 35.
    MobileNets 35 Depthwise separable convolution Number ofparameters: Depthwise: • K × K × N Pointwise: • 1 × 1 × M Total: • K × K × N + M N: input depth M: output depth
  • 36.
    MobileNets 36 Model shrinking hyperparameter DepthMultiplier :: Width Multiplier :: alpha :: α To thin a network uniformly at each layer Number of channels: M → αM Log linear dependence between accuracy and computation
  • 37.
    Let’s uniformly scalenetwork width, depth, and resolution with a set of fixed scaling coefficients EfficientNets 37
  • 38.
    Note: the baselineB0 architecture is designed using neural architecture search (NAS). EfficientNets 38
  • 39.
    Conclusions 39 We talked about: •Deep neural networks and CNNs as the network of choice for computer vision • The building blocks of CNNs: Convolution layer, pooling layer, padding, stride, etc. • Application of CNNs in computer vision: image classification, object detection, segmentation, etc. • CNN architectures: AlexNet, VGG, GoogleNet, ResNet • Edge-optimized CNNs architectures: MobileNets & EfficientNets Choosing the right model for an application and hardware is crucial for accuracy and efficiency.
  • 40.
  • 41.
    • EfficientNet: https://arxiv.org/abs/1905.11946 •Papers With Code: https://paperswithcode.com • Understanding of MobileNet: https://wikidocs.net/165429 • New mobile neural network architectures https://machinethink.net/blog/mobile-architectures/ • An Analysis of Deep Neural Network Models for Practical Applications: https://arxiv.org/abs/1605.07678 • Deep Learning Equivariance and Invariance: https://www.doc.ic.ac.uk/~bkainz/teaching/DL/notes/equivariance.pdf • IndoML Student Notes: Convolutional Neural Networks (CNN) Introduction: https://indoml.com/2018/03/07/student-notes-convolutional-neural-networks-cnn-introduction/ • Beginners Guide to Convolutional Neural Networks: https://towardsdatascience.com/beginners-guide-to- understanding-convolutional-neural-networks-ae9ed58bb17d • A Comprehensive Guide to Convolutional Neural Networks: https://towardsdatascience.com/a-comprehensive- guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 Resources 41