Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre @ Photobox

Data Science
Bootcamp
Commencez votre carrière
dans la Data

Nos Speakers
—
Cristina Oprean & Pierre Stefani
Machine Learning Engineer

The Rise of Deep Learning
Healthcare
Autonomous vehicles
Reinforcement learning
Robotics
Language
Processing
GANs

Why Deep Learning ?
Hand engineered features are time consuming, brittle and
not scalable in practice
Can we learn the underlying features directly from data?

Why now ?
Neural Networks date back decades, so why the resurgence?

Inspired from neuroscience
- Signal Input
- Neuron activation
Decision

Fully connected networks
Fully connected:
Connect neuron in hidden layer to all
neurons in input layer
No spatial information!
And many, many parameters!
Input:
2D image
Vector of pixel values

Using spatial structure
Idea: connect patches of input to neurons
in the hidden layer.
Neuron connected to region of input. Only
“sees” these values
Input: 2D image.
Array of pixel values

Applying Filters to Extract Features
1) Apply a set of weights – a filter – to extract local features
2) Use multiple filters to extract different features
3) Spatially share parameters of each filter

Layers used to build ConvNets
INPUT → will hold the raw pixel values of the image (e.g.[32,32,3] - image with width 32,
height 32 and 3 channels: RGB)
Convolutional layer (CONV) → compute the output of neurons connected to
local regions in the input (e.g.: output of size [32x32x12] if we decided to use 12 ﬁlters)
RELU → apply an elementwise activation function, such as the max(0,x) thresholding at
zero. (e.g. the size of the volume is unchanged [32x32x12])
POOL → perform a downsampling operation along the spatial dimensions (width, height),
resulting in volume such as [16x16x12].
Fully connected layer (FC) → compute the class scores, and it connects all
neurons in this layer with all the neurons in the previous one. (e.g. the size is [1, 1, 10] if we
have 10 classes)

Input layer
Existing color spaces for images:
RBG, Grayscale, HSV, Lab, etc.
CNNs - architectures speciﬁc for
images
Reduce the images into a form
easier to process
Without loosing features needed
for a good recognition

Convolutional layer (CONV)
Spatial arrangement:
Depths (D) → nb of ﬁlters we want to
use
Stride (S) → value with which we slide
the ﬁlter
Padding (P) → the number of zeros we
add to the border
Input size (I) → size of the image
Filter size (F) → size of the convolution
=> Size of the output:
(I−F+2P)/S+1
E.g. i=5, F=3, P=0, S=1 => O=(5-3+0)/1+1=3

Convolutional layer (CONV)
Conv layers:
In CNNs architectures can
have multiple Conv layers
1st layer captures low-level
features
Final layers capture
high-level features
When the image has 3 channels (RGB) => the ﬁlter has D = 3

Pooling layer (POOL)
Reduces the spatial size of the
convolved feature
Decreases computational power
Extracts dominant features
Has no parameters to learn
Two types: max and average pooling

Pooling layer (POOL)
Max pooling performs better than
average pooling
Max pooling is a noise
suppressant

RELU (activation function)
Introduces
non-linearities into
the network

RELU (activation function)
Linear Activation functions
produce linear decisions no
matter the network size
Non-linearities allow us to
approximate arbitrarily
complex functions

CNNs for feature learning
1. Learn features in input image through convolution
2. Introduce non-linearity through activation function (real-world data is non-linear!)
3. Reduce dimensionality and preserve spatial invariance with pooling

CNNs for classiﬁcation
1. CONV and POOL layers output high-level features of input
2. Fully connected layer uses these features for classifying input image
3. Express output as probability of image belonging to a particular class

CNNs: training with backpropagation
Learn weights for convolutional ﬁlters and fully connected layers

ConvNet architectures
LeNet
AlexNet
GoogLeNet
VGGNet
ResNet
etc.
Errorrate%

CNNs: Tips - dataset creation
Where to ﬁnd datasets
https://toolbox.google.com/datasetsearch
https://www.kaggle.com/datasets
Crawl the web ! (Flickr, Bing, Google)
Watch out for scientiﬁc paper releases and conference workshops

CNNs: Tips - training
● No need to start from scratch (train with less data, start from pretrained models,
domain adaptation)
● Monitor loss and validation accuracies
● Optimize once it works
Batch size
Learning Rates
Modules & Loss

Some applications @Photobox
and not only

Photo crop: naive version
⇒ How to crop the photo and keep the main subject ?

Photo crop: using face detection
⇒ Face detection allows to have a crop focused on the faces

Near duplicates filtering
Without duplicates
filtering
With duplicates
filtering

0.4
0
0.1
…
0
Remove
Feature comparison allows us to ﬁnd near duplicates and remove them
Goal : reduce the hassle of duplicate selection

0.4
5
-3
…
3
Remove

-4
15
20
…
10
Remove

-40
5
10
…
11
Remove

102
-32
120
…
33
Keep

Photo selection with aesthetics
● Propose book covers or photo candidates for full page photos in a book
● Suggest premium products with the most beautiful pictures from a set

Photo selection with aesthetics
● Select the most beautiful photo from a set of duplicates
0.73
0.52
0.53
0.67

How it ‘learns’: minimizing the error
input
expected
output expected
output
output
‘ cat ‘
Error
(ou loss)

How it ‘learns’ : minimizing the error
minimum
Starting
point

How it ‘learns’: minimizing the error
minimum
Starting
point

How it ‘learns’: backpropagation
input
expected
output
output
Error
(ou loss)

Takeaways
No algorithm is 100% accurate
It depends on:
How much data we have
The quality of the data
Performance tradeoff
Hardware
Some algorithms are better than humans,
some of them are way behind

How it ‘learns’ : minimizing the error
- Deﬁne a loss function
- Minimize this loss
- Gradient descent
algorithm

Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre @ Photobox

More Related Content

What's hot

Similar to Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre @ Photobox

More from Jedha Bootcamp

Recently uploaded

Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre @ Photobox