Data Science
Bootcamp
Commencez votre carrière
dans la Data
Nos Speakers
—
Cristina Oprean & Pierre Stefani
Machine Learning Engineer
The Rise of Deep Learning
Healthcare
Autonomous vehicles
Reinforcement learning
Robotics
Language
Processing
GANs
What is Deep Learning ?
Why Deep Learning ?
Hand engineered features are time consuming, brittle and
not scalable in practice
Can we learn the underlying features directly from data?
Why now ?
Neural Networks date back decades, so why the resurgence?
Inspired from neuroscience
- Signal Input
- Neuron activation
Decision
Fully connected networks
Fully connected:
Connect neuron in hidden layer to all
neurons in input layer
No spatial information!
And many, many parameters!
Input:
2D image
Vector of pixel values
Using spatial structure
Idea: connect patches of input to neurons
in the hidden layer.
Neuron connected to region of input. Only
“sees” these values
Input: 2D image.
Array of pixel values
Applying Filters to Extract Features
1) Apply a set of weights – a filter – to extract local features
2) Use multiple filters to extract different features
3) Spatially share parameters of each filter
Convolutional Neural Networks
Layers used to build ConvNets
INPUT → will hold the raw pixel values of the image (e.g.[32,32,3] - image with width 32,
height 32 and 3 channels: RGB)
Convolutional layer (CONV) → compute the output of neurons connected to
local regions in the input (e.g.: output of size [32x32x12] if we decided to use 12 filters)
RELU → apply an elementwise activation function, such as the max(0,x) thresholding at
zero. (e.g. the size of the volume is unchanged [32x32x12])
POOL → perform a downsampling operation along the spatial dimensions (width, height),
resulting in volume such as [16x16x12].
Fully connected layer (FC) → compute the class scores, and it connects all
neurons in this layer with all the neurons in the previous one. (e.g. the size is [1, 1, 10] if we
have 10 classes)
Input layer
Existing color spaces for images:
RBG, Grayscale, HSV, Lab, etc.
CNNs - architectures specific for
images
Reduce the images into a form
easier to process
Without loosing features needed
for a good recognition
Convolutional layer (CONV)
Spatial arrangement:
Depths (D) → nb of filters we want to
use
Stride (S) → value with which we slide
the filter
Padding (P) → the number of zeros we
add to the border
Input size (I) → size of the image
Filter size (F) → size of the convolution
=> Size of the output:
(I−F+2P)/S+1
E.g. i=5, F=3, P=0, S=1 => O=(5-3+0)/1+1=3
Convolutional layer (CONV)
Conv layers:
In CNNs architectures can
have multiple Conv layers
1st layer captures low-level
features
Final layers capture
high-level features
When the image has 3 channels (RGB) => the filter has D = 3
Pooling layer (POOL)
Reduces the spatial size of the
convolved feature
Decreases computational power
Extracts dominant features
Has no parameters to learn
Two types: max and average pooling
Pooling layer (POOL)
Max pooling performs better than
average pooling
Max pooling is a noise
suppressant
RELU (activation function)
Introduces
non-linearities into
the network
RELU (activation function)
Linear Activation functions
produce linear decisions no
matter the network size
Non-linearities allow us to
approximate arbitrarily
complex functions
CNNs for feature learning
1. Learn features in input image through convolution
2. Introduce non-linearity through activation function (real-world data is non-linear!)
3. Reduce dimensionality and preserve spatial invariance with pooling
CNNs for classification
1. CONV and POOL layers output high-level features of input
2. Fully connected layer uses these features for classifying input image
3. Express output as probability of image belonging to a particular class
CNNs: training with backpropagation
Learn weights for convolutional filters and fully connected layers
ConvNet architectures
LeNet
AlexNet
GoogLeNet
VGGNet
ResNet
etc.
Errorrate%
CNNs: Tips - dataset creation
Where to find datasets
https://toolbox.google.com/datasetsearch
https://www.kaggle.com/datasets
Crawl the web ! (Flickr, Bing, Google)
Watch out for scientific paper releases and conference workshops
CNNs: Tips - training
● No need to start from scratch (train with less data, start from pretrained models,
domain adaptation)
● Monitor loss and validation accuracies
● Optimize once it works
Batch size
Learning Rates
Modules & Loss
Some applications @Photobox
and not only
Photo crop: naive version
⇒ How to crop the photo and keep the main subject ?
Photo crop: using face detection
⇒ Face detection allows to have a crop focused on the faces
Near duplicates filtering
Without duplicates
filtering
With duplicates
filtering
Near duplicates filtering
0.4
0
0.1
…
0
Remove
Feature comparison allows us to find near duplicates and remove them
Goal : reduce the hassle of duplicate selection
Near duplicates filtering
Goal : reduce the hassle of duplicate selection
Feature comparison allows us to find near duplicates and remove them
0.4
5
-3
…
3
Remove
Near duplicates filtering
-4
15
20
…
10
Remove
Feature comparison allows us to find near duplicates and remove them
Goal : reduce the hassle of duplicate selection
Feature comparison allows us to find near duplicates and remove them
-40
5
10
…
11
Remove
Near duplicates filtering
Goal : reduce the hassle of duplicate selection
Feature comparison allows us to find near duplicates and remove them
102
-32
120
…
33
Keep
Near duplicates filtering
Goal : reduce the hassle of duplicate selection
Photo selection with aesthetics
● Propose book covers or photo candidates for full page photos in a book
● Suggest premium products with the most beautiful pictures from a set
Photo selection with aesthetics
● Select the most beautiful photo from a set of duplicates
0.73
0.52
0.53
0.67
Style transfer
38
Let’s pratice !
How it ‘learns’: minimizing the error
input
expected
output expected
output
output
‘ cat ‘
Error
(ou loss)
How it ‘learns’ : minimizing the error
minimum
Starting
point
How it ‘learns’: minimizing the error
minimum
Starting
point
How it ‘learns’: backpropagation
input
expected
output
output
Error
(ou loss)
Takeaways
No algorithm is 100% accurate
It depends on:
How much data we have
The quality of the data
Performance tradeoff
Hardware
Some algorithms are better than humans,
some of them are way behind
How it ‘learns’ : minimizing the error
- Define a loss function
- Minimize this loss
- Gradient descent
algorithm

Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre @ Photobox

  • 1.
  • 2.
    Nos Speakers — Cristina Oprean& Pierre Stefani Machine Learning Engineer
  • 3.
    The Rise ofDeep Learning Healthcare Autonomous vehicles Reinforcement learning Robotics Language Processing GANs
  • 4.
    What is DeepLearning ?
  • 5.
    Why Deep Learning? Hand engineered features are time consuming, brittle and not scalable in practice Can we learn the underlying features directly from data?
  • 6.
    Why now ? NeuralNetworks date back decades, so why the resurgence?
  • 7.
    Inspired from neuroscience -Signal Input - Neuron activation Decision
  • 8.
    Fully connected networks Fullyconnected: Connect neuron in hidden layer to all neurons in input layer No spatial information! And many, many parameters! Input: 2D image Vector of pixel values
  • 9.
    Using spatial structure Idea:connect patches of input to neurons in the hidden layer. Neuron connected to region of input. Only “sees” these values Input: 2D image. Array of pixel values
  • 10.
    Applying Filters toExtract Features 1) Apply a set of weights – a filter – to extract local features 2) Use multiple filters to extract different features 3) Spatially share parameters of each filter
  • 11.
  • 12.
    Layers used tobuild ConvNets INPUT → will hold the raw pixel values of the image (e.g.[32,32,3] - image with width 32, height 32 and 3 channels: RGB) Convolutional layer (CONV) → compute the output of neurons connected to local regions in the input (e.g.: output of size [32x32x12] if we decided to use 12 filters) RELU → apply an elementwise activation function, such as the max(0,x) thresholding at zero. (e.g. the size of the volume is unchanged [32x32x12]) POOL → perform a downsampling operation along the spatial dimensions (width, height), resulting in volume such as [16x16x12]. Fully connected layer (FC) → compute the class scores, and it connects all neurons in this layer with all the neurons in the previous one. (e.g. the size is [1, 1, 10] if we have 10 classes)
  • 13.
    Input layer Existing colorspaces for images: RBG, Grayscale, HSV, Lab, etc. CNNs - architectures specific for images Reduce the images into a form easier to process Without loosing features needed for a good recognition
  • 14.
    Convolutional layer (CONV) Spatialarrangement: Depths (D) → nb of filters we want to use Stride (S) → value with which we slide the filter Padding (P) → the number of zeros we add to the border Input size (I) → size of the image Filter size (F) → size of the convolution => Size of the output: (I−F+2P)/S+1 E.g. i=5, F=3, P=0, S=1 => O=(5-3+0)/1+1=3
  • 15.
    Convolutional layer (CONV) Convlayers: In CNNs architectures can have multiple Conv layers 1st layer captures low-level features Final layers capture high-level features When the image has 3 channels (RGB) => the filter has D = 3
  • 16.
    Pooling layer (POOL) Reducesthe spatial size of the convolved feature Decreases computational power Extracts dominant features Has no parameters to learn Two types: max and average pooling
  • 17.
    Pooling layer (POOL) Maxpooling performs better than average pooling Max pooling is a noise suppressant
  • 18.
  • 19.
    RELU (activation function) LinearActivation functions produce linear decisions no matter the network size Non-linearities allow us to approximate arbitrarily complex functions
  • 20.
    CNNs for featurelearning 1. Learn features in input image through convolution 2. Introduce non-linearity through activation function (real-world data is non-linear!) 3. Reduce dimensionality and preserve spatial invariance with pooling
  • 21.
    CNNs for classification 1.CONV and POOL layers output high-level features of input 2. Fully connected layer uses these features for classifying input image 3. Express output as probability of image belonging to a particular class
  • 22.
    CNNs: training withbackpropagation Learn weights for convolutional filters and fully connected layers
  • 23.
  • 24.
    CNNs: Tips -dataset creation Where to find datasets https://toolbox.google.com/datasetsearch https://www.kaggle.com/datasets Crawl the web ! (Flickr, Bing, Google) Watch out for scientific paper releases and conference workshops
  • 26.
    CNNs: Tips -training ● No need to start from scratch (train with less data, start from pretrained models, domain adaptation) ● Monitor loss and validation accuracies ● Optimize once it works Batch size Learning Rates Modules & Loss
  • 27.
  • 28.
    Photo crop: naiveversion ⇒ How to crop the photo and keep the main subject ?
  • 29.
    Photo crop: usingface detection ⇒ Face detection allows to have a crop focused on the faces
  • 30.
    Near duplicates filtering Withoutduplicates filtering With duplicates filtering
  • 31.
    Near duplicates filtering 0.4 0 0.1 … 0 Remove Featurecomparison allows us to find near duplicates and remove them Goal : reduce the hassle of duplicate selection
  • 32.
    Near duplicates filtering Goal: reduce the hassle of duplicate selection Feature comparison allows us to find near duplicates and remove them 0.4 5 -3 … 3 Remove
  • 33.
    Near duplicates filtering -4 15 20 … 10 Remove Featurecomparison allows us to find near duplicates and remove them Goal : reduce the hassle of duplicate selection
  • 34.
    Feature comparison allowsus to find near duplicates and remove them -40 5 10 … 11 Remove Near duplicates filtering Goal : reduce the hassle of duplicate selection
  • 35.
    Feature comparison allowsus to find near duplicates and remove them 102 -32 120 … 33 Keep Near duplicates filtering Goal : reduce the hassle of duplicate selection
  • 36.
    Photo selection withaesthetics ● Propose book covers or photo candidates for full page photos in a book ● Suggest premium products with the most beautiful pictures from a set
  • 37.
    Photo selection withaesthetics ● Select the most beautiful photo from a set of duplicates 0.73 0.52 0.53 0.67
  • 38.
  • 39.
  • 40.
    How it ‘learns’:minimizing the error input expected output expected output output ‘ cat ‘ Error (ou loss)
  • 41.
    How it ‘learns’: minimizing the error minimum Starting point
  • 42.
    How it ‘learns’:minimizing the error minimum Starting point
  • 43.
    How it ‘learns’:backpropagation input expected output output Error (ou loss)
  • 44.
    Takeaways No algorithm is100% accurate It depends on: How much data we have The quality of the data Performance tradeoff Hardware Some algorithms are better than humans, some of them are way behind
  • 47.
    How it ‘learns’: minimizing the error - Define a loss function - Minimize this loss - Gradient descent algorithm