Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problems - StampedeCon AI Summit 2017

Don’t Start From
Scratch
Florian Muellerklein
fmuellerklein@minerkasch.com

Contents
● What is transfer learning?
● How to do it
● Why does it work?

What is transfer learning?
● Transfer learning is taking a neural network that has already been
trained and adapting it to a new dataset.
● Allows us to use deep learning on problems that we have very little
training data for.
● Improve the generalization of our network on some novel task.
● Let big AI labs do all the hard work for us.

Pretrained Neural Networks
Big research labs typically design and test their new deep learning models
against large public benchmark datasets.
Most commonly on Imagenet – 1.2 million images across 1000 categories
Or on MS COCO – 330k images for things like object detection, image
segmentation, and keypoint detection

Pretrained Neural Networks - Classification
VGG
VGG - https://arxiv.org/pdf/1409.1556.pdf, Inception V3 - https://arxiv.org/pdf/1512.00567v3.pdf, ResNet - https://arxiv.org/abs/1512.03385
Inception V3
ResNet

Pretrained Neural Networks - Detection
RCNN- https://arxiv.org/pdf/1506.01497.pdf, SSD- https://arxiv.org/pdf/1512.02325.pdf, YOLO- https://arxiv.org/abs/1512.03385
R-CNN SSD
YOLO

Transfer learning - Classification
Ok, so we have a state of the art neural network that someone else
trained to recognize images and classify them in one of 1000 categories.
But our problem has maybe 5 categories. What do we do?

Transfer learning - Classification
Ok, so we have a state of the art neural network that someone else
trained to recognize images and classify them in one of 1000 categories.
But our problem has maybe 5 categories. What do we do?
fc 5

Transfer learning - Detection
Very similar, the only portions of the pretrained detection network that are
not flexible are the last portions.
We can call this the ‘Detector’ portion of the network, we can just swap
that out with our own ’detector’ portion that will learn the specific objects
that we want to detect.

What is transfer learning?
Once you have your own classifier/regression or whatever layer tacked on
you can start training.
If your dataset is small you may want to only train your new layer.
If your dataset is large you may want to retrain the whole network.
If your dataset is very different from the one that the network was
originally trained with you may want to retrain the whole network.

What if I don’t want to train?
Can even make use of a pretrained neural network without retraining.
Once trained the deep learning model because an effective way to
represent any image as a feature rich vector.
Now we can pass any image through our pretrained neural network and
get a dense vector representation of that image.
Allows us to do use these features as the input for any traditional machine
learning algorithm (logistic regression, random forest, etc …)
Or we can set up a reverse image search type of system by computing
similarity scores between the vectors.

Transfer learning – How Well Does it Work?
Remi Cadene’s masters thesis -
Deep Learning for Visual
Recognition

Representation Learning?
http://www.rsipvision.com/exploring-deep-learning/

Input data

Input data
Data Transformations

Input data
Data Transformations
Output - usually some
shallow machine learning
layer (logistic regression,
linear svm, linear
regression).

Representation learning
Deep learning presents us with a unique philosophy for machine learning. Instead
of learning the mapping from our features directly to an output we are learning the
best representation of our features.
Each layer of a neural network introduces feature representations that are
expressed in terms of other more simple representations.
The process has multiple steps, with each layer building on the representations
created by the previous layer.

Representation Learning
Deep Learning Book (2016)

Here is a test with natural images. Unsupervised clustering on the raw images and
then again on the features extracted from the final layer of a deep neural network.

Transfer Learning Context
Edges are very generic features.
Probably don’t need to finetune these.

Contours, shapes, and corners are
slightly more specific.
Finetuning could be helpful here, maybe
not necessary.

Layers are starting to get really specific
to the original problem.
Should finetune these.

Definitely finetune these

Representation learning
This is an interesting observation!
The ‘deep’ part of this neural network is essentially working to try and make the
problem easier.
We will leverage this property to very quickly and effectively make use of the
power of deep learning without the headache of training the networks from
scratch.

Applied Deep Learning is Mostly Finetuning
Most applied deep learning work revolves around two strategies. Both of which
utilize neural networks that have already been trained
1. We can take a pretrained network and finetune it on our specific problem.
a. Ex. A computer vision model already knows how to ‘see’ we just need to finetune it to
see whatever our specific problem requires.
2. Using a pretrained network and extract its internal representations of a
dataset to use as features for a model.

Fine Tuning a Network
To finetune a network we typically choose a network that was already trained on a
similar problem to our own.
We then train the network in much the same way that it was originally trained with
a few differences.
1. Use a much smaller learning rate
2. Only train for a handful of iterations

Feature Extraction
We can use these networks as feature extractors to represent our data as dense information
rich vectors.
We can vectorize the following quite easily:
1. Images
a. Using Convolutional Neural Networks trained to classify images
2. Words
a. Using neural networks trained to predict words given their context
3. Sentences
a. Using neural networks that are trained to reconstruct sentences given their context

Further Resources – Classification Tutorials
Tensorflow - https://www.tensorflow.org/tutorials/image_retraining
PyTorch - http://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html
Caffe - http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html
MXNet - https://mxnet.incubator.apache.org/how_to/finetune.html

Further Resources – Theory
How transferable are features in deep neural networks? Jason Yosinski, Jeff
Clune, Yoshua Bengio, Hod Lipson - https://arxiv.org/abs/1411.1792
Deep Learning Book, Ian Goodfellow, Yoshua Bengio, Aaron Courville -
http://www.deeplearningbook.org/
Overfeat: Integrated Recognition, Localization and Detection using Convolutional
Networks. Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob
Fergus, Yann LeCun - https://arxiv.org/pdf/1312.6229.pdf

Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problems - StampedeCon AI Summit 2017

More Related Content

What's hot

Similar to Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problems - StampedeCon AI Summit 2017

More from StampedeCon

Recently uploaded

Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problems - StampedeCon AI Summit 2017