Don’t Start From
Scratch
Florian Muellerklein
fmuellerklein@minerkasch.com
Contents
● What is transfer learning?
● How to do it
● Why does it work?
What is transfer learning?
● Transfer learning is taking a neural network that has already been
trained and adapting it to a new dataset.
● Allows us to use deep learning on problems that we have very little
training data for.
● Improve the generalization of our network on some novel task.
● Let big AI labs do all the hard work for us.
Pretrained Neural Networks
Big research labs typically design and test their new deep learning models
against large public benchmark datasets.
Most commonly on Imagenet – 1.2 million images across 1000 categories
Or on MS COCO – 330k images for things like object detection, image
segmentation, and keypoint detection
Pretrained Neural Networks - Classification
VGG
VGG - https://arxiv.org/pdf/1409.1556.pdf, Inception V3 - https://arxiv.org/pdf/1512.00567v3.pdf, ResNet - https://arxiv.org/abs/1512.03385
Inception V3
ResNet
Pretrained Neural Networks - Detection
RCNN- https://arxiv.org/pdf/1506.01497.pdf, SSD- https://arxiv.org/pdf/1512.02325.pdf, YOLO- https://arxiv.org/abs/1512.03385
R-CNN SSD
YOLO
Transfer learning - Classification
Ok, so we have a state of the art neural network that someone else
trained to recognize images and classify them in one of 1000 categories.
But our problem has maybe 5 categories. What do we do?
Transfer learning - Classification
Ok, so we have a state of the art neural network that someone else
trained to recognize images and classify them in one of 1000 categories.
But our problem has maybe 5 categories. What do we do?
Transfer learning - Classification
Ok, so we have a state of the art neural network that someone else
trained to recognize images and classify them in one of 1000 categories.
But our problem has maybe 5 categories. What do we do?
fc 5
Transfer learning - Detection
Very similar, the only portions of the pretrained detection network that are
not flexible are the last portions.
We can call this the ‘Detector’ portion of the network, we can just swap
that out with our own ’detector’ portion that will learn the specific objects
that we want to detect.
What is transfer learning?
Once you have your own classifier/regression or whatever layer tacked on
you can start training.
If your dataset is small you may want to only train your new layer.
If your dataset is large you may want to retrain the whole network.
If your dataset is very different from the one that the network was
originally trained with you may want to retrain the whole network.
What if I don’t want to train?
Can even make use of a pretrained neural network without retraining.
Once trained the deep learning model because an effective way to
represent any image as a feature rich vector.
Now we can pass any image through our pretrained neural network and
get a dense vector representation of that image.
Allows us to do use these features as the input for any traditional machine
learning algorithm (logistic regression, random forest, etc …)
Or we can set up a reverse image search type of system by computing
similarity scores between the vectors.
Transfer learning – How Well Does it Work?
Remi Cadene’s masters thesis -
Deep Learning for Visual
Recognition
What’s going on
here?
Representation
Learning!
Representation Learning?
http://www.rsipvision.com/exploring-deep-learning/
Representation Learning?
http://www.rsipvision.com/exploring-deep-learning/
Input data
Representation Learning?
http://www.rsipvision.com/exploring-deep-learning/
Input data
Data Transformations
Representation Learning?
http://www.rsipvision.com/exploring-deep-learning/
Input data
Data Transformations
Output - usually some
shallow machine learning
layer (logistic regression,
linear svm, linear
regression).
Representation learning
Deep learning presents us with a unique philosophy for machine learning. Instead
of learning the mapping from our features directly to an output we are learning the
best representation of our features.
Each layer of a neural network introduces feature representations that are
expressed in terms of other more simple representations.
The process has multiple steps, with each layer building on the representations
created by the previous layer.
Representation Learning
Deep Learning Book (2016)
Representation Learning
Here is a test with natural images. Unsupervised clustering on the raw images and
then again on the features extracted from the final layer of a deep neural network.
Representation Learning
Representation Learning
Representation Learning
Representation Learning
Transfer Learning Context
Transfer Learning Context
Edges are very generic features.
Probably don’t need to finetune these.
Representation Learning
Representation Learning
Contours, shapes, and corners are
slightly more specific.
Finetuning could be helpful here, maybe
not necessary.
Representation Learning
Layers are starting to get really specific
to the original problem.
Should finetune these.
Representation Learning
Definitely finetune these
Representation learning
This is an interesting observation!
The ‘deep’ part of this neural network is essentially working to try and make the
problem easier.
We will leverage this property to very quickly and effectively make use of the
power of deep learning without the headache of training the networks from
scratch.
Recap
Applied Deep Learning is Mostly Finetuning
Most applied deep learning work revolves around two strategies. Both of which
utilize neural networks that have already been trained
1. We can take a pretrained network and finetune it on our specific problem.
a. Ex. A computer vision model already knows how to ‘see’ we just need to finetune it to
see whatever our specific problem requires.
2. Using a pretrained network and extract its internal representations of a
dataset to use as features for a model.
Fine Tuning a Network
To finetune a network we typically choose a network that was already trained on a
similar problem to our own.
We then train the network in much the same way that it was originally trained with
a few differences.
1. Use a much smaller learning rate
2. Only train for a handful of iterations
Feature Extraction
We can use these networks as feature extractors to represent our data as dense information
rich vectors.
We can vectorize the following quite easily:
1. Images
a. Using Convolutional Neural Networks trained to classify images
2. Words
a. Using neural networks trained to predict words given their context
3. Sentences
a. Using neural networks that are trained to reconstruct sentences given their context
Further Resources – Classification Tutorials
Tensorflow - https://www.tensorflow.org/tutorials/image_retraining
PyTorch - http://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html
Caffe - http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html
MXNet - https://mxnet.incubator.apache.org/how_to/finetune.html
Further Resources – Theory
How transferable are features in deep neural networks? Jason Yosinski, Jeff
Clune, Yoshua Bengio, Hod Lipson - https://arxiv.org/abs/1411.1792
Deep Learning Book, Ian Goodfellow, Yoshua Bengio, Aaron Courville -
http://www.deeplearningbook.org/
Overfeat: Integrated Recognition, Localization and Detection using Convolutional
Networks. Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob
Fergus, Yann LeCun - https://arxiv.org/pdf/1312.6229.pdf

Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problems - StampedeCon AI Summit 2017

  • 1.
    Don’t Start From Scratch FlorianMuellerklein fmuellerklein@minerkasch.com
  • 2.
    Contents ● What istransfer learning? ● How to do it ● Why does it work?
  • 3.
    What is transferlearning? ● Transfer learning is taking a neural network that has already been trained and adapting it to a new dataset. ● Allows us to use deep learning on problems that we have very little training data for. ● Improve the generalization of our network on some novel task. ● Let big AI labs do all the hard work for us.
  • 4.
    Pretrained Neural Networks Bigresearch labs typically design and test their new deep learning models against large public benchmark datasets. Most commonly on Imagenet – 1.2 million images across 1000 categories Or on MS COCO – 330k images for things like object detection, image segmentation, and keypoint detection
  • 5.
    Pretrained Neural Networks- Classification VGG VGG - https://arxiv.org/pdf/1409.1556.pdf, Inception V3 - https://arxiv.org/pdf/1512.00567v3.pdf, ResNet - https://arxiv.org/abs/1512.03385 Inception V3 ResNet
  • 6.
    Pretrained Neural Networks- Detection RCNN- https://arxiv.org/pdf/1506.01497.pdf, SSD- https://arxiv.org/pdf/1512.02325.pdf, YOLO- https://arxiv.org/abs/1512.03385 R-CNN SSD YOLO
  • 7.
    Transfer learning -Classification Ok, so we have a state of the art neural network that someone else trained to recognize images and classify them in one of 1000 categories. But our problem has maybe 5 categories. What do we do?
  • 8.
    Transfer learning -Classification Ok, so we have a state of the art neural network that someone else trained to recognize images and classify them in one of 1000 categories. But our problem has maybe 5 categories. What do we do?
  • 9.
    Transfer learning -Classification Ok, so we have a state of the art neural network that someone else trained to recognize images and classify them in one of 1000 categories. But our problem has maybe 5 categories. What do we do? fc 5
  • 10.
    Transfer learning -Detection Very similar, the only portions of the pretrained detection network that are not flexible are the last portions. We can call this the ‘Detector’ portion of the network, we can just swap that out with our own ’detector’ portion that will learn the specific objects that we want to detect.
  • 11.
    What is transferlearning? Once you have your own classifier/regression or whatever layer tacked on you can start training. If your dataset is small you may want to only train your new layer. If your dataset is large you may want to retrain the whole network. If your dataset is very different from the one that the network was originally trained with you may want to retrain the whole network.
  • 12.
    What if Idon’t want to train? Can even make use of a pretrained neural network without retraining. Once trained the deep learning model because an effective way to represent any image as a feature rich vector. Now we can pass any image through our pretrained neural network and get a dense vector representation of that image. Allows us to do use these features as the input for any traditional machine learning algorithm (logistic regression, random forest, etc …) Or we can set up a reverse image search type of system by computing similarity scores between the vectors.
  • 13.
    Transfer learning –How Well Does it Work? Remi Cadene’s masters thesis - Deep Learning for Visual Recognition
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
    Representation Learning? http://www.rsipvision.com/exploring-deep-learning/ Input data DataTransformations Output - usually some shallow machine learning layer (logistic regression, linear svm, linear regression).
  • 20.
    Representation learning Deep learningpresents us with a unique philosophy for machine learning. Instead of learning the mapping from our features directly to an output we are learning the best representation of our features. Each layer of a neural network introduces feature representations that are expressed in terms of other more simple representations. The process has multiple steps, with each layer building on the representations created by the previous layer.
  • 21.
  • 22.
    Representation Learning Here isa test with natural images. Unsupervised clustering on the raw images and then again on the features extracted from the final layer of a deep neural network.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
    Transfer Learning Context Edgesare very generic features. Probably don’t need to finetune these.
  • 29.
  • 30.
    Representation Learning Contours, shapes,and corners are slightly more specific. Finetuning could be helpful here, maybe not necessary.
  • 31.
    Representation Learning Layers arestarting to get really specific to the original problem. Should finetune these.
  • 32.
  • 33.
    Representation learning This isan interesting observation! The ‘deep’ part of this neural network is essentially working to try and make the problem easier. We will leverage this property to very quickly and effectively make use of the power of deep learning without the headache of training the networks from scratch.
  • 34.
  • 35.
    Applied Deep Learningis Mostly Finetuning Most applied deep learning work revolves around two strategies. Both of which utilize neural networks that have already been trained 1. We can take a pretrained network and finetune it on our specific problem. a. Ex. A computer vision model already knows how to ‘see’ we just need to finetune it to see whatever our specific problem requires. 2. Using a pretrained network and extract its internal representations of a dataset to use as features for a model.
  • 36.
    Fine Tuning aNetwork To finetune a network we typically choose a network that was already trained on a similar problem to our own. We then train the network in much the same way that it was originally trained with a few differences. 1. Use a much smaller learning rate 2. Only train for a handful of iterations
  • 37.
    Feature Extraction We canuse these networks as feature extractors to represent our data as dense information rich vectors. We can vectorize the following quite easily: 1. Images a. Using Convolutional Neural Networks trained to classify images 2. Words a. Using neural networks trained to predict words given their context 3. Sentences a. Using neural networks that are trained to reconstruct sentences given their context
  • 38.
    Further Resources –Classification Tutorials Tensorflow - https://www.tensorflow.org/tutorials/image_retraining PyTorch - http://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html Caffe - http://caffe.berkeleyvision.org/gathered/examples/finetune_flickr_style.html MXNet - https://mxnet.incubator.apache.org/how_to/finetune.html
  • 39.
    Further Resources –Theory How transferable are features in deep neural networks? Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson - https://arxiv.org/abs/1411.1792 Deep Learning Book, Ian Goodfellow, Yoshua Bengio, Aaron Courville - http://www.deeplearningbook.org/ Overfeat: Integrated Recognition, Localization and Detection using Convolutional Networks. Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, Yann LeCun - https://arxiv.org/pdf/1312.6229.pdf