SlideShare a Scribd company logo
How transferable are features in
deep neural networks?
Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson
DL Hacks
!  NIPS2014
!  15
!  Deep Learning
! 
! 
! 
! 
! 
DNN general specific
! 
1
! 
! 
general
!  specific
http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/CVPR2012-Tutorial_lee.pdf
general
specific
general specific
DNN general
1.
general specific
2.
general specific
3.
[Pan et al. 2010] inductive
transfer learning
base task
target task
&
general
fine-tune frozen
2
1. fine-tune
2. frozen
frozen
fine-tune
A B
!  A B 8
baseA baseB
!  8 n {1,2,…,7}
!  n 1 n
n=
!  B3B baseB 3 frozen,
B selffer
!  A3B baseA 3 frozen,
B transfer
!  frozen fine-tune B3B+ A3B+
Figure 1: Overview of the experimental treatments and controls. Top two rows: The base networks
are trained using standard supervised backprop on only half of the ImageNet dataset (first row: A
half, second row: B half). Thelabeled rectangles(e.g. WA 1) represent theweight vector learned for
frozen
=fine-tune
n AnB BnA
!  A3B baseB 3
B general
!  A3B baseB 3 A
specific
Figure 1: Overview of the experimental treatments and controls. Top two rows: The base networks
are trained using standard supervised backprop on only half of the ImageNet dataset (first row: A
ImageNet[Deng et al., 2009]
!  1000 1281167 50000
!  A B 500 645000
! 
!  ImageNet
13
6 7
! 
!  man-made 551 natural 449
!  Caffe [Jia et al., 2014]
! 
Caffe
!  http://yosinski.com/transfer
!  ipython notebook
1
0 1 2 3 4 5 6 7
0.52
0.54
0.56
0.58
0.60
0.62
0.64
0.66
7op-1accuracy(higherisbetter)
baseB
selffer BnB
selffer BnB+
transfer AnB
transfer AnB+
• 
2 3 4 5
Layer n at whLFh network Ls Fhopped and retraLned
Figure 2: The results from this paper’s main experiment. Top: Each marker in the figure represents
the average accuracy over the validation set for a trained network. The white circles above n =
0 represent the accuracy of baseB. There are eight points, because we tested on four separate
random A/B splits. Each dark blue dot represents a BnB network. Light blue points represent
BnB+
networks, or fine-tuned versions of BnB. Dark red diamonds are AnB networks, and light
red diamonds are the fine-tuned AnB+
versions. Points are shifted slightly left or right for visual
•  A B 4
•  AnA BnB BnB AnB
1
1
!  baseB 37.5% 1000 42.5%
!  500 1000
0 1 2 3 4 5 6 7
0.52
0.54
0.56
0.58
0.60
0.62
0.64
0.66
7op-1accuracy(higherisbetter)
baseB
selffer BnB
selffer BnB+
transfer AnB
transfer AnB+
Figure 2: The results from this paper’s main experiment. Top: Each marker in the figure represents
the average accuracy over the validation set for a trained network. The white circles above n =
0 represent the accuracy of baseB. There are eight points, because we tested on four separate
random A/B splits. Each dark blue dot represents a BnB network. Light blue points represent
BnB+
networks, or fine-tuned versions of BnB. Dark red diamonds are AnB networks, and light
red diamonds are the fine-tuned AnB+
versions. Points are shifted slightly left or right for visual
clarity. Bottom: Lines connecting the means of each treatment. Numbered descriptions above each
linerefer to which interpretation from Section 4.1 applies.
0 1 2 3 4 5 6 7
0.52
0.54
0.56
0.58
0.60
0.62
0.64
0.66
7op-1accuracy(higherisbetter)
baseB
selffer BnB
selffer BnB+
transfer AnB
transfer AnB+
Figure 2: The results from this paper’s main experiment. Top: Each marker in the figure represents
the average accuracy over the validation set for a trained network. The white circles above n =
0 represent the accuracy of baseB. There are eight points, because we tested on four separate
random A/B splits. Each dark blue dot represents a BnB network. Light blue points represent
BnB+
networks, or fine-tuned versions of BnB. Dark red diamonds are AnB networks, and light
red diamonds are the fine-tuned AnB+
versions. Points are shifted slightly left or right for visual
clarity. Bottom: Lines connecting the means of each treatment. Numbered descriptions above each
linerefer to which interpretation from Section 4.1 applies.
1
2
!  BnB 4,5 B
NN (co-adaptation)
!  frozen
!  6,7
0 1 2 3 4 5 6 7
0.52
0.54
0.56
0.58
0.60
0.62
0.64
0.66
7op-1accuracy(higherisbetter)
baseB
selffer BnB
selffer BnB+
transfer AnB
transfer AnB+
Figure 2: The results from this paper’s main experiment. Top: Each marker in the figure represents
the average accuracy over the validation set for a trained network. The white circles above n =
0 represent the accuracy of baseB. There are eight points, because we tested on four separate
random A/B splits. Each dark blue dot represents a BnB network. Light blue points represent
BnB+
networks, or fine-tuned versions of BnB. Dark red diamonds are AnB networks, and light
red diamonds are the fine-tuned AnB+
versions. Points are shifted slightly left or right for visual
clarity. Bottom: Lines connecting the means of each treatment. Numbered descriptions above each
linerefer to which interpretation from Section 4.1 applies.
1
3
!  BnB+ baseB
fine-tuning BnB
0 1 2 3 4 5 6 7
0.52
0.54
0.56
0.58
0.60
0.62
0.64
0.66
7op-1accuracy(higherisbetter)
baseB
selffer BnB
selffer BnB+
transfer AnB
transfer AnB+
Figure 2: The results from this paper’s main experiment. Top: Each marker in the figure represents
the average accuracy over the validation set for a trained network. The white circles above n =
0 represent the accuracy of baseB. There are eight points, because we tested on four separate
random A/B splits. Each dark blue dot represents a BnB network. Light blue points represent
BnB+
networks, or fine-tuned versions of BnB. Dark red diamonds are AnB networks, and light
red diamonds are the fine-tuned AnB+
versions. Points are shifted slightly left or right for visual
clarity. Bottom: Lines connecting the means of each treatment. Numbered descriptions above each
linerefer to which interpretation from Section 4.1 applies.
1
4
!  AnB 1,2
1 2 general
!  3 4 ~7
2
3, 4
5~7 &specific
0 1 2 3 4 5 6 7
0.52
0.54
0.56
0.58
0.60
0.62
0.64
0.66
7op-1accuracy(higherisbetter)
baseB
selffer BnB
selffer BnB+
transfer AnB
transfer AnB+
Figure 2: The results from this paper’s main experiment. Top: Each marker in the figure represents
the average accuracy over the validation set for a trained network. The white circles above n =
0 represent the accuracy of baseB. There are eight points, because we tested on four separate
random A/B splits. Each dark blue dot represents a BnB network. Light blue points represent
BnB+
networks, or fine-tuned versions of BnB. Dark red diamonds are AnB networks, and light
red diamonds are the fine-tuned AnB+
versions. Points are shifted slightly left or right for visual
clarity. Bottom: Lines connecting the means of each treatment. Numbered descriptions above each
linerefer to which interpretation from Section 4.1 applies.
1
5
!  AnB+ baseB BnB+
! 
! 
BnB+
!  fine-tuning
! 
! 
Table1: Performanceboost of AnB+
over controls, averaged over differen
mean boost mean boost
layers over over
aggregated baseB selffer BnB+
1-7 1.6% 1.4%
3-7 1.8% 1.4%
5-7 2.1% 1.7%
4.2 Dissimilar Datasets: SplittingMan-madeand Natural ClassesIntoS
Asmentioned previously, theeffectivenessof featuretransfer isexpected to d
target tasks become less similar. We test this hypothesis by comparing tran
similar datasets (the random A/B splits discussed above) to that on dissimila
assigning man-madeobject classesto A and natural object classesto B. Thism
createsdatasetsasdissimilar aspossiblewithin theImageNet dataset.
Theupper-left subplot of Figure3 showstheaccuracy of abaseA and baseB n
and BnA and AnB networks (orange hexagons). Lines join common target ta
two linescontainsthosenetworkstrained toward thetarget task containing natu
and AnB). Thesenetworksperform better than thosetrained toward theman-m
may bedueto having only 449 classesinstead of 551, or simply being an easi
4.3 Random Weights
We also compare to random, untrained weights because Jarrett et al. (2009) s
ingly — that the combination of random convolutional filters, rectification, p
malization can work almost aswell aslearned features. They reported thisres
networksof two or threelearned layersand on thesmaller Caltech-101 dataset
It isnatural toask whether or not thenearly optimal performanceof random filt
over to adeeper network trained on alarger dataset.
1
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7
Layer n at whLFh network Ls Fhopped and retraLned
0.54
0.56
0.58
0.60
0.62
0.64
7op-1aFFuraFy(hLgherLsbetter)
5: 7ransfer + fLne-tunLng LPproves generalLzatLon
3: )Lne-tunLng reFovers Fo-adapted LnteraFtLons
2: 3erforPanFe drops
due to fragLle
Fo-adaptatLon
4: 3erforPanFe
drops due to
representatLon
speFLfLFLty
Figure 2: The results from this paper’s main experiment. Top: Each marker in the figure represents
the average accuracy over the validation set for a trained network. The white circles above n =
0 represent the accuracy of baseB. There are eight points, because we tested on four separate
2
• 
0 1 2 3 4 5 6 7
0.3
0.4
0.5
0.6
0.7
7Rp-1accuracy
0an-made/1atural splLt
baseA man-made, baseB natural
•  baseA baseB
•  BnA AnB
• 
•  natural man-made
•  natural 449 man-made 551
•  natural
3
!  [Jarrett et al. 2009] Rectification
!  NN 2,3
!  Caltech-101
NN&
3
! 
! 
!  1,2 3
!  NN
!  [Jarrett et al. 2009]
0 1 2 3 4 5 6 7
0.0
0.1
0.2
0.3
0.4
0.5
0.6
5andRm, untraLned fLlters
1,2,3
0 1 2 3 4 5 6 7
/ayer n at whLch netwRrN Ls chRpped and retraLned
−0.30
−0.25
−0.20
−0.15
−0.10
−0.05
0.00
5elatLvetRp-1accuracy(hLgherLsbetter)
reference
mean AnB, randRm splLts
mean AnB, m/n splLt
randRm features
Figure 3: Performance degradation vs. layer. Top left: Degradation when transferring between dis-
similar tasks (from man-made classes of ImageNet to natural classes or vice versa). The upper line
connects networks trained to the “natural” target task, and the lower line connects those trained to-
ward the“man-made” target task. Top right: Performancewhen thefirst n layersconsist of random,
untrained weights. Bottom: The top two plots compared to the random A/B split from Section 4.1
(red diamonds), all normalized by subtracting their baselevel performance.
dataset, making their random filters perform better by comparison. In the supplementary material,
weprovidean extraexperiment indicating theextent to which our networksareoverfit.
5 Conclusions
We have demonstrated a method for quantifying the transferability of features from each layer of
a neural network, which reveals their generality or specificity. We showed how transferability is
negatively affected by two distinct issues: optimization difficulties related to splitting networks in
themiddleof fragilely co-adaptedlayersandthespecializationof higher layer featurestotheoriginal
• 
• 
•  [Jarrett et al. 2009] [Jarrett et al. 2009]
Caltech-101
NN general specific
!  frozen 2
! 
!  specific
! 
! 
!  fine-tuning
! 
!  & GPU 9.5
!  DL
!  NN
! 
!  DL
! 
! 
!  R. Caruana, “Multitask learning,” Mach. Learn., vol. 28(1), pp. 41–75,
1997.
!  S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans.
Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, 2010.
!  Deep Learning
!  Y. Bengio, “Deep Learning of Representations for Unsupervised and
Transfer Learning,” JMLR Work. Conf. Proc. 7, vol. 7, pp. 1–20, 2011.
! 
!  http://yosinski.com/transfer
!  CVPR 2012 Tutorial : Deep Learning Methods for Vision
http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/
CVPR2012-Tutorial_lee.pdf

More Related Content

What's hot

Bayesian Neural Networks : Survey
Bayesian Neural Networks : SurveyBayesian Neural Networks : Survey
Bayesian Neural Networks : Survey
tmtm otm
 
High-impact Papers in Computer Vision: 歴史を変えた/トレンドを創る論文
High-impact Papers in Computer Vision: 歴史を変えた/トレンドを創る論文High-impact Papers in Computer Vision: 歴史を変えた/トレンドを創る論文
High-impact Papers in Computer Vision: 歴史を変えた/トレンドを創る論文
cvpaper. challenge
 
【DL輪読会】Where do Models go Wrong? Parameter-Space Saliency Maps for Explainabi...
【DL輪読会】Where do Models go Wrong? Parameter-Space Saliency Maps for Explainabi...【DL輪読会】Where do Models go Wrong? Parameter-Space Saliency Maps for Explainabi...
【DL輪読会】Where do Models go Wrong? Parameter-Space Saliency Maps for Explainabi...
Deep Learning JP
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
Masahiro Suzuki
 
SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...
Kazuki Fujikawa
 
MixMatch: A Holistic Approach to Semi- Supervised Learning
MixMatch: A Holistic Approach to Semi- Supervised LearningMixMatch: A Holistic Approach to Semi- Supervised Learning
MixMatch: A Holistic Approach to Semi- Supervised Learning
harmonylab
 
グラフデータの機械学習における特徴表現の設計と学習
グラフデータの機械学習における特徴表現の設計と学習グラフデータの機械学習における特徴表現の設計と学習
グラフデータの機械学習における特徴表現の設計と学習
Ichigaku Takigawa
 
PRMLの線形回帰モデル(線形基底関数モデル)
PRMLの線形回帰モデル(線形基底関数モデル)PRMLの線形回帰モデル(線形基底関数モデル)
PRMLの線形回帰モデル(線形基底関数モデル)
Yasunori Ozaki
 
深層学習によるHuman Pose Estimationの基礎
深層学習によるHuman Pose Estimationの基礎深層学習によるHuman Pose Estimationの基礎
深層学習によるHuman Pose Estimationの基礎
Takumi Ohkuma
 
Learning to summarize from human feedback
Learning to summarize from human feedbackLearning to summarize from human feedback
Learning to summarize from human feedback
harmonylab
 
Semi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningSemi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learning
Yusuke Uchida
 
3D CNNによる人物行動認識の動向
3D CNNによる人物行動認識の動向3D CNNによる人物行動認識の動向
3D CNNによる人物行動認識の動向
Kensho Hara
 
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transfo...
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transfo...SegFormer: Simple and Efficient Design for Semantic Segmentation with Transfo...
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transfo...
harmonylab
 
PRML輪読#5
PRML輪読#5PRML輪読#5
PRML輪読#5
matsuolab
 
幾何を使った統計のはなし
幾何を使った統計のはなし幾何を使った統計のはなし
幾何を使った統計のはなし
Toru Imai
 
[DL輪読会]Pay Attention to MLPs (gMLP)
[DL輪読会]Pay Attention to MLPs	(gMLP)[DL輪読会]Pay Attention to MLPs	(gMLP)
[DL輪読会]Pay Attention to MLPs (gMLP)
Deep Learning JP
 
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
 [DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent [DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
Deep Learning JP
 
PRML 6.1章 カーネル法と双対表現
PRML 6.1章 カーネル法と双対表現PRML 6.1章 カーネル法と双対表現
PRML 6.1章 カーネル法と双対表現
hagino 3000
 
スペクトラルグラフ理論入門
スペクトラルグラフ理論入門スペクトラルグラフ理論入門
スペクトラルグラフ理論入門irrrrr
 

What's hot (20)

Bayesian Neural Networks : Survey
Bayesian Neural Networks : SurveyBayesian Neural Networks : Survey
Bayesian Neural Networks : Survey
 
High-impact Papers in Computer Vision: 歴史を変えた/トレンドを創る論文
High-impact Papers in Computer Vision: 歴史を変えた/トレンドを創る論文High-impact Papers in Computer Vision: 歴史を変えた/トレンドを創る論文
High-impact Papers in Computer Vision: 歴史を変えた/トレンドを創る論文
 
【DL輪読会】Where do Models go Wrong? Parameter-Space Saliency Maps for Explainabi...
【DL輪読会】Where do Models go Wrong? Parameter-Space Saliency Maps for Explainabi...【DL輪読会】Where do Models go Wrong? Parameter-Space Saliency Maps for Explainabi...
【DL輪読会】Where do Models go Wrong? Parameter-Space Saliency Maps for Explainabi...
 
GAN(と強化学習との関係)
GAN(と強化学習との関係)GAN(と強化学習との関係)
GAN(と強化学習との関係)
 
SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...
 
MixMatch: A Holistic Approach to Semi- Supervised Learning
MixMatch: A Holistic Approach to Semi- Supervised LearningMixMatch: A Holistic Approach to Semi- Supervised Learning
MixMatch: A Holistic Approach to Semi- Supervised Learning
 
グラフデータの機械学習における特徴表現の設計と学習
グラフデータの機械学習における特徴表現の設計と学習グラフデータの機械学習における特徴表現の設計と学習
グラフデータの機械学習における特徴表現の設計と学習
 
PRMLの線形回帰モデル(線形基底関数モデル)
PRMLの線形回帰モデル(線形基底関数モデル)PRMLの線形回帰モデル(線形基底関数モデル)
PRMLの線形回帰モデル(線形基底関数モデル)
 
深層学習によるHuman Pose Estimationの基礎
深層学習によるHuman Pose Estimationの基礎深層学習によるHuman Pose Estimationの基礎
深層学習によるHuman Pose Estimationの基礎
 
Learning to summarize from human feedback
Learning to summarize from human feedbackLearning to summarize from human feedback
Learning to summarize from human feedback
 
PRML 5.3-5.4
PRML 5.3-5.4PRML 5.3-5.4
PRML 5.3-5.4
 
Semi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learningSemi supervised, weakly-supervised, unsupervised, and active learning
Semi supervised, weakly-supervised, unsupervised, and active learning
 
3D CNNによる人物行動認識の動向
3D CNNによる人物行動認識の動向3D CNNによる人物行動認識の動向
3D CNNによる人物行動認識の動向
 
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transfo...
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transfo...SegFormer: Simple and Efficient Design for Semantic Segmentation with Transfo...
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transfo...
 
PRML輪読#5
PRML輪読#5PRML輪読#5
PRML輪読#5
 
幾何を使った統計のはなし
幾何を使った統計のはなし幾何を使った統計のはなし
幾何を使った統計のはなし
 
[DL輪読会]Pay Attention to MLPs (gMLP)
[DL輪読会]Pay Attention to MLPs	(gMLP)[DL輪読会]Pay Attention to MLPs	(gMLP)
[DL輪読会]Pay Attention to MLPs (gMLP)
 
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
 [DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent [DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
[DL輪読会]A Bayesian Perspective on Generalization and Stochastic Gradient Descent
 
PRML 6.1章 カーネル法と双対表現
PRML 6.1章 カーネル法と双対表現PRML 6.1章 カーネル法と双対表現
PRML 6.1章 カーネル法と双対表現
 
スペクトラルグラフ理論入門
スペクトラルグラフ理論入門スペクトラルグラフ理論入門
スペクトラルグラフ理論入門
 

Viewers also liked

(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence
Masahiro Suzuki
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
Masahiro Suzuki
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning
Masahiro Suzuki
 
Introduction to "Facial Landmark Detection by Deep Multi-task Learning"
Introduction to "Facial Landmark Detection by Deep Multi-task Learning"Introduction to "Facial Landmark Detection by Deep Multi-task Learning"
Introduction to "Facial Landmark Detection by Deep Multi-task Learning"
Yukiyoshi Sasao
 
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
Masahiro Suzuki
 
(DL hacks輪読) Difference Target Propagation
(DL hacks輪読) Difference Target Propagation(DL hacks輪読) Difference Target Propagation
(DL hacks輪読) Difference Target Propagation
Masahiro Suzuki
 
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
Masahiro Suzuki
 
(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network
Masahiro Suzuki
 
生成モデルの Deep Learning
生成モデルの Deep Learning生成モデルの Deep Learning
生成モデルの Deep Learning
Seiya Tokui
 
深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習
Masahiro Suzuki
 
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...
Masahiro Suzuki
 
(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning
Masahiro Suzuki
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
Masahiro Suzuki
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
Masahiro Suzuki
 
[Dl輪読会]dl hacks輪読
[Dl輪読会]dl hacks輪読[Dl輪読会]dl hacks輪読
[Dl輪読会]dl hacks輪読
Deep Learning JP
 
Iclr2016 vaeまとめ
Iclr2016 vaeまとめIclr2016 vaeまとめ
Iclr2016 vaeまとめ
Deep Learning JP
 
Large-Scale Object Classification Using Label Relation Graphs
Large-Scale Object Classification Using Label Relation GraphsLarge-Scale Object Classification Using Label Relation Graphs
Large-Scale Object Classification Using Label Relation Graphs
Takuya Minagawa
 
(DL hacks輪読) Deep Kalman Filters
(DL hacks輪読) Deep Kalman Filters(DL hacks輪読) Deep Kalman Filters
(DL hacks輪読) Deep Kalman Filters
Masahiro Suzuki
 
Iaetsd deblurring of noisy or blurred
Iaetsd deblurring of noisy or blurredIaetsd deblurring of noisy or blurred
Iaetsd deblurring of noisy or blurred
Iaetsd Iaetsd
 
Deblurring in ct
Deblurring in ctDeblurring in ct
Deblurring in ct
Kriti Sharma
 

Viewers also liked (20)

(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
 
(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning(DL輪読)Matching Networks for One Shot Learning
(DL輪読)Matching Networks for One Shot Learning
 
Introduction to "Facial Landmark Detection by Deep Multi-task Learning"
Introduction to "Facial Landmark Detection by Deep Multi-task Learning"Introduction to "Facial Landmark Detection by Deep Multi-task Learning"
Introduction to "Facial Landmark Detection by Deep Multi-task Learning"
 
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
(DL hacks輪読) Variational Dropout and the Local Reparameterization Trick
 
(DL hacks輪読) Difference Target Propagation
(DL hacks輪読) Difference Target Propagation(DL hacks輪読) Difference Target Propagation
(DL hacks輪読) Difference Target Propagation
 
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
(研究会輪読) Facial Landmark Detection by Deep Multi-task Learning
 
(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network(DL hacks輪読)Bayesian Neural Network
(DL hacks輪読)Bayesian Neural Network
 
生成モデルの Deep Learning
生成モデルの Deep Learning生成モデルの Deep Learning
生成モデルの Deep Learning
 
深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習
 
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...
(DL hacks輪読) Seven neurons memorizing sequences of alphabetical images via sp...
 
(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning(DL hacks輪読) Deep Kernel Learning
(DL hacks輪読) Deep Kernel Learning
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
 
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
 
[Dl輪読会]dl hacks輪読
[Dl輪読会]dl hacks輪読[Dl輪読会]dl hacks輪読
[Dl輪読会]dl hacks輪読
 
Iclr2016 vaeまとめ
Iclr2016 vaeまとめIclr2016 vaeまとめ
Iclr2016 vaeまとめ
 
Large-Scale Object Classification Using Label Relation Graphs
Large-Scale Object Classification Using Label Relation GraphsLarge-Scale Object Classification Using Label Relation Graphs
Large-Scale Object Classification Using Label Relation Graphs
 
(DL hacks輪読) Deep Kalman Filters
(DL hacks輪読) Deep Kalman Filters(DL hacks輪読) Deep Kalman Filters
(DL hacks輪読) Deep Kalman Filters
 
Iaetsd deblurring of noisy or blurred
Iaetsd deblurring of noisy or blurredIaetsd deblurring of noisy or blurred
Iaetsd deblurring of noisy or blurred
 
Deblurring in ct
Deblurring in ctDeblurring in ct
Deblurring in ct
 

Similar to (DL Hacks輪読) How transferable are features in deep neural networks?

Ensemble normalization for stable training
Ensemble normalization for stable trainingEnsemble normalization for stable training
Ensemble normalization for stable training
Seoung-Ho Choi
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
Bi-activation Function : an Enhanced Version of an Activation Function in C...
Bi-activation Function : an Enhanced Version of an Activation Function in C...Bi-activation Function : an Enhanced Version of an Activation Function in C...
Bi-activation Function : an Enhanced Version of an Activation Function in C...
Seoung-Ho Choi
 
Unit iii update
Unit iii updateUnit iii update
Unit iii update
Indira Priyadarsini
 
Paper id 252014146
Paper id 252014146Paper id 252014146
Paper id 252014146IJRAT
 
H017416670
H017416670H017416670
H017416670
IOSR Journals
 

Similar to (DL Hacks輪読) How transferable are features in deep neural networks? (8)

Ensemble normalization for stable training
Ensemble normalization for stable trainingEnsemble normalization for stable training
Ensemble normalization for stable training
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Galgo f
Galgo fGalgo f
Galgo f
 
Bi-activation Function : an Enhanced Version of an Activation Function in C...
Bi-activation Function : an Enhanced Version of an Activation Function in C...Bi-activation Function : an Enhanced Version of an Activation Function in C...
Bi-activation Function : an Enhanced Version of an Activation Function in C...
 
Report_NLNN
Report_NLNNReport_NLNN
Report_NLNN
 
Unit iii update
Unit iii updateUnit iii update
Unit iii update
 
Paper id 252014146
Paper id 252014146Paper id 252014146
Paper id 252014146
 
H017416670
H017416670H017416670
H017416670
 

More from Masahiro Suzuki

深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)
Masahiro Suzuki
 
確率的推論と行動選択
確率的推論と行動選択確率的推論と行動選択
確率的推論と行動選択
Masahiro Suzuki
 
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
Masahiro Suzuki
 
深層生成モデルと世界モデル
深層生成モデルと世界モデル深層生成モデルと世界モデル
深層生成モデルと世界モデル
Masahiro Suzuki
 
「世界モデル」と関連研究について
「世界モデル」と関連研究について「世界モデル」と関連研究について
「世界モデル」と関連研究について
Masahiro Suzuki
 
深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習
Masahiro Suzuki
 

More from Masahiro Suzuki (6)

深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)深層生成モデルと世界モデル(2020/11/20版)
深層生成モデルと世界モデル(2020/11/20版)
 
確率的推論と行動選択
確率的推論と行動選択確率的推論と行動選択
確率的推論と行動選択
 
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて深層生成モデルと世界モデル,深層生成モデルライブラリPixyzについて
深層生成モデルと世界モデル, 深層生成モデルライブラリPixyzについて
 
深層生成モデルと世界モデル
深層生成モデルと世界モデル深層生成モデルと世界モデル
深層生成モデルと世界モデル
 
「世界モデル」と関連研究について
「世界モデル」と関連研究について「世界モデル」と関連研究について
「世界モデル」と関連研究について
 
深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習深層生成モデルを用いたマルチモーダルデータの半教師あり学習
深層生成モデルを用いたマルチモーダルデータの半教師あり学習
 

Recently uploaded

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 

Recently uploaded (20)

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 

(DL Hacks輪読) How transferable are features in deep neural networks?

  • 1. How transferable are features in deep neural networks? Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson DL Hacks
  • 2. !  NIPS2014 !  15 !  Deep Learning !  !  !  !  ! 
  • 3. DNN general specific !  1 !  !  general !  specific http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/CVPR2012-Tutorial_lee.pdf general specific
  • 4. general specific DNN general 1. general specific 2. general specific 3.
  • 5. [Pan et al. 2010] inductive transfer learning base task target task & general
  • 6. fine-tune frozen 2 1. fine-tune 2. frozen frozen fine-tune
  • 7. A B !  A B 8 baseA baseB !  8 n {1,2,…,7} !  n 1 n
  • 8. n= !  B3B baseB 3 frozen, B selffer !  A3B baseA 3 frozen, B transfer !  frozen fine-tune B3B+ A3B+ Figure 1: Overview of the experimental treatments and controls. Top two rows: The base networks are trained using standard supervised backprop on only half of the ImageNet dataset (first row: A half, second row: B half). Thelabeled rectangles(e.g. WA 1) represent theweight vector learned for frozen =fine-tune n AnB BnA
  • 9. !  A3B baseB 3 B general !  A3B baseB 3 A specific Figure 1: Overview of the experimental treatments and controls. Top two rows: The base networks are trained using standard supervised backprop on only half of the ImageNet dataset (first row: A
  • 10. ImageNet[Deng et al., 2009] !  1000 1281167 50000 !  A B 500 645000 !  !  ImageNet 13 6 7 !  !  man-made 551 natural 449
  • 11. !  Caffe [Jia et al., 2014] !  Caffe !  http://yosinski.com/transfer !  ipython notebook
  • 12. 1 0 1 2 3 4 5 6 7 0.52 0.54 0.56 0.58 0.60 0.62 0.64 0.66 7op-1accuracy(higherisbetter) baseB selffer BnB selffer BnB+ transfer AnB transfer AnB+ •  2 3 4 5 Layer n at whLFh network Ls Fhopped and retraLned Figure 2: The results from this paper’s main experiment. Top: Each marker in the figure represents the average accuracy over the validation set for a trained network. The white circles above n = 0 represent the accuracy of baseB. There are eight points, because we tested on four separate random A/B splits. Each dark blue dot represents a BnB network. Light blue points represent BnB+ networks, or fine-tuned versions of BnB. Dark red diamonds are AnB networks, and light red diamonds are the fine-tuned AnB+ versions. Points are shifted slightly left or right for visual •  A B 4 •  AnA BnB BnB AnB
  • 13. 1 1 !  baseB 37.5% 1000 42.5% !  500 1000 0 1 2 3 4 5 6 7 0.52 0.54 0.56 0.58 0.60 0.62 0.64 0.66 7op-1accuracy(higherisbetter) baseB selffer BnB selffer BnB+ transfer AnB transfer AnB+ Figure 2: The results from this paper’s main experiment. Top: Each marker in the figure represents the average accuracy over the validation set for a trained network. The white circles above n = 0 represent the accuracy of baseB. There are eight points, because we tested on four separate random A/B splits. Each dark blue dot represents a BnB network. Light blue points represent BnB+ networks, or fine-tuned versions of BnB. Dark red diamonds are AnB networks, and light red diamonds are the fine-tuned AnB+ versions. Points are shifted slightly left or right for visual clarity. Bottom: Lines connecting the means of each treatment. Numbered descriptions above each linerefer to which interpretation from Section 4.1 applies.
  • 14. 0 1 2 3 4 5 6 7 0.52 0.54 0.56 0.58 0.60 0.62 0.64 0.66 7op-1accuracy(higherisbetter) baseB selffer BnB selffer BnB+ transfer AnB transfer AnB+ Figure 2: The results from this paper’s main experiment. Top: Each marker in the figure represents the average accuracy over the validation set for a trained network. The white circles above n = 0 represent the accuracy of baseB. There are eight points, because we tested on four separate random A/B splits. Each dark blue dot represents a BnB network. Light blue points represent BnB+ networks, or fine-tuned versions of BnB. Dark red diamonds are AnB networks, and light red diamonds are the fine-tuned AnB+ versions. Points are shifted slightly left or right for visual clarity. Bottom: Lines connecting the means of each treatment. Numbered descriptions above each linerefer to which interpretation from Section 4.1 applies. 1 2 !  BnB 4,5 B NN (co-adaptation) !  frozen !  6,7
  • 15. 0 1 2 3 4 5 6 7 0.52 0.54 0.56 0.58 0.60 0.62 0.64 0.66 7op-1accuracy(higherisbetter) baseB selffer BnB selffer BnB+ transfer AnB transfer AnB+ Figure 2: The results from this paper’s main experiment. Top: Each marker in the figure represents the average accuracy over the validation set for a trained network. The white circles above n = 0 represent the accuracy of baseB. There are eight points, because we tested on four separate random A/B splits. Each dark blue dot represents a BnB network. Light blue points represent BnB+ networks, or fine-tuned versions of BnB. Dark red diamonds are AnB networks, and light red diamonds are the fine-tuned AnB+ versions. Points are shifted slightly left or right for visual clarity. Bottom: Lines connecting the means of each treatment. Numbered descriptions above each linerefer to which interpretation from Section 4.1 applies. 1 3 !  BnB+ baseB fine-tuning BnB
  • 16. 0 1 2 3 4 5 6 7 0.52 0.54 0.56 0.58 0.60 0.62 0.64 0.66 7op-1accuracy(higherisbetter) baseB selffer BnB selffer BnB+ transfer AnB transfer AnB+ Figure 2: The results from this paper’s main experiment. Top: Each marker in the figure represents the average accuracy over the validation set for a trained network. The white circles above n = 0 represent the accuracy of baseB. There are eight points, because we tested on four separate random A/B splits. Each dark blue dot represents a BnB network. Light blue points represent BnB+ networks, or fine-tuned versions of BnB. Dark red diamonds are AnB networks, and light red diamonds are the fine-tuned AnB+ versions. Points are shifted slightly left or right for visual clarity. Bottom: Lines connecting the means of each treatment. Numbered descriptions above each linerefer to which interpretation from Section 4.1 applies. 1 4 !  AnB 1,2 1 2 general !  3 4 ~7 2 3, 4 5~7 &specific
  • 17. 0 1 2 3 4 5 6 7 0.52 0.54 0.56 0.58 0.60 0.62 0.64 0.66 7op-1accuracy(higherisbetter) baseB selffer BnB selffer BnB+ transfer AnB transfer AnB+ Figure 2: The results from this paper’s main experiment. Top: Each marker in the figure represents the average accuracy over the validation set for a trained network. The white circles above n = 0 represent the accuracy of baseB. There are eight points, because we tested on four separate random A/B splits. Each dark blue dot represents a BnB network. Light blue points represent BnB+ networks, or fine-tuned versions of BnB. Dark red diamonds are AnB networks, and light red diamonds are the fine-tuned AnB+ versions. Points are shifted slightly left or right for visual clarity. Bottom: Lines connecting the means of each treatment. Numbered descriptions above each linerefer to which interpretation from Section 4.1 applies. 1 5 !  AnB+ baseB BnB+ !  !  BnB+ !  fine-tuning !  !  Table1: Performanceboost of AnB+ over controls, averaged over differen mean boost mean boost layers over over aggregated baseB selffer BnB+ 1-7 1.6% 1.4% 3-7 1.8% 1.4% 5-7 2.1% 1.7% 4.2 Dissimilar Datasets: SplittingMan-madeand Natural ClassesIntoS Asmentioned previously, theeffectivenessof featuretransfer isexpected to d target tasks become less similar. We test this hypothesis by comparing tran similar datasets (the random A/B splits discussed above) to that on dissimila assigning man-madeobject classesto A and natural object classesto B. Thism createsdatasetsasdissimilar aspossiblewithin theImageNet dataset. Theupper-left subplot of Figure3 showstheaccuracy of abaseA and baseB n and BnA and AnB networks (orange hexagons). Lines join common target ta two linescontainsthosenetworkstrained toward thetarget task containing natu and AnB). Thesenetworksperform better than thosetrained toward theman-m may bedueto having only 449 classesinstead of 551, or simply being an easi 4.3 Random Weights We also compare to random, untrained weights because Jarrett et al. (2009) s ingly — that the combination of random convolutional filters, rectification, p malization can work almost aswell aslearned features. They reported thisres networksof two or threelearned layersand on thesmaller Caltech-101 dataset It isnatural toask whether or not thenearly optimal performanceof random filt over to adeeper network trained on alarger dataset.
  • 18. 1 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Layer n at whLFh network Ls Fhopped and retraLned 0.54 0.56 0.58 0.60 0.62 0.64 7op-1aFFuraFy(hLgherLsbetter) 5: 7ransfer + fLne-tunLng LPproves generalLzatLon 3: )Lne-tunLng reFovers Fo-adapted LnteraFtLons 2: 3erforPanFe drops due to fragLle Fo-adaptatLon 4: 3erforPanFe drops due to representatLon speFLfLFLty Figure 2: The results from this paper’s main experiment. Top: Each marker in the figure represents the average accuracy over the validation set for a trained network. The white circles above n = 0 represent the accuracy of baseB. There are eight points, because we tested on four separate
  • 19. 2 •  0 1 2 3 4 5 6 7 0.3 0.4 0.5 0.6 0.7 7Rp-1accuracy 0an-made/1atural splLt baseA man-made, baseB natural •  baseA baseB •  BnA AnB •  •  natural man-made •  natural 449 man-made 551 •  natural
  • 20. 3 !  [Jarrett et al. 2009] Rectification !  NN 2,3 !  Caltech-101 NN&
  • 21. 3 !  !  !  1,2 3 !  NN !  [Jarrett et al. 2009] 0 1 2 3 4 5 6 7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 5andRm, untraLned fLlters
  • 22. 1,2,3 0 1 2 3 4 5 6 7 /ayer n at whLch netwRrN Ls chRpped and retraLned −0.30 −0.25 −0.20 −0.15 −0.10 −0.05 0.00 5elatLvetRp-1accuracy(hLgherLsbetter) reference mean AnB, randRm splLts mean AnB, m/n splLt randRm features Figure 3: Performance degradation vs. layer. Top left: Degradation when transferring between dis- similar tasks (from man-made classes of ImageNet to natural classes or vice versa). The upper line connects networks trained to the “natural” target task, and the lower line connects those trained to- ward the“man-made” target task. Top right: Performancewhen thefirst n layersconsist of random, untrained weights. Bottom: The top two plots compared to the random A/B split from Section 4.1 (red diamonds), all normalized by subtracting their baselevel performance. dataset, making their random filters perform better by comparison. In the supplementary material, weprovidean extraexperiment indicating theextent to which our networksareoverfit. 5 Conclusions We have demonstrated a method for quantifying the transferability of features from each layer of a neural network, which reveals their generality or specificity. We showed how transferability is negatively affected by two distinct issues: optimization difficulties related to splitting networks in themiddleof fragilely co-adaptedlayersandthespecializationof higher layer featurestotheoriginal •  •  •  [Jarrett et al. 2009] [Jarrett et al. 2009] Caltech-101
  • 23. NN general specific !  frozen 2 !  !  specific !  !  !  fine-tuning
  • 24. !  !  & GPU 9.5 !  DL !  NN !  !  DL ! 
  • 25. !  !  R. Caruana, “Multitask learning,” Mach. Learn., vol. 28(1), pp. 41–75, 1997. !  S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, 2010. !  Deep Learning !  Y. Bengio, “Deep Learning of Representations for Unsupervised and Transfer Learning,” JMLR Work. Conf. Proc. 7, vol. 7, pp. 1–20, 2011. !  !  http://yosinski.com/transfer !  CVPR 2012 Tutorial : Deep Learning Methods for Vision http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/ CVPR2012-Tutorial_lee.pdf