Do Better ImageNet Models Transfer Better... for Image Recommendation?
Do Better ImageNet Models Transfer
Better … for Image Recommendation?
FelipedelRío,PabloMessina,VicenteDominguez,DenisParra
CS Department
Schoolof Engineering
PontificiaUniversidadCatólicadeChile
KTLRecSysWorkshop,6de Octubrede 2018
Artwork Recommendation
• Online artwork market: Growing since 2008, despite
global crises!
– In 2011, art received $11.57 billion in totalglobal annual
revenue, over $2 billion versus 2010 (*forbes)
• Previous recommendation projects date for as long as
2007, such as the CHIP project to recommend paintings
from Rijksmuseum.
• Little use of recent advances in Deep Neural Networks
for Computer Vision.
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 2
[forbes] The World’s Strongest Economy? the Global Art Market. https://www.forbes.com/sites/abigailesman/2012/02/29/the- worlds- strongest-
economy- the- global- art- market/ (2012)
Image Recommendation
• Since 2017 we have been working on
recommending art images, using data from the
online store UGallery.
• Two papers published:
– DLRS 2017: Dominguez,V., Messina, P., Parra,D., Mery,D., Trattner,
C., & Soto,A. (2017, August). ComparingNeural and Attractiveness-
based Visual Features for ArtworkRecommendation.In Proceedingsof
the 2nd WorkshoponDeep Learning for RecommenderSystems(pp. 55-59).
ACM.
– UMUAI2018: Messina, P., Dominguez,V., Parra,D., Trattner,C., &
Soto, A. (2018). Content-basedartworkrecommendation:integrating
painting metadata with neural and manually-engineeredvisualfeatures.
User Modelingand User-AdaptedInteraction,1-40.
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 3
Data: UGallery
• Online Artwork Store, based on CA, USA.
• Mostly sales one-of-a-kind physical artwork.
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 4
Image Recommendation
• Our top approach is a hybrid recommender, based
on metadata and visual features from Deep
Convolutional Neural Networks.
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 5
Motivation
• When submitting our work we usually received
criticism for not using the latest DNN model.
• An actual review from a previous article submission
(2017):
<< Overall an interesting paper although … the
choice of AlexNet is rather odd as there are better
pre-trained networks available e.g. VGG16 >>
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 6
Motivation
• Is it always the case that better pre-trained deep
convolutional models (on the Imagenet Challenge)
produce better results in a transfer learning setting?
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 7
Datasets in Computer Vision
• 1996: faces and cars 14,000 images of 10,000 people
• 1998: MNIST 70,000 images of handwritten digits
• 2004: Caltech 101, 9,146 images of 101 categories
• 2005: PASCAL VOC 20,000 images with 20 classes
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 9
Datasets in Computer Vision
• Imagenet: Presented in 2009 at CVPR
• Crowdsourced
• 14,197,122 images
• 21,841 categories (non-empty synsets)
• Categories based on WordNet taxonomy
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 10
WordNet
• Wordnet: Miller’s project started in 1980 at
Princeton, a hierarchy for the English language
• Prof. Fei-Fei Li (UIUC, Princeton, Stanford),
worked on filling WordNet with many images.
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 11
Crowdsourced
• Amazon Mechanical Turk
• It took 2.5 years to complete. Originally 3.2 million
images in 5,247 categories (mammal, vehicle, etc.)
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 12
ImageNet Challenge
• The dataset was used to
set a competition for
image classification:
from 2010 on.
• In 2012 a team used
deep learning, got error
rate below 25% (Hinton
et al.), 10.8 point
margin, 41% better than
next best.
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 13
Transfer Learning
• 2012 model was called AlexNet: a Convolutional
Neural Network
• The features learned (fc6, fc7) have been used in
succesfully, allowing to transfer the learning to
other tasks.
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 14
Inspiration
• Simon Kornblith, Jonathon Shlens, and Quoc V. Le.
2018. Do Better ImageNetModels Transfer Better?
(2018). https://arxiv.org/abs/1805.08974
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 16
Without tuning
ResNet outperforms NASNet (SOTA)
Evaluation 1
• Do pre-trained ImageNet model performance
correlate with Image recommendation
performance?
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 17
Ugallery Data and Evaluation
• 1,371 users / 3,940 items / 2,846 transactions
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 18
Recommendation
• Scoring items based on cosine similarity between
user model and item model:
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 19
Experiment 1: Results
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 21
• No correlation between ImageNet performance and
image recommendation performance.
Experiment 2
• What is the effect of fine-tuning?
• How should fine-tuning be performed?
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 22
Tuning I: Shallow vs. Deep
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 23
Shallow fine-tuning
Tuning I: Shallow vs. Deep
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 24
Deep fine tuning
Learning: Multitask vs. Single Task
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 25
• Dataset 1: Omniart
– 432,217 images
– Target classes: artist, artwork type, year
• Dataset 2: Ugallery
– 3,940 images
– Target classes: artist, medium (oil, acrylic, etc.)
Results 1
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 27
Deep fine-tuning worked better than shallow fine tuning
Results 2
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 28
• ResNet was better than shallow fine-tuning
• Consistent with Kornblith et al., ResNet is the best generic
visual feature extractor
Results 3
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 29
• Training with a smaller but focused target dataset results in better
transfer learning performance
Results 4
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 30
• There was not a clear winner between multitask and single task,
probably because the artist category is really descriptive
Conclusion
• Pre-trained neural image embeddings are great, but
do not assume that performance in the original task
is correlated with a current recommendation task.
• If you are still going to used a pre-trained Imagenet
visual embedding, ResNet is a good option,
although is not the current SOTA in ILSVRC.
• Fine-tuning is strongly suggested, even if your
dataset is small,
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 31