Do Better ImageNet Models Transfer Better... for Image Recommendation?

Do Better ImageNet Models Transfer
Better … for Image Recommendation?
FelipedelRío,PabloMessina,VicenteDominguez,DenisParra
CS Department
Schoolof Engineering
PontificiaUniversidadCatólicadeChile
KTLRecSysWorkshop,6de Octubrede 2018

Artwork Recommendation
• Online artwork market: Growing since 2008, despite
global crises!
– In 2011, art received $11.57 billion in totalglobal annual
revenue, over $2 billion versus 2010 (*forbes)
• Previous recommendation projects date for as long as
2007, such as the CHIP project to recommend paintings
from Rijksmuseum.
• Little use of recent advances in Deep Neural Networks
for Computer Vision.
October 6th, 2018 del Rio et al ~ RecSysKTL 2018 2
[forbes] The World’s Strongest Economy? the Global Art Market. https://www.forbes.com/sites/abigailesman/2012/02/29/the- worlds- strongest-
economy- the- global- art- market/ (2012)

Image Recommendation
• Since 2017 we have been working on
recommending art images, using data from the
online store UGallery.
• Two papers published:
– DLRS 2017: Dominguez,V., Messina, P., Parra,D., Mery,D., Trattner,
C., & Soto,A. (2017, August). ComparingNeural and Attractiveness-
based Visual Features for ArtworkRecommendation.In Proceedingsof
the 2nd WorkshoponDeep Learning for RecommenderSystems(pp. 55-59).
ACM.
– UMUAI2018: Messina, P., Dominguez,V., Parra,D., Trattner,C., &
Soto, A. (2018). Content-basedartworkrecommendation:integrating
painting metadata with neural and manually-engineeredvisualfeatures.
User Modelingand User-AdaptedInteraction,1-40.

Data: UGallery
• Online Artwork Store, based on CA, USA.
• Mostly sales one-of-a-kind physical artwork.

Image Recommendation
• Our top approach is a hybrid recommender, based
on metadata and visual features from Deep
Convolutional Neural Networks.

Motivation
• When submitting our work we usually received
criticism for not using the latest DNN model.
• An actual review from a previous article submission
(2017):
<< Overall an interesting paper although … the
choice of AlexNet is rather odd as there are better
pre-trained networks available e.g. VGG16 >>

Motivation
• Is it always the case that better pre-trained deep
convolutional models (on the Imagenet Challenge)
produce better results in a transfer learning setting?

ImageNet:
Crowdsourcing a Large Dataset
of Image Labels

Datasets in Computer Vision
• 1996: faces and cars 14,000 images of 10,000 people
• 1998: MNIST 70,000 images of handwritten digits
• 2004: Caltech 101, 9,146 images of 101 categories
• 2005: PASCAL VOC 20,000 images with 20 classes

Datasets in Computer Vision
• Imagenet: Presented in 2009 at CVPR
• Crowdsourced
• 14,197,122 images
• 21,841 categories (non-empty synsets)
• Categories based on WordNet taxonomy

WordNet
• Wordnet: Miller’s project started in 1980 at
Princeton, a hierarchy for the English language
• Prof. Fei-Fei Li (UIUC, Princeton, Stanford),
worked on filling WordNet with many images.

Crowdsourced
• Amazon Mechanical Turk
• It took 2.5 years to complete. Originally 3.2 million
images in 5,247 categories (mammal, vehicle, etc.)

ImageNet Challenge
• The dataset was used to
set a competition for
image classification:
from 2010 on.
• In 2012 a team used
deep learning, got error
rate below 25% (Hinton
et al.), 10.8 point
margin, 41% better than
next best.

Transfer Learning
• 2012 model was called AlexNet: a Convolutional
Neural Network
• The features learned (fc6, fc7) have been used in
succesfully, allowing to transfer the learning to
other tasks.

Recent Imagenet results
https://github.com/tensorflow/models/tree/master/research/slim#pre-trained-models
Method Top-1 Accuracy Top-5 Accuracy
NASNet Large 82.7 96.2
InceptionResNetV2 80.4 95.3
InceptionV3 78.0 93.9
ResNet50 75.6 92.8
VGG19 71.1 89.8

Inspiration
• Simon Kornblith, Jonathon Shlens, and Quoc V. Le.
2018. Do Better ImageNetModels Transfer Better?
(2018). https://arxiv.org/abs/1805.08974
Without tuning
ResNet outperforms NASNet (SOTA)

Evaluation 1
• Do pre-trained ImageNet model performance
correlate with Image recommendation
performance?

Ugallery Data and Evaluation
• 1,371 users / 3,940 items / 2,846 transactions

Recommendation
• Scoring items based on cosine similarity between
user model and item model:

Experiment 1: Results

Experiment 1: Results
• No correlation between ImageNet performance and
image recommendation performance.

Experiment 2
• What is the effect of fine-tuning?
• How should fine-tuning be performed?

Tuning I: Shallow vs. Deep
Shallow fine-tuning

Tuning I: Shallow vs. Deep
Deep fine tuning

Learning: Multitask vs. Single Task
• Dataset 1: Omniart
– 432,217 images
– Target classes: artist, artwork type, year
• Dataset 2: Ugallery
– 3,940 images
– Target classes: artist, medium (oil, acrylic, etc.)

Omniart Dataset
• http://isis-data.science.uva.nl/strezoski/#3

Results 1
Deep fine-tuning worked better than shallow fine tuning

Results 2
• ResNet was better than shallow fine-tuning
• Consistent with Kornblith et al., ResNet is the best generic
visual feature extractor

Results 3
• Training with a smaller but focused target dataset results in better
transfer learning performance

Results 4
• There was not a clear winner between multitask and single task,
probably because the artist category is really descriptive

Conclusion
• Pre-trained neural image embeddings are great, but
do not assume that performance in the original task
is correlated with a current recommendation task.
• If you are still going to used a pre-trained Imagenet
visual embedding, ResNet is a good option,
although is not the current SOTA in ILSVRC.
• Fine-tuning is strongly suggested, even if your
dataset is small,

Do Better ImageNet Models Transfer Better... for Image Recommendation?

Recommended

Recommended

More Related Content

Similar to Do Better ImageNet Models Transfer Better... for Image Recommendation?

Similar to Do Better ImageNet Models Transfer Better... for Image Recommendation? (20)

More from Denis Parra Santander

More from Denis Parra Santander (15)

Recently uploaded

Recently uploaded (20)

Do Better ImageNet Models Transfer Better... for Image Recommendation?