Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Do Better ImageNet Models Transfer Better... for Image Recommendation?

300 views

Published on

Article presented at the RecSysKTL workshop, co-located at ACM RecSys 2018

Published in: Education
  • Be the first to comment

  • Be the first to like this

Do Better ImageNet Models Transfer Better... for Image Recommendation?

  1. 1. Do Better ImageNet Models Transfer Better … for Image Recommendation? FelipedelRío,PabloMessina,VicenteDominguez,DenisParra CS Department Schoolof Engineering PontificiaUniversidadCatólicadeChile KTLRecSysWorkshop,6de Octubrede 2018
  2. 2. Artwork Recommendation • Online artwork market: Growing since 2008, despite global crises! – In 2011, art received $11.57 billion in totalglobal annual revenue, over $2 billion versus 2010 (*forbes) • Previous recommendation projects date for as long as 2007, such as the CHIP project to recommend paintings from Rijksmuseum. • Little use of recent advances in Deep Neural Networks for Computer Vision. October 6th, 2018 del Rio et al ~ RecSysKTL 2018 2 [forbes] The World’s Strongest Economy? the Global Art Market. https://www.forbes.com/sites/abigailesman/2012/02/29/the- worlds- strongest- economy- the- global- art- market/ (2012)
  3. 3. Image Recommendation • Since 2017 we have been working on recommending art images, using data from the online store UGallery. • Two papers published: – DLRS 2017: Dominguez,V., Messina, P., Parra,D., Mery,D., Trattner, C., & Soto,A. (2017, August). ComparingNeural and Attractiveness- based Visual Features for ArtworkRecommendation.In Proceedingsof the 2nd WorkshoponDeep Learning for RecommenderSystems(pp. 55-59). ACM. – UMUAI2018: Messina, P., Dominguez,V., Parra,D., Trattner,C., & Soto, A. (2018). Content-basedartworkrecommendation:integrating painting metadata with neural and manually-engineeredvisualfeatures. User Modelingand User-AdaptedInteraction,1-40. October 6th, 2018 del Rio et al ~ RecSysKTL 2018 3
  4. 4. Data: UGallery • Online Artwork Store, based on CA, USA. • Mostly sales one-of-a-kind physical artwork. October 6th, 2018 del Rio et al ~ RecSysKTL 2018 4
  5. 5. Image Recommendation • Our top approach is a hybrid recommender, based on metadata and visual features from Deep Convolutional Neural Networks. October 6th, 2018 del Rio et al ~ RecSysKTL 2018 5
  6. 6. Motivation • When submitting our work we usually received criticism for not using the latest DNN model. • An actual review from a previous article submission (2017): << Overall an interesting paper although … the choice of AlexNet is rather odd as there are better pre-trained networks available e.g. VGG16 >> October 6th, 2018 del Rio et al ~ RecSysKTL 2018 6
  7. 7. Motivation • Is it always the case that better pre-trained deep convolutional models (on the Imagenet Challenge) produce better results in a transfer learning setting? October 6th, 2018 del Rio et al ~ RecSysKTL 2018 7
  8. 8. ImageNet: Crowdsourcing a Large Dataset of Image Labels
  9. 9. Datasets in Computer Vision • 1996: faces and cars 14,000 images of 10,000 people • 1998: MNIST 70,000 images of handwritten digits • 2004: Caltech 101, 9,146 images of 101 categories • 2005: PASCAL VOC 20,000 images with 20 classes October 6th, 2018 del Rio et al ~ RecSysKTL 2018 9
  10. 10. Datasets in Computer Vision • Imagenet: Presented in 2009 at CVPR • Crowdsourced • 14,197,122 images • 21,841 categories (non-empty synsets) • Categories based on WordNet taxonomy October 6th, 2018 del Rio et al ~ RecSysKTL 2018 10
  11. 11. WordNet • Wordnet: Miller’s project started in 1980 at Princeton, a hierarchy for the English language • Prof. Fei-Fei Li (UIUC, Princeton, Stanford), worked on filling WordNet with many images. October 6th, 2018 del Rio et al ~ RecSysKTL 2018 11
  12. 12. Crowdsourced • Amazon Mechanical Turk • It took 2.5 years to complete. Originally 3.2 million images in 5,247 categories (mammal, vehicle, etc.) October 6th, 2018 del Rio et al ~ RecSysKTL 2018 12
  13. 13. ImageNet Challenge • The dataset was used to set a competition for image classification: from 2010 on. • In 2012 a team used deep learning, got error rate below 25% (Hinton et al.), 10.8 point margin, 41% better than next best. October 6th, 2018 del Rio et al ~ RecSysKTL 2018 13
  14. 14. Transfer Learning • 2012 model was called AlexNet: a Convolutional Neural Network • The features learned (fc6, fc7) have been used in succesfully, allowing to transfer the learning to other tasks. October 6th, 2018 del Rio et al ~ RecSysKTL 2018 14
  15. 15. Recent Imagenet results https://github.com/tensorflow/models/tree/master/research/slim#pre-trained-models October 6th, 2018 del Rio et al ~ RecSysKTL 2018 15 Method Top-1 Accuracy Top-5 Accuracy NASNet Large 82.7 96.2 InceptionResNetV2 80.4 95.3 InceptionV3 78.0 93.9 ResNet50 75.6 92.8 VGG19 71.1 89.8
  16. 16. Inspiration • Simon Kornblith, Jonathon Shlens, and Quoc V. Le. 2018. Do Better ImageNetModels Transfer Better? (2018). https://arxiv.org/abs/1805.08974 October 6th, 2018 del Rio et al ~ RecSysKTL 2018 16 Without tuning ResNet outperforms NASNet (SOTA)
  17. 17. Evaluation 1 • Do pre-trained ImageNet model performance correlate with Image recommendation performance? October 6th, 2018 del Rio et al ~ RecSysKTL 2018 17
  18. 18. Ugallery Data and Evaluation • 1,371 users / 3,940 items / 2,846 transactions October 6th, 2018 del Rio et al ~ RecSysKTL 2018 18
  19. 19. Recommendation • Scoring items based on cosine similarity between user model and item model: October 6th, 2018 del Rio et al ~ RecSysKTL 2018 19
  20. 20. Experiment 1: Results October 6th, 2018 del Rio et al ~ RecSysKTL 2018 20
  21. 21. Experiment 1: Results October 6th, 2018 del Rio et al ~ RecSysKTL 2018 21 • No correlation between ImageNet performance and image recommendation performance.
  22. 22. Experiment 2 • What is the effect of fine-tuning? • How should fine-tuning be performed? October 6th, 2018 del Rio et al ~ RecSysKTL 2018 22
  23. 23. Tuning I: Shallow vs. Deep October 6th, 2018 del Rio et al ~ RecSysKTL 2018 23 Shallow fine-tuning
  24. 24. Tuning I: Shallow vs. Deep October 6th, 2018 del Rio et al ~ RecSysKTL 2018 24 Deep fine tuning
  25. 25. Learning: Multitask vs. Single Task October 6th, 2018 del Rio et al ~ RecSysKTL 2018 25 • Dataset 1: Omniart – 432,217 images – Target classes: artist, artwork type, year • Dataset 2: Ugallery – 3,940 images – Target classes: artist, medium (oil, acrylic, etc.)
  26. 26. Omniart Dataset • http://isis-data.science.uva.nl/strezoski/#3 October 6th, 2018 del Rio et al ~ RecSysKTL 2018 26
  27. 27. Results 1 October 6th, 2018 del Rio et al ~ RecSysKTL 2018 27 Deep fine-tuning worked better than shallow fine tuning
  28. 28. Results 2 October 6th, 2018 del Rio et al ~ RecSysKTL 2018 28 • ResNet was better than shallow fine-tuning • Consistent with Kornblith et al., ResNet is the best generic visual feature extractor
  29. 29. Results 3 October 6th, 2018 del Rio et al ~ RecSysKTL 2018 29 • Training with a smaller but focused target dataset results in better transfer learning performance
  30. 30. Results 4 October 6th, 2018 del Rio et al ~ RecSysKTL 2018 30 • There was not a clear winner between multitask and single task, probably because the artist category is really descriptive
  31. 31. Conclusion • Pre-trained neural image embeddings are great, but do not assume that performance in the original task is correlated with a current recommendation task. • If you are still going to used a pre-trained Imagenet visual embedding, ResNet is a good option, although is not the current SOTA in ILSVRC. • Fine-tuning is strongly suggested, even if your dataset is small, October 6th, 2018 del Rio et al ~ RecSysKTL 2018 31
  32. 32. THANKS! dparra@ing.puc.cl

×