Successfully reported this slideshow.
Your SlideShare is downloading. ×

Unsupervised visual representation learning overview: Toward Self-Supervision

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
2019 cvpr paper_overview
2019 cvpr paper_overview
Loading in …3
×

Check these out next

1 of 20 Ad
Advertisement

More Related Content

Slideshows for you (20)

Similar to Unsupervised visual representation learning overview: Toward Self-Supervision (20)

Advertisement

More from LEE HOSEONG (17)

Recently uploaded (20)

Advertisement

Unsupervised visual representation learning overview: Toward Self-Supervision

  1. 1. 17th November, 2019 PR12 Paper Review Ho Seong Lee Cognex + SUALAB Unsupervised Visual Representation Learning Overview “Toward Self-Supervision”
  2. 2. Contents • What is “Self-Supervision”? • Self-Supervised Visual Representation Learning • Exemplar • Relative Patch Location • Jigsaw Puzzles • Count • Multi-task • Rotation • Autoencoder Base
  3. 3. What is “Self-Supervision”? • Supervised Learning is powerful, but need a large amount of labeled data • many research for tackling this problem is in progress. • Transfer learning, Domain adaptation, Semi-supervised, Weakly-supervised and Unsupervised Learning • Self-Supervised Visual Representation Learning • Sub-class of Unsupervised learning where the data provides the “self-supervision” • Define pretext tasks which can be formulated using only unlabeled data, but do require higher-level semantic understanding in order to solved • The features obtained with pretext tasks can be successfully transferred to classification, detection task
  4. 4. What is “Self-Supervision”? • Pretext task in Self-Supervised Visual Representation Learning • Exemplar, 2014 NIPS • Relative Patch Location, 2015 ICCV • Jigsaw Puzzles, 2016 ECCV • Autoencoder Base Approaches - Denoising Autoencoder(2008), Context Autoencoder(2016), Colorization(2016), Split-brain Autoencoders(2017) • Count, 2017 ICCV • Multi-task, 2017 ICCV • Rotation, 2018 ICLR
  5. 5. Self-Supervised Visual Representation Learning – Exemplar • ”Discriminative unsupervised feature learning with exemplar convolutional neural networks”, 2014 NIPS • Randomly sample 𝑁 ∈ [50, 32000] patches of size 32x32 from different images • Apply a various transformations to a randomly sampled “seed” image patch • Train to classify these exemplars as same class → cannot be scalable to large datasets! Seed Transformations Regions containing considerable gradients Train with STL-10 dataset (96x96)
  6. 6. Self-Supervised Visual Representation Learning – Relative Patch Location • “Unsupervised Visual Representation Learning by Context Prediction”, 2015 ICCV • Aim to Self-Supervised Learning for image data using context prediction • The algorithm must guess the position of one patch relative to the other
  7. 7. Self-Supervised Visual Representation Learning – Relative Patch Location • “Unsupervised Visual Representation Learning by Context Prediction”, 2015 ICCV • AlexNet based architecture for pair classification • Avoid “trivial” solutions using two precautions: Include a gap, Randomly jitter Shared weight Include a gap between patches Randomly jitter each patch location
  8. 8. Self-Supervised Visual Representation Learning – Jigsaw Puzzles • “Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles”, 2016 ECCV • Recover relative spatial position of 9 randomly sampled image patches after random permutation • 9! = 362,880 permutations, so remove similar permutations → use predefined permutation set (100) • Network output is 100-d vector that predicts a permutation index Permutation 9, 5, 8, 3, 2, 4, 7, 1, 6 Sample image Extract 9 patches Permutate 9 patches Index (0~99) 61
  9. 9. Self-Supervised Visual Representation Learning – Jigsaw Puzzles • “Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles”, 2016 ECCV • Propose the context-free network(CFN), a Siamese-ennead CNN • Fewer parameters than AlexNet while preserving the same semantic learning capabilities
  10. 10. Self-Supervised Visual Representation Learning – Autoencoder-Base Approaches • Autoencoder-Base Approaches • Denoising Autoencoder, Context Autoencoder, Colorization, Split-brain Autoencoders • Learn image features from reconstructing images without any annotation Context Autoencoder Image ColorizationDenoising Autoencoder Split-Brain Autoencoder Random noise
  11. 11. Self-Supervised Visual Representation Learning – Count • “Representation Learning by Learning to Count”, 2017 ICCV • The number of visual primitives in the whole image should match the sum of those in each tile • Also, a feature that counts visual primitives should not be affected by scale, translation and rotation • In this work, use downsampling(D) and tiling(𝑇𝑗, j=1, 2, 3, 4) * This values are not label, just example for explanation!!!!
  12. 12. Self-Supervised Visual Representation Learning – Count • “Representation Learning by Learning to Count”, 2017 ICCV • The feature vector(counting vector) is used for calculating loss • Use 𝒍 𝟐 𝒍𝒐𝒔𝒔 , but all feature vectors = 0 → trivial solution, so use 𝒄𝒐𝒏𝒕𝒓𝒂𝒔𝒕𝒊𝒗𝒆 𝒍𝒐𝒔𝒔 • Enforce that the counting features should be different between two randomly chosen different images
  13. 13. Self-Supervised Visual Representation Learning – Multi-task • “Multi-task Self-Supervised Visual Learning”, 2017 ICCV • Implement four different self-supervision methods and make one single neural network • Relative Patch Location + Colorization + Exemplar + Motion Segmentation • Evaluate for ImageNet (Classification), PASCAL VOC 2007 (Detection), NYU V2 (Depth Prediction)
  14. 14. Self-Supervised Visual Representation Learning – Rotations • “Unsupervised representation learning by predicting image rotations”, 2018 ICLR • Rotate a single image and classify the rotation which was applied – {0°, 90°, 180°, 270°} • Intuitively, a good model should learn to recognize canonical orientations of objects in natural images
  15. 15. Self-Supervised Visual Representation Learning • Task Generalization of Self-Supervised Learning: ImageNet classification • All unsupervised methods are pre-trained on ImageNet without labels (unsupervised way) • All weights are frozen and feature maps are spatially resized so as to have around 9000 elements • Train linear classifiers on top of the feature maps of each layer by logistic regression • All approaches use AlexNet variants Pre-train with self-supervision
  16. 16. Self-Supervised Visual Representation Learning • Task Generalization of Self-Supervised Learning: ImageNet classification • SGD with batch size 192, momentum 0.9, weight decay 5e-4, learning rate 0.01 • Learning decay by a factor of 10 after epochs 10, 20 / train in total for 30 epochs ImageNet top-1 classification Self- Supervised
  17. 17. Self-Supervised Visual Representation Learning • Task & Dataset Generalization of Self-Supervised Learning: PASCAL VOC • PASCAL VOC 2007 classification, detection and PASCAL VOC 2012 segmentation • All unsupervised methods are pre-trained on ImageNet without labels (unsupervised way) Self- Supervised finetuning
  18. 18. Self-Supervised Visual Representation Learning • Recent papers not covered today.. • Deep Cluster (2018, ECCV) • Revisiting Self-Supervised Visual Representation Learning (2019, CVPR) • Selfie (2019, arXiv) • Deeper Cluster (2019, ICCV) • S4L (2019, ICCV) Deeper ClusterDeep Cluster
  19. 19. Self-Supervised Visual Representation Learning • Summary • Define pretext tasks which can be formulated using only unlabeled data, but do require higher-level semantic understanding in order to solved • Pre-train feature extractor and transfer to downstream task (classification, detection, etc.) pretext tasks
  20. 20. Thanks!

×