SlideShare a Scribd company logo
1 of 32
Sangmin Woo
School of Electrical Engineering and Computer Science
Gwangju Institute of Science and Technology (GIST)
2021.06.23
2 / 32
In a nutshell…
• Siamese networks [1] maximize the similarity between two augmentations of
one image, subject to certain conditions for avoiding collapsing solutions
• Collapsing solutions do exist for the loss and structure, but a stop-gradient
operation plays an essential role in preventing collapsing
• A simple Siamese networks (SimSiam) shows surprising results without using:
• Negative sample pairs
• Large batches
• Momentum encoders
[1] Koch, G., Zemel, R., & Salakhutdinov, R. (2015, July). Siamese neural networks for one-shot image recognition. In ICML deep learning workshop (Vol. 2).
3 / 32
In a nutshell…
• Siamese networks [1] maximize the similarity between two augmentations of
one image, subject to certain conditions for avoiding collapsing solutions
• Collapsing solutions do exist for the loss and structure, but a stop-gradient
operation plays an essential role in preventing collapsing
• A simple Siamese networks (SimSiam) shows surprising results without using:
• Negative sample pairs
• Large batches
• Momentum encoders
[1] Koch, G., Zemel, R., & Salakhutdinov, R. (2015, July). Siamese neural networks for one-shot image recognition. In ICML deep learning workshop (Vol. 2).
Siamese networks are weight-sharing neural networks applied on two or more inputs.
They are natural tools for comparing entities.
4 / 32
In a nutshell…
• Siamese networks [1] maximize the similarity between two augmentations of
one image, subject to certain conditions for avoiding collapsing solutions
• Collapsing solutions do exist for the loss and structure, but a stop-gradient
operation plays an essential role in preventing collapsing
• A simple Siamese networks (SimSiam) shows surprising results without using:
• Negative sample pairs
• Large batches
• Momentum encoders
[1] Koch, G., Zemel, R., & Salakhutdinov, R. (2015, July). Siamese neural networks for one-shot image recognition. In ICML deep learning workshop (Vol. 2).
An undesired trivial solution to Siamese networks is
all outputs “collapsing” to a constant.
5 / 32
In a nutshell…
• Siamese networks [1] maximize the similarity between two augmentations of
one image, subject to certain conditions for avoiding collapsing solutions
• Collapsing solutions do exist for the loss and structure, but a stop-gradient
operation plays an essential role in preventing collapsing
• A simple Siamese networks (SimSiam) shows surprising results without using:
• Negative sample pairs
• Large batches
• Momentum encoders
[1] Koch, G., Zemel, R., & Salakhutdinov, R. (2015, July). Siamese neural networks for one-shot image recognition. In ICML deep learning workshop (Vol. 2).
stop the gradient from flowing backward
Tensorflow: stop_gradient / PyTorch: detach
6 / 32
In a nutshell…
• Siamese networks [1] maximize the similarity between two augmentations of
one image, subject to certain conditions for avoiding collapsing solutions
• Collapsing solutions do exist for the loss and structure, but a stop-gradient
operation plays an essential role in preventing collapsing
• A simple Siamese networks (SimSiam) shows surprising results without using:
• Negative sample pairs
• Large batches
• Momentum encoders
[1] Koch, G., Zemel, R., & Salakhutdinov, R. (2015, July). Siamese neural networks for one-shot image recognition. In ICML deep learning workshop (Vol. 2).
Typical strategies for preventing Collapsing
7 / 32
Siamese Networks
8 / 32
Siamese Networks
9 / 32
Siamese Networks
• Recent methods define the inputs as two augmentations of one image, and
maximize the similarity subject to different conditions.
• Siamese networks can naturally introduce inductive biases for modeling
“invariance”
• Convolution is a successful inductive bias via weight-sharing for modeling
translation-invariance
• Weight-sharing Siamese networks can model augmentation-invariance
10 / 32
“Collapsing” Preventing Strategies
• Contrastive Learning
• Repulses different images (negative pairs) while attracting the same image’s two
views (positive pairs)
• SimCLR [2] (Google; cited by ~1500)
• Online Clustering
• SwAV [3] (FAIR; cited by ~200)
• Momentum Encoder
• MoCO [4] (FAIR; cited by ~1250), BYOL [5] (Deepmind; cited by ~350)
[2] Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020, November). A simple framework for contrastive learning of visual representations. In International
conference on machine learning (pp. 1597-1607). PMLR.
[3] Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., & Joulin, A. (2020). Unsupervised learning of visual features by contrasting cluster
assignments. arXiv preprint arXiv:2006.09882.
[4] He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition (pp. 9729-9738).
[5] Grill, J. B., Strub, F., Altché, F., Tallec, C., Richemond, P. H., Buchatskaya, E., ... & Valko, M. (2020). Bootstrap your own latent: A new approach to self-
supervised learning. arXiv preprint arXiv:2006.07733.
11 / 32
Contrastive Learning
• SimCLR [2]
[2] Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020, November). A simple framework for contrastive learning of visual representations. In International
conference on machine learning (pp. 1597-1607). PMLR.
12 / 32
Clustering
• SwAV [3]
[3] Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., & Joulin, A. (2020). Unsupervised learning of visual features by contrasting cluster
assignments. arXiv preprint arXiv:2006.09882.
13 / 32
Momentum Encoder
• MoCO [4]
• Only the parameters 𝜃𝑞 are updated by back-propagation
• The momentum update makes 𝜃𝑘 evolve more smoothly than 𝜃𝑞
• As a result, though the keys in the queue are encoded by different encoders (in
different mini-batches), the difference among these encoders can be made small.
[4] He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition (pp. 9729-9738).
where,
Momentum update rule:
14 / 32
Momentum Encoder
[5] Grill, J. B., Strub, F., Altché, F., Tallec, C., Richemond, P. H., Buchatskaya, E., ... & Valko, M. (2020). Bootstrap your own latent: A new approach to self-
supervised learning. arXiv preprint arXiv:2006.07733.
• BYOL [5]
• Relies only on positive pairs but it does not collapse in case a momentum encoder
is used
• BYOL minimizes a similarity loss between 𝑞𝜃(𝑧𝜃) and 𝑠𝑔 𝑧𝜉
’
• where 𝜃 are the trained weights, 𝜉 are an exponential moving average of 𝜃 and 𝑠𝑔
means stop-gradient
• At the end of training, everything but 𝑓𝜃 is discarded, and 𝑦𝜃 is used as the image
representation
15 / 32
SimSiam
• SimSiam works surprisingly well without any “collapsing” preventing
strategies:
• Negative pairs
• Momentum encoder
• Large-batch training
16 / 32
SimSiam
• SimSiam works surprisingly well without any “collapsing” preventing
strategies:
• Negative pairs
• Momentum encoder
• Large-batch training
: Stop the gradient flow
Not updated by
its parameters’ gradients
Weight
Shared
17 / 32
SimSiam
• “SimSiam” = “BYOL without Momentum Encoder”
• Hypothesis of factors involved in preventing “collapsing”
• BYOL: momentum encoder
• SimSiam: Stop-gradient operation
• This discovery can be obscured with the usage of a momentum encoder, which is
always accompanied with stop-gradient (as it is not updated by its parameters’
gradients).
• While the moving-average behavior may improve accuracy with an appropriate
momentum coefficient, experiments show that momentum encoder is not directly
related to preventing collapsing.
18 / 32
SimSiam
19 / 32
SimSiam
20 / 32
SimSiam
21 / 32
Empirical Study
• with vs. without Stop-gradient
• (left) Without stop-gradient, the optimizer quickly finds a degenerated solution and
reaches the minimum possible loss of −𝟏
• (mid) If the outputs collapse to a constant vector, their standard deviation over all
samples becomes zero for each channel
• (right) validation accuracy of kNN classifier can serve as a monitor of the progress.
With stop-gradient, kNN monitor shows a steadily improving accuracy.
• (table) Linear evaluation result
22 / 32
Discussion
• Stop-gradient
• Experiments empirically show there exist collapsing solutions.
• However, a stop-gradient operation can play a key role in preventing such
solutions.
• The optimizer (batch size), batch normalization, similarity function, and
symmetrization may affect accuracy, but we have seen no evidence that they are
related to collapse prevention.
• The importance of stop-gradient suggests that there should be a different
underlying optimization problem that is being solved.
• The authors hypothesize that there are implicitly two sets of variables, and
SimSiam behaves like alternating between optimizing each set.
23 / 32
Hypothesis
• “Siasiam” = “Expectation-Maximization” like algorithm
• It implicitly involves two sets of variables, and solves two underlying sub-problems
• Loss function
• ℱ: network parameterized by 𝜃
• 𝒯: augmentation
• 𝑥: image
• 𝔼: expectation is over the distribution of images (𝑥) and augmentations (𝒯)
• ⋅ 2
2
: mean squared error (= 𝑙2-normalized cosine similarity)
• 𝜂𝑥: representation of the image 𝑥 (another set of variables)
24 / 32
Hypothesis
• Objective:
• Where,
• This formulation is analogous to k-means clustering
• Variable 𝜃 is analogous to the clustering centers
• i.e., learnable parameters of an encoder
• Variable 𝜂𝑥 is analogous to the assignment vector of the sample 𝒙 (one-hot vector
in k-means)
• i.e., representation of 𝑥
25 / 32
Hypothesis
• Solve by an alternating algorithm
• Fixing one set of variables and solving for the other set:
• 𝑡: index of alternation
• Solving for 𝜽 (clustering centers): SGD can be used to solve the (1)
• stop-gradient operation is a natural consequence, because the gradient does not back-
propagate to 𝜂𝑡−1
which is a constant in this sub-problem.
• Solving for 𝜼 (assignment vectors): (2) can be solved independently for each 𝜂𝑥.
Now the problem is to minimize: for each image 𝑥.
• 𝜂𝑥 is assigned with the average representation of 𝑥 over the distribution of augmentation
⋯ (1)
⋯ (2)
𝒯
26 / 32
Hypothesis
• One-step alternation
• This can be approximated by sampling the augmentation 𝒯 only once, denoted
as 𝒯′, and ignoring 𝔼𝒯[⋅]:
• Inserting it into the , we have:
• 𝜽𝒕
is a constant in this sub-problem, and 𝓣′
implies another view due to its
random nature.
27 / 32
Proof-of-concept
• Multi-step alternation
• Hypothesis: “SimSiam algorithm is like alternating between (1) and (2)”
• In this variant, 𝒕 is treated as the index of an outer loop
• And the sub-problem in (1) is updated by an inner loop of 𝒌 SGD steps
• In each alternation, 𝜂𝑥 required for all 𝑘 SGD steps are pre-computed and cached in
memory. Then, 𝑘 SGD steps are performed to update 𝜃
• 1-step = SimSiam
• Alternating optimization is a valid formulation
⋯ (1)
⋯ (2)
28 / 32
Discussion
• SimSiam’s non-collapsing behavior remains as an open question…
• This is an authors’ understanding:
• Alternating optimization provides a different trajectory, and the trajectory depends
on the initialization
• It is unlikely that the initialized 𝜼, which is the output of a randomly initialized
network, would be a constant
• Starting from this initialization, it may be difficult for the alternating optimizer to
approach a constant 𝜼𝒙 for all 𝒙
• because the method does not compute the gradients w.r.t. 𝜼 jointly for all 𝒙
29 / 32
Comparisons
• ImageNet Linear Evaluation
• All are based on pre-trained ResNet-50, with two 224×224 views
• Small batch size
• No negative pairs
• No momentum encoder
• Achieves highest accuracy among all methods under 100-epoch pre-training
30 / 32
Comparisons
• Transfer Learning
• Representation qualities are compared by transferring them to other tasks
• The pre-trained models are fine-tuned end-to-end in the target datasets
• SimSiam, base: same pre-training recipe as in ImageNet experiment
• SimSiam, optimal: another recipe producing better results in all tasks
31 / 32
Takeaways
• Siamese networks are natural and effective tool for modeling invariance,
which is a focus of representation learning
• Siamese structure can be a core reason for recent methods’ success
• Stop-gradient helps to avoid collapsing solutions
Thank You
shmwoo9395@{gist.ac.kr, gmail.com}

More Related Content

What's hot

Action Recognitionの歴史と最新動向
Action Recognitionの歴史と最新動向Action Recognitionの歴史と最新動向
Action Recognitionの歴史と最新動向Ohnishi Katsunori
 
ConvNetの歴史とResNet亜種、ベストプラクティス
ConvNetの歴史とResNet亜種、ベストプラクティスConvNetの歴史とResNet亜種、ベストプラクティス
ConvNetの歴史とResNet亜種、ベストプラクティスYusuke Uchida
 
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
[DL輪読会]Few-Shot Unsupervised Image-to-Image TranslationDeep Learning JP
 
[DL輪読会]A Simple Unified Framework for Detecting Out-of-Distribution Samples a...
[DL輪読会]A Simple Unified Framework for Detecting Out-of-Distribution Samples a...[DL輪読会]A Simple Unified Framework for Detecting Out-of-Distribution Samples a...
[DL輪読会]A Simple Unified Framework for Detecting Out-of-Distribution Samples a...Deep Learning JP
 
On the Convergence of Adam and Beyond
On the Convergence of Adam and BeyondOn the Convergence of Adam and Beyond
On the Convergence of Adam and Beyondharmonylab
 
Transformerを用いたAutoEncoderの設計と実験
Transformerを用いたAutoEncoderの設計と実験Transformerを用いたAutoEncoderの設計と実験
Transformerを用いたAutoEncoderの設計と実験myxymyxomatosis
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorJinwon Lee
 
[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...
[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...
[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...Deep Learning JP
 
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...Deep Learning JP
 
深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習Masahiro Suzuki
 
Noisy Labels と戦う深層学習
Noisy Labels と戦う深層学習Noisy Labels と戦う深層学習
Noisy Labels と戦う深層学習Plot Hong
 
Generative Models(メタサーベイ )
Generative Models(メタサーベイ )Generative Models(メタサーベイ )
Generative Models(メタサーベイ )cvpaper. challenge
 
U-Net: Convolutional Networks for Biomedical Image Segmentationの紹介
U-Net: Convolutional Networks for Biomedical Image Segmentationの紹介U-Net: Convolutional Networks for Biomedical Image Segmentationの紹介
U-Net: Convolutional Networks for Biomedical Image Segmentationの紹介KCS Keio Computer Society
 
[DLHacks]StyleGANとBigGANのStyle mixing, morphing
[DLHacks]StyleGANとBigGANのStyle mixing, morphing[DLHacks]StyleGANとBigGANのStyle mixing, morphing
[DLHacks]StyleGANとBigGANのStyle mixing, morphingDeep Learning JP
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기NAVER Engineering
 
【論文読み会】BEiT_BERT Pre-Training of Image Transformers.pptx
【論文読み会】BEiT_BERT Pre-Training of Image Transformers.pptx【論文読み会】BEiT_BERT Pre-Training of Image Transformers.pptx
【論文読み会】BEiT_BERT Pre-Training of Image Transformers.pptxARISE analytics
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII
 
超音波動画内の正中神経セグメンテーションと手根管症候群推定推定
超音波動画内の正中神経セグメンテーションと手根管症候群推定推定超音波動画内の正中神経セグメンテーションと手根管症候群推定推定
超音波動画内の正中神経セグメンテーションと手根管症候群推定推定sugiuralab
 

What's hot (20)

Action Recognitionの歴史と最新動向
Action Recognitionの歴史と最新動向Action Recognitionの歴史と最新動向
Action Recognitionの歴史と最新動向
 
ConvNetの歴史とResNet亜種、ベストプラクティス
ConvNetの歴史とResNet亜種、ベストプラクティスConvNetの歴史とResNet亜種、ベストプラクティス
ConvNetの歴史とResNet亜種、ベストプラクティス
 
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
 
[DL輪読会]A Simple Unified Framework for Detecting Out-of-Distribution Samples a...
[DL輪読会]A Simple Unified Framework for Detecting Out-of-Distribution Samples a...[DL輪読会]A Simple Unified Framework for Detecting Out-of-Distribution Samples a...
[DL輪読会]A Simple Unified Framework for Detecting Out-of-Distribution Samples a...
 
On the Convergence of Adam and Beyond
On the Convergence of Adam and BeyondOn the Convergence of Adam and Beyond
On the Convergence of Adam and Beyond
 
Transformerを用いたAutoEncoderの設計と実験
Transformerを用いたAutoEncoderの設計と実験Transformerを用いたAutoEncoderの設計と実験
Transformerを用いたAutoEncoderの設計と実験
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
 
[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...
[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...
[DL輪読会]Progressive Growing of GANs for Improved Quality, Stability, and Varia...
 
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
[DL輪読会]Set Transformer: A Framework for Attention-based Permutation-Invariant...
 
深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習深層生成モデルを用いたマルチモーダル学習
深層生成モデルを用いたマルチモーダル学習
 
Noisy Labels と戦う深層学習
Noisy Labels と戦う深層学習Noisy Labels と戦う深層学習
Noisy Labels と戦う深層学習
 
Generative Models(メタサーベイ )
Generative Models(メタサーベイ )Generative Models(メタサーベイ )
Generative Models(メタサーベイ )
 
Image denoising
Image denoising Image denoising
Image denoising
 
U-Net: Convolutional Networks for Biomedical Image Segmentationの紹介
U-Net: Convolutional Networks for Biomedical Image Segmentationの紹介U-Net: Convolutional Networks for Biomedical Image Segmentationの紹介
U-Net: Convolutional Networks for Biomedical Image Segmentationの紹介
 
[DLHacks]StyleGANとBigGANのStyle mixing, morphing
[DLHacks]StyleGANとBigGANのStyle mixing, morphing[DLHacks]StyleGANとBigGANのStyle mixing, morphing
[DLHacks]StyleGANとBigGANのStyle mixing, morphing
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
 
【論文読み会】BEiT_BERT Pre-Training of Image Transformers.pptx
【論文読み会】BEiT_BERT Pre-Training of Image Transformers.pptx【論文読み会】BEiT_BERT Pre-Training of Image Transformers.pptx
【論文読み会】BEiT_BERT Pre-Training of Image Transformers.pptx
 
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
SSII2021 [OS2-03] 自己教師あり学習における対照学習の基礎と応用
 
超音波動画内の正中神経セグメンテーションと手根管症候群推定推定
超音波動画内の正中神経セグメンテーションと手根管症候群推定推定超音波動画内の正中神経セグメンテーションと手根管症候群推定推定
超音波動画内の正中神経セグメンテーションと手根管症候群推定推定
 
最新の異常検知手法(NIPS 2018)
最新の異常検知手法(NIPS 2018)最新の異常検知手法(NIPS 2018)
最新の異常検知手法(NIPS 2018)
 

Similar to Exploring Simple Siamese Representation Learning

Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaSangwoo Mo
 
Analog signal processing solution
Analog signal processing solutionAnalog signal processing solution
Analog signal processing solutioncsandit
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraJason Riedy
 
contrastive-learning2.pdf
contrastive-learning2.pdfcontrastive-learning2.pdf
contrastive-learning2.pdfomogire
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in VisionSangmin Woo
 
Garbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesGarbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesIRJET Journal
 
BIG DATA-DRIVEN FAST REDUCING THE VISUAL BLOCK ARTIFACTS OF DCT COMPRESSED IM...
BIG DATA-DRIVEN FAST REDUCING THE VISUAL BLOCK ARTIFACTS OF DCT COMPRESSED IM...BIG DATA-DRIVEN FAST REDUCING THE VISUAL BLOCK ARTIFACTS OF DCT COMPRESSED IM...
BIG DATA-DRIVEN FAST REDUCING THE VISUAL BLOCK ARTIFACTS OF DCT COMPRESSED IM...IJDKP
 
IRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET Journal
 
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATIONMAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATIONVLSICS Design
 
Representational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningRepresentational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningMLAI2
 
AN ENHANCED EDGE ADAPTIVE STEGANOGRAPHY APPROACH USING THRESHOLD VALUE FOR RE...
AN ENHANCED EDGE ADAPTIVE STEGANOGRAPHY APPROACH USING THRESHOLD VALUE FOR RE...AN ENHANCED EDGE ADAPTIVE STEGANOGRAPHY APPROACH USING THRESHOLD VALUE FOR RE...
AN ENHANCED EDGE ADAPTIVE STEGANOGRAPHY APPROACH USING THRESHOLD VALUE FOR RE...ijcsa
 
M ESH S IMPLIFICATION V IA A V OLUME C OST M EASURE
M ESH S IMPLIFICATION V IA A V OLUME C OST M EASUREM ESH S IMPLIFICATION V IA A V OLUME C OST M EASURE
M ESH S IMPLIFICATION V IA A V OLUME C OST M EASUREijcga
 
A Face Recognition Using Linear-Diagonal Binary Graph Pattern Feature Extract...
A Face Recognition Using Linear-Diagonal Binary Graph Pattern Feature Extract...A Face Recognition Using Linear-Diagonal Binary Graph Pattern Feature Extract...
A Face Recognition Using Linear-Diagonal Binary Graph Pattern Feature Extract...ijfcstjournal
 
A FACE RECOGNITION USING LINEAR-DIAGONAL BINARY GRAPH PATTERN FEATURE EXTRACT...
A FACE RECOGNITION USING LINEAR-DIAGONAL BINARY GRAPH PATTERN FEATURE EXTRACT...A FACE RECOGNITION USING LINEAR-DIAGONAL BINARY GRAPH PATTERN FEATURE EXTRACT...
A FACE RECOGNITION USING LINEAR-DIAGONAL BINARY GRAPH PATTERN FEATURE EXTRACT...ijfcstjournal
 
A FACE RECOGNITION USING LINEAR-DIAGONAL BINARY GRAPH PATTERN FEATURE EXTRACT...
A FACE RECOGNITION USING LINEAR-DIAGONAL BINARY GRAPH PATTERN FEATURE EXTRACT...A FACE RECOGNITION USING LINEAR-DIAGONAL BINARY GRAPH PATTERN FEATURE EXTRACT...
A FACE RECOGNITION USING LINEAR-DIAGONAL BINARY GRAPH PATTERN FEATURE EXTRACT...ijfcstjournal
 
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...Taiji Suzuki
 
Performance analysis on color image mosaicing techniques on FPGA
Performance analysis on color image mosaicing techniques on FPGAPerformance analysis on color image mosaicing techniques on FPGA
Performance analysis on color image mosaicing techniques on FPGAIJECEIAES
 
An Approach for Iris Recognition Based On Singular Value Decomposition and Hi...
An Approach for Iris Recognition Based On Singular Value Decomposition and Hi...An Approach for Iris Recognition Based On Singular Value Decomposition and Hi...
An Approach for Iris Recognition Based On Singular Value Decomposition and Hi...paperpublications3
 
MINR: Implicit Neural Representations with Masked Image Modelling (ICCV '23 O...
MINR: Implicit Neural Representations with Masked Image Modelling (ICCV '23 O...MINR: Implicit Neural Representations with Masked Image Modelling (ICCV '23 O...
MINR: Implicit Neural Representations with Masked Image Modelling (ICCV '23 O...Joonhun Lee
 

Similar to Exploring Simple Siamese Representation Learning (20)

Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
 
Analog signal processing solution
Analog signal processing solutionAnalog signal processing solution
Analog signal processing solution
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
contrastive-learning2.pdf
contrastive-learning2.pdfcontrastive-learning2.pdf
contrastive-learning2.pdf
 
Transformer in Vision
Transformer in VisionTransformer in Vision
Transformer in Vision
 
Garbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesGarbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning Techniques
 
The Importance of Time in Visual Attention Models
The Importance of Time in Visual Attention ModelsThe Importance of Time in Visual Attention Models
The Importance of Time in Visual Attention Models
 
BIG DATA-DRIVEN FAST REDUCING THE VISUAL BLOCK ARTIFACTS OF DCT COMPRESSED IM...
BIG DATA-DRIVEN FAST REDUCING THE VISUAL BLOCK ARTIFACTS OF DCT COMPRESSED IM...BIG DATA-DRIVEN FAST REDUCING THE VISUAL BLOCK ARTIFACTS OF DCT COMPRESSED IM...
BIG DATA-DRIVEN FAST REDUCING THE VISUAL BLOCK ARTIFACTS OF DCT COMPRESSED IM...
 
IRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature Descriptor
 
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATIONMAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
MAGNETIC RESONANCE BRAIN IMAGE SEGMENTATION
 
Representational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningRepresentational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual Learning
 
AN ENHANCED EDGE ADAPTIVE STEGANOGRAPHY APPROACH USING THRESHOLD VALUE FOR RE...
AN ENHANCED EDGE ADAPTIVE STEGANOGRAPHY APPROACH USING THRESHOLD VALUE FOR RE...AN ENHANCED EDGE ADAPTIVE STEGANOGRAPHY APPROACH USING THRESHOLD VALUE FOR RE...
AN ENHANCED EDGE ADAPTIVE STEGANOGRAPHY APPROACH USING THRESHOLD VALUE FOR RE...
 
M ESH S IMPLIFICATION V IA A V OLUME C OST M EASURE
M ESH S IMPLIFICATION V IA A V OLUME C OST M EASUREM ESH S IMPLIFICATION V IA A V OLUME C OST M EASURE
M ESH S IMPLIFICATION V IA A V OLUME C OST M EASURE
 
A Face Recognition Using Linear-Diagonal Binary Graph Pattern Feature Extract...
A Face Recognition Using Linear-Diagonal Binary Graph Pattern Feature Extract...A Face Recognition Using Linear-Diagonal Binary Graph Pattern Feature Extract...
A Face Recognition Using Linear-Diagonal Binary Graph Pattern Feature Extract...
 
A FACE RECOGNITION USING LINEAR-DIAGONAL BINARY GRAPH PATTERN FEATURE EXTRACT...
A FACE RECOGNITION USING LINEAR-DIAGONAL BINARY GRAPH PATTERN FEATURE EXTRACT...A FACE RECOGNITION USING LINEAR-DIAGONAL BINARY GRAPH PATTERN FEATURE EXTRACT...
A FACE RECOGNITION USING LINEAR-DIAGONAL BINARY GRAPH PATTERN FEATURE EXTRACT...
 
A FACE RECOGNITION USING LINEAR-DIAGONAL BINARY GRAPH PATTERN FEATURE EXTRACT...
A FACE RECOGNITION USING LINEAR-DIAGONAL BINARY GRAPH PATTERN FEATURE EXTRACT...A FACE RECOGNITION USING LINEAR-DIAGONAL BINARY GRAPH PATTERN FEATURE EXTRACT...
A FACE RECOGNITION USING LINEAR-DIAGONAL BINARY GRAPH PATTERN FEATURE EXTRACT...
 
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
[ICLR2021 (spotlight)] Benefit of deep learning with non-convex noisy gradien...
 
Performance analysis on color image mosaicing techniques on FPGA
Performance analysis on color image mosaicing techniques on FPGAPerformance analysis on color image mosaicing techniques on FPGA
Performance analysis on color image mosaicing techniques on FPGA
 
An Approach for Iris Recognition Based On Singular Value Decomposition and Hi...
An Approach for Iris Recognition Based On Singular Value Decomposition and Hi...An Approach for Iris Recognition Based On Singular Value Decomposition and Hi...
An Approach for Iris Recognition Based On Singular Value Decomposition and Hi...
 
MINR: Implicit Neural Representations with Masked Image Modelling (ICCV '23 O...
MINR: Implicit Neural Representations with Masked Image Modelling (ICCV '23 O...MINR: Implicit Neural Representations with Masked Image Modelling (ICCV '23 O...
MINR: Implicit Neural Representations with Masked Image Modelling (ICCV '23 O...
 

More from Sangmin Woo

Multimodal Learning with Severely Missing Modality.pptx
Multimodal Learning with Severely Missing Modality.pptxMultimodal Learning with Severely Missing Modality.pptx
Multimodal Learning with Severely Missing Modality.pptxSangmin Woo
 
Video Transformers.pptx
Video Transformers.pptxVideo Transformers.pptx
Video Transformers.pptxSangmin Woo
 
Masked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptxMasked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptxSangmin Woo
 
An Empirical Study of Training Self-Supervised Vision Transformers.pptx
An Empirical Study of Training Self-Supervised Vision Transformers.pptxAn Empirical Study of Training Self-Supervised Vision Transformers.pptx
An Empirical Study of Training Self-Supervised Vision Transformers.pptxSangmin Woo
 
Visual Commonsense Reasoning.pptx
Visual Commonsense Reasoning.pptxVisual Commonsense Reasoning.pptx
Visual Commonsense Reasoning.pptxSangmin Woo
 
Video Grounding.pptx
Video Grounding.pptxVideo Grounding.pptx
Video Grounding.pptxSangmin Woo
 
Action Recognition Datasets.pptx
Action Recognition Datasets.pptxAction Recognition Datasets.pptx
Action Recognition Datasets.pptxSangmin Woo
 
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...Sangmin Woo
 
Towards Efficient Transformers
Towards Efficient TransformersTowards Efficient Transformers
Towards Efficient TransformersSangmin Woo
 
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene GraphsAction Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene GraphsSangmin Woo
 
Neural motifs scene graph parsing with global context
Neural motifs scene graph parsing with global contextNeural motifs scene graph parsing with global context
Neural motifs scene graph parsing with global contextSangmin Woo
 
Attentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene GraphsAttentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene GraphsSangmin Woo
 
Graph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph GenerationGraph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph GenerationSangmin Woo
 

More from Sangmin Woo (13)

Multimodal Learning with Severely Missing Modality.pptx
Multimodal Learning with Severely Missing Modality.pptxMultimodal Learning with Severely Missing Modality.pptx
Multimodal Learning with Severely Missing Modality.pptx
 
Video Transformers.pptx
Video Transformers.pptxVideo Transformers.pptx
Video Transformers.pptx
 
Masked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptxMasked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptx
 
An Empirical Study of Training Self-Supervised Vision Transformers.pptx
An Empirical Study of Training Self-Supervised Vision Transformers.pptxAn Empirical Study of Training Self-Supervised Vision Transformers.pptx
An Empirical Study of Training Self-Supervised Vision Transformers.pptx
 
Visual Commonsense Reasoning.pptx
Visual Commonsense Reasoning.pptxVisual Commonsense Reasoning.pptx
Visual Commonsense Reasoning.pptx
 
Video Grounding.pptx
Video Grounding.pptxVideo Grounding.pptx
Video Grounding.pptx
 
Action Recognition Datasets.pptx
Action Recognition Datasets.pptxAction Recognition Datasets.pptx
Action Recognition Datasets.pptx
 
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the...
 
Towards Efficient Transformers
Towards Efficient TransformersTowards Efficient Transformers
Towards Efficient Transformers
 
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene GraphsAction Genome: Action As Composition of Spatio Temporal Scene Graphs
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
 
Neural motifs scene graph parsing with global context
Neural motifs scene graph parsing with global contextNeural motifs scene graph parsing with global context
Neural motifs scene graph parsing with global context
 
Attentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene GraphsAttentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene Graphs
 
Graph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph GenerationGraph R-CNN for Scene Graph Generation
Graph R-CNN for Scene Graph Generation
 

Recently uploaded

TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsssuserddc89b
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxAleenaTreesaSaji
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravitySubhadipsau21168
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 

Recently uploaded (20)

TOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physicsTOPIC 8 Temperature and Heat.pdf physics
TOPIC 8 Temperature and Heat.pdf physics
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Luciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptxLuciferase in rDNA technology (biotechnology).pptx
Luciferase in rDNA technology (biotechnology).pptx
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified Gravity
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 

Exploring Simple Siamese Representation Learning

  • 1. Sangmin Woo School of Electrical Engineering and Computer Science Gwangju Institute of Science and Technology (GIST) 2021.06.23
  • 2. 2 / 32 In a nutshell… • Siamese networks [1] maximize the similarity between two augmentations of one image, subject to certain conditions for avoiding collapsing solutions • Collapsing solutions do exist for the loss and structure, but a stop-gradient operation plays an essential role in preventing collapsing • A simple Siamese networks (SimSiam) shows surprising results without using: • Negative sample pairs • Large batches • Momentum encoders [1] Koch, G., Zemel, R., & Salakhutdinov, R. (2015, July). Siamese neural networks for one-shot image recognition. In ICML deep learning workshop (Vol. 2).
  • 3. 3 / 32 In a nutshell… • Siamese networks [1] maximize the similarity between two augmentations of one image, subject to certain conditions for avoiding collapsing solutions • Collapsing solutions do exist for the loss and structure, but a stop-gradient operation plays an essential role in preventing collapsing • A simple Siamese networks (SimSiam) shows surprising results without using: • Negative sample pairs • Large batches • Momentum encoders [1] Koch, G., Zemel, R., & Salakhutdinov, R. (2015, July). Siamese neural networks for one-shot image recognition. In ICML deep learning workshop (Vol. 2). Siamese networks are weight-sharing neural networks applied on two or more inputs. They are natural tools for comparing entities.
  • 4. 4 / 32 In a nutshell… • Siamese networks [1] maximize the similarity between two augmentations of one image, subject to certain conditions for avoiding collapsing solutions • Collapsing solutions do exist for the loss and structure, but a stop-gradient operation plays an essential role in preventing collapsing • A simple Siamese networks (SimSiam) shows surprising results without using: • Negative sample pairs • Large batches • Momentum encoders [1] Koch, G., Zemel, R., & Salakhutdinov, R. (2015, July). Siamese neural networks for one-shot image recognition. In ICML deep learning workshop (Vol. 2). An undesired trivial solution to Siamese networks is all outputs “collapsing” to a constant.
  • 5. 5 / 32 In a nutshell… • Siamese networks [1] maximize the similarity between two augmentations of one image, subject to certain conditions for avoiding collapsing solutions • Collapsing solutions do exist for the loss and structure, but a stop-gradient operation plays an essential role in preventing collapsing • A simple Siamese networks (SimSiam) shows surprising results without using: • Negative sample pairs • Large batches • Momentum encoders [1] Koch, G., Zemel, R., & Salakhutdinov, R. (2015, July). Siamese neural networks for one-shot image recognition. In ICML deep learning workshop (Vol. 2). stop the gradient from flowing backward Tensorflow: stop_gradient / PyTorch: detach
  • 6. 6 / 32 In a nutshell… • Siamese networks [1] maximize the similarity between two augmentations of one image, subject to certain conditions for avoiding collapsing solutions • Collapsing solutions do exist for the loss and structure, but a stop-gradient operation plays an essential role in preventing collapsing • A simple Siamese networks (SimSiam) shows surprising results without using: • Negative sample pairs • Large batches • Momentum encoders [1] Koch, G., Zemel, R., & Salakhutdinov, R. (2015, July). Siamese neural networks for one-shot image recognition. In ICML deep learning workshop (Vol. 2). Typical strategies for preventing Collapsing
  • 7. 7 / 32 Siamese Networks
  • 8. 8 / 32 Siamese Networks
  • 9. 9 / 32 Siamese Networks • Recent methods define the inputs as two augmentations of one image, and maximize the similarity subject to different conditions. • Siamese networks can naturally introduce inductive biases for modeling “invariance” • Convolution is a successful inductive bias via weight-sharing for modeling translation-invariance • Weight-sharing Siamese networks can model augmentation-invariance
  • 10. 10 / 32 “Collapsing” Preventing Strategies • Contrastive Learning • Repulses different images (negative pairs) while attracting the same image’s two views (positive pairs) • SimCLR [2] (Google; cited by ~1500) • Online Clustering • SwAV [3] (FAIR; cited by ~200) • Momentum Encoder • MoCO [4] (FAIR; cited by ~1250), BYOL [5] (Deepmind; cited by ~350) [2] Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020, November). A simple framework for contrastive learning of visual representations. In International conference on machine learning (pp. 1597-1607). PMLR. [3] Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., & Joulin, A. (2020). Unsupervised learning of visual features by contrasting cluster assignments. arXiv preprint arXiv:2006.09882. [4] He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9729-9738). [5] Grill, J. B., Strub, F., Altché, F., Tallec, C., Richemond, P. H., Buchatskaya, E., ... & Valko, M. (2020). Bootstrap your own latent: A new approach to self- supervised learning. arXiv preprint arXiv:2006.07733.
  • 11. 11 / 32 Contrastive Learning • SimCLR [2] [2] Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020, November). A simple framework for contrastive learning of visual representations. In International conference on machine learning (pp. 1597-1607). PMLR.
  • 12. 12 / 32 Clustering • SwAV [3] [3] Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., & Joulin, A. (2020). Unsupervised learning of visual features by contrasting cluster assignments. arXiv preprint arXiv:2006.09882.
  • 13. 13 / 32 Momentum Encoder • MoCO [4] • Only the parameters 𝜃𝑞 are updated by back-propagation • The momentum update makes 𝜃𝑘 evolve more smoothly than 𝜃𝑞 • As a result, though the keys in the queue are encoded by different encoders (in different mini-batches), the difference among these encoders can be made small. [4] He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. (2020). Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9729-9738). where, Momentum update rule:
  • 14. 14 / 32 Momentum Encoder [5] Grill, J. B., Strub, F., Altché, F., Tallec, C., Richemond, P. H., Buchatskaya, E., ... & Valko, M. (2020). Bootstrap your own latent: A new approach to self- supervised learning. arXiv preprint arXiv:2006.07733. • BYOL [5] • Relies only on positive pairs but it does not collapse in case a momentum encoder is used • BYOL minimizes a similarity loss between 𝑞𝜃(𝑧𝜃) and 𝑠𝑔 𝑧𝜉 ’ • where 𝜃 are the trained weights, 𝜉 are an exponential moving average of 𝜃 and 𝑠𝑔 means stop-gradient • At the end of training, everything but 𝑓𝜃 is discarded, and 𝑦𝜃 is used as the image representation
  • 15. 15 / 32 SimSiam • SimSiam works surprisingly well without any “collapsing” preventing strategies: • Negative pairs • Momentum encoder • Large-batch training
  • 16. 16 / 32 SimSiam • SimSiam works surprisingly well without any “collapsing” preventing strategies: • Negative pairs • Momentum encoder • Large-batch training : Stop the gradient flow Not updated by its parameters’ gradients Weight Shared
  • 17. 17 / 32 SimSiam • “SimSiam” = “BYOL without Momentum Encoder” • Hypothesis of factors involved in preventing “collapsing” • BYOL: momentum encoder • SimSiam: Stop-gradient operation • This discovery can be obscured with the usage of a momentum encoder, which is always accompanied with stop-gradient (as it is not updated by its parameters’ gradients). • While the moving-average behavior may improve accuracy with an appropriate momentum coefficient, experiments show that momentum encoder is not directly related to preventing collapsing.
  • 21. 21 / 32 Empirical Study • with vs. without Stop-gradient • (left) Without stop-gradient, the optimizer quickly finds a degenerated solution and reaches the minimum possible loss of −𝟏 • (mid) If the outputs collapse to a constant vector, their standard deviation over all samples becomes zero for each channel • (right) validation accuracy of kNN classifier can serve as a monitor of the progress. With stop-gradient, kNN monitor shows a steadily improving accuracy. • (table) Linear evaluation result
  • 22. 22 / 32 Discussion • Stop-gradient • Experiments empirically show there exist collapsing solutions. • However, a stop-gradient operation can play a key role in preventing such solutions. • The optimizer (batch size), batch normalization, similarity function, and symmetrization may affect accuracy, but we have seen no evidence that they are related to collapse prevention. • The importance of stop-gradient suggests that there should be a different underlying optimization problem that is being solved. • The authors hypothesize that there are implicitly two sets of variables, and SimSiam behaves like alternating between optimizing each set.
  • 23. 23 / 32 Hypothesis • “Siasiam” = “Expectation-Maximization” like algorithm • It implicitly involves two sets of variables, and solves two underlying sub-problems • Loss function • ℱ: network parameterized by 𝜃 • 𝒯: augmentation • 𝑥: image • 𝔼: expectation is over the distribution of images (𝑥) and augmentations (𝒯) • ⋅ 2 2 : mean squared error (= 𝑙2-normalized cosine similarity) • 𝜂𝑥: representation of the image 𝑥 (another set of variables)
  • 24. 24 / 32 Hypothesis • Objective: • Where, • This formulation is analogous to k-means clustering • Variable 𝜃 is analogous to the clustering centers • i.e., learnable parameters of an encoder • Variable 𝜂𝑥 is analogous to the assignment vector of the sample 𝒙 (one-hot vector in k-means) • i.e., representation of 𝑥
  • 25. 25 / 32 Hypothesis • Solve by an alternating algorithm • Fixing one set of variables and solving for the other set: • 𝑡: index of alternation • Solving for 𝜽 (clustering centers): SGD can be used to solve the (1) • stop-gradient operation is a natural consequence, because the gradient does not back- propagate to 𝜂𝑡−1 which is a constant in this sub-problem. • Solving for 𝜼 (assignment vectors): (2) can be solved independently for each 𝜂𝑥. Now the problem is to minimize: for each image 𝑥. • 𝜂𝑥 is assigned with the average representation of 𝑥 over the distribution of augmentation ⋯ (1) ⋯ (2) 𝒯
  • 26. 26 / 32 Hypothesis • One-step alternation • This can be approximated by sampling the augmentation 𝒯 only once, denoted as 𝒯′, and ignoring 𝔼𝒯[⋅]: • Inserting it into the , we have: • 𝜽𝒕 is a constant in this sub-problem, and 𝓣′ implies another view due to its random nature.
  • 27. 27 / 32 Proof-of-concept • Multi-step alternation • Hypothesis: “SimSiam algorithm is like alternating between (1) and (2)” • In this variant, 𝒕 is treated as the index of an outer loop • And the sub-problem in (1) is updated by an inner loop of 𝒌 SGD steps • In each alternation, 𝜂𝑥 required for all 𝑘 SGD steps are pre-computed and cached in memory. Then, 𝑘 SGD steps are performed to update 𝜃 • 1-step = SimSiam • Alternating optimization is a valid formulation ⋯ (1) ⋯ (2)
  • 28. 28 / 32 Discussion • SimSiam’s non-collapsing behavior remains as an open question… • This is an authors’ understanding: • Alternating optimization provides a different trajectory, and the trajectory depends on the initialization • It is unlikely that the initialized 𝜼, which is the output of a randomly initialized network, would be a constant • Starting from this initialization, it may be difficult for the alternating optimizer to approach a constant 𝜼𝒙 for all 𝒙 • because the method does not compute the gradients w.r.t. 𝜼 jointly for all 𝒙
  • 29. 29 / 32 Comparisons • ImageNet Linear Evaluation • All are based on pre-trained ResNet-50, with two 224×224 views • Small batch size • No negative pairs • No momentum encoder • Achieves highest accuracy among all methods under 100-epoch pre-training
  • 30. 30 / 32 Comparisons • Transfer Learning • Representation qualities are compared by transferring them to other tasks • The pre-trained models are fine-tuned end-to-end in the target datasets • SimSiam, base: same pre-training recipe as in ImageNet experiment • SimSiam, optimal: another recipe producing better results in all tasks
  • 31. 31 / 32 Takeaways • Siamese networks are natural and effective tool for modeling invariance, which is a focus of representation learning • Siamese structure can be a core reason for recent methods’ success • Stop-gradient helps to avoid collapsing solutions

Editor's Notes

  1. Invariance means that two observations of the same concept should produce the same outputs
  2. There have been several general strategies for preventing Siamese networks from collapsing.
  3. Google + hinton SimCLR directly uses negative samples coexisting in the current batch, and it requires a large batch size to work well.
  4. They alternate between clustering the representations and learning to predict the cluster assignment. SwAV incorporates clustering into a Siamese network, by computing the assignment from one view and predicting it from another view.
  5. In a Siamese network, MoCo maintains a queue of negative samples and turns one branch into a momentum encoder to improve consistency of the queue.
  6. Two randomly augmented views of one image are processed by the same encoder network f (a backbone plus a projection MLP). Then a prediction MLP h is applied on one side, and a stop-gradient operation is applied on the other side. The model maximizes the similarity between both sides. It uses neither negative pairs nor a momentum encoder.
  7. Two randomly augmented views of one image are processed by the same encoder network f (a backbone plus a projection MLP). Then a prediction MLP h is applied on one side, and a stop-gradient operation is applied on the other side. The model maximizes the similarity between both sides. It uses neither negative pairs nor a momentum encoder.
  8. The architectures and all hyper-parameters are kept unchanged, and stop-gradient is the only difference.
  9. 1) Eta^t-1을 fix하고 Loss를 최소화하는 theta를 구함 2) Theta^t를 fix하고 Loss를 최소화하는 eta를 구함 3) 반복