SlideShare a Scribd company logo
1 of 24
Download to read offline
PR-433
Gandelsman, Yossi, et al. "Test-time training with masked autoencoders." Advances in Neural Information
Processing Systems 35 (2022): 29374-29385.
주성훈, VUNO Inc.
2023. 4. 16.
1. Research Background
2. Methods
1. Research Background 3
Reference
Sun, Yu, et al. "Test-time training with self-supervision for generalization under distribution
shifts." International conference on machine learning. PMLR, 2020.
•https://yueatsprograms.github.io/ttt/home.html
/ 24
2. Methods
1. Research Background 4
https://yueatsprograms.github.io/ttt/home.html
/ 24
2. Methods
1. Research Background 5
Problem Settings
Generalization under distribution shifts
•Generalization is intrinsically hard without access to training data from the test distribution
•The common practice is to avoid distribution shifts altogether by using a wider training
distribution that hopefully contains the test distribution – with more training data or data
augmentation.
Geirhos, Robert, et al. "Generalisation in humans and deep neural networks." Advances in neural information processing systems 31 (2018).
salt-and-pepper noise
uniform noise uniform noise
uniform noise
Hard to know the test distribution!
/ 24
2. Methods
1. Research Background 6
Test time training (Sun et al., ICML, 2020)
/ 24
2. Methods
1. Research Background 7
Test time training (Sun et al., ICML, 2020)
•The self-supervised pretext task employed by TTT is rotation prediction
This task is limited in generality, because it can often be too easy or too hard.
https://yueatsprograms.github.io/ttt/home.html
/ 24
2. Methods
1. Research Background 8
Autoencoders for representation learning
The most successful work is masked autoencoders (MAE)
•He, Kaiming, et al. "Masked autoencoders are scalable vision learners." CVPR. 2022.
•PR-355
Proposed method simply substitutes MAE for the self-supervised part of TTT
/ 24
2. Methods
2. Methods
2. Methods 10
Design choices - Architecture
•Y-shaped (original TTT paper)
https://yueatsprograms.github.io/ttt/home.html
/ 24
2. Methods
2. Methods 11
Design choices - Architecture
h •Main task (e.g. object recognition)
f
•MAE encoder: ViT
g
•MAE decoder : ViT
•ViT-Base (for ViT probing)
•Y-shaped (TTT-MAE)
/ 24
2. Methods
2. Methods 12
Training-time training: 1. training encoder and decoder
f
g
•MAE encoder, deocder: ViT-large, pre-trained
for 800 epochs on ImageNet-1k
• ViT probing: train only, with frozen. Here, is a ViT-Base.
h f h
/ 24
2. Methods
2. Methods 13
Training-time training: 2. training main task head
f
•MAE encoder: ViT-Large
• pre-trained for ImageNet-1k reconstruction
• cross entropy loss for classification
• encoder produced by MAE pre-training
•Augmentation: image cropping and horizontal flips
•No other augmentations (random changes in
brightness, contrast, color and sharpness )
•800 epochs
lm :
f0 :
Training set with samples
n
h
Main task head
xi
yi
/ 24
2. Methods
2. Methods 14
Test-time training
g0
Test input arrives,
x
•self-supervised reconstruction loss
(pixel-wise mean squared error),
•random mask (75%)
•SGD, for 20 steps, using a momentum of
0.9, weight decay of 0.2, batch size of 128,
and fixed learning rate of 5e-3.
ls
Make a prediction on as
x h ∘ fx(x)
f0
fx h Bir
Reset the weights to and for the next test input
f0 g0 x
•By test-time training on the test inputs independently, we do not
assume that they come from the same distribution.
/ 24
2. Methods
2. Methods 15
Optimizer for TTT
Figure 2: We experiment with two optimizers for TTT. MAE [19] uses AdamW for pre-training. But our results (left) show that
AdamW for TTT requires early stopping, which is unrealistic for generalization to unknown distributions without a validation
set. We instead use SGD, which keeps improving performance even after 20 steps (right).
•it simply takes the same optimizer setting as during the last epoch of training-time training of the
self-supervised task. (Original TTT)
•the learning rate schedule of MAE reaches zero by the end of pre-training.
•When Test-Time Training (TTT), excessive iterations with AdamW can negatively impact performance.
•more iterations with SGD consistently improve performance on all distribution shifts
/ 24
3. Experimental Results
2. Methods
3. Experimental Results 17
Calibration on out of distribution data
•15 types of corruption to the images of ImageNet-C, 5 levels of severity
• D. Hendrycks and T. Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. ICLR, 2018
/ 24
2. Methods
3. Experimental Results 18
Main results on ImageNet-C
TTT-MAE has higher performance gains in all corruptions than TTT-Rot, on top of their respective baselines.
• Joint Train: ResNet-16-layers, after joint training for rotation prediction and object recognition (baseline for TTT-Rot)
• TTT-Rot: original paper (rotation task, resnet-18)
• Baseline: pre-trained MAE encoder ViT probing (no TTT)
• TTT-MAE (red) on top of our baseline significantly improves performance.
/ 24
2. Methods
3. Experimental Results 19
TTT-MAE in rotation invariant classes
• Rotation invariant class: images are usually taken from top-down views
TTT-MAE is agnostic to rotation invariance and still helps on these classes.
/ 24
2. Methods
3. Experimental Results 20
Design choices - Training setup
1. Fine-tuning; train ◦ end-to-end. This
works poorly with TTT
2. ViT probing: train only, with frozen.
Here, is a ViT-Base.
3. Joint training: train both ◦ and ◦ ,
by summing their losses together. This is
used by TTT with rotation prediction. But
with MAE, it performs worse on the
ImageNet validation set
h f
h f
h
h f g f
h
f
g
Object classification
/ 24
2. Methods
3. Experimental Results 21
Accuracy comparison of three designs (ViT probing, fine-tuning, joint training)
•The first three rows are only for training-time training, after which a fixed model is applied during testing.
•Joint training does not achieve satisfactory performance on most corruptions
•Fine-tuning: initially performs better than ViT probing, it is not amenable to TTT
•TTT-MAE: TTT-MAE after ViT probing, which performs the best across all corruption types
/ 24
2. Methods
3. Experimental Results 22
Performance on other ImageNet variants
ImageNet-R
• ImageNet-R is a benchmark dataset for evaluating robustness of image classification
• The dataset includes images that are synthetically generated from the original
ImageNet images in a variety of ways, such as adding noise, changing lighting, or
applying artistic styles.
ImageNet-A
• Baseline: pre-trained MAE encoder ViT probing (no TTT)
• ImageNet-A is a dataset designed to test the robustness of computer vision
models against real-world, unmodified images.
• The dataset includes visually similar images to those in ImageNet but with
added challenges such as occlusion, low resolution, and unusual viewpoints.
/ 24
4. Conclusion
2. Methods
4. Conclusions 24
• Main contribution
• The proposal of a new method - TTT-MAE for addressing the problem of domain shift in visual
recognition tasks.
• TTT can be viewed alternatively as one-sample unsupervised domain adaptation (UDA)
• Limitations & future works
• Slower at test time than the baseline applying a fixed model (Inference speed has not been the focus
of this paper), It might be improved through better hyper-parameters, optimizers, training techniques
and architectural designs.
• Studying the generalization of spatial autoencoding to other main tasks and test distributions beyond
object recognition and the benchmarks used in this study.
• Exploring test-time training on video streams in human-like environments, where self-supervised
learning can take advantage of past frames
Thank you.
/ 24

More Related Content

What's hot

오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것NAVER Engineering
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]Dongmin Choi
 
B&W Image Coloring (1).pptx
B&W  Image Coloring (1).pptxB&W  Image Coloring (1).pptx
B&W Image Coloring (1).pptxVaibhav533087
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Sungchul Kim
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative ModelsMLReview
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)Dong Guo
 
A brief introduction of Artificial neural network by example
A brief introduction of Artificial neural network by exampleA brief introduction of Artificial neural network by example
A brief introduction of Artificial neural network by exampleMrinmoy Majumder
 
Image-to-Image Translation
Image-to-Image TranslationImage-to-Image Translation
Image-to-Image TranslationJunho Kim
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnnKuppusamy P
 
Essential concepts for machine learning
Essential concepts for machine learning Essential concepts for machine learning
Essential concepts for machine learning pyingkodi maran
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksJonathan Mugan
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Abdulrazak Zakieh
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution OverviewLEE HOSEONG
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 

What's hot (20)

오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
B&W Image Coloring (1).pptx
B&W  Image Coloring (1).pptxB&W  Image Coloring (1).pptx
B&W Image Coloring (1).pptx
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
 
Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+Deeplabv1, v2, v3, v3+
Deeplabv1, v2, v3, v3+
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
 
Backpropagation algo
Backpropagation  algoBackpropagation  algo
Backpropagation algo
 
A brief introduction of Artificial neural network by example
A brief introduction of Artificial neural network by exampleA brief introduction of Artificial neural network by example
A brief introduction of Artificial neural network by example
 
Image-to-Image Translation
Image-to-Image TranslationImage-to-Image Translation
Image-to-Image Translation
 
ViT.pptx
ViT.pptxViT.pptx
ViT.pptx
 
Recurrent neural networks rnn
Recurrent neural networks   rnnRecurrent neural networks   rnn
Recurrent neural networks rnn
 
Ai black box
Ai black boxAi black box
Ai black box
 
Essential concepts for machine learning
Essential concepts for machine learning Essential concepts for machine learning
Essential concepts for machine learning
 
Generating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural NetworksGenerating Natural-Language Text with Neural Networks
Generating Natural-Language Text with Neural Networks
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
Single Image Super Resolution Overview
Single Image Super Resolution OverviewSingle Image Super Resolution Overview
Single Image Super Resolution Overview
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Mc culloch pitts neuron
Mc culloch pitts neuronMc culloch pitts neuron
Mc culloch pitts neuron
 

Similar to Test-time training with masked autoencoders improves generalization under distribution shifts

Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...Dongmin Choi
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
 
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...Edge AI and Vision Alliance
 
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGESA DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGESPNandaSai
 
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...Edge AI and Vision Alliance
 
Traffic Sign Recognition System
Traffic Sign Recognition SystemTraffic Sign Recognition System
Traffic Sign Recognition SystemIRJET Journal
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentationmultimediaeval
 
“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...
“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...
“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...Edge AI and Vision Alliance
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
 
Remote Sensing Sattelite image Digital Image Analysis.pptx
Remote Sensing Sattelite image Digital Image Analysis.pptxRemote Sensing Sattelite image Digital Image Analysis.pptx
Remote Sensing Sattelite image Digital Image Analysis.pptxhabtamuawulachew1
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learningmilad abbasi
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningMehrnaz Faraz
 
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...Universitat Politècnica de Catalunya
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
What Makes Training Multi-modal Classification Networks Hard? ppt
What Makes Training Multi-modal Classification Networks Hard? pptWhat Makes Training Multi-modal Classification Networks Hard? ppt
What Makes Training Multi-modal Classification Networks Hard? ppttaeseon ryu
 
IRJET- Efficient Face Detection from Video Sequences using KNN and PCA
IRJET-  	  Efficient Face Detection from Video Sequences using KNN and PCAIRJET-  	  Efficient Face Detection from Video Sequences using KNN and PCA
IRJET- Efficient Face Detection from Video Sequences using KNN and PCAIRJET Journal
 
PR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterPR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterSunghoon Joo
 

Similar to Test-time training with masked autoencoders improves generalization under distribution shifts (20)

Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
 
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGESA DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
 
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...
 
Deeplearning
Deeplearning Deeplearning
Deeplearning
 
nnUNet
nnUNetnnUNet
nnUNet
 
Traffic Sign Recognition System
Traffic Sign Recognition SystemTraffic Sign Recognition System
Traffic Sign Recognition System
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentation
 
“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...
“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...
“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
Remote Sensing Sattelite image Digital Image Analysis.pptx
Remote Sensing Sattelite image Digital Image Analysis.pptxRemote Sensing Sattelite image Digital Image Analysis.pptx
Remote Sensing Sattelite image Digital Image Analysis.pptx
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
What Makes Training Multi-modal Classification Networks Hard? ppt
What Makes Training Multi-modal Classification Networks Hard? pptWhat Makes Training Multi-modal Classification Networks Hard? ppt
What Makes Training Multi-modal Classification Networks Hard? ppt
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
IRJET- Efficient Face Detection from Video Sequences using KNN and PCA
IRJET-  	  Efficient Face Detection from Video Sequences using KNN and PCAIRJET-  	  Efficient Face Detection from Video Sequences using KNN and PCA
IRJET- Efficient Face Detection from Video Sequences using KNN and PCA
 
PR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterPR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But Faster
 

More from Sunghoon Joo

PR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfPR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfSunghoon Joo
 
PR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionPR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionSunghoon Joo
 
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...Sunghoon Joo
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.Sunghoon Joo
 
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningPR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningSunghoon Joo
 
PR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningPR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningSunghoon Joo
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...Sunghoon Joo
 
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...Sunghoon Joo
 
PR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document rerankingPR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document rerankingSunghoon Joo
 
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...Sunghoon Joo
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationSunghoon Joo
 
PR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesPR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesSunghoon Joo
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From ScratchSunghoon Joo
 
PR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchPR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchSunghoon Joo
 
PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesSunghoon Joo
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...Sunghoon Joo
 
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...Sunghoon Joo
 
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...Sunghoon Joo
 

More from Sunghoon Joo (18)

PR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfPR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdf
 
PR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionPR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed Recognition
 
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
 
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningPR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
 
PR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningPR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learning
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
 
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
 
PR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document rerankingPR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document reranking
 
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
 
PR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesPR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseases
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
 
PR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchPR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture Search
 
PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of Samples
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
 
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
 
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
 

Recently uploaded

Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 

Recently uploaded (20)

Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 

Test-time training with masked autoencoders improves generalization under distribution shifts

  • 1. PR-433 Gandelsman, Yossi, et al. "Test-time training with masked autoencoders." Advances in Neural Information Processing Systems 35 (2022): 29374-29385. 주성훈, VUNO Inc. 2023. 4. 16.
  • 3. 2. Methods 1. Research Background 3 Reference Sun, Yu, et al. "Test-time training with self-supervision for generalization under distribution shifts." International conference on machine learning. PMLR, 2020. •https://yueatsprograms.github.io/ttt/home.html / 24
  • 4. 2. Methods 1. Research Background 4 https://yueatsprograms.github.io/ttt/home.html / 24
  • 5. 2. Methods 1. Research Background 5 Problem Settings Generalization under distribution shifts •Generalization is intrinsically hard without access to training data from the test distribution •The common practice is to avoid distribution shifts altogether by using a wider training distribution that hopefully contains the test distribution – with more training data or data augmentation. Geirhos, Robert, et al. "Generalisation in humans and deep neural networks." Advances in neural information processing systems 31 (2018). salt-and-pepper noise uniform noise uniform noise uniform noise Hard to know the test distribution! / 24
  • 6. 2. Methods 1. Research Background 6 Test time training (Sun et al., ICML, 2020) / 24
  • 7. 2. Methods 1. Research Background 7 Test time training (Sun et al., ICML, 2020) •The self-supervised pretext task employed by TTT is rotation prediction This task is limited in generality, because it can often be too easy or too hard. https://yueatsprograms.github.io/ttt/home.html / 24
  • 8. 2. Methods 1. Research Background 8 Autoencoders for representation learning The most successful work is masked autoencoders (MAE) •He, Kaiming, et al. "Masked autoencoders are scalable vision learners." CVPR. 2022. •PR-355 Proposed method simply substitutes MAE for the self-supervised part of TTT / 24
  • 10. 2. Methods 2. Methods 10 Design choices - Architecture •Y-shaped (original TTT paper) https://yueatsprograms.github.io/ttt/home.html / 24
  • 11. 2. Methods 2. Methods 11 Design choices - Architecture h •Main task (e.g. object recognition) f •MAE encoder: ViT g •MAE decoder : ViT •ViT-Base (for ViT probing) •Y-shaped (TTT-MAE) / 24
  • 12. 2. Methods 2. Methods 12 Training-time training: 1. training encoder and decoder f g •MAE encoder, deocder: ViT-large, pre-trained for 800 epochs on ImageNet-1k • ViT probing: train only, with frozen. Here, is a ViT-Base. h f h / 24
  • 13. 2. Methods 2. Methods 13 Training-time training: 2. training main task head f •MAE encoder: ViT-Large • pre-trained for ImageNet-1k reconstruction • cross entropy loss for classification • encoder produced by MAE pre-training •Augmentation: image cropping and horizontal flips •No other augmentations (random changes in brightness, contrast, color and sharpness ) •800 epochs lm : f0 : Training set with samples n h Main task head xi yi / 24
  • 14. 2. Methods 2. Methods 14 Test-time training g0 Test input arrives, x •self-supervised reconstruction loss (pixel-wise mean squared error), •random mask (75%) •SGD, for 20 steps, using a momentum of 0.9, weight decay of 0.2, batch size of 128, and fixed learning rate of 5e-3. ls Make a prediction on as x h ∘ fx(x) f0 fx h Bir Reset the weights to and for the next test input f0 g0 x •By test-time training on the test inputs independently, we do not assume that they come from the same distribution. / 24
  • 15. 2. Methods 2. Methods 15 Optimizer for TTT Figure 2: We experiment with two optimizers for TTT. MAE [19] uses AdamW for pre-training. But our results (left) show that AdamW for TTT requires early stopping, which is unrealistic for generalization to unknown distributions without a validation set. We instead use SGD, which keeps improving performance even after 20 steps (right). •it simply takes the same optimizer setting as during the last epoch of training-time training of the self-supervised task. (Original TTT) •the learning rate schedule of MAE reaches zero by the end of pre-training. •When Test-Time Training (TTT), excessive iterations with AdamW can negatively impact performance. •more iterations with SGD consistently improve performance on all distribution shifts / 24
  • 17. 2. Methods 3. Experimental Results 17 Calibration on out of distribution data •15 types of corruption to the images of ImageNet-C, 5 levels of severity • D. Hendrycks and T. Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. ICLR, 2018 / 24
  • 18. 2. Methods 3. Experimental Results 18 Main results on ImageNet-C TTT-MAE has higher performance gains in all corruptions than TTT-Rot, on top of their respective baselines. • Joint Train: ResNet-16-layers, after joint training for rotation prediction and object recognition (baseline for TTT-Rot) • TTT-Rot: original paper (rotation task, resnet-18) • Baseline: pre-trained MAE encoder ViT probing (no TTT) • TTT-MAE (red) on top of our baseline significantly improves performance. / 24
  • 19. 2. Methods 3. Experimental Results 19 TTT-MAE in rotation invariant classes • Rotation invariant class: images are usually taken from top-down views TTT-MAE is agnostic to rotation invariance and still helps on these classes. / 24
  • 20. 2. Methods 3. Experimental Results 20 Design choices - Training setup 1. Fine-tuning; train ◦ end-to-end. This works poorly with TTT 2. ViT probing: train only, with frozen. Here, is a ViT-Base. 3. Joint training: train both ◦ and ◦ , by summing their losses together. This is used by TTT with rotation prediction. But with MAE, it performs worse on the ImageNet validation set h f h f h h f g f h f g Object classification / 24
  • 21. 2. Methods 3. Experimental Results 21 Accuracy comparison of three designs (ViT probing, fine-tuning, joint training) •The first three rows are only for training-time training, after which a fixed model is applied during testing. •Joint training does not achieve satisfactory performance on most corruptions •Fine-tuning: initially performs better than ViT probing, it is not amenable to TTT •TTT-MAE: TTT-MAE after ViT probing, which performs the best across all corruption types / 24
  • 22. 2. Methods 3. Experimental Results 22 Performance on other ImageNet variants ImageNet-R • ImageNet-R is a benchmark dataset for evaluating robustness of image classification • The dataset includes images that are synthetically generated from the original ImageNet images in a variety of ways, such as adding noise, changing lighting, or applying artistic styles. ImageNet-A • Baseline: pre-trained MAE encoder ViT probing (no TTT) • ImageNet-A is a dataset designed to test the robustness of computer vision models against real-world, unmodified images. • The dataset includes visually similar images to those in ImageNet but with added challenges such as occlusion, low resolution, and unusual viewpoints. / 24
  • 24. 2. Methods 4. Conclusions 24 • Main contribution • The proposal of a new method - TTT-MAE for addressing the problem of domain shift in visual recognition tasks. • TTT can be viewed alternatively as one-sample unsupervised domain adaptation (UDA) • Limitations & future works • Slower at test time than the baseline applying a fixed model (Inference speed has not been the focus of this paper), It might be improved through better hyper-parameters, optimizers, training techniques and architectural designs. • Studying the generalization of spatial autoencoding to other main tasks and test distributions beyond object recognition and the benchmarks used in this study. • Exploring test-time training on video streams in human-like environments, where self-supervised learning can take advantage of past frames Thank you. / 24