SlideShare a Scribd company logo
PR-433
Gandelsman, Yossi, et al. "Test-time training with masked autoencoders." Advances in Neural Information
Processing Systems 35 (2022): 29374-29385.
주성훈, VUNO Inc.
2023. 4. 16.
1. Research Background
2. Methods
1. Research Background 3
Reference
Sun, Yu, et al. "Test-time training with self-supervision for generalization under distribution
shifts." International conference on machine learning. PMLR, 2020.
•https://yueatsprograms.github.io/ttt/home.html
/ 24
2. Methods
1. Research Background 4
https://yueatsprograms.github.io/ttt/home.html
/ 24
2. Methods
1. Research Background 5
Problem Settings
Generalization under distribution shifts
•Generalization is intrinsically hard without access to training data from the test distribution
•The common practice is to avoid distribution shifts altogether by using a wider training
distribution that hopefully contains the test distribution – with more training data or data
augmentation.
Geirhos, Robert, et al. "Generalisation in humans and deep neural networks." Advances in neural information processing systems 31 (2018).
salt-and-pepper noise
uniform noise uniform noise
uniform noise
Hard to know the test distribution!
/ 24
2. Methods
1. Research Background 6
Test time training (Sun et al., ICML, 2020)
/ 24
2. Methods
1. Research Background 7
Test time training (Sun et al., ICML, 2020)
•The self-supervised pretext task employed by TTT is rotation prediction
This task is limited in generality, because it can often be too easy or too hard.
https://yueatsprograms.github.io/ttt/home.html
/ 24
2. Methods
1. Research Background 8
Autoencoders for representation learning
The most successful work is masked autoencoders (MAE)
•He, Kaiming, et al. "Masked autoencoders are scalable vision learners." CVPR. 2022.
•PR-355
Proposed method simply substitutes MAE for the self-supervised part of TTT
/ 24
2. Methods
2. Methods
2. Methods 10
Design choices - Architecture
•Y-shaped (original TTT paper)
https://yueatsprograms.github.io/ttt/home.html
/ 24
2. Methods
2. Methods 11
Design choices - Architecture
h •Main task (e.g. object recognition)
f
•MAE encoder: ViT
g
•MAE decoder : ViT
•ViT-Base (for ViT probing)
•Y-shaped (TTT-MAE)
/ 24
2. Methods
2. Methods 12
Training-time training: 1. training encoder and decoder
f
g
•MAE encoder, deocder: ViT-large, pre-trained
for 800 epochs on ImageNet-1k
• ViT probing: train only, with frozen. Here, is a ViT-Base.
h f h
/ 24
2. Methods
2. Methods 13
Training-time training: 2. training main task head
f
•MAE encoder: ViT-Large
• pre-trained for ImageNet-1k reconstruction
• cross entropy loss for classification
• encoder produced by MAE pre-training
•Augmentation: image cropping and horizontal flips
•No other augmentations (random changes in
brightness, contrast, color and sharpness )
•800 epochs
lm :
f0 :
Training set with samples
n
h
Main task head
xi
yi
/ 24
2. Methods
2. Methods 14
Test-time training
g0
Test input arrives,
x
•self-supervised reconstruction loss
(pixel-wise mean squared error),
•random mask (75%)
•SGD, for 20 steps, using a momentum of
0.9, weight decay of 0.2, batch size of 128,
and fixed learning rate of 5e-3.
ls
Make a prediction on as
x h ∘ fx(x)
f0
fx h Bir
Reset the weights to and for the next test input
f0 g0 x
•By test-time training on the test inputs independently, we do not
assume that they come from the same distribution.
/ 24
2. Methods
2. Methods 15
Optimizer for TTT
Figure 2: We experiment with two optimizers for TTT. MAE [19] uses AdamW for pre-training. But our results (left) show that
AdamW for TTT requires early stopping, which is unrealistic for generalization to unknown distributions without a validation
set. We instead use SGD, which keeps improving performance even after 20 steps (right).
•it simply takes the same optimizer setting as during the last epoch of training-time training of the
self-supervised task. (Original TTT)
•the learning rate schedule of MAE reaches zero by the end of pre-training.
•When Test-Time Training (TTT), excessive iterations with AdamW can negatively impact performance.
•more iterations with SGD consistently improve performance on all distribution shifts
/ 24
3. Experimental Results
2. Methods
3. Experimental Results 17
Calibration on out of distribution data
•15 types of corruption to the images of ImageNet-C, 5 levels of severity
• D. Hendrycks and T. Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. ICLR, 2018
/ 24
2. Methods
3. Experimental Results 18
Main results on ImageNet-C
TTT-MAE has higher performance gains in all corruptions than TTT-Rot, on top of their respective baselines.
• Joint Train: ResNet-16-layers, after joint training for rotation prediction and object recognition (baseline for TTT-Rot)
• TTT-Rot: original paper (rotation task, resnet-18)
• Baseline: pre-trained MAE encoder ViT probing (no TTT)
• TTT-MAE (red) on top of our baseline significantly improves performance.
/ 24
2. Methods
3. Experimental Results 19
TTT-MAE in rotation invariant classes
• Rotation invariant class: images are usually taken from top-down views
TTT-MAE is agnostic to rotation invariance and still helps on these classes.
/ 24
2. Methods
3. Experimental Results 20
Design choices - Training setup
1. Fine-tuning; train ◦ end-to-end. This
works poorly with TTT
2. ViT probing: train only, with frozen.
Here, is a ViT-Base.
3. Joint training: train both ◦ and ◦ ,
by summing their losses together. This is
used by TTT with rotation prediction. But
with MAE, it performs worse on the
ImageNet validation set
h f
h f
h
h f g f
h
f
g
Object classification
/ 24
2. Methods
3. Experimental Results 21
Accuracy comparison of three designs (ViT probing, fine-tuning, joint training)
•The first three rows are only for training-time training, after which a fixed model is applied during testing.
•Joint training does not achieve satisfactory performance on most corruptions
•Fine-tuning: initially performs better than ViT probing, it is not amenable to TTT
•TTT-MAE: TTT-MAE after ViT probing, which performs the best across all corruption types
/ 24
2. Methods
3. Experimental Results 22
Performance on other ImageNet variants
ImageNet-R
• ImageNet-R is a benchmark dataset for evaluating robustness of image classification
• The dataset includes images that are synthetically generated from the original
ImageNet images in a variety of ways, such as adding noise, changing lighting, or
applying artistic styles.
ImageNet-A
• Baseline: pre-trained MAE encoder ViT probing (no TTT)
• ImageNet-A is a dataset designed to test the robustness of computer vision
models against real-world, unmodified images.
• The dataset includes visually similar images to those in ImageNet but with
added challenges such as occlusion, low resolution, and unusual viewpoints.
/ 24
4. Conclusion
2. Methods
4. Conclusions 24
• Main contribution
• The proposal of a new method - TTT-MAE for addressing the problem of domain shift in visual
recognition tasks.
• TTT can be viewed alternatively as one-sample unsupervised domain adaptation (UDA)
• Limitations & future works
• Slower at test time than the baseline applying a fixed model (Inference speed has not been the focus
of this paper), It might be improved through better hyper-parameters, optimizers, training techniques
and architectural designs.
• Studying the generalization of spatial autoencoding to other main tasks and test distributions beyond
object recognition and the benchmarks used in this study.
• Exploring test-time training on video streams in human-like environments, where self-supervised
learning can take advantage of past frames
Thank you.
/ 24

More Related Content

What's hot

DLゼミ: Ego-Body Pose Estimation via Ego-Head Pose Estimation
DLゼミ: Ego-Body Pose Estimation via Ego-Head Pose EstimationDLゼミ: Ego-Body Pose Estimation via Ego-Head Pose Estimation
DLゼミ: Ego-Body Pose Estimation via Ego-Head Pose Estimation
harmonylab
 
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Universitat Politècnica de Catalunya
 
[IBIS2017 講演] ディープラーニングによる画像変換
[IBIS2017 講演] ディープラーニングによる画像変換[IBIS2017 講演] ディープラーニングによる画像変換
[IBIS2017 講演] ディープラーニングによる画像変換
Satoshi Iizuka
 
asm.jsとWebAssemblyって実際なんなの?
asm.jsとWebAssemblyって実際なんなの?asm.jsとWebAssemblyって実際なんなの?
asm.jsとWebAssemblyって実際なんなの?
Yosuke Onoue
 
Activity-Net Challenge 2021の紹介
Activity-Net Challenge 2021の紹介Activity-Net Challenge 2021の紹介
Activity-Net Challenge 2021の紹介
Toru Tamaki
 
文献紹介:Benchmarking Neural Network Robustness to Common Corruptions and Perturb...
文献紹介:Benchmarking Neural Network Robustness to Common Corruptions and Perturb...文献紹介:Benchmarking Neural Network Robustness to Common Corruptions and Perturb...
文献紹介:Benchmarking Neural Network Robustness to Common Corruptions and Perturb...
Toru Tamaki
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ
Deep Learning JP
 
A100 GPU 搭載! P4d インスタンス 使いこなしのコツ
A100 GPU 搭載! P4d インスタンス使いこなしのコツA100 GPU 搭載! P4d インスタンス使いこなしのコツ
A100 GPU 搭載! P4d インスタンス 使いこなしのコツ
Kuninobu SaSaki
 
画像の基盤モデルの変遷と研究動向
画像の基盤モデルの変遷と研究動向画像の基盤モデルの変遷と研究動向
画像の基盤モデルの変遷と研究動向
nlab_utokyo
 
SSII2021 [OS2-02] 深層学習におけるデータ拡張の原理と最新動向
SSII2021 [OS2-02] 深層学習におけるデータ拡張の原理と最新動向SSII2021 [OS2-02] 深層学習におけるデータ拡張の原理と最新動向
SSII2021 [OS2-02] 深層学習におけるデータ拡張の原理と最新動向
SSII
 
【メタサーベイ】Neural Fields
【メタサーベイ】Neural Fields【メタサーベイ】Neural Fields
【メタサーベイ】Neural Fields
cvpaper. challenge
 
[DLHacks 実装]Network Dissection: Quantifying Interpretability of Deep Visual R...
[DLHacks 実装]Network Dissection: Quantifying Interpretability of Deep Visual R...[DLHacks 実装]Network Dissection: Quantifying Interpretability of Deep Visual R...
[DLHacks 実装]Network Dissection: Quantifying Interpretability of Deep Visual R...
Deep Learning JP
 
CVPR 2018 速報
CVPR 2018 速報CVPR 2018 速報
CVPR 2018 速報
cvpaper. challenge
 
Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...
Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...
Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...
IRJET Journal
 
論文紹介:End-to-End Object Detection with Transformers
論文紹介:End-to-End Object Detection with Transformers論文紹介:End-to-End Object Detection with Transformers
論文紹介:End-to-End Object Detection with Transformers
Toru Tamaki
 
Video Classification Basic
Video Classification Basic Video Classification Basic
Video Classification Basic
Silversparro Technologies
 
Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2 「エッジAIモダン計測制御の世界」オ...
Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2  「エッジAIモダン計測制御の世界」オ...Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2  「エッジAIモダン計測制御の世界」オ...
Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2 「エッジAIモダン計測制御の世界」オ...
Mr. Vengineer
 
Vimから見たemacs
Vimから見たemacsVimから見たemacs
Vimから見たemacsShougo
 
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
Sunghoon Joo
 
画像認識モデルを作るための鉄板レシピ
画像認識モデルを作るための鉄板レシピ画像認識モデルを作るための鉄板レシピ
画像認識モデルを作るための鉄板レシピ
Takahiro Kubo
 

What's hot (20)

DLゼミ: Ego-Body Pose Estimation via Ego-Head Pose Estimation
DLゼミ: Ego-Body Pose Estimation via Ego-Head Pose EstimationDLゼミ: Ego-Body Pose Estimation via Ego-Head Pose Estimation
DLゼミ: Ego-Body Pose Estimation via Ego-Head Pose Estimation
 
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
 
[IBIS2017 講演] ディープラーニングによる画像変換
[IBIS2017 講演] ディープラーニングによる画像変換[IBIS2017 講演] ディープラーニングによる画像変換
[IBIS2017 講演] ディープラーニングによる画像変換
 
asm.jsとWebAssemblyって実際なんなの?
asm.jsとWebAssemblyって実際なんなの?asm.jsとWebAssemblyって実際なんなの?
asm.jsとWebAssemblyって実際なんなの?
 
Activity-Net Challenge 2021の紹介
Activity-Net Challenge 2021の紹介Activity-Net Challenge 2021の紹介
Activity-Net Challenge 2021の紹介
 
文献紹介:Benchmarking Neural Network Robustness to Common Corruptions and Perturb...
文献紹介:Benchmarking Neural Network Robustness to Common Corruptions and Perturb...文献紹介:Benchmarking Neural Network Robustness to Common Corruptions and Perturb...
文献紹介:Benchmarking Neural Network Robustness to Common Corruptions and Perturb...
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ
 
A100 GPU 搭載! P4d インスタンス 使いこなしのコツ
A100 GPU 搭載! P4d インスタンス使いこなしのコツA100 GPU 搭載! P4d インスタンス使いこなしのコツ
A100 GPU 搭載! P4d インスタンス 使いこなしのコツ
 
画像の基盤モデルの変遷と研究動向
画像の基盤モデルの変遷と研究動向画像の基盤モデルの変遷と研究動向
画像の基盤モデルの変遷と研究動向
 
SSII2021 [OS2-02] 深層学習におけるデータ拡張の原理と最新動向
SSII2021 [OS2-02] 深層学習におけるデータ拡張の原理と最新動向SSII2021 [OS2-02] 深層学習におけるデータ拡張の原理と最新動向
SSII2021 [OS2-02] 深層学習におけるデータ拡張の原理と最新動向
 
【メタサーベイ】Neural Fields
【メタサーベイ】Neural Fields【メタサーベイ】Neural Fields
【メタサーベイ】Neural Fields
 
[DLHacks 実装]Network Dissection: Quantifying Interpretability of Deep Visual R...
[DLHacks 実装]Network Dissection: Quantifying Interpretability of Deep Visual R...[DLHacks 実装]Network Dissection: Quantifying Interpretability of Deep Visual R...
[DLHacks 実装]Network Dissection: Quantifying Interpretability of Deep Visual R...
 
CVPR 2018 速報
CVPR 2018 速報CVPR 2018 速報
CVPR 2018 速報
 
Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...
Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...
Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...
 
論文紹介:End-to-End Object Detection with Transformers
論文紹介:End-to-End Object Detection with Transformers論文紹介:End-to-End Object Detection with Transformers
論文紹介:End-to-End Object Detection with Transformers
 
Video Classification Basic
Video Classification Basic Video Classification Basic
Video Classification Basic
 
Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2 「エッジAIモダン計測制御の世界」オ...
Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2  「エッジAIモダン計測制御の世界」オ...Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2  「エッジAIモダン計測制御の世界」オ...
Google Edge TPUで TensorFlow Liteを使った時に 何をやっているのかを妄想してみる 2 「エッジAIモダン計測制御の世界」オ...
 
Vimから見たemacs
Vimから見たemacsVimから見たemacs
Vimから見たemacs
 
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
 
画像認識モデルを作るための鉄板レシピ
画像認識モデルを作るための鉄板レシピ画像認識モデルを作るための鉄板レシピ
画像認識モデルを作るための鉄板レシピ
 

Similar to PR-433: Test-time Training with Masked Autoencoders

Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...
Dongmin Choi
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
Jinwon Lee
 
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
Edge AI and Vision Alliance
 
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGESA DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
PNandaSai
 
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...
Edge AI and Vision Alliance
 
Deeplearning
Deeplearning Deeplearning
Deeplearning
Nimrita Koul
 
nnUNet
nnUNetnnUNet
Traffic Sign Recognition System
Traffic Sign Recognition SystemTraffic Sign Recognition System
Traffic Sign Recognition System
IRJET Journal
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentation
multimediaeval
 
“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...
“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...
“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...
Edge AI and Vision Alliance
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
Jinwon Lee
 
Remote Sensing Sattelite image Digital Image Analysis.pptx
Remote Sensing Sattelite image Digital Image Analysis.pptxRemote Sensing Sattelite image Digital Image Analysis.pptx
Remote Sensing Sattelite image Digital Image Analysis.pptx
habtamuawulachew1
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
milad abbasi
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
Mehrnaz Faraz
 
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Universitat Politècnica de Catalunya
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
Edge AI and Vision Alliance
 
What Makes Training Multi-modal Classification Networks Hard? ppt
What Makes Training Multi-modal Classification Networks Hard? pptWhat Makes Training Multi-modal Classification Networks Hard? ppt
What Makes Training Multi-modal Classification Networks Hard? ppt
taeseon ryu
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
SaadMemon23
 
IRJET- Efficient Face Detection from Video Sequences using KNN and PCA
IRJET-  	  Efficient Face Detection from Video Sequences using KNN and PCAIRJET-  	  Efficient Face Detection from Video Sequences using KNN and PCA
IRJET- Efficient Face Detection from Video Sequences using KNN and PCA
IRJET Journal
 
PR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterPR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But Faster
Sunghoon Joo
 

Similar to PR-433: Test-time Training with Masked Autoencoders (20)

Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
 
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGESA DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
A DEEP LEARNING APPROACH FOR SEMANTIC SEGMENTATION IN BRAIN TUMOR IMAGES
 
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...
“Imaging Systems for Applied Reinforcement Learning Control,” a Presentation ...
 
Deeplearning
Deeplearning Deeplearning
Deeplearning
 
nnUNet
nnUNetnnUNet
nnUNet
 
Traffic Sign Recognition System
Traffic Sign Recognition SystemTraffic Sign Recognition System
Traffic Sign Recognition System
 
Deep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp SegmentationDeep Conditional Adversarial learning for polyp Segmentation
Deep Conditional Adversarial learning for polyp Segmentation
 
“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...
“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...
“Fundamentals of Training AI Models for Computer Vision Applications,” a Pres...
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
Remote Sensing Sattelite image Digital Image Analysis.pptx
Remote Sensing Sattelite image Digital Image Analysis.pptxRemote Sensing Sattelite image Digital Image Analysis.pptx
Remote Sensing Sattelite image Digital Image Analysis.pptx
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
What Makes Training Multi-modal Classification Networks Hard? ppt
What Makes Training Multi-modal Classification Networks Hard? pptWhat Makes Training Multi-modal Classification Networks Hard? ppt
What Makes Training Multi-modal Classification Networks Hard? ppt
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
IRJET- Efficient Face Detection from Video Sequences using KNN and PCA
IRJET-  	  Efficient Face Detection from Video Sequences using KNN and PCAIRJET-  	  Efficient Face Detection from Video Sequences using KNN and PCA
IRJET- Efficient Face Detection from Video Sequences using KNN and PCA
 
PR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterPR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But Faster
 

More from Sunghoon Joo

PR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfPR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdf
Sunghoon Joo
 
PR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionPR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed Recognition
Sunghoon Joo
 
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
Sunghoon Joo
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
Sunghoon Joo
 
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningPR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
Sunghoon Joo
 
PR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningPR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learning
Sunghoon Joo
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
Sunghoon Joo
 
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
Sunghoon Joo
 
PR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document rerankingPR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document reranking
Sunghoon Joo
 
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
Sunghoon Joo
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
Sunghoon Joo
 
PR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesPR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseases
Sunghoon Joo
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
Sunghoon Joo
 
PR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchPR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture Search
Sunghoon Joo
 
PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of Samples
Sunghoon Joo
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
Sunghoon Joo
 
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
Sunghoon Joo
 
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
Sunghoon Joo
 

More from Sunghoon Joo (18)

PR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfPR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdf
 
PR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionPR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed Recognition
 
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
 
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningPR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
 
PR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningPR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learning
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
 
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
 
PR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document rerankingPR-298 PARADE: Passage representation aggregation for document reranking
PR-298 PARADE: Passage representation aggregation for document reranking
 
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
 
PR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesPR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseases
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
 
PR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchPR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture Search
 
PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of Samples
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
 
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
 
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
 

Recently uploaded

Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
Kamal Acharya
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
ShahidSultan24
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
Kamal Acharya
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
Kamal Acharya
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
MuhammadTufail242431
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 

Recently uploaded (20)

Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
H.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdfH.Seo,  ICLR 2024, MLILAB,  KAIST AI.pdf
H.Seo, ICLR 2024, MLILAB, KAIST AI.pdf
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
addressing modes in computer architecture
addressing modes  in computer architectureaddressing modes  in computer architecture
addressing modes in computer architecture
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Vaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdfVaccine management system project report documentation..pdf
Vaccine management system project report documentation..pdf
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 

PR-433: Test-time Training with Masked Autoencoders

  • 1. PR-433 Gandelsman, Yossi, et al. "Test-time training with masked autoencoders." Advances in Neural Information Processing Systems 35 (2022): 29374-29385. 주성훈, VUNO Inc. 2023. 4. 16.
  • 3. 2. Methods 1. Research Background 3 Reference Sun, Yu, et al. "Test-time training with self-supervision for generalization under distribution shifts." International conference on machine learning. PMLR, 2020. •https://yueatsprograms.github.io/ttt/home.html / 24
  • 4. 2. Methods 1. Research Background 4 https://yueatsprograms.github.io/ttt/home.html / 24
  • 5. 2. Methods 1. Research Background 5 Problem Settings Generalization under distribution shifts •Generalization is intrinsically hard without access to training data from the test distribution •The common practice is to avoid distribution shifts altogether by using a wider training distribution that hopefully contains the test distribution – with more training data or data augmentation. Geirhos, Robert, et al. "Generalisation in humans and deep neural networks." Advances in neural information processing systems 31 (2018). salt-and-pepper noise uniform noise uniform noise uniform noise Hard to know the test distribution! / 24
  • 6. 2. Methods 1. Research Background 6 Test time training (Sun et al., ICML, 2020) / 24
  • 7. 2. Methods 1. Research Background 7 Test time training (Sun et al., ICML, 2020) •The self-supervised pretext task employed by TTT is rotation prediction This task is limited in generality, because it can often be too easy or too hard. https://yueatsprograms.github.io/ttt/home.html / 24
  • 8. 2. Methods 1. Research Background 8 Autoencoders for representation learning The most successful work is masked autoencoders (MAE) •He, Kaiming, et al. "Masked autoencoders are scalable vision learners." CVPR. 2022. •PR-355 Proposed method simply substitutes MAE for the self-supervised part of TTT / 24
  • 10. 2. Methods 2. Methods 10 Design choices - Architecture •Y-shaped (original TTT paper) https://yueatsprograms.github.io/ttt/home.html / 24
  • 11. 2. Methods 2. Methods 11 Design choices - Architecture h •Main task (e.g. object recognition) f •MAE encoder: ViT g •MAE decoder : ViT •ViT-Base (for ViT probing) •Y-shaped (TTT-MAE) / 24
  • 12. 2. Methods 2. Methods 12 Training-time training: 1. training encoder and decoder f g •MAE encoder, deocder: ViT-large, pre-trained for 800 epochs on ImageNet-1k • ViT probing: train only, with frozen. Here, is a ViT-Base. h f h / 24
  • 13. 2. Methods 2. Methods 13 Training-time training: 2. training main task head f •MAE encoder: ViT-Large • pre-trained for ImageNet-1k reconstruction • cross entropy loss for classification • encoder produced by MAE pre-training •Augmentation: image cropping and horizontal flips •No other augmentations (random changes in brightness, contrast, color and sharpness ) •800 epochs lm : f0 : Training set with samples n h Main task head xi yi / 24
  • 14. 2. Methods 2. Methods 14 Test-time training g0 Test input arrives, x •self-supervised reconstruction loss (pixel-wise mean squared error), •random mask (75%) •SGD, for 20 steps, using a momentum of 0.9, weight decay of 0.2, batch size of 128, and fixed learning rate of 5e-3. ls Make a prediction on as x h ∘ fx(x) f0 fx h Bir Reset the weights to and for the next test input f0 g0 x •By test-time training on the test inputs independently, we do not assume that they come from the same distribution. / 24
  • 15. 2. Methods 2. Methods 15 Optimizer for TTT Figure 2: We experiment with two optimizers for TTT. MAE [19] uses AdamW for pre-training. But our results (left) show that AdamW for TTT requires early stopping, which is unrealistic for generalization to unknown distributions without a validation set. We instead use SGD, which keeps improving performance even after 20 steps (right). •it simply takes the same optimizer setting as during the last epoch of training-time training of the self-supervised task. (Original TTT) •the learning rate schedule of MAE reaches zero by the end of pre-training. •When Test-Time Training (TTT), excessive iterations with AdamW can negatively impact performance. •more iterations with SGD consistently improve performance on all distribution shifts / 24
  • 17. 2. Methods 3. Experimental Results 17 Calibration on out of distribution data •15 types of corruption to the images of ImageNet-C, 5 levels of severity • D. Hendrycks and T. Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. ICLR, 2018 / 24
  • 18. 2. Methods 3. Experimental Results 18 Main results on ImageNet-C TTT-MAE has higher performance gains in all corruptions than TTT-Rot, on top of their respective baselines. • Joint Train: ResNet-16-layers, after joint training for rotation prediction and object recognition (baseline for TTT-Rot) • TTT-Rot: original paper (rotation task, resnet-18) • Baseline: pre-trained MAE encoder ViT probing (no TTT) • TTT-MAE (red) on top of our baseline significantly improves performance. / 24
  • 19. 2. Methods 3. Experimental Results 19 TTT-MAE in rotation invariant classes • Rotation invariant class: images are usually taken from top-down views TTT-MAE is agnostic to rotation invariance and still helps on these classes. / 24
  • 20. 2. Methods 3. Experimental Results 20 Design choices - Training setup 1. Fine-tuning; train ◦ end-to-end. This works poorly with TTT 2. ViT probing: train only, with frozen. Here, is a ViT-Base. 3. Joint training: train both ◦ and ◦ , by summing their losses together. This is used by TTT with rotation prediction. But with MAE, it performs worse on the ImageNet validation set h f h f h h f g f h f g Object classification / 24
  • 21. 2. Methods 3. Experimental Results 21 Accuracy comparison of three designs (ViT probing, fine-tuning, joint training) •The first three rows are only for training-time training, after which a fixed model is applied during testing. •Joint training does not achieve satisfactory performance on most corruptions •Fine-tuning: initially performs better than ViT probing, it is not amenable to TTT •TTT-MAE: TTT-MAE after ViT probing, which performs the best across all corruption types / 24
  • 22. 2. Methods 3. Experimental Results 22 Performance on other ImageNet variants ImageNet-R • ImageNet-R is a benchmark dataset for evaluating robustness of image classification • The dataset includes images that are synthetically generated from the original ImageNet images in a variety of ways, such as adding noise, changing lighting, or applying artistic styles. ImageNet-A • Baseline: pre-trained MAE encoder ViT probing (no TTT) • ImageNet-A is a dataset designed to test the robustness of computer vision models against real-world, unmodified images. • The dataset includes visually similar images to those in ImageNet but with added challenges such as occlusion, low resolution, and unusual viewpoints. / 24
  • 24. 2. Methods 4. Conclusions 24 • Main contribution • The proposal of a new method - TTT-MAE for addressing the problem of domain shift in visual recognition tasks. • TTT can be viewed alternatively as one-sample unsupervised domain adaptation (UDA) • Limitations & future works • Slower at test time than the baseline applying a fixed model (Inference speed has not been the focus of this paper), It might be improved through better hyper-parameters, optimizers, training techniques and architectural designs. • Studying the generalization of spatial autoencoding to other main tasks and test distributions beyond object recognition and the benchmarks used in this study. • Exploring test-time training on video streams in human-like environments, where self-supervised learning can take advantage of past frames Thank you. / 24