SlideShare a Scribd company logo
1 of 36
Download to read offline
Cut And Learn for Unsupervised Object
Detection and Instance Segmentation
2023
이미지 처리팀
김병현 안종식 이주영 이해원 이희재
CONTENTS
Introduction
01
Related Works
02
Methods
03
Experiments
04
Conclusion
05
Introduction
01
01. Introduction
1. Unsupervised Object Detection & Instance Segmentation
Supervised Learning
[ Train Data ] [ Annotations ]
[ Inference 결과 ]
Semi Supervised Learning
[ Train Data ] [ A Little Annotations ]
[ Inference 결과 ]
01. Introduction
1. Unsupervised Object Detection & Instance Segmentation
Unsupervised Learning
[ Train Data ] [ With Out Any Annotations ]
[ Inference 결과 ]
01. Introduction
1. Unsupervised Object Detection & Instance Segmentation
Self Supervised Learning
[ Train Data ] [ With Out Any Annotations ]
[ Inference 결과 ]
Deep Learning Network
Head
(Task Prediction)
Backbone
(Feature Extractor)
Train Without
Annotations
Train With
Annotations
01. Introduction
2. Class Agnostic Detection
Class Agnostic Detection
Class Aware Detection
과일
• w/ Supervision : 몇 개의 Class로 분류해야 하는지 알고 있음 (관심 객체가 무엇인지 알고 있음)
• w/o Supervision : 몇 개의 Class로 분류해야 하는지 알 수 없음 (관심 객체가 무엇인지 인지 불가)
객체 1
객체 2
객체 4 …
객체 3
[ Class Aware ] [ Class Agnostic ]
01. Introduction
3. CutLER (Cut-and-LEaRn)
Contribution of CutLER
1
2
3
4
Previous Works
01. Introduction
3. CutLER (Cut-and-LEaRn)
Detect Multiple Objects
1
DINO : 한 개의 Object만 검출 가능
• 이미지 내 한 개의 Object 만 검출 가능 (제한적인 데이터셋에서 적용가능)
• 여러 Object가 있어도 Semantic Mask만 검출 가능 (Instance 단위로 분리 X)
• Self Supervised Learning을 통한 Feature Extractor 학습 방법이므로 실제 Target Task를 수행하는 방법이 아님
01. Introduction
3. CutLER (Cut-and-LEaRn)
Zero-shot Detector
2
[ Pre Train Data ]
(Large Scale Dataset)
[ Pretrained Model ]
[ Target Dataset ]
(Train Set )
[ Fine Tuned Model ]
[ Target Dataset ]
(Test Set )
[ Evaluate Model ]
Accuracy
F1-Score
mAP
mAR
IOU
Evaluate Metric
01. Introduction
3. CutLER (Cut-and-LEaRn)
Compatible with various detection architectures
3
Pipeline
1
2
3
2 기존 Loss에 Indicator Function 추가
• 어떠한 Detector도 Loss Term 수정을 통해 CutLER 적용 가능
3
1
전 / 후처리로 생각할 수 있어 다양한 Detector에 적용 가능
2 다양한 구조 적용 예시 Ablation Section에서 확인 가능
01. Introduction
3. CutLER (Cut-and-LEaRn)
Pretrained Model for Supervised Detection
4
Pretrained Model을 생성하기 위한 과정으로 활용 가능
• TokenCut의 경우, 엄밀히 말해 Deep Learning 기반의 Model이 아닌 Graph Cut의 알고리즘 중 한 종류이므로
Pretrain Model로서 활용 불가
Related Works
02
02. Related Works
1. DINO (Emerging Properties in Self-Supervised Vision Transformers)
Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut
• 이미지처리팀 현청천, https://www.youtube.com/watch?v=JCEK5nD4MKM
[ ViT Model ] [ DINO (Self- Supervised Learning Method ]
02. Related Works
2. TokenCut (Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut)
Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut
• 이미지처리팀 현청천, https://www.youtube.com/watch?v=JCEK5nD4MKM
02. Related Works
3. FreeSOLO for unsupervised instance segmentation
Instance Segmentation Network인 SOLOv2에 Unsupervised Learning Method 적용
• SOLO 구조를 이용하여 Network Dependency가 존재 (다양한 Detector에 적용불가)
Q & A
Methods
03
03. Methods
1. MaskCut for Discovering Multiple Objects
1
2
3
3
1 MaskCut for Discovering Multiple Objects
Multi-Round Self-Training
2 DropLoss for Exploring Image Regions
03. Methods
1. MaskCut for Discovering Multiple Objects
1
Self-Supervised ViT Model
(Trained By. DINO)
x N
MaskCut
N x N Patch Input Images ViT Feature Vector
03. Methods
1. MaskCut for Discovering Multiple Objects
1
NormalizedCut & TokenCut
: Graph Node
: Graph Edge
• Graph Edge 𝑖 는 다른 Graph Edge 𝑗와 가중치 𝜔𝑖𝑗로 연결되어 있다.
> 이를 표현한 행렬 : Adjacency Matrix (or Affinity Matrix), 인접 행렬
𝑊 =
𝜔00 ⋯ 𝜔09
⋮ ⋱ ⋮
𝜔90 ⋯ 𝜔99
• 3 x 3의 Patch 존재하므로, 9 x 9의 Adjacency Matrix 생성
• 𝜔𝑖𝑗 =
𝐾𝑖𝐾𝑗
||𝐾𝑖||2||𝐾𝑗||2
( Cosine Similarity )
• 적절하게 두개의 Cluster A,B를 찾는 문제 (Normalized Cut)
𝐴𝑟𝑔𝑚𝑖𝑛 𝑁𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑𝐶𝑢𝑡 𝐴 = 𝐴𝑟𝑔𝑚𝑖𝑛(𝐶𝑢𝑡(𝐴, 𝐵) ∙ (
1
𝑉𝑜𝑙 𝐴
+
1
𝑉𝑜𝑙 𝐵
))
𝐶𝑢𝑡 𝐴, 𝐵 = σ𝑖⊆𝐴,𝑗⊆𝐵 𝜔𝑖𝑗 : : Cluster A와 B사이 Node들의 가중치 합
Cluster A Cluster B
𝑉𝑜𝑙 𝐴 : Cluster A 내부 Node끼리의 가중치 합
> 각기 다른 Cluster들끼리의 가중치 합을 최소로 만든다 !
> 각기 Cluster 내부의 가중치로 Normalization하여 군집내 유사도를 최대화 한다 !
NP – HARD
(Nondeterministic polynomial)
• Weight Threshold 𝜏𝑛𝑐𝑢𝑡적용, 1 또는 1𝑒−5로 Thresholding
03. Methods
1. MaskCut for Discovering Multiple Objects
1
NormalizedCut & TokenCut
: Graph Node
: Graph Edge
• Generalized Eigenvalue System으로 근사하여 최적의 Cluster를 계산
> Laplacian Matrix를 이용 : Adjacency Matrix와 Degree Matrix를 이용하여 Laplacian Matrix 생성
𝐷 =
𝑑1 ⋯ 0
⋮ ⋱ ⋮
0 ⋯ 𝑑9
• 대각 성분을 제외한 나머지 값 = 0 : Diagonal Matrix
• Symmetric Matrix
• 𝑑𝑖 = σ𝑗=1
𝑛
𝜔𝑖𝑗
• 2nd Smallest Eigenvector 를 찾는 것이 𝐴𝑟𝑔𝑚𝑖𝑛 𝑁𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑𝐶𝑢𝑡 𝐴 와 동일함1)
Cluster A Cluster B
𝐷 − 𝑊 𝑥 = 𝜆𝐷𝑥 ( 𝑥 ∶ 𝐸𝑖𝑔𝑒𝑛𝑣𝑒𝑐𝑡𝑜𝑟, 𝜆 ∶ 𝐸𝑖𝑔𝑒𝑛𝑣𝑎𝑙𝑢𝑒)
1) https://gjkoplik.github.io/spectral_clustering/#proofs_with_2_clusters
• 1st Smallest Eigenvector는 𝑥의 Components가 모두 1인 Vector, 𝜆 = 0 인 상황이므로 2nd Smallest Eigenvector를 계산
03. Methods
1. MaskCut for Discovering Multiple Objects
1
MaskCut
• 계산된 Cluster A, B의 Eigenvector에서, Absolute Value가 최대인 곳을 포함하는 Cluster를 Foreground로 채택
• Post Processing으로 DenseCRF (Conditional Random Field) 적용
• t 단계에서 획득된 Foreground Mask를 제외한 Background Mask를 통해 Adjacency Matrix를 Masking하고 반복
03. Methods
1. MaskCut for Discovering Multiple Objects
1
MaskCut
• Ablation을 통해 반복횟수 3이 제일 적절하다고 판단
• 실제 객체 갯수가 반복 횟수 3보다 적다면 ?
> Weight Threshold 𝜏𝑛𝑐𝑢𝑡에 의해 Mask 검출 X
03. Methods
2. DropLoss for Exploring Image Regions
2
MaskCut
• 실제 객체 갯수가 반복 횟수 3보다 많다면 ?
> 검출이 안되는 객체 발생 !
DropLoss
• 기존 Loss를 그대로 사용할 경우, MaskCut이 찾지 못한 객체들을 찾지 못하도록 Detector가 학습됨
• MaskCut은 Coarse Mask이기 때문에 이러한 Coarse Mask와 겹치는 부분만 Loss를 계산하여 Detector가 새로운 객체를 찾아낼 수 있도록 함
• 𝜏𝑛𝑐𝑢𝑡 = 0.01
03. Methods
3. Multi-Round Self-Training
3
Self-Training
• 첫 단계 MaskCut으로 생성한 Coarse Mask 사용하여 학습 후에는, 이전 단계 Detector에서 검출한 Mask로 학습 반복 진행
Q & A
Experiments
04
04. Experiments
1. Results
Unsupervised Zero-shot Evaluation
• Imagenet에서 학습한 Model을 각기 Dataset에 적용하였을 때 (w/o Finetuning) 기존 방법론 대비 성능 2배~4배 향상 확인
• 이 때, FreeSOLO는 Resnet101 Backbone인데 반해, CutLER은 Resnet50
• Non zero-shot Methods와 비교하여도 성능 향상 확인 가능
04. Experiments
1. Results
Label-Efficient and Fully-Supervised Learning
• Pretrain Model로서 성능 확인을 위해서 Fully Annotated Dataset을 Subset으로 나누어 학습 및 평가 진행
• MoCo-v2 : Self-Supervised를 통한 Pretrain Model 생성 방법
• Self-Supervised 방법론을 사용한 방법보다 성능 우수
04. Experiments
2. Ablation
각 Component 별 성능 향상
• Pretrain Model로서 성능 확인을 위해서 Fully Annotated Dataset을 Subset으로 나누어 학습 및 평가 진행
Coarse Mask 생성방식에 따른 결과
04. Experiments
2. Ablation
Hyper Parameters
Self-Train 횟수에 따른 성능
Conclusion
05
05. Conclusion
Unsupervised Learning 방법론 또한 General한 Dataset에서 활용되기 시작하였다.
• 한정적인 상황, 하나의 객체 위주 등 제약사항이 많았지만 일반적인 데이터셋에서도 사용할 수 있게 발전하기 시작
[Oxford 102 Flowers] [MNIST]
Class-Agnostic이라는 한계는 존재
• One Class Object Detection & Instance Segmentation
• 발전의 원인 : DINO와 같은 Transformer 기반의 Self-Supervised 기법이 촉매제가 되었을 것
Q & A
Thank you for your attention

More Related Content

What's hot

PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearnPratap Dangeti
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetRishabh Indoria
 
SSII2020 [OS2-03] 深層学習における半教師あり学習の最新動向
SSII2020 [OS2-03] 深層学習における半教師あり学習の最新動向SSII2020 [OS2-03] 深層学習における半教師あり学習の最新動向
SSII2020 [OS2-03] 深層学習における半教師あり学習の最新動向SSII
 
PR-328: End-to-End Optimized Image Compression
PR-328: End-to-End OptimizedImage CompressionPR-328: End-to-End OptimizedImage Compression
PR-328: End-to-End Optimized Image CompressionHyeongmin Lee
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detectionWenjing Chen
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryAhmed Yousry
 
Disentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative ModelsDisentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative ModelsRyohei Suzuki
 
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context KhalidKhan412
 
[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)Donghyeon Kim
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Hwa Pyung Kim
 
Self-Critical Sequence Training for Image Captioning (関東CV勉強会 CVPR 2017 読み会)
Self-Critical Sequence Training for Image Captioning (関東CV勉強会 CVPR 2017 読み会)Self-Critical Sequence Training for Image Captioning (関東CV勉強会 CVPR 2017 読み会)
Self-Critical Sequence Training for Image Captioning (関東CV勉強会 CVPR 2017 読み会)Yoshitaka Ushiku
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersSungchul Kim
 
論文紹介: Fast R-CNN&Faster R-CNN
論文紹介: Fast R-CNN&Faster R-CNN論文紹介: Fast R-CNN&Faster R-CNN
論文紹介: Fast R-CNN&Faster R-CNNTakashi Abe
 
Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩Hiroto Honda
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것NAVER Engineering
 
분산 강화학습 논문(DeepMind IMPALA) 구현
분산 강화학습 논문(DeepMind IMPALA) 구현분산 강화학습 논문(DeepMind IMPALA) 구현
분산 강화학습 논문(DeepMind IMPALA) 구현정주 김
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)Jinwon Lee
 

What's hot (20)

PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
Style gan
Style ganStyle gan
Style gan
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
SSII2020 [OS2-03] 深層学習における半教師あり学習の最新動向
SSII2020 [OS2-03] 深層学習における半教師あり学習の最新動向SSII2020 [OS2-03] 深層学習における半教師あり学習の最新動向
SSII2020 [OS2-03] 深層学習における半教師あり学習の最新動向
 
PR-328: End-to-End Optimized Image Compression
PR-328: End-to-End OptimizedImage CompressionPR-328: End-to-End OptimizedImage Compression
PR-328: End-to-End Optimized Image Compression
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
 
Disentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative ModelsDisentangled Representation Learning of Deep Generative Models
Disentangled Representation Learning of Deep Generative Models
 
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context
 
[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
 
Self-Critical Sequence Training for Image Captioning (関東CV勉強会 CVPR 2017 読み会)
Self-Critical Sequence Training for Image Captioning (関東CV勉強会 CVPR 2017 読み会)Self-Critical Sequence Training for Image Captioning (関東CV勉強会 CVPR 2017 読み会)
Self-Critical Sequence Training for Image Captioning (関東CV勉強会 CVPR 2017 読み会)
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
論文紹介: Fast R-CNN&Faster R-CNN
論文紹介: Fast R-CNN&Faster R-CNN論文紹介: Fast R-CNN&Faster R-CNN
論文紹介: Fast R-CNN&Faster R-CNN
 
Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩Deep Learningによる超解像の進歩
Deep Learningによる超解像の進歩
 
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
분산 강화학습 논문(DeepMind IMPALA) 구현
분산 강화학습 논문(DeepMind IMPALA) 구현분산 강화학습 논문(DeepMind IMPALA) 구현
분산 강화학습 논문(DeepMind IMPALA) 구현
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
 

Similar to [2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation

OpenCV's Built-in Trackers
OpenCV's Built-in TrackersOpenCV's Built-in Trackers
OpenCV's Built-in TrackersKwonkyuPark
 
Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Jinwon Lee
 
인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝Jinwon Lee
 
Coursera Machine Learning (by Andrew Ng)_강의정리
Coursera Machine Learning (by Andrew Ng)_강의정리Coursera Machine Learning (by Andrew Ng)_강의정리
Coursera Machine Learning (by Andrew Ng)_강의정리SANG WON PARK
 
Loss function discovery for object detection via convergence simulation drive...
Loss function discovery for object detection via convergence simulation drive...Loss function discovery for object detection via convergence simulation drive...
Loss function discovery for object detection via convergence simulation drive...taeseon ryu
 
텐서플로우로 배우는 딥러닝
텐서플로우로 배우는 딥러닝텐서플로우로 배우는 딥러닝
텐서플로우로 배우는 딥러닝찬웅 주
 
UE4 Garbage Collection
UE4 Garbage CollectionUE4 Garbage Collection
UE4 Garbage CollectionQooJuice
 
Direct x 12 초기화
Direct x 12 초기화Direct x 12 초기화
Direct x 12 초기화QooJuice
 
[컨퍼런스] 모두콘 2018 리뷰
[컨퍼런스] 모두콘 2018 리뷰[컨퍼런스] 모두콘 2018 리뷰
[컨퍼런스] 모두콘 2018 리뷰Donghyeon Kim
 
Game programming patterns 2
Game programming patterns 2Game programming patterns 2
Game programming patterns 2QooJuice
 
Deep neural networks cnn rnn_ae_some practical techniques
Deep neural networks cnn rnn_ae_some practical techniquesDeep neural networks cnn rnn_ae_some practical techniques
Deep neural networks cnn rnn_ae_some practical techniquesKang Pilsung
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper ReviewLEE HOSEONG
 
PR12 Season3 Weight Agnostic Neural Networks
PR12 Season3 Weight Agnostic Neural NetworksPR12 Season3 Weight Agnostic Neural Networks
PR12 Season3 Weight Agnostic Neural NetworksKyunghoon Jung
 
Denoising auto encoders(d a)
Denoising auto encoders(d a)Denoising auto encoders(d a)
Denoising auto encoders(d a)Tae Young Lee
 
(Book Summary) Classification and ensemble(book review)
(Book Summary) Classification and ensemble(book review)(Book Summary) Classification and ensemble(book review)
(Book Summary) Classification and ensemble(book review)MYEONGGYU LEE
 
강화학습 기초부터 DQN까지 (Reinforcement Learning from Basics to DQN)
강화학습 기초부터 DQN까지 (Reinforcement Learning from Basics to DQN)강화학습 기초부터 DQN까지 (Reinforcement Learning from Basics to DQN)
강화학습 기초부터 DQN까지 (Reinforcement Learning from Basics to DQN)Curt Park
 
캐빈머피 머신러닝 Kevin Murphy Machine Learning Statistic
캐빈머피 머신러닝 Kevin Murphy Machine Learning Statistic캐빈머피 머신러닝 Kevin Murphy Machine Learning Statistic
캐빈머피 머신러닝 Kevin Murphy Machine Learning Statistic용진 조
 
Time series classification
Time series classificationTime series classification
Time series classificationSung Kim
 
Imagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement LearningImagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement Learning성재 최
 

Similar to [2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation (20)

OpenCV's Built-in Trackers
OpenCV's Built-in TrackersOpenCV's Built-in Trackers
OpenCV's Built-in Trackers
 
Deep learning seminar_snu_161031
Deep learning seminar_snu_161031Deep learning seminar_snu_161031
Deep learning seminar_snu_161031
 
인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝인공지능, 기계학습 그리고 딥러닝
인공지능, 기계학습 그리고 딥러닝
 
Coursera Machine Learning (by Andrew Ng)_강의정리
Coursera Machine Learning (by Andrew Ng)_강의정리Coursera Machine Learning (by Andrew Ng)_강의정리
Coursera Machine Learning (by Andrew Ng)_강의정리
 
Loss function discovery for object detection via convergence simulation drive...
Loss function discovery for object detection via convergence simulation drive...Loss function discovery for object detection via convergence simulation drive...
Loss function discovery for object detection via convergence simulation drive...
 
텐서플로우로 배우는 딥러닝
텐서플로우로 배우는 딥러닝텐서플로우로 배우는 딥러닝
텐서플로우로 배우는 딥러닝
 
UE4 Garbage Collection
UE4 Garbage CollectionUE4 Garbage Collection
UE4 Garbage Collection
 
Direct x 12 초기화
Direct x 12 초기화Direct x 12 초기화
Direct x 12 초기화
 
[컨퍼런스] 모두콘 2018 리뷰
[컨퍼런스] 모두콘 2018 리뷰[컨퍼런스] 모두콘 2018 리뷰
[컨퍼런스] 모두콘 2018 리뷰
 
Game programming patterns 2
Game programming patterns 2Game programming patterns 2
Game programming patterns 2
 
Deep neural networks cnn rnn_ae_some practical techniques
Deep neural networks cnn rnn_ae_some practical techniquesDeep neural networks cnn rnn_ae_some practical techniques
Deep neural networks cnn rnn_ae_some practical techniques
 
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
 
PR12 Season3 Weight Agnostic Neural Networks
PR12 Season3 Weight Agnostic Neural NetworksPR12 Season3 Weight Agnostic Neural Networks
PR12 Season3 Weight Agnostic Neural Networks
 
Denoising auto encoders(d a)
Denoising auto encoders(d a)Denoising auto encoders(d a)
Denoising auto encoders(d a)
 
(Book Summary) Classification and ensemble(book review)
(Book Summary) Classification and ensemble(book review)(Book Summary) Classification and ensemble(book review)
(Book Summary) Classification and ensemble(book review)
 
강화학습 기초부터 DQN까지 (Reinforcement Learning from Basics to DQN)
강화학습 기초부터 DQN까지 (Reinforcement Learning from Basics to DQN)강화학습 기초부터 DQN까지 (Reinforcement Learning from Basics to DQN)
강화학습 기초부터 DQN까지 (Reinforcement Learning from Basics to DQN)
 
캐빈머피 머신러닝 Kevin Murphy Machine Learning Statistic
캐빈머피 머신러닝 Kevin Murphy Machine Learning Statistic캐빈머피 머신러닝 Kevin Murphy Machine Learning Statistic
캐빈머피 머신러닝 Kevin Murphy Machine Learning Statistic
 
Deep learning overview
Deep learning overviewDeep learning overview
Deep learning overview
 
Time series classification
Time series classificationTime series classification
Time series classification
 
Imagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement LearningImagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement Learning
 

More from taeseon ryu

OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...taeseon ryu
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splattingtaeseon ryu
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptxtaeseon ryu
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정taeseon ryu
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdftaeseon ryu
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learningtaeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Modelstaeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuningtaeseon ryu
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdftaeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithmtaeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networkstaeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarizationtaeseon ryu
 

More from taeseon ryu (20)

VoxelNet
VoxelNetVoxelNet
VoxelNet
 
OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...OpineSum Entailment-based self-training for abstractive opinion summarization...
OpineSum Entailment-based self-training for abstractive opinion summarization...
 
3D Gaussian Splatting
3D Gaussian Splatting3D Gaussian Splatting
3D Gaussian Splatting
 
JetsonTX2 Python
 JetsonTX2 Python  JetsonTX2 Python
JetsonTX2 Python
 
Hyperbolic Image Embedding.pptx
Hyperbolic  Image Embedding.pptxHyperbolic  Image Embedding.pptx
Hyperbolic Image Embedding.pptx
 
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정
 
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdfLLaMA Open and Efficient Foundation Language Models - 230528.pdf
LLaMA Open and Efficient Foundation Language Models - 230528.pdf
 
YOLO V6
YOLO V6YOLO V6
YOLO V6
 
Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 

[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation

  • 1. Cut And Learn for Unsupervised Object Detection and Instance Segmentation 2023 이미지 처리팀 김병현 안종식 이주영 이해원 이희재
  • 4. 01. Introduction 1. Unsupervised Object Detection & Instance Segmentation Supervised Learning [ Train Data ] [ Annotations ] [ Inference 결과 ] Semi Supervised Learning [ Train Data ] [ A Little Annotations ] [ Inference 결과 ]
  • 5. 01. Introduction 1. Unsupervised Object Detection & Instance Segmentation Unsupervised Learning [ Train Data ] [ With Out Any Annotations ] [ Inference 결과 ]
  • 6. 01. Introduction 1. Unsupervised Object Detection & Instance Segmentation Self Supervised Learning [ Train Data ] [ With Out Any Annotations ] [ Inference 결과 ] Deep Learning Network Head (Task Prediction) Backbone (Feature Extractor) Train Without Annotations Train With Annotations
  • 7. 01. Introduction 2. Class Agnostic Detection Class Agnostic Detection Class Aware Detection 과일 • w/ Supervision : 몇 개의 Class로 분류해야 하는지 알고 있음 (관심 객체가 무엇인지 알고 있음) • w/o Supervision : 몇 개의 Class로 분류해야 하는지 알 수 없음 (관심 객체가 무엇인지 인지 불가) 객체 1 객체 2 객체 4 … 객체 3 [ Class Aware ] [ Class Agnostic ]
  • 8. 01. Introduction 3. CutLER (Cut-and-LEaRn) Contribution of CutLER 1 2 3 4 Previous Works
  • 9. 01. Introduction 3. CutLER (Cut-and-LEaRn) Detect Multiple Objects 1 DINO : 한 개의 Object만 검출 가능 • 이미지 내 한 개의 Object 만 검출 가능 (제한적인 데이터셋에서 적용가능) • 여러 Object가 있어도 Semantic Mask만 검출 가능 (Instance 단위로 분리 X) • Self Supervised Learning을 통한 Feature Extractor 학습 방법이므로 실제 Target Task를 수행하는 방법이 아님
  • 10. 01. Introduction 3. CutLER (Cut-and-LEaRn) Zero-shot Detector 2 [ Pre Train Data ] (Large Scale Dataset) [ Pretrained Model ] [ Target Dataset ] (Train Set ) [ Fine Tuned Model ] [ Target Dataset ] (Test Set ) [ Evaluate Model ] Accuracy F1-Score mAP mAR IOU Evaluate Metric
  • 11. 01. Introduction 3. CutLER (Cut-and-LEaRn) Compatible with various detection architectures 3 Pipeline 1 2 3 2 기존 Loss에 Indicator Function 추가 • 어떠한 Detector도 Loss Term 수정을 통해 CutLER 적용 가능 3 1 전 / 후처리로 생각할 수 있어 다양한 Detector에 적용 가능 2 다양한 구조 적용 예시 Ablation Section에서 확인 가능
  • 12. 01. Introduction 3. CutLER (Cut-and-LEaRn) Pretrained Model for Supervised Detection 4 Pretrained Model을 생성하기 위한 과정으로 활용 가능 • TokenCut의 경우, 엄밀히 말해 Deep Learning 기반의 Model이 아닌 Graph Cut의 알고리즘 중 한 종류이므로 Pretrain Model로서 활용 불가
  • 14. 02. Related Works 1. DINO (Emerging Properties in Self-Supervised Vision Transformers) Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut • 이미지처리팀 현청천, https://www.youtube.com/watch?v=JCEK5nD4MKM [ ViT Model ] [ DINO (Self- Supervised Learning Method ]
  • 15. 02. Related Works 2. TokenCut (Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut) Self-Supervised Transformers for Unsupervised Object Discovery using Normalized Cut • 이미지처리팀 현청천, https://www.youtube.com/watch?v=JCEK5nD4MKM
  • 16. 02. Related Works 3. FreeSOLO for unsupervised instance segmentation Instance Segmentation Network인 SOLOv2에 Unsupervised Learning Method 적용 • SOLO 구조를 이용하여 Network Dependency가 존재 (다양한 Detector에 적용불가)
  • 17. Q & A
  • 19. 03. Methods 1. MaskCut for Discovering Multiple Objects 1 2 3 3 1 MaskCut for Discovering Multiple Objects Multi-Round Self-Training 2 DropLoss for Exploring Image Regions
  • 20. 03. Methods 1. MaskCut for Discovering Multiple Objects 1 Self-Supervised ViT Model (Trained By. DINO) x N MaskCut N x N Patch Input Images ViT Feature Vector
  • 21. 03. Methods 1. MaskCut for Discovering Multiple Objects 1 NormalizedCut & TokenCut : Graph Node : Graph Edge • Graph Edge 𝑖 는 다른 Graph Edge 𝑗와 가중치 𝜔𝑖𝑗로 연결되어 있다. > 이를 표현한 행렬 : Adjacency Matrix (or Affinity Matrix), 인접 행렬 𝑊 = 𝜔00 ⋯ 𝜔09 ⋮ ⋱ ⋮ 𝜔90 ⋯ 𝜔99 • 3 x 3의 Patch 존재하므로, 9 x 9의 Adjacency Matrix 생성 • 𝜔𝑖𝑗 = 𝐾𝑖𝐾𝑗 ||𝐾𝑖||2||𝐾𝑗||2 ( Cosine Similarity ) • 적절하게 두개의 Cluster A,B를 찾는 문제 (Normalized Cut) 𝐴𝑟𝑔𝑚𝑖𝑛 𝑁𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑𝐶𝑢𝑡 𝐴 = 𝐴𝑟𝑔𝑚𝑖𝑛(𝐶𝑢𝑡(𝐴, 𝐵) ∙ ( 1 𝑉𝑜𝑙 𝐴 + 1 𝑉𝑜𝑙 𝐵 )) 𝐶𝑢𝑡 𝐴, 𝐵 = σ𝑖⊆𝐴,𝑗⊆𝐵 𝜔𝑖𝑗 : : Cluster A와 B사이 Node들의 가중치 합 Cluster A Cluster B 𝑉𝑜𝑙 𝐴 : Cluster A 내부 Node끼리의 가중치 합 > 각기 다른 Cluster들끼리의 가중치 합을 최소로 만든다 ! > 각기 Cluster 내부의 가중치로 Normalization하여 군집내 유사도를 최대화 한다 ! NP – HARD (Nondeterministic polynomial) • Weight Threshold 𝜏𝑛𝑐𝑢𝑡적용, 1 또는 1𝑒−5로 Thresholding
  • 22. 03. Methods 1. MaskCut for Discovering Multiple Objects 1 NormalizedCut & TokenCut : Graph Node : Graph Edge • Generalized Eigenvalue System으로 근사하여 최적의 Cluster를 계산 > Laplacian Matrix를 이용 : Adjacency Matrix와 Degree Matrix를 이용하여 Laplacian Matrix 생성 𝐷 = 𝑑1 ⋯ 0 ⋮ ⋱ ⋮ 0 ⋯ 𝑑9 • 대각 성분을 제외한 나머지 값 = 0 : Diagonal Matrix • Symmetric Matrix • 𝑑𝑖 = σ𝑗=1 𝑛 𝜔𝑖𝑗 • 2nd Smallest Eigenvector 를 찾는 것이 𝐴𝑟𝑔𝑚𝑖𝑛 𝑁𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑𝐶𝑢𝑡 𝐴 와 동일함1) Cluster A Cluster B 𝐷 − 𝑊 𝑥 = 𝜆𝐷𝑥 ( 𝑥 ∶ 𝐸𝑖𝑔𝑒𝑛𝑣𝑒𝑐𝑡𝑜𝑟, 𝜆 ∶ 𝐸𝑖𝑔𝑒𝑛𝑣𝑎𝑙𝑢𝑒) 1) https://gjkoplik.github.io/spectral_clustering/#proofs_with_2_clusters • 1st Smallest Eigenvector는 𝑥의 Components가 모두 1인 Vector, 𝜆 = 0 인 상황이므로 2nd Smallest Eigenvector를 계산
  • 23. 03. Methods 1. MaskCut for Discovering Multiple Objects 1 MaskCut • 계산된 Cluster A, B의 Eigenvector에서, Absolute Value가 최대인 곳을 포함하는 Cluster를 Foreground로 채택 • Post Processing으로 DenseCRF (Conditional Random Field) 적용 • t 단계에서 획득된 Foreground Mask를 제외한 Background Mask를 통해 Adjacency Matrix를 Masking하고 반복
  • 24. 03. Methods 1. MaskCut for Discovering Multiple Objects 1 MaskCut • Ablation을 통해 반복횟수 3이 제일 적절하다고 판단 • 실제 객체 갯수가 반복 횟수 3보다 적다면 ? > Weight Threshold 𝜏𝑛𝑐𝑢𝑡에 의해 Mask 검출 X
  • 25. 03. Methods 2. DropLoss for Exploring Image Regions 2 MaskCut • 실제 객체 갯수가 반복 횟수 3보다 많다면 ? > 검출이 안되는 객체 발생 ! DropLoss • 기존 Loss를 그대로 사용할 경우, MaskCut이 찾지 못한 객체들을 찾지 못하도록 Detector가 학습됨 • MaskCut은 Coarse Mask이기 때문에 이러한 Coarse Mask와 겹치는 부분만 Loss를 계산하여 Detector가 새로운 객체를 찾아낼 수 있도록 함 • 𝜏𝑛𝑐𝑢𝑡 = 0.01
  • 26. 03. Methods 3. Multi-Round Self-Training 3 Self-Training • 첫 단계 MaskCut으로 생성한 Coarse Mask 사용하여 학습 후에는, 이전 단계 Detector에서 검출한 Mask로 학습 반복 진행
  • 27. Q & A
  • 29. 04. Experiments 1. Results Unsupervised Zero-shot Evaluation • Imagenet에서 학습한 Model을 각기 Dataset에 적용하였을 때 (w/o Finetuning) 기존 방법론 대비 성능 2배~4배 향상 확인 • 이 때, FreeSOLO는 Resnet101 Backbone인데 반해, CutLER은 Resnet50 • Non zero-shot Methods와 비교하여도 성능 향상 확인 가능
  • 30. 04. Experiments 1. Results Label-Efficient and Fully-Supervised Learning • Pretrain Model로서 성능 확인을 위해서 Fully Annotated Dataset을 Subset으로 나누어 학습 및 평가 진행 • MoCo-v2 : Self-Supervised를 통한 Pretrain Model 생성 방법 • Self-Supervised 방법론을 사용한 방법보다 성능 우수
  • 31. 04. Experiments 2. Ablation 각 Component 별 성능 향상 • Pretrain Model로서 성능 확인을 위해서 Fully Annotated Dataset을 Subset으로 나누어 학습 및 평가 진행 Coarse Mask 생성방식에 따른 결과
  • 32. 04. Experiments 2. Ablation Hyper Parameters Self-Train 횟수에 따른 성능
  • 34. 05. Conclusion Unsupervised Learning 방법론 또한 General한 Dataset에서 활용되기 시작하였다. • 한정적인 상황, 하나의 객체 위주 등 제약사항이 많았지만 일반적인 데이터셋에서도 사용할 수 있게 발전하기 시작 [Oxford 102 Flowers] [MNIST] Class-Agnostic이라는 한계는 존재 • One Class Object Detection & Instance Segmentation • 발전의 원인 : DINO와 같은 Transformer 기반의 Self-Supervised 기법이 촉매제가 되었을 것
  • 35. Q & A
  • 36. Thank you for your attention