論文紹介：End-to-End Object Detection with Transformers

•

0 likes•213 views

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov & Sergey Zagoruyko, "End-to-End Object Detection with Transformers" ECCV2020 https://www.ecva.net/papers/eccv_2020/papers_ECCV/html/832_ECCV_2020_paper.php

Technology

End-to-End Object Detection
with Transformers
2023/5/11

◼ DETR (DEtection TRansformer)
◼
◼End-to-end
•

◼
• NMS
• Non-Maximum Suppression (NMS)
• Bounding box (bbox)
• bbox
◼
• NMS End-to-End
• bbox

DETR
◼ 3
• CNN
• Transformer
• simple feed forward network (FFN)

CNN
◼CNN (Convolutional Neural Network)
•
•

Transformer
◼
• 1 1
• positional encoding

Transformer
◼
• object queries
• object queries N enbedding
• object queries
•
• Positional Encoding

FFN
◼FFN (feed-forward networks)
• ReLU 3
• bbox
• softmax
• N

◼Faster R-CNN [Ren+, PAMI2015]
◼Ablation Study
•
• Positional embedding
•
◼Analysis
•
•
◼
• ID

◼
• COCO 2017
• 118k 5k
◼Optimizer
• AdamW [Ilya&Frank, ICLR2017]
◼
• transformer
• 1e−4
•
• 1e−5
◼
• 1e−4
◼
• ResNet-50 ResNet-101
• DETR,DETR-R101
• ImageNet [Deng+, CVPR2009]
◼dilation
• dilation
• DETR-DC5,DETR-DC5-R101
•

Faster R-CNN
◼DETR Faster R-CNN
• FPN [Lin+, CVPR2017]
• 9
◼𝐴𝑃𝑆 𝐴𝑃𝐿
•
•
•
•
•
•
•

Ablation Study
◼
• NMS
•
◼ attention
•
•

Ablation Study
◼positional encoding
• Spatial positional encoding Output positional encoding(object queries) 2
• attention
• Output positional encoding
• Spatial positional encoding
• attention Spatial pos

Ablation Study
◼
• 3
•
• L1
• Generalized IoU (GIoU) [Rezatofighi+, CVPR2019]
•
• L1 GIoU

Analysis
◼
• FFN 100 20 val
• bbox
• COCO
•
• bbox
•
• bbox
• bbox
• bbox

◼
• Faster R-CNN Mask RCNN [He+, ICCV2017]

◼
• (PQ)
• thing
• stuff
◼stuff
• attention

◼End-to-End
• Faster R-CNN
• Faster R-CNN
•
◼

What's hot

【メタサーベイ】Video Transformercvpaper. challenge

論文紹介 Pixel Recurrent Neural NetworksSeiya Tokui

[DL輪読会]Pixel2Mesh: Generating 3D Mesh Models from Single RGB ImagesDeep Learning JP

AI勉強会用スライドharmonylab

画像生成・生成モデルメタサーベイcvpaper. challenge

[DL輪読会]Focal Loss for Dense Object DetectionDeep Learning JP

【DL輪読会】Aspect-based Analysis of Advertising Appeals for Search Engine Advert...Deep Learning JP

【DL輪読会】ViT + Self Supervised LearningまとめDeep Learning JP

SSII2019企画: 点群深層学習の研究動向SSII

[DL輪読会]Revisiting Deep Learning Models for Tabular Data (NeurIPS 2021) 表形式デー...Deep Learning JP

【DL輪読会】Segment AnythingDeep Learning JP

Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages. Satoshi Kato

SSII2021 [SS1] Transformer x Computer Visionの実活用可能性と展望〜 TransformerのCompute...SSII

大域的探索から局所的探索へデータ拡張 (Data Augmentation)を用いた学習の探索テクニック西岡賢一郎

【メタサーベイ】Neural Fieldscvpaper. challenge

【メタサーベイ】基盤モデル / Foundation Modelscvpaper. challenge

最近のDeep Learning (NLP) 界隈におけるAttention事情Yuta Kikuchi

[DL輪読会]Deep Learning 第15章表現学習Deep Learning JP

敵対的生成ネットワーク（GAN）cvpaper. challenge

機械学習を民主化する取り組みYoshitaka Ushiku

What's hot (20)

【メタサーベイ】Video Transformer

論文紹介 Pixel Recurrent Neural Networks

[DL輪読会]Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images

AI勉強会用スライド

画像生成・生成モデルメタサーベイ

[DL輪読会]Focal Loss for Dense Object Detection

【DL輪読会】Aspect-based Analysis of Advertising Appeals for Search Engine Advert...

【DL輪読会】ViT + Self Supervised Learningまとめ

SSII2019企画: 点群深層学習の研究動向

[DL輪読会]Revisiting Deep Learning Models for Tabular Data (NeurIPS 2021) 表形式デー...

【DL輪読会】Segment Anything

Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages.

SSII2021 [SS1] Transformer x Computer Visionの実活用可能性と展望〜 TransformerのCompute...

大域的探索から局所的探索へデータ拡張 (Data Augmentation)を用いた学習の探索テクニック

【メタサーベイ】Neural Fields

【メタサーベイ】基盤モデル / Foundation Models

最近のDeep Learning (NLP) 界隈におけるAttention事情

[DL輪読会]Deep Learning 第15章表現学習

敵対的生成ネットワーク（GAN）

機械学習を民主化する取り組み

Similar to 論文紹介：End-to-End Object Detection with Transformers

Introduction to Apache CassandraJesus Guzman

Object Detection Beyond Mask R-CNN and RetinaNet IWanjin Yu

YOLOv4: optimal speed and accuracy of object detection reviewLEE HOSEONG

Faster R-CNN - PR012Jinwon Lee

Auro tripathy - Localizing with CNNsAuro Tripathy

Online video object segmentation via convolutional trident networkNAVER Engineering

Similar to 論文紹介：End-to-End Object Detection with Transformers (6)

Introduction to Apache Cassandra

Object Detection Beyond Mask R-CNN and RetinaNet I

YOLOv4: optimal speed and accuracy of object detection review

Faster R-CNN - PR012

Auro tripathy - Localizing with CNNs

Online video object segmentation via convolutional trident network

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Commit 2024 - Secret Management made easyAlfredo García Lavilla

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

Install Stable Diffusion in windows machinePadma Pradeep

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Artificial intelligence in cctv survelliance.pptxhariprasad279825

WordPress Websites for Engineers: Elevate Your Brandgvaughan

"ML in Production",Oleksandr BaganFwdays

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Story boards and shot lists for my a level piececharlottematthew16

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club

Commit 2024 - Secret Management made easy

Understanding the Laravel MVC Architecture

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

Vertex AI Gemini Prompt Engineering Tips

Dev Dives: Streamline document processing with UiPath Studio Web

Install Stable Diffusion in windows machine

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

Artificial intelligence in cctv survelliance.pptx

WordPress Websites for Engineers: Elevate Your Brand

"ML in Production",Oleksandr Bagan

DMCC Future of Trade Web3 - Special Edition

Story boards and shot lists for my a level piece

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

SIP trunking in Janus @ Kamailio World 2024

Are Multi-Cloud and Serverless Good or Bad?

"Debugging python applications inside k8s environment", Andrii Soldatenko

Designing IA for AI - Information Architecture Conference 2024

Developer Data Modeling Mistakes: From Postgres to NoSQL

論文紹介：End-to-End Object Detection with Transformers

1. End-to-End Object Detection with Transformers 2023/5/11

2. ◼ DETR (DEtection TRansformer) ◼ ◼End-to-end •

3. ◼ • NMS • Non-Maximum Suppression (NMS) • Bounding box (bbox) • bbox ◼ • NMS End-to-End • bbox

4. DETR ◼2 • • • • 𝐶𝑖 bbox

5. DETR ◼ 3 • CNN • Transformer • simple feed forward network (FFN)

6. CNN ◼CNN (Convolutional Neural Network) • •

7. Transformer ◼ • 1 1 • positional encoding

8. Transformer ◼ • object queries • object queries N enbedding • object queries • • Positional Encoding

9. FFN ◼FFN (feed-forward networks) • ReLU 3 • bbox • softmax • N

10. ◼Faster R-CNN [Ren+, PAMI2015] ◼Ablation Study • • Positional embedding • ◼Analysis • • ◼ • ID

11. ◼ • COCO 2017 • 118k 5k ◼Optimizer • AdamW [Ilya&Frank, ICLR2017] ◼ • transformer • 1e−4 • • 1e−5 ◼ • 1e−4 ◼ • ResNet-50 ResNet-101 • DETR,DETR-R101 • ImageNet [Deng+, CVPR2009] ◼dilation • dilation • DETR-DC5,DETR-DC5-R101 •

12. Faster R-CNN ◼DETR Faster R-CNN • FPN [Lin+, CVPR2017] • 9 ◼𝐴𝑃𝑆 𝐴𝑃𝐿 • • • • • • •

13. Ablation Study ◼ • • •

14. Ablation Study ◼ • NMS • ◼ attention • •

15. Ablation Study ◼positional encoding • Spatial positional encoding Output positional encoding(object queries) 2 • attention • Output positional encoding • Spatial positional encoding • attention Spatial pos

16. Ablation Study ◼ • 3 • • L1 • Generalized IoU (GIoU) [Rezatofighi+, CVPR2019] • • L1 GIoU

17. Analysis ◼ • FFN 100 20 val • bbox • COCO • • bbox • • bbox • bbox • bbox

18. Analysis ◼ • 13 • • 24 • object queries

19. ◼ • Faster R-CNN Mask RCNN [He+, ICCV2017]

20. ◼ • (PQ) • thing • stuff ◼stuff • attention

21. ◼End-to-End • Faster R-CNN • Faster R-CNN • ◼

論文紹介：End-to-End Object Detection with Transformers

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 論文紹介：End-to-End Object Detection with Transformers

Similar to 論文紹介：End-to-End Object Detection with Transformers (6)

More from Toru Tamaki

More from Toru Tamaki (20)

Recently uploaded

Recently uploaded (20)

論文紹介：End-to-End Object Detection with Transformers