Embed presentation
Download as PDF, PPTX



![◼ConvNet
•
◼Transformer [ ]
•
◼
•
◼
•](https://image.slidesharecdn.com/20230406omnivoreasinglemodelformanyvisualmodalities-230414082741-1dc04272/85/Omnivore-A-Single-Model-for-Many-Visual-Modalities-4-320.jpg)
![OMNIVORE
◼
• Swin-transformer [ ]
• self-attention
•
◼ Omnivore
•
• Transformer
◼
•
• SGD](https://image.slidesharecdn.com/20230406omnivoreasinglemodelformanyvisualmodalities-230414082741-1dc04272/85/Omnivore-A-Single-Model-for-Many-Visual-Modalities-5-320.jpg)




![➢OMNIVORE
• Swin-transformer
•
• IN1K
• K400
• SUN Single-view 3D
◼
• Swin-transformer
•
➢ImageSwin [Liu+, ICCV2021]
•
• IN1K
➢VideoSwin [Liu+, arXiv2021]
•
• ImageSwin
➢DepthSwin
• Single-view 3D
• ImageSwin](https://image.slidesharecdn.com/20230406omnivoreasinglemodelformanyvisualmodalities-230414082741-1dc04272/85/Omnivore-A-Single-Model-for-Many-Visual-Modalities-10-320.jpg)


![◼
• MViT-B-24 [ ]
• ViT-L/16 [Dosovitskiy+, ICLR2021]
◼
• ViT-B-VTN [Neimark+, arXiv2021]
• TimeSformer-L [Bertasius+, ICML2021]
◼3D
• DF2Net [Li+, AAAI2018]
• G-L-SOOR [Song+, TIP2020]
◼OMNIVORE
• Swin-B
• Swin-L](https://image.slidesharecdn.com/20230406omnivoreasinglemodelformanyvisualmodalities-230414082741-1dc04272/85/Omnivore-A-Single-Model-for-Many-Visual-Modalities-13-320.jpg)




The document presents a model named Omnivore, designed to handle various visual modalities, including single-view 3D images. It discusses the use of Swin Transformers and compares performance across multiple datasets such as in1k, k400, and sun. The paper explores different configurations and ablation studies to optimize the model's effectiveness.



![◼ConvNet
•
◼Transformer [ ]
•
◼
•
◼
•](https://image.slidesharecdn.com/20230406omnivoreasinglemodelformanyvisualmodalities-230414082741-1dc04272/85/Omnivore-A-Single-Model-for-Many-Visual-Modalities-4-320.jpg)
![OMNIVORE
◼
• Swin-transformer [ ]
• self-attention
•
◼ Omnivore
•
• Transformer
◼
•
• SGD](https://image.slidesharecdn.com/20230406omnivoreasinglemodelformanyvisualmodalities-230414082741-1dc04272/85/Omnivore-A-Single-Model-for-Many-Visual-Modalities-5-320.jpg)




![➢OMNIVORE
• Swin-transformer
•
• IN1K
• K400
• SUN Single-view 3D
◼
• Swin-transformer
•
➢ImageSwin [Liu+, ICCV2021]
•
• IN1K
➢VideoSwin [Liu+, arXiv2021]
•
• ImageSwin
➢DepthSwin
• Single-view 3D
• ImageSwin](https://image.slidesharecdn.com/20230406omnivoreasinglemodelformanyvisualmodalities-230414082741-1dc04272/85/Omnivore-A-Single-Model-for-Many-Visual-Modalities-10-320.jpg)


![◼
• MViT-B-24 [ ]
• ViT-L/16 [Dosovitskiy+, ICLR2021]
◼
• ViT-B-VTN [Neimark+, arXiv2021]
• TimeSformer-L [Bertasius+, ICML2021]
◼3D
• DF2Net [Li+, AAAI2018]
• G-L-SOOR [Song+, TIP2020]
◼OMNIVORE
• Swin-B
• Swin-L](https://image.slidesharecdn.com/20230406omnivoreasinglemodelformanyvisualmodalities-230414082741-1dc04272/85/Omnivore-A-Single-Model-for-Many-Visual-Modalities-13-320.jpg)


