Sangmin Woo

Sort by

Latest

Most popular

Multimodal Learning with Severely Missing Modality.pptx

Video Transformers.pptx

Masked Autoencoders Are Scalable Vision Learners.pptx

An Empirical Study of Training Self-Supervised Vision Transformers.pptx

Visual Commonsense Reasoning.pptx

Video Grounding.pptx

Action Recognition Datasets.pptx

Exploring Simple Siamese Representation Learning

Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the Wild

Towards Efficient Transformers

Transformer in Vision

Action Genome: Action As Composition of Spatio Temporal Scene Graphs

Neural motifs scene graph parsing with global context

Attentive Relational Networks for Mapping Images to Scene Graphs

Graph R-CNN for Scene Graph Generation