SW

Sangmin Woo

Sort by
Multimodal Learning with Severely Missing Modality.pptx
Video Transformers.pptx
Masked Autoencoders Are Scalable Vision Learners.pptx
An Empirical Study of Training Self-Supervised Vision Transformers.pptx
Visual Commonsense Reasoning.pptx
Video Grounding.pptx
Action Recognition Datasets.pptx
Exploring Simple Siamese Representation Learning
Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the Wild
Towards Efficient Transformers
Transformer in Vision
Action Genome: Action As Composition of Spatio Temporal Scene Graphs
Neural motifs scene graph parsing with global context
Attentive Relational Networks for Mapping Images to Scene Graphs