Multimodal Learning with Severely Missing Modality.pptx Masked Autoencoders Are Scalable Vision Learners.pptx An Empirical Study of Training Self-Supervised Vision Transformers.pptx Visual Commonsense Reasoning.pptx Action Recognition Datasets.pptx Exploring Simple Siamese Representation Learning Recent Breakthroughs in AI + Learning Visual-Linguistic Representation in the Wild Towards Efficient Transformers Action Genome: Action As Composition of Spatio Temporal Scene Graphs Neural motifs scene graph parsing with global context Attentive Relational Networks for Mapping Images to Scene Graphs