This document summarizes recent research on applying self-attention mechanisms from Transformers to domains other than language, such as computer vision. It discusses models that use self-attention for images, including ViT, DeiT, and T2T, which apply Transformers to divided image patches. It also covers more general attention modules like the Perceiver that aims to be domain-agnostic. Finally, it discusses work on transferring pretrained language Transformers to other modalities through frozen weights, showing they can function as universal computation engines.
ROS Japan UG #34 LT大会 で飛び込みLTした資料です.
https://rosjp.connpass.com/event/161041/
ROS 2のDashing/Eloquentで QoS (Quality of Service) 周りのAPIがそれぞれ破壊的に更新されててツラかったので,そのTIPS・知見を共有させていただきました.
This document summarizes recent research on applying self-attention mechanisms from Transformers to domains other than language, such as computer vision. It discusses models that use self-attention for images, including ViT, DeiT, and T2T, which apply Transformers to divided image patches. It also covers more general attention modules like the Perceiver that aims to be domain-agnostic. Finally, it discusses work on transferring pretrained language Transformers to other modalities through frozen weights, showing they can function as universal computation engines.
ROS Japan UG #34 LT大会 で飛び込みLTした資料です.
https://rosjp.connpass.com/event/161041/
ROS 2のDashing/Eloquentで QoS (Quality of Service) 周りのAPIがそれぞれ破壊的に更新されててツラかったので,そのTIPS・知見を共有させていただきました.