This document summarizes recent research on applying self-attention mechanisms from Transformers to domains other than language, such as computer vision. It discusses models that use self-attention for images, including ViT, DeiT, and T2T, which apply Transformers to divided image patches. It also covers more general attention modules like the Perceiver that aims to be domain-agnostic. Finally, it discusses work on transferring pretrained language Transformers to other modalities through frozen weights, showing they can function as universal computation engines.
This document summarizes recent research on applying self-attention mechanisms from Transformers to domains other than language, such as computer vision. It discusses models that use self-attention for images, including ViT, DeiT, and T2T, which apply Transformers to divided image patches. It also covers more general attention modules like the Perceiver that aims to be domain-agnostic. Finally, it discusses work on transferring pretrained language Transformers to other modalities through frozen weights, showing they can function as universal computation engines.
cvpaper.challenge の Meta Study Group 発表スライド
cvpaper.challenge はコンピュータビジョン分野の今を映し、トレンドを創り出す挑戦です。論文サマリ・アイディア考案・議論・実装・論文投稿に取り組み、凡ゆる知識を共有します。2019の目標「トップ会議30+本投稿」「2回以上のトップ会議網羅的サーベイ」
http://xpaperchallenge.org/cv/
【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative ModelDeep Learning JP
NeRF-VAE is a 3D scene generative model that combines Neural Radiance Fields (NeRF) and Generative Query Networks (GQN) with a variational autoencoder (VAE). It uses a NeRF decoder to generate novel views conditioned on a latent code. An encoder extracts latent codes from input views. During training, it maximizes the evidence lower bound to learn the latent space of scenes and allow for novel view synthesis. NeRF-VAE aims to generate photorealistic novel views of scenes by leveraging NeRF's view synthesis abilities within a generative model framework.
An automated 3D cup planning method in total hip arthroplasty from a standard X-ray radiograph is described. To achieve this objective, we integrate a statistical reconstruction method of 2D-3D pelvis shape reconstruction method and a 3D automated cup planning method which have been previously proposed. In performance evaluation, we virtually simulated scale estimation error in 2D-3D reconstruction. There were no significant difference both in cup size error and positional error between 2D-3D reconstruction based automated planning and 3D-CT based automated planning when a scale estimation errors were equal and less than ±5%. As future direction, we will study whether scale estimation accuracy less than ±5% can be achieved or not in real clinical application.
cvpaper.challenge の Meta Study Group 発表スライド
cvpaper.challenge はコンピュータビジョン分野の今を映し、トレンドを創り出す挑戦です。論文サマリ・アイディア考案・議論・実装・論文投稿に取り組み、凡ゆる知識を共有します。2019の目標「トップ会議30+本投稿」「2回以上のトップ会議網羅的サーベイ」
http://xpaperchallenge.org/cv/
【DL輪読会】NeRF-VAE: A Geometry Aware 3D Scene Generative ModelDeep Learning JP
NeRF-VAE is a 3D scene generative model that combines Neural Radiance Fields (NeRF) and Generative Query Networks (GQN) with a variational autoencoder (VAE). It uses a NeRF decoder to generate novel views conditioned on a latent code. An encoder extracts latent codes from input views. During training, it maximizes the evidence lower bound to learn the latent space of scenes and allow for novel view synthesis. NeRF-VAE aims to generate photorealistic novel views of scenes by leveraging NeRF's view synthesis abilities within a generative model framework.
An automated 3D cup planning method in total hip arthroplasty from a standard X-ray radiograph is described. To achieve this objective, we integrate a statistical reconstruction method of 2D-3D pelvis shape reconstruction method and a 3D automated cup planning method which have been previously proposed. In performance evaluation, we virtually simulated scale estimation error in 2D-3D reconstruction. There were no significant difference both in cup size error and positional error between 2D-3D reconstruction based automated planning and 3D-CT based automated planning when a scale estimation errors were equal and less than ±5%. As future direction, we will study whether scale estimation accuracy less than ±5% can be achieved or not in real clinical application.
Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised L...harmonylab
紹介論文
Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos
出典: Vincent Casser, Soeren Pirk Reza, Mahjourian, Anelia Angelova : Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos, the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 8001-8008 (2019)
概要: カメラ映像による深度予測は、屋内及び屋外のロボットナビゲーションにとって必要なタスクです。本研究では、教師なし学習を用いて映像の深度予測とカメラのエゴモーション(自身の動き)の学習に取り組んでいます。先行研究で確立されたベースラインのモデルに、移動する個々の物体のモデル化と、オンラインでのモデルの調整を行う手法を取り入れています。結果として、物体の動きを多く含むシーンでの予測結果を大幅に向上させています。
8. 関連研究:CycleGANによる画像生成[1,2]
8
• 識別器が各ドメインの分布を学習し,生成器がドメイン
間の写像を学習
• ペアの学習データを必要としない
シマウマ↔馬[1] 夏↔冬[1] MR ↔ CT [2]
[1] Jun-Yan Zhu, “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks,” 2017
[2] Wolterink J. M., “Deep MR to CT Synthesis Using Unpaired Data,” Simulation and Synthesis in Medical Imaging, 2017
CycleGAN
MR CT
Generator
𝐺
Generator
𝐺
Discriminator
𝐷
Discriminator
𝐷
Flow of real MR
Flow of real CT
Real
Synthesized
ℒ
ℒ
ℒ
ℒ
9. 関連研究:CycleGANによる画像生成[1,2]
9
[1] Jun-Yan Zhu, “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks,” 2017
[2] Wolterink J. M., “Deep MR to CT Synthesis Using Unpaired Data,” Simulation and Synthesis in Medical Imaging, 2017
CycleGAN
MR CT
Generator
𝐺
Generator
𝐺
Discriminator
𝐷
Discriminator
𝐷
Flow of real MR
Flow of real CT
Real
Synthesized
ℒ
ℒ
ℒ
ℒ
• 識別器
• 画像が本物のCT画像かを識別
• 生成器
• を騙すような(本物のCT画像のような)画像を生成
, : 生成器が本物のような
画像を生成するための制約
10. 関連研究:CycleGANによる画像生成[1,2]
10
[1] Jun-Yan Zhu, “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks,” 2017
[2] Wolterink J. M., “Deep MR to CT Synthesis Using Unpaired Data,” Simulation and Synthesis in Medical Imaging, 2017
CycleGAN
MR CT
Generator
𝐺
Generator
𝐺
Discriminator
𝐷
Discriminator
𝐷
Flow of real MR
Flow of real CT
Real
Synthesized
ℒ
ℒ
ℒ
ℒ
, : 生成器が本物のような
画像を生成するための制約
:対応付けされた画像生成を
するための制約
• 識別器
• 画像が本物のCT画像かを識別
• 生成器
• を騙すような(本物のCT画像のような)画像を生成
11. 関連研究:CycleGANによる画像生成[1,2]
11
[1] Jun-Yan Zhu, “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks,” 2017
[2] Wolterink J. M., “Deep MR to CT Synthesis Using Unpaired Data,” Simulation and Synthesis in Medical Imaging, 2017
CycleGAN
MR CT
Generator
𝐺
Generator
𝐺
Discriminator
𝐷
Discriminator
𝐷
Flow of real MR
Flow of real CT
Real
Synthesized
ℒ
ℒ
ℒ
ℒ
, : 生成器が本物のような
画像を生成するための制約
:対応付けされた画像生成を
するための制約
• 識別器
• 画像が本物のCT画像かを識別
• 生成器
• を騙すような(本物のCT画像のような)画像を生成
生成前後で保たれるべき情報の制
約が無い
12. • CycleGANと抽出器をEnd-to-endに学習
• 自動セグメンテーション結果と教師ラベルの一致度を考慮し
た制約 をCycleGANに導入
12
関連研究:CycleGANを用いた異種モダリティ画像からのセグメンテーション[1]
[1] Y. Huo, et al. "Adversarial Synthesis Learning Enables Segmentation Without Target Modality Ground Truth." arXiv
preprint arXiv:1712.07695 (2017).
𝑖: 画素, 𝑡: 正解ラベル, 𝑥: 入力CT
Labeled
CT
Unlabeled
MR
Generator
𝐺
Generator
𝐺
Discriminator
𝐷
Discriminator
𝐷
Flow of real CT
Flow of real MR
Real
Synthesized
ℒ
ℒ
ℒ
ℒSegmenter
𝑆
ℒ
13. • 生成前後の画像の類似度を考慮した制約 を
CycleGANに導入
• 生成前後の形状の変化を制約
13
[1] Y. Hiasa, et al. "Cross-modality image synthesis from unpaired data using CycleGAN: Effects of gradient
consistency loss and training data size." arXiv preprint arXiv:1803.06629 (2018).
関連研究:生成前後の類似度を考慮した制約[1]
𝐺𝐶: 2つの画像の勾配の相関
∈
∈
Unlabeled
MR
Labeled
CT
Generator
𝐺
Generator
𝐺
Discriminator
𝐷
Discriminator
𝐷
Flow of real MR
Flow of real CT
Real
Synthesized
ℒ
ℒ
ℒ
ℒ
ℒ
ℒ
23. • ラベルデータによる制約を用いて学習し,生成MR画像と本物の
MR画像からのセグメンテーション精度を比較
• ネットワークの入出力は2Dのcoronal面
• Dice係数を用いて定量評価
• 評価用データ
• 生成MR画像からのセグメンテーション:正解ラベル付きCT20症例で2-fold交差検証
• 本物のMR画像からのセグメンテーション:正解ラベル付きMR10症例
実験2:ラベルデータによる制約の汎化能力に関する調査
23
生成MR画像からのセグメンテーション
CT MR
Generator
𝐺
Flow of real CT
Flow of real MR
Real
Synthesized
Segmenter
𝑆
MR
Flow of real CT
Flow of real MR
Real
Synthesized
Segmenter
𝑆
本物のMR画像からのセグメンテーション