【DL輪読会】Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks

•Download as PPTX, PDF•

0 likes•343 views

Deep Learning JP

2022/9/9 Deep Learning JP http://deeplearning.jp/seminar-2/

Technology

1
DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
Transporters with Visual Foresight for Solving Unseen
Rearrangement Tasks
Koki Yamane, University of Tsukuba

書誌情報
2022/9/9 2
 題名
 Transporters with Visual Foresight for Solving Unseen Rearrangement
Tasks
 著者
 Hongtao Wu, Jikai Ye, Xin Meng, Chris Paxton, Gregory Chirikjian
 The Johns Hopkins University
 National University of Singapore
 NVIDIA
 会議: arXiv (2022, May)
 URL: https://arxiv.org/pdf/2202.10765.pdf

概要
 ゼロショットでのタスクの汎化
 未学習タスク
 長時間タスク
 木探索による未来予測
 画像予測モデル
 複数行動提案モジュール
 画像予測モデルの高効率学習
 FCNの平行移動等価性
 入力が平行移動すると出力も移動
2022/9/9 3
目標指定タスク計画により幅広い再配置タスクを実現

従来手法:
Transporter Networks (TN)
FCNの平行移動等価性を利用して高効率なpick-and-placeタスクの学習を実現
2022/9/9 4

従来手法:
Goal-Conditioned Transporter Networks (GCTN)
目標状態の入力を追加し非剛体物体に対応
2022/9/9 5

提案手法:
Transporters with Visual Foresight (TVF)
行動提案と画像予測による木探索
2022/9/9 6

提案手法:
画像予測モデル
 入力
 真上からのRGB-D 画像
 行動情報 (Pick-pose, Place-pose)
 出力
 次ステップの画像
 アーキテクチャ
 36層のFCN (Fully Convolutional Network)
2022/9/9 7
FCNにより次ステップの画像を予測
平行移動等価性による高効率学習

提案手法:
複数行動提案モジュール
2022/9/9 8
木探索のために複数の行動を提案
1. GCTNで行動価値マップを取得
2. 行動価値マップを閾値処理
3. K-Means クラスタリング
行動価値マップから数個の候補に絞り込み

実験（シミュレーション）
14 種類のブロック積みタスク (未学習含む)
2022/9/9 9
 シミュレータ: Ravens (pybulletベースのマニピュレータシミュレータ)
 ロボット: UR5 (吸引機構)
 データ数：1000

実験結果（シミュレーション）
2022/9/9 10
未学習のタスクでも高い成功率を達成
> 90%

実験（実世界）
6 種類のブロック積みタスク (未学習含む)
2022/9/9 11
 手法
 GCTN: Goal-Conditioned Transporter Networks (ベースライン)
 TVF: Transporters with Visual Foresight (提案手法)
 3 種類の学習タスクと 3 種類の未学習タスク
 3 種類のタスクに合計 30 回のデモ（1 タスク 10 回）
 各タスク 10 回の施行で検証

実験結果（実世界）
2022/9/9 12
実世界でも高い成功率を達成
未学習
タスク結果例

まとめ
2022/9/9 13
 複数行動提案と画像予測を繰り返すことにより行動とその結果を木探索
 画像予測モデルにFCNを使用し高効率の学習を実現
 未学習タスクに対して高い成功率を達成
 実世界のロボットで検証し高い成功率を実証

What's hot

実装レベルで学ぶVQVAEぱんいちすみもと

[DL輪読会]Flow-based Deep Generative ModelsDeep Learning JP

【DL輪読会】Is Conditional Generative Modeling All You Need For Decision-Making?Deep Learning JP

【DL輪読会】ViT + Self Supervised LearningまとめDeep Learning JP

モデル高速化百選Yusuke Uchida

これからの Vision & Language ～ Acadexit した4つの理由Yoshitaka Ushiku

[DL輪読会]NVAE: A Deep Hierarchical Variational AutoencoderDeep Learning JP

[DL輪読会]GQNと関連研究，世界モデルとの関係についてDeep Learning JP

[DL輪読会]MetaFormer is Actually What You Need for VisionDeep Learning JP

[DL輪読会]Life-Long Disentangled Representation Learning with Cross-Domain Laten...Deep Learning JP

[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...Deep Learning JP

【DL輪読会】ConvNeXt V2: Co-designing and Scaling ConvNets with Masked AutoencodersDeep Learning JP

自己教師学習（Self-Supervised Learning）cvpaper. challenge

Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料Yusuke Uchida

[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic DatasetsDeep Learning JP

【DL輪読会】Diffusion Policy: Visuomotor Policy Learning via Action DiffusionDeep Learning JP

【DL輪読会】Efficiently Modeling Long Sequences with Structured State SpacesDeep Learning JP

continual learning surveyぱんいちすみもと

[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−Deep Learning JP

[DL輪読会]“Spatial Attention Point Network for Deep-learning-based Robust Autono...Deep Learning JP

What's hot (20)

実装レベルで学ぶVQVAE

[DL輪読会]Flow-based Deep Generative Models

【DL輪読会】Is Conditional Generative Modeling All You Need For Decision-Making?

【DL輪読会】ViT + Self Supervised Learningまとめ

モデル高速化百選

これからの Vision & Language ～ Acadexit した4つの理由

[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder

[DL輪読会]GQNと関連研究，世界モデルとの関係について

[DL輪読会]MetaFormer is Actually What You Need for Vision

[DL輪読会]Life-Long Disentangled Representation Learning with Cross-Domain Laten...

[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...

【DL輪読会】ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

自己教師学習（Self-Supervised Learning）

Swin Transformer (ICCV'21 Best Paper) を完璧に理解する資料

[DL輪読会]Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

【DL輪読会】Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

【DL輪読会】Efficiently Modeling Long Sequences with Structured State Spaces

continual learning survey

[DL輪読会]The Neural Process Family−Neural Processes関連の実装を読んで動かしてみる−

[DL輪読会]“Spatial Attention Point Network for Deep-learning-based Robust Autono...

Similar to 【DL輪読会】Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks

[CVPR2020読み会＠CV勉強会] 3D Packing for Self-Supervised Monocular Depth EstimationKazuyuki Miyazawa

RobotPaperChallenge 2019-07robotpaperchallenge

【CVPR 2019】Do Better ImageNet Models Transfer Better?cvpaper. challenge

Vision and Language（メタサーベイ）cvpaper. challenge

You Only Look One-level Featureの解説と見せかけた物体検出のよもやま話Yusuke Uchida

論文 Solo Advent Calendar諒介荒木

先端技術とメディア表現　第4回レポートまとめDigital Nature Group

これからのコンピュータビジョン技術 - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMU研究会 2...cvpaper. challenge

コンピュータビジョンの研究開発状況cvpaper. challenge

ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東Yukiyoshi Sasao

PredCNN: Predictive Learning with Cascade Convolutionsharmonylab

「解説資料」ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation Takumi Ohkuma

【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose EstimationDeep Learning JP

NVIDIA Seminar ディープラーニングによる画像認識と応用事例Takayoshi Yamashita

SSII2021 [SS1] Transformer x Computer Visionの実活用可能性と展望〜 TransformerのCompute...SSII

SfM Learner系単眼深度推定手法についてRyutaro Yamauchi

輪講スライド20220903.pptxnishimoto2

[DL輪読会]VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Envir...Deep Learning JP

[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative ModelsDeep Learning JP

semantic segmentation サーベイyohei okawa

Similar to 【DL輪読会】Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks (20)

[CVPR2020読み会＠CV勉強会] 3D Packing for Self-Supervised Monocular Depth Estimation

RobotPaperChallenge 2019-07

【CVPR 2019】Do Better ImageNet Models Transfer Better?

Vision and Language（メタサーベイ）

You Only Look One-level Featureの解説と見せかけた物体検出のよもやま話

論文 Solo Advent Calendar

先端技術とメディア表現　第4回レポートまとめ

これからのコンピュータビジョン技術 - cvpaper.challenge in PRMU Grand Challenge 2016 (PRMU研究会 2...

コンピュータビジョンの研究開発状況

ArtTrack: Articulated Multi-Person Tracking in the Wild : CV勉強会関東

PredCNN: Predictive Learning with Cascade Convolutions

「解説資料」ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation

【DL輪読会】ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation

NVIDIA Seminar ディープラーニングによる画像認識と応用事例

SSII2021 [SS1] Transformer x Computer Visionの実活用可能性と展望〜 TransformerのCompute...

SfM Learner系単眼深度推定手法について

輪講スライド20220903.pptx

[DL輪読会]VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild Envir...

[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models

semantic segmentation サーベイ

Recently uploaded

Open Source UN-Conference 2024 Kawagoe - 独自OS「DaisyOS GB」の紹介Yuma Ohgami

論文紹介：Automated Classification of Model Errors on ImageNetToru Tamaki

クラウドネイティブなサーバー仮想化基盤 - OpenShift Virtualization.pdfFumieNakayama

論文紹介：Semantic segmentation using Vision Transformers: A surveyToru Tamaki

SOPを理解する 2024/04/19 の勉強会で発表されたものですiPride Co., Ltd.

論文紹介：Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...Toru Tamaki

デジタル・フォレンジックの最新動向（2024年4月27日情洛会総会特別講演スライド）UEHARA, Tetsutaro

TataPixel: 畳の異方性を利用した切り替え可能なディスプレイの提案sugiuralab

モーダル間の変換後の一致性とジャンル表を用いた解釈可能性の考察～Text-to-MusicとText-To-ImageかつImage-to-Music...博三太田

TSAL operation mechanism and circuit diagram.pdftaisei2219

【早稲田AI研究会　講義資料】3DスキャンとTextTo3Dのツールを知ろう！(Vol.1)Hiroki Ichikura

AWS の OpenShift サービス (ROSA) を使った OpenShift Virtualizationの始め方.pdfFumieNakayama

Recently uploaded (12)

Open Source UN-Conference 2024 Kawagoe - 独自OS「DaisyOS GB」の紹介

論文紹介：Automated Classification of Model Errors on ImageNet

クラウドネイティブなサーバー仮想化基盤 - OpenShift Virtualization.pdf

論文紹介：Semantic segmentation using Vision Transformers: A survey

SOPを理解する 2024/04/19 の勉強会で発表されたものです

論文紹介：Content-Aware Token Sharing for Efficient Semantic Segmentation With Vis...

デジタル・フォレンジックの最新動向（2024年4月27日情洛会総会特別講演スライド）

TataPixel: 畳の異方性を利用した切り替え可能なディスプレイの提案

モーダル間の変換後の一致性とジャンル表を用いた解釈可能性の考察～Text-to-MusicとText-To-ImageかつImage-to-Music...

TSAL operation mechanism and circuit diagram.pdf

【早稲田AI研究会　講義資料】3DスキャンとTextTo3Dのツールを知ろう！(Vol.1)

AWS の OpenShift サービス (ROSA) を使った OpenShift Virtualizationの始め方.pdf

【DL輪読会】Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks

1. 1 DEEP LEARNING JP [DL Papers] http://deeplearning.jp/ Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks Koki Yamane, University of Tsukuba

2. 書誌情報 2022/9/9 2  題名  Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks  著者  Hongtao Wu, Jikai Ye, Xin Meng, Chris Paxton, Gregory Chirikjian  The Johns Hopkins University  National University of Singapore  NVIDIA  会議: arXiv (2022, May)  URL: https://arxiv.org/pdf/2202.10765.pdf

3. 概要  ゼロショットでのタスクの汎化  未学習タスク  長時間タスク  木探索による未来予測  画像予測モデル  複数行動提案モジュール  画像予測モデルの高効率学習  FCNの平行移動等価性  入力が平行移動すると出力も移動 2022/9/9 3 目標指定タスク計画により幅広い再配置タスクを実現

4. 従来手法: Transporter Networks (TN) FCNの平行移動等価性を利用して高効率なpick-and-placeタスクの学習を実現 2022/9/9 4

5. 従来手法: Goal-Conditioned Transporter Networks (GCTN) 目標状態の入力を追加し非剛体物体に対応 2022/9/9 5

6. 提案手法: Transporters with Visual Foresight (TVF) 行動提案と画像予測による木探索 2022/9/9 6

7. 提案手法: 画像予測モデル  入力  真上からのRGB-D 画像  行動情報 (Pick-pose, Place-pose)  出力  次ステップの画像  アーキテクチャ  36層のFCN (Fully Convolutional Network) 2022/9/9 7 FCNにより次ステップの画像を予測平行移動等価性による高効率学習

8. 提案手法: 複数行動提案モジュール 2022/9/9 8 木探索のために複数の行動を提案 1. GCTNで行動価値マップを取得 2. 行動価値マップを閾値処理 3. K-Means クラスタリング行動価値マップから数個の候補に絞り込み

9. 実験（シミュレーション） 14 種類のブロック積みタスク (未学習含む) 2022/9/9 9  シミュレータ: Ravens (pybulletベースのマニピュレータシミュレータ)  ロボット: UR5 (吸引機構)  データ数：1000

10. 実験結果（シミュレーション） 2022/9/9 10 未学習のタスクでも高い成功率を達成 > 90%

11. 実験（実世界） 6 種類のブロック積みタスク (未学習含む) 2022/9/9 11  手法  GCTN: Goal-Conditioned Transporter Networks (ベースライン)  TVF: Transporters with Visual Foresight (提案手法)  3 種類の学習タスクと 3 種類の未学習タスク  3 種類のタスクに合計 30 回のデモ（1 タスク 10 回）  各タスク 10 回の施行で検証

12. 実験結果（実世界） 2022/9/9 12 実世界でも高い成功率を達成未学習タスク結果例

13. まとめ 2022/9/9 13  複数行動提案と画像予測を繰り返すことにより行動とその結果を木探索  画像予測モデルにFCNを使用し高効率の学習を実現  未学習タスクに対して高い成功率を達成  実世界のロボットで検証し高い成功率を実証

【DL輪読会】Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 【DL輪読会】Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks

Similar to 【DL輪読会】Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks (20)

More from Deep Learning JP

More from Deep Learning JP (20)

Recently uploaded

Recently uploaded (12)

【DL輪読会】Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks