【DL輪読会】AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
1. DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
Makoto Kawano (@mkt_kwn), Matsuo Lab.
2. 書誌情報
• AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners
Liang, Z., Mu, Y., Ding, M., Ni, F., Tomizuka, M., and Luo, P
The University of Hong Kong, University of California, Berkeley, Tianjin University,
Shanghai AI Laboratory
ICML2023(oral)
• Planning with Diffusion for Flexible Behavior Synthesis
Janner, M., Du, Y., Tenenbaum, J.B., and Levine, S.
University of California, Berkeley, MIT
ICML2022
今回のメイン
前回少し触れたが
かなり簡素だったので
2
15. AdaptDiffuser
15
1. とにかく軌道生成
様々なタスク報酬でガイド
Goal
Start State
Goal
Start State
�
�
Reward Function
Denoising
U-Net
� (Initialized as noise)
�
Denoising
Process
Discriminator
Data Pool
Goal 1 Goal 2 Goal 3
Diverse Task
Generation
Update Diffusion
Model
Drop Accept
Reward Gradient
Guidance
16. AdaptDiffuser
16
1. とにかく軌道生成
様々なタスク報酬でガイド
2. 識別器𝒟で選択
1で生成した軌道が
基準を満たしているか判断
Goal
Start State
Goal
Start State
�
�
Reward Function
Denoising
U-Net
� (Initialized as noise)
�
Denoising
Process
Discriminator
Data Pool
Goal 1 Goal 2 Goal 3
Diverse Task
Generation
Update Diffusion
Model
Drop Accept
Reward Gradient
Guidance
17. AdaptDiffuser
17
1. とにかく軌道生成
様々なタスク報酬でガイド
2. 識別器𝒟で選択
1で生成した軌道が
基準を満たしているか判断
3. 良質な合成データで学習
拡散モデルをfine-tune
Goal
Start State
Goal
Start State
�
�
Reward Function
Denoising
U-Net
� (Initialized as noise)
�
Denoising
Process
Discriminator
Data Pool
Goal 1 Goal 2 Goal 3
Diverse Task
Generation
Update Diffusion
Model
Drop Accept
Reward Gradient
Guidance
18. AdaptDiffuser
18
1. とにかく軌道生成
様々なタスク報酬でガイド
2. 識別器𝒟で選択
1で生成した軌道が
基準を満たしているか判断
3. 良質な合成データで学習
拡散モデルをfine-tune
4. 所望の精度まで継続
Reward
Guidance
Diffusion Model
II. Model (re-)training
Diverse Goal Point
IV. Diverse synthetic data
III. Guided Trajectory
Generation
I. Offline Trajectories
(Single Goal)
Initialize
Selection by
Discriminator
Goal Point