220227 rainbow2017 deep-mind paper explained

•

0 likes•133 views

[2017 Deepmind] Deep Q learning + 발표자료 : https://www.slideshare.net/taeseonryu/how-does-unlabeled-data-improve-generalization-in-self-training 지금까지 발표한 논문 :https://github.com/Lilcob/-DL_PaperReadingMeeting 안녕하세요 딥러닝 논문 읽기 모임입니다. 오늘 업로드된 논문 리뷰 영상은 'Rainbow : Combining Improvements in Deep Reinforcement Learning'라는 제목의 논문입니다. 오늘 발표드릴 논문은 Rainbow라는 이제 제목을 가지고 있습니다. 2018년도에 딥마인드에서 발표한 논문인데 Deep Q learning이 발표된 이후에 에이전트의 성능을 향상시키기 위해서 여러 익스텐션들이 이제 추가적으로 발표가 됐는데 이 논문은 그 EXTENSION 들을 모두 다 인터그레이션해서 에이전트를 구성을 했을 때 기존의 그 기존에 발표됐던 'Deep Q learning 에이전트'보다 월등한 성능을 가질 수 있다는 것을 보인 논문입니다 오늘 논문 리뷰를 위하여 펀디멘탈 양현모님이 자세한 리뷰 도와주셨습니다. 많은 관심 미리 감사드립니다! https://youtu.be/oC1AOIefjT8

Data & Analytics

발표자: 양현모(ipmnhmyang@naver.com)
Fundamental Team: 김동현, 김채현, 박종익, 송헌, 오대환, 이근배, 이대현, 최재윤
Rainbow
Combining Improvements in Deep Reinforcement Learning
2018 AAAI DeepMind

Background | Deep Q-learning
ҧ
𝜃 : Target network
𝜃 : Online network
• Deep Q-learning
• Transition → replay memory buffer
• Loss

DQN Extensions | Double Q-learning
• Overestimation bias
• Decoupling

DQN Extensions | Prioritized reply
• Replay memory
• Which experiences to store
• Which experiences to replay
DQN Extensions | Multi-step learning
Temporal Difference | TD error

DQN Extensions | Dueling networks
DQN Extensions | Noisy Nets
• Limit of 𝜺-greedy policy: many actions must be executed
• Using advantage stream : Q = V + A

DQN Extensions | Distributional RL
• Q(s,a) | Reward expectation → Z(s,z) | Reward distribution
• Support z
• Probability mass
• Distribution
• Next state, action by optimal policy
• Target distribution

Rainbow: The Integrated Agent
• Multi-step distributional loss
• Double Q-learning
• Prioritized replay
• Dueling network architecture
• Noisy linear layers

Conclusions
• Reviewed Deep Q-learning and its extensions
• Double Q-learning
• Distributional learning
• Dueling Q-learning
• Prioritized exploration
• Multi-step learning
• Noisy network
• Rainbow, an integrated DQN agent with 6-extensions in terms of learning
efficiency and performance

More from taeseon ryu

Dataset Distillation by Matching Training Trajectories taeseon ryu

RL_UpsideDowntaeseon ryu

Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu

MOReL: Model-Based Offline Reinforcement Learningtaeseon ryu

Scaling Instruction-Finetuned Language Modelstaeseon ryu

Visual prompt tuningtaeseon ryu

mPLUGtaeseon ryu

variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu

Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdftaeseon ryu

The Forward-Forward Algorithmtaeseon ryu

Towards Robust and Reproducible Active Learning using Neural Networkstaeseon ryu

BRIO: Bringing Order to Abstractive Summarizationtaeseon ryu

ProximalPolicyOptimizationtaeseon ryu

Dream2Control paper reviewtaeseon ryu

Online Continual Learning on Class Incremental Blurry Task Configuration with...taeseon ryu

[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentationtaeseon ryu

Unsupervised Neural Machine Translation for Low-Resource Domainstaeseon ryu

PaLM Scaling Language Modeling with Pathways - 230219 (1).pdftaeseon ryu

Distributional RL via Moment Matchingtaeseon ryu

Deep Reinforcement Learning from Human Preferencestaeseon ryu

More from taeseon ryu (20)

Dataset Distillation by Matching Training Trajectories

RL_UpsideDown

Packed Levitated Marker for Entity and Relation Extraction

MOReL: Model-Based Offline Reinforcement Learning

Scaling Instruction-Finetuned Language Models

Visual prompt tuning

mPLUG

variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf

Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf

The Forward-Forward Algorithm

Towards Robust and Reproducible Active Learning using Neural Networks

BRIO: Bringing Order to Abstractive Summarization

ProximalPolicyOptimization

Dream2Control paper review

Online Continual Learning on Class Incremental Blurry Task Configuration with...

[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation

Unsupervised Neural Machine Translation for Low-Resource Domains

PaLM Scaling Language Modeling with Pathways - 230219 (1).pdf

Distributional RL via Moment Matching

Deep Reinforcement Learning from Human Preferences

220227 rainbow2017 deep-mind paper explained

1. 발표자: 양현모(ipmnhmyang@naver.com) Fundamental Team: 김동현, 김채현, 박종익, 송헌, 오대환, 이근배, 이대현, 최재윤 Rainbow Combining Improvements in Deep Reinforcement Learning 2018 AAAI DeepMind

2. Background | Deep Q-learning ҧ 𝜃 : Target network 𝜃 : Online network • Deep Q-learning • Transition → replay memory buffer • Loss

3. DQN Extensions | Double Q-learning • Overestimation bias • Decoupling

4. DQN Extensions | Prioritized reply • Replay memory • Which experiences to store • Which experiences to replay DQN Extensions | Multi-step learning Temporal Difference | TD error

5. DQN Extensions | Dueling networks DQN Extensions | Noisy Nets • Limit of 𝜺-greedy policy: many actions must be executed • Using advantage stream : Q = V + A

6. DQN Extensions | Distributional RL • Q(s,a) | Reward expectation → Z(s,z) | Reward distribution • Support z • Probability mass • Distribution • Next state, action by optimal policy • Target distribution

7. Rainbow: The Integrated Agent • Multi-step distributional loss • Double Q-learning • Prioritized replay • Dueling network architecture • Noisy linear layers

8. Q & A

9. Results

10. Ablation studies

11.

12. Results

13. Conclusions • Reviewed Deep Q-learning and its extensions • Double Q-learning • Distributional learning • Dueling Q-learning • Prioritized exploration • Multi-step learning • Noisy network • Rainbow, an integrated DQN agent with 6-extensions in terms of learning efficiency and performance

220227 rainbow2017 deep-mind paper explained

Recommended

Recommended

More Related Content

More from taeseon ryu

More from taeseon ryu (20)

220227 rainbow2017 deep-mind paper explained