SlideShare a Scribd company logo
1 of 13
Download to read offline
발표자: 양현모(ipmnhmyang@naver.com)
Fundamental Team: 김동현, 김채현, 박종익, 송헌, 오대환, 이근배, 이대현, 최재윤
Rainbow
Combining Improvements in Deep Reinforcement Learning
2018 AAAI DeepMind
Background | Deep Q-learning
ҧ
𝜃 : Target network
𝜃 : Online network
• Deep Q-learning
• Transition → replay memory buffer
• Loss
DQN Extensions | Double Q-learning
• Overestimation bias
• Decoupling
DQN Extensions | Prioritized reply
• Replay memory
• Which experiences to store
• Which experiences to replay
DQN Extensions | Multi-step learning
Temporal Difference | TD error
DQN Extensions | Dueling networks
DQN Extensions | Noisy Nets
• Limit of 𝜺-greedy policy: many actions must be executed
• Using advantage stream : Q = V + A
DQN Extensions | Distributional RL
• Q(s,a) | Reward expectation → Z(s,z) | Reward distribution
• Support z
• Probability mass
• Distribution
• Next state, action by optimal policy
• Target distribution
Rainbow: The Integrated Agent
• Multi-step distributional loss
• Double Q-learning
• Prioritized replay
• Dueling network architecture
• Noisy linear layers
Q & A
Results
Ablation studies
Results
Conclusions
• Reviewed Deep Q-learning and its extensions
• Double Q-learning
• Distributional learning
• Dueling Q-learning
• Prioritized exploration
• Multi-step learning
• Noisy network
• Rainbow, an integrated DQN agent with 6-extensions in terms of learning
efficiency and performance

More Related Content

More from taeseon ryu

Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories taeseon ryu
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learningtaeseon ryu
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Modelstaeseon ryu
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuningtaeseon ryu
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdftaeseon ryu
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithmtaeseon ryu
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networkstaeseon ryu
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarizationtaeseon ryu
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimizationtaeseon ryu
 
Dream2Control paper review
Dream2Control paper reviewDream2Control paper review
Dream2Control paper reviewtaeseon ryu
 
Online Continual Learning on Class Incremental Blurry Task Configuration with...
Online Continual Learning on Class Incremental Blurry Task Configuration with...Online Continual Learning on Class Incremental Blurry Task Configuration with...
Online Continual Learning on Class Incremental Blurry Task Configuration with...taeseon ryu
 
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentationtaeseon ryu
 
Unsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource DomainsUnsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource Domainstaeseon ryu
 
PaLM Scaling Language Modeling with Pathways - 230219 (1).pdf
PaLM Scaling Language Modeling with Pathways - 230219 (1).pdfPaLM Scaling Language Modeling with Pathways - 230219 (1).pdf
PaLM Scaling Language Modeling with Pathways - 230219 (1).pdftaeseon ryu
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matchingtaeseon ryu
 
Deep Reinforcement Learning from Human Preferences
Deep Reinforcement Learning from Human PreferencesDeep Reinforcement Learning from Human Preferences
Deep Reinforcement Learning from Human Preferencestaeseon ryu
 

More from taeseon ryu (20)

Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories Dataset Distillation by Matching Training Trajectories
Dataset Distillation by Matching Training Trajectories
 
RL_UpsideDown
RL_UpsideDownRL_UpsideDown
RL_UpsideDown
 
Packed Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation ExtractionPacked Levitated Marker for Entity and Relation Extraction
Packed Levitated Marker for Entity and Relation Extraction
 
MOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement LearningMOReL: Model-Based Offline Reinforcement Learning
MOReL: Model-Based Offline Reinforcement Learning
 
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsScaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
 
Visual prompt tuning
Visual prompt tuningVisual prompt tuning
Visual prompt tuning
 
mPLUG
mPLUGmPLUG
mPLUG
 
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdfvariBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf
 
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdfReinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf
 
The Forward-Forward Algorithm
The Forward-Forward AlgorithmThe Forward-Forward Algorithm
The Forward-Forward Algorithm
 
Towards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural NetworksTowards Robust and Reproducible Active Learning using Neural Networks
Towards Robust and Reproducible Active Learning using Neural Networks
 
BRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive SummarizationBRIO: Bringing Order to Abstractive Summarization
BRIO: Bringing Order to Abstractive Summarization
 
ProximalPolicyOptimization
ProximalPolicyOptimizationProximalPolicyOptimization
ProximalPolicyOptimization
 
Dream2Control paper review
Dream2Control paper reviewDream2Control paper review
Dream2Control paper review
 
Online Continual Learning on Class Incremental Blurry Task Configuration with...
Online Continual Learning on Class Incremental Blurry Task Configuration with...Online Continual Learning on Class Incremental Blurry Task Configuration with...
Online Continual Learning on Class Incremental Blurry Task Configuration with...
 
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
[2023] Cut and Learn for Unsupervised Object Detection and Instance Segmentation
 
Unsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource DomainsUnsupervised Neural Machine Translation for Low-Resource Domains
Unsupervised Neural Machine Translation for Low-Resource Domains
 
PaLM Scaling Language Modeling with Pathways - 230219 (1).pdf
PaLM Scaling Language Modeling with Pathways - 230219 (1).pdfPaLM Scaling Language Modeling with Pathways - 230219 (1).pdf
PaLM Scaling Language Modeling with Pathways - 230219 (1).pdf
 
Distributional RL via Moment Matching
Distributional RL via Moment MatchingDistributional RL via Moment Matching
Distributional RL via Moment Matching
 
Deep Reinforcement Learning from Human Preferences
Deep Reinforcement Learning from Human PreferencesDeep Reinforcement Learning from Human Preferences
Deep Reinforcement Learning from Human Preferences
 

220227 rainbow2017 deep-mind paper explained

  • 1. 발표자: 양현모(ipmnhmyang@naver.com) Fundamental Team: 김동현, 김채현, 박종익, 송헌, 오대환, 이근배, 이대현, 최재윤 Rainbow Combining Improvements in Deep Reinforcement Learning 2018 AAAI DeepMind
  • 2. Background | Deep Q-learning ҧ 𝜃 : Target network 𝜃 : Online network • Deep Q-learning • Transition → replay memory buffer • Loss
  • 3. DQN Extensions | Double Q-learning • Overestimation bias • Decoupling
  • 4. DQN Extensions | Prioritized reply • Replay memory • Which experiences to store • Which experiences to replay DQN Extensions | Multi-step learning Temporal Difference | TD error
  • 5. DQN Extensions | Dueling networks DQN Extensions | Noisy Nets • Limit of 𝜺-greedy policy: many actions must be executed • Using advantage stream : Q = V + A
  • 6. DQN Extensions | Distributional RL • Q(s,a) | Reward expectation → Z(s,z) | Reward distribution • Support z • Probability mass • Distribution • Next state, action by optimal policy • Target distribution
  • 7. Rainbow: The Integrated Agent • Multi-step distributional loss • Double Q-learning • Prioritized replay • Dueling network architecture • Noisy linear layers
  • 11.
  • 13. Conclusions • Reviewed Deep Q-learning and its extensions • Double Q-learning • Distributional learning • Dueling Q-learning • Prioritized exploration • Multi-step learning • Noisy network • Rainbow, an integrated DQN agent with 6-extensions in terms of learning efficiency and performance