SlideShare a Scribd company logo
1 of 63
Download to read offline
Hindsight Experience Replay
OpenAI
Paper review
Presenter : Uijin Jung
Presenter
• Name : Uijin Jung

• Github : github.com/jinPrelude

• 16년 부산 원주민

• 경기도 3년 거주 억양 섞임
Contents
• Abstract

1. Introduction 

2. Background

3. Hindsight Experience Replay 

4. Experiments

5. Related work

6. Conclusion
1. Introduction
1. Introduction
• Reward engineering limits the applicability of RL in the
real world because it requires both RL expertise and
domain-specific knowledge.

• But dealing with sparse rewards is also one of the biggest
challenges in RL
• One ability humans have, unlike the current generation of
model-free RL algorithms, is to learn almost as much from
achieving an undesired outcome as from the desired one.
1. Introduction
:(
:(
:(
:(
:(
:)
:(
:)
Hindsight Experience Replay
2. Background
• Reinforcement Learning

• Deep Q Learning (DQN)

• Deep Deterministic Policy Gradient (DDPG)

• Universal Value Function Approximators (UVFA)
3. Hindsight Experience Replay
• Bit flipping environment
3. Hindsight Experience Replay
• Bit flipping environment
3. Hindsight Experience Replay
• Bit flipping environment
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
3. Hindsight Experience Replay
• Bit flipping environment
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
n
3. Hindsight Experience Replay
• Bit flipping environment
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
n
A = {0, 1, …, n-1}
3. Hindsight Experience Replay
• Bit flipping environment
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
A = {0, 1, …, n-1}
3. Hindsight Experience Replay
• Bit flipping environment
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
A = {0, 1, …, n-1}
3. Hindsight Experience Replay
• Bit flipping environment
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
A = {0, 1, …, n-1}
g (goal)
3. Hindsight Experience Replay
• Bit flipping environment
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
A = {0, 1, …, n-1}
0 1 0 0 … 1 1 0 1
n
g (goal)
3. Hindsight Experience Replay
• Bit flipping environment
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
A = {0, 1, …, n-1}
0 1 0 0 … 1 1 0 1g (goal) =
=
3. Hindsight Experience Replay
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
A = {0, 1, …, n-1}
0 1 0 0 … 1 1 0 1g (goal)
• Bit flipping environment
=
3. Hindsight Experience Replay
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
A = {0, 1, …, n-1}
0 1 0 0 … 1 1 0 1g (goal)
• Bit flipping environment
3. Hindsight Experience Replay
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
• Bit flipping environment
3. Hindsight Experience Replay
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
40
• Bit flipping environment
3. Hindsight Experience Replay
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
40
• Bit flipping environment
0 1 1 g
1 1 0 s0
0 1 1 g
1 1 0 s0 …
0 1 1 g
1 1 0 s0 … 1 0 0 sT
0 1 1 g
1 1 0 s0 … 1 0 0 sT
≠
0 1 1 g
1 1 0 s0 … 1 0 0 sT
≠
0 1 1 g
Episode reward : {-1, -1, …, -1, -1}
1 1 0 s0 … 1 0 0 sT
≠
0 1 1 g
Episode reward : {-1, -1, …, -1, -1}
1 1 0 s0 … 1 0 0 sT
≠
0 1 1 g
Episode reward : {-1, -1, …, -1, -1}
R
r0(s0, a0, r0, s1, g)
r1(s1, a1, r1, s2, g)
rt(st, at, rt, st+1, g)
rT−1(sT−1, aT−1, rT−1, sT, g)
1 1 0 s0 … 1 0 0 sT
≠
0 1 1 g
Episode reward : {-1, -1, …, -1, -1}
R
r0(s0, a0, r0, s1, g)
r1(s1, a1, r1, s2, g)
rt(st, at, rt, st+1, g)
rT−1(sT−1, aT−1, rT−1, sT, g)
1 1 0 s0 … 1 0 0 sT
≠
1 0 0 sT
Episode reward : {-1, -1, …, -1, -1}
R
r0(s0, a0, r0, s1, g)
r1(s1, a1, r1, s2, g)
rt(st, at, rt, st+1, g)
rT−1(sT−1, aT−1, rT−1, sT, g)
1 1 0 s0 … 1 0 0 sT
1 0 0 sT
=
Episode reward : {-1, -1, …, -1, -1}
R
r0(s0, a0, r0, s1, g)
r1(s1, a1, r1, s2, g)
rt(st, at, rt, st+1, g)
rT−1(sT−1, aT−1, rT−1, sT, g)
1 1 0 s0 … 1 0 0 sT
1 0 0 sT
=
Episode reward : {-1, -1, …, -1, 0}
R
r0(s0, a0, r0, s1, g)
r1(s1, a1, r1, s2, g)
rt(st, at, rt, st+1, g)
rT−1(sT−1, aT−1, rT−1, sT, g)
1 1 0 s0 … 1 0 0 sT
1 0 0 sT
=
Episode reward : {-1, -1, …, -1, 0}
R
r0(s0, a0, r0, s1, g)
r1(s1, a1, r1, s2, g)
rt(st, at, rt, st+1, g)
rT−1(sT−1, aT−1, rT−1, sT, g)
R′
r0(s0, a0, r0, s1, sT)
r1(s1, a1, r1, s2, sT)
rt(st, at, rt, st+1, sT)
rT−1(sT−1, aT−1, rT−1, sT, sT)
R
r0(s0, a0, r0, s1, g)
r1(s1, a1, r1, s2, g)
rt(st, at, rt, st+1, g)
rT−1(sT−1, aT−1, rT−1, sT, g)
R′
r0(s0, a0, r0, s1, sT)
r1(s1, a1, r1, s2, sT)
rt(st, at, rt, st+1, sT)
rT−1(sT−1, aT−1, rT−1, sT, sT)
Memory
R
r0(s0, a0, r0, s1, g)
r1(s1, a1, r1, s2, g)
rt(st, at, rt, st+1, g)
rT−1(sT−1, aT−1, rT−1, sT, g)
R′
r0(s0, a0, r0, s1, sT)
r1(s1, a1, r1, s2, sT)
rt(st, at, rt, st+1, sT)
rT−1(sT−1, aT−1, rT−1, sT, sT)
Memory
Training
Memory
Training
Memory
Hindsight Experience Replay
3. Hindsight Experience Replay
4. Experiments
• Three different tasks : pushing, sliding, pick&place

• How we define MDPs

• Does HER improve performance?

• Does HER improve performance even if there is only one goal we care
about?

• How does HER interact with reward shaping?

• How many goals should we replay each trajectory with and how to
choose them?
• Deployment on a physical robot
• How many goals should we replay each trajectory
with and how to choose them?
• future — replay with k random states which come from the same episode as the
transition being replayed and were observed after it,

• episode — replay with k random states coming from the same episode as the transition
being replayed,

• random — replay with k random states encountered so far in the whole training
procedure.
6. Conclusion
• We showed that HER allows training policies which push,
slide and pick-and-place objects with a robotic arm to the
specified positions while the vanilla RL algorithm fails to
solve these tasks.

More Related Content

What's hot

클라우드 환경에서 재해복구 방안
클라우드 환경에서 재해복구 방안클라우드 환경에서 재해복구 방안
클라우드 환경에서 재해복구 방안Hyuk Kwon
 
Deep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDDeep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDAmmar Rashed
 
Dataset for Semantic Urban Scene Understanding
Dataset for Semantic Urban Scene UnderstandingDataset for Semantic Urban Scene Understanding
Dataset for Semantic Urban Scene UnderstandingYosuke Shinya
 
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"YeChan(Paul) Kim
 
Deep Recurrent Q-Learning(DRQN) for Partially Observable MDPs
Deep Recurrent Q-Learning(DRQN) for Partially Observable MDPsDeep Recurrent Q-Learning(DRQN) for Partially Observable MDPs
Deep Recurrent Q-Learning(DRQN) for Partially Observable MDPsHakky St
 
クラシックな機械学習の入門  11.評価方法
クラシックな機械学習の入門  11.評価方法クラシックな機械学習の入門  11.評価方法
クラシックな機械学習の入門  11.評価方法Hiroshi Nakagawa
 
[DL輪読会]Reinforcement Learning with Deep Energy-Based Policies
[DL輪読会]Reinforcement Learning with Deep Energy-Based Policies[DL輪読会]Reinforcement Learning with Deep Energy-Based Policies
[DL輪読会]Reinforcement Learning with Deep Energy-Based PoliciesDeep Learning JP
 
[DL輪読会]Hindsight Experience Replay
[DL輪読会]Hindsight Experience Replay[DL輪読会]Hindsight Experience Replay
[DL輪読会]Hindsight Experience ReplayDeep Learning JP
 
統計的学習の基礎 第2章後半
統計的学習の基礎 第2章後半統計的学習の基礎 第2章後半
統計的学習の基礎 第2章後半Prunus 1350
 
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011Preferred Networks
 
[DL輪読会]Deep Reinforcement Learning that Matters
[DL輪読会]Deep Reinforcement Learning that Matters[DL輪読会]Deep Reinforcement Learning that Matters
[DL輪読会]Deep Reinforcement Learning that MattersDeep Learning JP
 
深層強化学習入門 2020年度Deep Learning基礎講座「強化学習」
深層強化学習入門 2020年度Deep Learning基礎講座「強化学習」深層強化学習入門 2020年度Deep Learning基礎講座「強化学習」
深層強化学習入門 2020年度Deep Learning基礎講座「強化学習」Tatsuya Matsushima
 
강화 학습 기초 Reinforcement Learning an introduction
강화 학습 기초 Reinforcement Learning an introduction강화 학습 기초 Reinforcement Learning an introduction
강화 학습 기초 Reinforcement Learning an introductionTaehoon Kim
 
[DL輪読会]Deep Neural Networks as Gaussian Processes
[DL輪読会]Deep Neural Networks as Gaussian Processes[DL輪読会]Deep Neural Networks as Gaussian Processes
[DL輪読会]Deep Neural Networks as Gaussian ProcessesDeep Learning JP
 
[DL輪読会]Active Domain Randomization
[DL輪読会]Active Domain Randomization[DL輪読会]Active Domain Randomization
[DL輪読会]Active Domain RandomizationDeep Learning JP
 
ラビットチャレンジレポート 深層学習Day2
ラビットチャレンジレポート 深層学習Day2ラビットチャレンジレポート 深層学習Day2
ラビットチャレンジレポート 深層学習Day2HiroyukiTerada4
 
Decision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence ModelingDecision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence ModelingYasunori Ozaki
 
第5回パターン認識勉強会
第5回パターン認識勉強会第5回パターン認識勉強会
第5回パターン認識勉強会Yohei Sato
 

What's hot (20)

클라우드 환경에서 재해복구 방안
클라우드 환경에서 재해복구 방안클라우드 환경에서 재해복구 방안
클라우드 환경에서 재해복구 방안
 
Deep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDDeep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfD
 
Dataset for Semantic Urban Scene Understanding
Dataset for Semantic Urban Scene UnderstandingDataset for Semantic Urban Scene Understanding
Dataset for Semantic Urban Scene Understanding
 
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"
 
Deep Recurrent Q-Learning(DRQN) for Partially Observable MDPs
Deep Recurrent Q-Learning(DRQN) for Partially Observable MDPsDeep Recurrent Q-Learning(DRQN) for Partially Observable MDPs
Deep Recurrent Q-Learning(DRQN) for Partially Observable MDPs
 
クラシックな機械学習の入門  11.評価方法
クラシックな機械学習の入門  11.評価方法クラシックな機械学習の入門  11.評価方法
クラシックな機械学習の入門  11.評価方法
 
[DL輪読会]Reinforcement Learning with Deep Energy-Based Policies
[DL輪読会]Reinforcement Learning with Deep Energy-Based Policies[DL輪読会]Reinforcement Learning with Deep Energy-Based Policies
[DL輪読会]Reinforcement Learning with Deep Energy-Based Policies
 
[DL輪読会]Hindsight Experience Replay
[DL輪読会]Hindsight Experience Replay[DL輪読会]Hindsight Experience Replay
[DL輪読会]Hindsight Experience Replay
 
統計的学習の基礎 第2章後半
統計的学習の基礎 第2章後半統計的学習の基礎 第2章後半
統計的学習の基礎 第2章後半
 
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011
オンライン凸最適化と線形識別モデル学習の最前線_IBIS2011
 
MIRU2014 tutorial deeplearning
MIRU2014 tutorial deeplearningMIRU2014 tutorial deeplearning
MIRU2014 tutorial deeplearning
 
強化学習1章
強化学習1章強化学習1章
強化学習1章
 
[DL輪読会]Deep Reinforcement Learning that Matters
[DL輪読会]Deep Reinforcement Learning that Matters[DL輪読会]Deep Reinforcement Learning that Matters
[DL輪読会]Deep Reinforcement Learning that Matters
 
深層強化学習入門 2020年度Deep Learning基礎講座「強化学習」
深層強化学習入門 2020年度Deep Learning基礎講座「強化学習」深層強化学習入門 2020年度Deep Learning基礎講座「強化学習」
深層強化学習入門 2020年度Deep Learning基礎講座「強化学習」
 
강화 학습 기초 Reinforcement Learning an introduction
강화 학습 기초 Reinforcement Learning an introduction강화 학습 기초 Reinforcement Learning an introduction
강화 학습 기초 Reinforcement Learning an introduction
 
[DL輪読会]Deep Neural Networks as Gaussian Processes
[DL輪読会]Deep Neural Networks as Gaussian Processes[DL輪読会]Deep Neural Networks as Gaussian Processes
[DL輪読会]Deep Neural Networks as Gaussian Processes
 
[DL輪読会]Active Domain Randomization
[DL輪読会]Active Domain Randomization[DL輪読会]Active Domain Randomization
[DL輪読会]Active Domain Randomization
 
ラビットチャレンジレポート 深層学習Day2
ラビットチャレンジレポート 深層学習Day2ラビットチャレンジレポート 深層学習Day2
ラビットチャレンジレポート 深層学習Day2
 
Decision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence ModelingDecision Transformer: Reinforcement Learning via Sequence Modeling
Decision Transformer: Reinforcement Learning via Sequence Modeling
 
第5回パターン認識勉強会
第5回パターン認識勉強会第5回パターン認識勉強会
第5回パターン認識勉強会
 

Similar to Hindsight experience replay paper review

DS Unit-1.pptx very easy to understand..
DS Unit-1.pptx very easy to understand..DS Unit-1.pptx very easy to understand..
DS Unit-1.pptx very easy to understand..KarthikeyaLanka1
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化Akira Tanimoto
 
1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptxpallavidhade2
 
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習Deep Learning JP
 
Existence of positive solutions for fractional q-difference equations involvi...
Existence of positive solutions for fractional q-difference equations involvi...Existence of positive solutions for fractional q-difference equations involvi...
Existence of positive solutions for fractional q-difference equations involvi...IJRTEMJOURNAL
 
Positive and negative solutions of a boundary value problem for a fractional ...
Positive and negative solutions of a boundary value problem for a fractional ...Positive and negative solutions of a boundary value problem for a fractional ...
Positive and negative solutions of a boundary value problem for a fractional ...journal ijrtem
 
Lesson 7: Vector-valued functions
Lesson 7: Vector-valued functionsLesson 7: Vector-valued functions
Lesson 7: Vector-valued functionsMatthew Leingang
 
ゲーム理論BASIC 演習51 -完全ベイジアン均衡-
ゲーム理論BASIC 演習51 -完全ベイジアン均衡-ゲーム理論BASIC 演習51 -完全ベイジアン均衡-
ゲーム理論BASIC 演習51 -完全ベイジアン均衡-ssusere0a682
 
An Introduction to Hidden Markov Model
An Introduction to Hidden Markov ModelAn Introduction to Hidden Markov Model
An Introduction to Hidden Markov ModelShih-Hsiang Lin
 
One way to see higher dimensional surface
One way to see higher dimensional surfaceOne way to see higher dimensional surface
One way to see higher dimensional surfaceKenta Oono
 
F-1 Stat 423, Stat 523 Formulas Chapter 7 Section.docx
F-1  Stat 423, Stat 523  Formulas  Chapter 7 Section.docxF-1  Stat 423, Stat 523  Formulas  Chapter 7 Section.docx
F-1 Stat 423, Stat 523 Formulas Chapter 7 Section.docxmecklenburgstrelitzh
 
F-1 Stat 423, Stat 523 Formulas Chapter 7 Section.docx
F-1  Stat 423, Stat 523  Formulas  Chapter 7 Section.docxF-1  Stat 423, Stat 523  Formulas  Chapter 7 Section.docx
F-1 Stat 423, Stat 523 Formulas Chapter 7 Section.docxmydrynan
 
Control as Inference (強化学習とベイズ統計)
Control as Inference (強化学習とベイズ統計)Control as Inference (強化学習とベイズ統計)
Control as Inference (強化学習とベイズ統計)Shohei Taniguchi
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Dealing with latent discrete parameters in Stan
Dealing with latent discrete parameters in StanDealing with latent discrete parameters in Stan
Dealing with latent discrete parameters in StanHiroki Itô
 
Computing the Nucleon Spin from Lattice QCD
Computing the Nucleon Spin from Lattice QCDComputing the Nucleon Spin from Lattice QCD
Computing the Nucleon Spin from Lattice QCDChristos Kallidonis
 

Similar to Hindsight experience replay paper review (20)

DS Unit-1.pptx very easy to understand..
DS Unit-1.pptx very easy to understand..DS Unit-1.pptx very easy to understand..
DS Unit-1.pptx very easy to understand..
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化
 
Asymptotic Notation
Asymptotic NotationAsymptotic Notation
Asymptotic Notation
 
1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx
 
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習
 
Existence of positive solutions for fractional q-difference equations involvi...
Existence of positive solutions for fractional q-difference equations involvi...Existence of positive solutions for fractional q-difference equations involvi...
Existence of positive solutions for fractional q-difference equations involvi...
 
Parameter estimation
Parameter estimationParameter estimation
Parameter estimation
 
DAA_LECT_2.pdf
DAA_LECT_2.pdfDAA_LECT_2.pdf
DAA_LECT_2.pdf
 
Positive and negative solutions of a boundary value problem for a fractional ...
Positive and negative solutions of a boundary value problem for a fractional ...Positive and negative solutions of a boundary value problem for a fractional ...
Positive and negative solutions of a boundary value problem for a fractional ...
 
Lesson 7: Vector-valued functions
Lesson 7: Vector-valued functionsLesson 7: Vector-valued functions
Lesson 7: Vector-valued functions
 
ゲーム理論BASIC 演習51 -完全ベイジアン均衡-
ゲーム理論BASIC 演習51 -完全ベイジアン均衡-ゲーム理論BASIC 演習51 -完全ベイジアン均衡-
ゲーム理論BASIC 演習51 -完全ベイジアン均衡-
 
An Introduction to Hidden Markov Model
An Introduction to Hidden Markov ModelAn Introduction to Hidden Markov Model
An Introduction to Hidden Markov Model
 
One way to see higher dimensional surface
One way to see higher dimensional surfaceOne way to see higher dimensional surface
One way to see higher dimensional surface
 
F-1 Stat 423, Stat 523 Formulas Chapter 7 Section.docx
F-1  Stat 423, Stat 523  Formulas  Chapter 7 Section.docxF-1  Stat 423, Stat 523  Formulas  Chapter 7 Section.docx
F-1 Stat 423, Stat 523 Formulas Chapter 7 Section.docx
 
F-1 Stat 423, Stat 523 Formulas Chapter 7 Section.docx
F-1  Stat 423, Stat 523  Formulas  Chapter 7 Section.docxF-1  Stat 423, Stat 523  Formulas  Chapter 7 Section.docx
F-1 Stat 423, Stat 523 Formulas Chapter 7 Section.docx
 
Control as Inference (強化学習とベイズ統計)
Control as Inference (強化学習とベイズ統計)Control as Inference (強化学習とベイズ統計)
Control as Inference (強化学習とベイズ統計)
 
Recent rl
Recent rlRecent rl
Recent rl
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Dealing with latent discrete parameters in Stan
Dealing with latent discrete parameters in StanDealing with latent discrete parameters in Stan
Dealing with latent discrete parameters in Stan
 
Computing the Nucleon Spin from Lattice QCD
Computing the Nucleon Spin from Lattice QCDComputing the Nucleon Spin from Lattice QCD
Computing the Nucleon Spin from Lattice QCD
 

More from Euijin Jeong

Data efficient hrl paper review
Data efficient hrl paper reviewData efficient hrl paper review
Data efficient hrl paper reviewEuijin Jeong
 
한국인공지능연구소 강화학습랩 결과보고서
한국인공지능연구소 강화학습랩 결과보고서한국인공지능연구소 강화학습랩 결과보고서
한국인공지능연구소 강화학습랩 결과보고서Euijin Jeong
 
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)Euijin Jeong
 
Deep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQNDeep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQNEuijin Jeong
 
Reinforcement Learning basics part1
Reinforcement Learning basics part1Reinforcement Learning basics part1
Reinforcement Learning basics part1Euijin Jeong
 
강화학습기초(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 파트1
강화학습기초(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 파트1강화학습기초(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 파트1
강화학습기초(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 파트1Euijin Jeong
 

More from Euijin Jeong (6)

Data efficient hrl paper review
Data efficient hrl paper reviewData efficient hrl paper review
Data efficient hrl paper review
 
한국인공지능연구소 강화학습랩 결과보고서
한국인공지능연구소 강화학습랩 결과보고서한국인공지능연구소 강화학습랩 결과보고서
한국인공지능연구소 강화학습랩 결과보고서
 
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
 
Deep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQNDeep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQN
 
Reinforcement Learning basics part1
Reinforcement Learning basics part1Reinforcement Learning basics part1
Reinforcement Learning basics part1
 
강화학습기초(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 파트1
강화학습기초(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 파트1강화학습기초(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 파트1
강화학습기초(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 파트1
 

Recently uploaded

Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...software pro Development
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 

Recently uploaded (20)

Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 

Hindsight experience replay paper review

  • 1. Hindsight Experience Replay OpenAI Paper review Presenter : Uijin Jung
  • 2. Presenter • Name : Uijin Jung • Github : github.com/jinPrelude • 16년 부산 원주민 • 경기도 3년 거주 억양 섞임
  • 3. Contents • Abstract 1. Introduction 2. Background 3. Hindsight Experience Replay 4. Experiments 5. Related work 6. Conclusion
  • 5. 1. Introduction • Reward engineering limits the applicability of RL in the real world because it requires both RL expertise and domain-specific knowledge. • But dealing with sparse rewards is also one of the biggest challenges in RL
  • 6. • One ability humans have, unlike the current generation of model-free RL algorithms, is to learn almost as much from achieving an undesired outcome as from the desired one. 1. Introduction
  • 7.
  • 8.
  • 9.
  • 10. :(
  • 11. :(
  • 12. :(
  • 13. :(
  • 14. :( :)
  • 16. 2. Background • Reinforcement Learning • Deep Q Learning (DQN) • Deep Deterministic Policy Gradient (DDPG) • Universal Value Function Approximators (UVFA)
  • 17. 3. Hindsight Experience Replay • Bit flipping environment
  • 18. 3. Hindsight Experience Replay • Bit flipping environment
  • 19. 3. Hindsight Experience Replay • Bit flipping environment {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
  • 20. 3. Hindsight Experience Replay • Bit flipping environment {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} n
  • 21. 3. Hindsight Experience Replay • Bit flipping environment {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} n A = {0, 1, …, n-1}
  • 22. 3. Hindsight Experience Replay • Bit flipping environment {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} A = {0, 1, …, n-1}
  • 23. 3. Hindsight Experience Replay • Bit flipping environment {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} A = {0, 1, …, n-1}
  • 24. 3. Hindsight Experience Replay • Bit flipping environment {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} A = {0, 1, …, n-1} g (goal)
  • 25. 3. Hindsight Experience Replay • Bit flipping environment {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} A = {0, 1, …, n-1} 0 1 0 0 … 1 1 0 1 n g (goal)
  • 26. 3. Hindsight Experience Replay • Bit flipping environment {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} A = {0, 1, …, n-1} 0 1 0 0 … 1 1 0 1g (goal) =
  • 27. = 3. Hindsight Experience Replay {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} A = {0, 1, …, n-1} 0 1 0 0 … 1 1 0 1g (goal) • Bit flipping environment
  • 28. = 3. Hindsight Experience Replay {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} A = {0, 1, …, n-1} 0 1 0 0 … 1 1 0 1g (goal) • Bit flipping environment
  • 29. 3. Hindsight Experience Replay {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} • Bit flipping environment
  • 30. 3. Hindsight Experience Replay {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} 40 • Bit flipping environment
  • 31. 3. Hindsight Experience Replay {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} 40 • Bit flipping environment
  • 32.
  • 33. 0 1 1 g
  • 34. 1 1 0 s0 0 1 1 g
  • 35. 1 1 0 s0 … 0 1 1 g
  • 36. 1 1 0 s0 … 1 0 0 sT 0 1 1 g
  • 37. 1 1 0 s0 … 1 0 0 sT ≠ 0 1 1 g
  • 38. 1 1 0 s0 … 1 0 0 sT ≠ 0 1 1 g Episode reward : {-1, -1, …, -1, -1}
  • 39. 1 1 0 s0 … 1 0 0 sT ≠ 0 1 1 g Episode reward : {-1, -1, …, -1, -1}
  • 40. 1 1 0 s0 … 1 0 0 sT ≠ 0 1 1 g Episode reward : {-1, -1, …, -1, -1} R r0(s0, a0, r0, s1, g) r1(s1, a1, r1, s2, g) rt(st, at, rt, st+1, g) rT−1(sT−1, aT−1, rT−1, sT, g)
  • 41. 1 1 0 s0 … 1 0 0 sT ≠ 0 1 1 g Episode reward : {-1, -1, …, -1, -1} R r0(s0, a0, r0, s1, g) r1(s1, a1, r1, s2, g) rt(st, at, rt, st+1, g) rT−1(sT−1, aT−1, rT−1, sT, g)
  • 42. 1 1 0 s0 … 1 0 0 sT ≠ 1 0 0 sT Episode reward : {-1, -1, …, -1, -1} R r0(s0, a0, r0, s1, g) r1(s1, a1, r1, s2, g) rt(st, at, rt, st+1, g) rT−1(sT−1, aT−1, rT−1, sT, g)
  • 43. 1 1 0 s0 … 1 0 0 sT 1 0 0 sT = Episode reward : {-1, -1, …, -1, -1} R r0(s0, a0, r0, s1, g) r1(s1, a1, r1, s2, g) rt(st, at, rt, st+1, g) rT−1(sT−1, aT−1, rT−1, sT, g)
  • 44. 1 1 0 s0 … 1 0 0 sT 1 0 0 sT = Episode reward : {-1, -1, …, -1, 0} R r0(s0, a0, r0, s1, g) r1(s1, a1, r1, s2, g) rt(st, at, rt, st+1, g) rT−1(sT−1, aT−1, rT−1, sT, g)
  • 45. 1 1 0 s0 … 1 0 0 sT 1 0 0 sT = Episode reward : {-1, -1, …, -1, 0} R r0(s0, a0, r0, s1, g) r1(s1, a1, r1, s2, g) rt(st, at, rt, st+1, g) rT−1(sT−1, aT−1, rT−1, sT, g) R′ r0(s0, a0, r0, s1, sT) r1(s1, a1, r1, s2, sT) rt(st, at, rt, st+1, sT) rT−1(sT−1, aT−1, rT−1, sT, sT)
  • 46. R r0(s0, a0, r0, s1, g) r1(s1, a1, r1, s2, g) rt(st, at, rt, st+1, g) rT−1(sT−1, aT−1, rT−1, sT, g) R′ r0(s0, a0, r0, s1, sT) r1(s1, a1, r1, s2, sT) rt(st, at, rt, st+1, sT) rT−1(sT−1, aT−1, rT−1, sT, sT)
  • 47. Memory R r0(s0, a0, r0, s1, g) r1(s1, a1, r1, s2, g) rt(st, at, rt, st+1, g) rT−1(sT−1, aT−1, rT−1, sT, g) R′ r0(s0, a0, r0, s1, sT) r1(s1, a1, r1, s2, sT) rt(st, at, rt, st+1, sT) rT−1(sT−1, aT−1, rT−1, sT, sT)
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 61. 4. Experiments • Three different tasks : pushing, sliding, pick&place • How we define MDPs • Does HER improve performance? • Does HER improve performance even if there is only one goal we care about? • How does HER interact with reward shaping? • How many goals should we replay each trajectory with and how to choose them? • Deployment on a physical robot
  • 62. • How many goals should we replay each trajectory with and how to choose them? • future — replay with k random states which come from the same episode as the transition being replayed and were observed after it, • episode — replay with k random states coming from the same episode as the transition being replayed, • random — replay with k random states encountered so far in the whole training procedure.
  • 63. 6. Conclusion • We showed that HER allows training policies which push, slide and pick-and-place objects with a robotic arm to the specified positions while the vanilla RL algorithm fails to solve these tasks.