SlideShare a Scribd company logo
Hindsight Experience Replay
OpenAI
Paper review
Presenter : Uijin Jung
Presenter
• Name : Uijin Jung

• Github : github.com/jinPrelude

• 16년 부산 원주민

• 경기도 3년 거주 억양 섞임
Contents
• Abstract

1. Introduction 

2. Background

3. Hindsight Experience Replay 

4. Experiments

5. Related work

6. Conclusion
1. Introduction
1. Introduction
• Reward engineering limits the applicability of RL in the
real world because it requires both RL expertise and
domain-specific knowledge.

• But dealing with sparse rewards is also one of the biggest
challenges in RL
• One ability humans have, unlike the current generation of
model-free RL algorithms, is to learn almost as much from
achieving an undesired outcome as from the desired one.
1. Introduction
:(
:(
:(
:(
:(
:)
:(
:)
Hindsight Experience Replay
2. Background
• Reinforcement Learning

• Deep Q Learning (DQN)

• Deep Deterministic Policy Gradient (DDPG)

• Universal Value Function Approximators (UVFA)
3. Hindsight Experience Replay
• Bit flipping environment
3. Hindsight Experience Replay
• Bit flipping environment
3. Hindsight Experience Replay
• Bit flipping environment
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
3. Hindsight Experience Replay
• Bit flipping environment
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
n
3. Hindsight Experience Replay
• Bit flipping environment
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
n
A = {0, 1, …, n-1}
3. Hindsight Experience Replay
• Bit flipping environment
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
A = {0, 1, …, n-1}
3. Hindsight Experience Replay
• Bit flipping environment
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
A = {0, 1, …, n-1}
3. Hindsight Experience Replay
• Bit flipping environment
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
A = {0, 1, …, n-1}
g (goal)
3. Hindsight Experience Replay
• Bit flipping environment
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
A = {0, 1, …, n-1}
0 1 0 0 … 1 1 0 1
n
g (goal)
3. Hindsight Experience Replay
• Bit flipping environment
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
A = {0, 1, …, n-1}
0 1 0 0 … 1 1 0 1g (goal) =
=
3. Hindsight Experience Replay
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
A = {0, 1, …, n-1}
0 1 0 0 … 1 1 0 1g (goal)
• Bit flipping environment
=
3. Hindsight Experience Replay
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
A = {0, 1, …, n-1}
0 1 0 0 … 1 1 0 1g (goal)
• Bit flipping environment
3. Hindsight Experience Replay
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
• Bit flipping environment
3. Hindsight Experience Replay
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
40
• Bit flipping environment
3. Hindsight Experience Replay
{0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
40
• Bit flipping environment
0 1 1 g
1 1 0 s0
0 1 1 g
1 1 0 s0 …
0 1 1 g
1 1 0 s0 … 1 0 0 sT
0 1 1 g
1 1 0 s0 … 1 0 0 sT
≠
0 1 1 g
1 1 0 s0 … 1 0 0 sT
≠
0 1 1 g
Episode reward : {-1, -1, …, -1, -1}
1 1 0 s0 … 1 0 0 sT
≠
0 1 1 g
Episode reward : {-1, -1, …, -1, -1}
1 1 0 s0 … 1 0 0 sT
≠
0 1 1 g
Episode reward : {-1, -1, …, -1, -1}
R
r0(s0, a0, r0, s1, g)
r1(s1, a1, r1, s2, g)
rt(st, at, rt, st+1, g)
rT−1(sT−1, aT−1, rT−1, sT, g)
1 1 0 s0 … 1 0 0 sT
≠
0 1 1 g
Episode reward : {-1, -1, …, -1, -1}
R
r0(s0, a0, r0, s1, g)
r1(s1, a1, r1, s2, g)
rt(st, at, rt, st+1, g)
rT−1(sT−1, aT−1, rT−1, sT, g)
1 1 0 s0 … 1 0 0 sT
≠
1 0 0 sT
Episode reward : {-1, -1, …, -1, -1}
R
r0(s0, a0, r0, s1, g)
r1(s1, a1, r1, s2, g)
rt(st, at, rt, st+1, g)
rT−1(sT−1, aT−1, rT−1, sT, g)
1 1 0 s0 … 1 0 0 sT
1 0 0 sT
=
Episode reward : {-1, -1, …, -1, -1}
R
r0(s0, a0, r0, s1, g)
r1(s1, a1, r1, s2, g)
rt(st, at, rt, st+1, g)
rT−1(sT−1, aT−1, rT−1, sT, g)
1 1 0 s0 … 1 0 0 sT
1 0 0 sT
=
Episode reward : {-1, -1, …, -1, 0}
R
r0(s0, a0, r0, s1, g)
r1(s1, a1, r1, s2, g)
rt(st, at, rt, st+1, g)
rT−1(sT−1, aT−1, rT−1, sT, g)
1 1 0 s0 … 1 0 0 sT
1 0 0 sT
=
Episode reward : {-1, -1, …, -1, 0}
R
r0(s0, a0, r0, s1, g)
r1(s1, a1, r1, s2, g)
rt(st, at, rt, st+1, g)
rT−1(sT−1, aT−1, rT−1, sT, g)
R′
r0(s0, a0, r0, s1, sT)
r1(s1, a1, r1, s2, sT)
rt(st, at, rt, st+1, sT)
rT−1(sT−1, aT−1, rT−1, sT, sT)
R
r0(s0, a0, r0, s1, g)
r1(s1, a1, r1, s2, g)
rt(st, at, rt, st+1, g)
rT−1(sT−1, aT−1, rT−1, sT, g)
R′
r0(s0, a0, r0, s1, sT)
r1(s1, a1, r1, s2, sT)
rt(st, at, rt, st+1, sT)
rT−1(sT−1, aT−1, rT−1, sT, sT)
Memory
R
r0(s0, a0, r0, s1, g)
r1(s1, a1, r1, s2, g)
rt(st, at, rt, st+1, g)
rT−1(sT−1, aT−1, rT−1, sT, g)
R′
r0(s0, a0, r0, s1, sT)
r1(s1, a1, r1, s2, sT)
rt(st, at, rt, st+1, sT)
rT−1(sT−1, aT−1, rT−1, sT, sT)
Memory
Training
Memory
Training
Memory
Hindsight Experience Replay
3. Hindsight Experience Replay
4. Experiments
• Three different tasks : pushing, sliding, pick&place

• How we define MDPs

• Does HER improve performance?

• Does HER improve performance even if there is only one goal we care
about?

• How does HER interact with reward shaping?

• How many goals should we replay each trajectory with and how to
choose them?
• Deployment on a physical robot
• How many goals should we replay each trajectory
with and how to choose them?
• future — replay with k random states which come from the same episode as the
transition being replayed and were observed after it,

• episode — replay with k random states coming from the same episode as the transition
being replayed,

• random — replay with k random states encountered so far in the whole training
procedure.
6. Conclusion
• We showed that HER allows training policies which push,
slide and pick-and-place objects with a robotic arm to the
specified positions while the vanilla RL algorithm fails to
solve these tasks.

More Related Content

What's hot

[DL Hacks]Pruning Convolutional Neural Networks for Resource Efficient Inference
[DL Hacks]Pruning Convolutional Neural Networks for Resource Efficient Inference[DL Hacks]Pruning Convolutional Neural Networks for Resource Efficient Inference
[DL Hacks]Pruning Convolutional Neural Networks for Resource Efficient InferenceDeep Learning JP
 
IMAGE PROCESSING - MATHANKUMAR.S - VMKVEC
IMAGE PROCESSING - MATHANKUMAR.S - VMKVECIMAGE PROCESSING - MATHANKUMAR.S - VMKVEC
IMAGE PROCESSING - MATHANKUMAR.S - VMKVECMathankumar S
 
Amazing Feats of Daring - Uncharted Post Mortem
Amazing Feats of Daring - Uncharted Post MortemAmazing Feats of Daring - Uncharted Post Mortem
Amazing Feats of Daring - Uncharted Post MortemNaughty Dog
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility bufferWolfgang Engel
 
Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)PetteriTeikariPhD
 
القطوع المخروطية Conicss
القطوع المخروطية Conicssالقطوع المخروطية Conicss
القطوع المخروطية Conicssbabiker biko
 
A completed modeling of local binary pattern operator
A completed modeling of local binary pattern operatorA completed modeling of local binary pattern operator
A completed modeling of local binary pattern operatorWin Yu
 
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barré-Brisebois
 
Technical Deep Dive into the New Prefab System
Technical Deep Dive into the New Prefab SystemTechnical Deep Dive into the New Prefab System
Technical Deep Dive into the New Prefab SystemUnity Technologies
 
Future Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsFuture Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsElectronic Arts / DICE
 
Siggraph 2011: Occlusion culling in Alan Wake
Siggraph 2011: Occlusion culling in Alan WakeSiggraph 2011: Occlusion culling in Alan Wake
Siggraph 2011: Occlusion culling in Alan WakeUmbra
 
Precomputed Voxelized-Shadows for Large-scale Scene and Many lights
Precomputed Voxelized-Shadows for Large-scale Scene and Many lightsPrecomputed Voxelized-Shadows for Large-scale Scene and Many lights
Precomputed Voxelized-Shadows for Large-scale Scene and Many lightsSeongdae Kim
 
Geometri Transformasi Grafik Komputer Teknik Informatika
Geometri Transformasi Grafik Komputer Teknik InformatikaGeometri Transformasi Grafik Komputer Teknik Informatika
Geometri Transformasi Grafik Komputer Teknik InformatikaCristieSimatupang
 
3 intensity transformations and spatial filtering slides
3 intensity transformations and spatial filtering slides3 intensity transformations and spatial filtering slides
3 intensity transformations and spatial filtering slidesBHAGYAPRASADBUGGE
 

What's hot (19)

[DL Hacks]Pruning Convolutional Neural Networks for Resource Efficient Inference
[DL Hacks]Pruning Convolutional Neural Networks for Resource Efficient Inference[DL Hacks]Pruning Convolutional Neural Networks for Resource Efficient Inference
[DL Hacks]Pruning Convolutional Neural Networks for Resource Efficient Inference
 
IMAGE PROCESSING - MATHANKUMAR.S - VMKVEC
IMAGE PROCESSING - MATHANKUMAR.S - VMKVECIMAGE PROCESSING - MATHANKUMAR.S - VMKVEC
IMAGE PROCESSING - MATHANKUMAR.S - VMKVEC
 
Amazing Feats of Daring - Uncharted Post Mortem
Amazing Feats of Daring - Uncharted Post MortemAmazing Feats of Daring - Uncharted Post Mortem
Amazing Feats of Daring - Uncharted Post Mortem
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
 
OpenGL Basics
OpenGL BasicsOpenGL Basics
OpenGL Basics
 
SEED - Halcyon Architecture
SEED - Halcyon ArchitectureSEED - Halcyon Architecture
SEED - Halcyon Architecture
 
Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)Deep Learning for Structure-from-Motion (SfM)
Deep Learning for Structure-from-Motion (SfM)
 
القطوع المخروطية Conicss
القطوع المخروطية Conicssالقطوع المخروطية Conicss
القطوع المخروطية Conicss
 
A completed modeling of local binary pattern operator
A completed modeling of local binary pattern operatorA completed modeling of local binary pattern operator
A completed modeling of local binary pattern operator
 
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
Colin Barre-Brisebois - GDC 2011 - Approximating Translucency for a Fast, Che...
 
Technical Deep Dive into the New Prefab System
Technical Deep Dive into the New Prefab SystemTechnical Deep Dive into the New Prefab System
Technical Deep Dive into the New Prefab System
 
Future Directions for Compute-for-Graphics
Future Directions for Compute-for-GraphicsFuture Directions for Compute-for-Graphics
Future Directions for Compute-for-Graphics
 
Siggraph 2011: Occlusion culling in Alan Wake
Siggraph 2011: Occlusion culling in Alan WakeSiggraph 2011: Occlusion culling in Alan Wake
Siggraph 2011: Occlusion culling in Alan Wake
 
Hair in Tomb Raider
Hair in Tomb RaiderHair in Tomb Raider
Hair in Tomb Raider
 
Precomputed Voxelized-Shadows for Large-scale Scene and Many lights
Precomputed Voxelized-Shadows for Large-scale Scene and Many lightsPrecomputed Voxelized-Shadows for Large-scale Scene and Many lights
Precomputed Voxelized-Shadows for Large-scale Scene and Many lights
 
CS 354 Typography
CS 354 TypographyCS 354 Typography
CS 354 Typography
 
Nx tutorial basics
Nx tutorial basicsNx tutorial basics
Nx tutorial basics
 
Geometri Transformasi Grafik Komputer Teknik Informatika
Geometri Transformasi Grafik Komputer Teknik InformatikaGeometri Transformasi Grafik Komputer Teknik Informatika
Geometri Transformasi Grafik Komputer Teknik Informatika
 
3 intensity transformations and spatial filtering slides
3 intensity transformations and spatial filtering slides3 intensity transformations and spatial filtering slides
3 intensity transformations and spatial filtering slides
 

Similar to Hindsight experience replay paper review

DS Unit-1.pptx very easy to understand..
DS Unit-1.pptx very easy to understand..DS Unit-1.pptx very easy to understand..
DS Unit-1.pptx very easy to understand..KarthikeyaLanka1
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化Akira Tanimoto
 
深層強化学習入門 2020年度Deep Learning基礎講座「強化学習」
深層強化学習入門 2020年度Deep Learning基礎講座「強化学習」深層強化学習入門 2020年度Deep Learning基礎講座「強化学習」
深層強化学習入門 2020年度Deep Learning基礎講座「強化学習」Tatsuya Matsushima
 
1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptxpallavidhade2
 
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習Deep Learning JP
 
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"YeChan(Paul) Kim
 
Existence of positive solutions for fractional q-difference equations involvi...
Existence of positive solutions for fractional q-difference equations involvi...Existence of positive solutions for fractional q-difference equations involvi...
Existence of positive solutions for fractional q-difference equations involvi...IJRTEMJOURNAL
 
Positive and negative solutions of a boundary value problem for a fractional ...
Positive and negative solutions of a boundary value problem for a fractional ...Positive and negative solutions of a boundary value problem for a fractional ...
Positive and negative solutions of a boundary value problem for a fractional ...journal ijrtem
 
Lesson 7: Vector-valued functions
Lesson 7: Vector-valued functionsLesson 7: Vector-valued functions
Lesson 7: Vector-valued functionsMatthew Leingang
 
ゲーム理論BASIC 演習51 -完全ベイジアン均衡-
ゲーム理論BASIC 演習51 -完全ベイジアン均衡-ゲーム理論BASIC 演習51 -完全ベイジアン均衡-
ゲーム理論BASIC 演習51 -完全ベイジアン均衡-ssusere0a682
 
An Introduction to Hidden Markov Model
An Introduction to Hidden Markov ModelAn Introduction to Hidden Markov Model
An Introduction to Hidden Markov ModelShih-Hsiang Lin
 
One way to see higher dimensional surface
One way to see higher dimensional surfaceOne way to see higher dimensional surface
One way to see higher dimensional surfaceKenta Oono
 
F-1 Stat 423, Stat 523 Formulas Chapter 7 Section.docx
F-1  Stat 423, Stat 523  Formulas  Chapter 7 Section.docxF-1  Stat 423, Stat 523  Formulas  Chapter 7 Section.docx
F-1 Stat 423, Stat 523 Formulas Chapter 7 Section.docxmecklenburgstrelitzh
 
F-1 Stat 423, Stat 523 Formulas Chapter 7 Section.docx
F-1  Stat 423, Stat 523  Formulas  Chapter 7 Section.docxF-1  Stat 423, Stat 523  Formulas  Chapter 7 Section.docx
F-1 Stat 423, Stat 523 Formulas Chapter 7 Section.docxmydrynan
 
Control as Inference (強化学習とベイズ統計)
Control as Inference (強化学習とベイズ統計)Control as Inference (強化学習とベイズ統計)
Control as Inference (強化学習とベイズ統計)Shohei Taniguchi
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 

Similar to Hindsight experience replay paper review (20)

DS Unit-1.pptx very easy to understand..
DS Unit-1.pptx very easy to understand..DS Unit-1.pptx very easy to understand..
DS Unit-1.pptx very easy to understand..
 
MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化MLP輪読スパース8章 トレースノルム正則化
MLP輪読スパース8章 トレースノルム正則化
 
Asymptotic Notation
Asymptotic NotationAsymptotic Notation
Asymptotic Notation
 
深層強化学習入門 2020年度Deep Learning基礎講座「強化学習」
深層強化学習入門 2020年度Deep Learning基礎講座「強化学習」深層強化学習入門 2020年度Deep Learning基礎講座「強化学習」
深層強化学習入門 2020年度Deep Learning基礎講座「強化学習」
 
1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx
 
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習
[DL輪読会]Hindsight Experience Replayを応用した再ラベリングによる効率的な強化学習
 
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"
pycon2018 "RL Adventure : DQN 부터 Rainbow DQN까지"
 
Existence of positive solutions for fractional q-difference equations involvi...
Existence of positive solutions for fractional q-difference equations involvi...Existence of positive solutions for fractional q-difference equations involvi...
Existence of positive solutions for fractional q-difference equations involvi...
 
Parameter estimation
Parameter estimationParameter estimation
Parameter estimation
 
DAA_LECT_2.pdf
DAA_LECT_2.pdfDAA_LECT_2.pdf
DAA_LECT_2.pdf
 
Positive and negative solutions of a boundary value problem for a fractional ...
Positive and negative solutions of a boundary value problem for a fractional ...Positive and negative solutions of a boundary value problem for a fractional ...
Positive and negative solutions of a boundary value problem for a fractional ...
 
Lesson 7: Vector-valued functions
Lesson 7: Vector-valued functionsLesson 7: Vector-valued functions
Lesson 7: Vector-valued functions
 
ゲーム理論BASIC 演習51 -完全ベイジアン均衡-
ゲーム理論BASIC 演習51 -完全ベイジアン均衡-ゲーム理論BASIC 演習51 -完全ベイジアン均衡-
ゲーム理論BASIC 演習51 -完全ベイジアン均衡-
 
An Introduction to Hidden Markov Model
An Introduction to Hidden Markov ModelAn Introduction to Hidden Markov Model
An Introduction to Hidden Markov Model
 
One way to see higher dimensional surface
One way to see higher dimensional surfaceOne way to see higher dimensional surface
One way to see higher dimensional surface
 
F-1 Stat 423, Stat 523 Formulas Chapter 7 Section.docx
F-1  Stat 423, Stat 523  Formulas  Chapter 7 Section.docxF-1  Stat 423, Stat 523  Formulas  Chapter 7 Section.docx
F-1 Stat 423, Stat 523 Formulas Chapter 7 Section.docx
 
F-1 Stat 423, Stat 523 Formulas Chapter 7 Section.docx
F-1  Stat 423, Stat 523  Formulas  Chapter 7 Section.docxF-1  Stat 423, Stat 523  Formulas  Chapter 7 Section.docx
F-1 Stat 423, Stat 523 Formulas Chapter 7 Section.docx
 
Control as Inference (強化学習とベイズ統計)
Control as Inference (強化学習とベイズ統計)Control as Inference (強化学習とベイズ統計)
Control as Inference (強化学習とベイズ統計)
 
Recent rl
Recent rlRecent rl
Recent rl
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 

More from Euijin Jeong

Data efficient hrl paper review
Data efficient hrl paper reviewData efficient hrl paper review
Data efficient hrl paper reviewEuijin Jeong
 
한국인공지능연구소 강화학습랩 결과보고서
한국인공지능연구소 강화학습랩 결과보고서한국인공지능연구소 강화학습랩 결과보고서
한국인공지능연구소 강화학습랩 결과보고서Euijin Jeong
 
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)Euijin Jeong
 
Deep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQNDeep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQNEuijin Jeong
 
Reinforcement Learning basics part1
Reinforcement Learning basics part1Reinforcement Learning basics part1
Reinforcement Learning basics part1Euijin Jeong
 
강화학습기초(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 파트1
강화학습기초(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 파트1강화학습기초(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 파트1
강화학습기초(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 파트1Euijin Jeong
 

More from Euijin Jeong (6)

Data efficient hrl paper review
Data efficient hrl paper reviewData efficient hrl paper review
Data efficient hrl paper review
 
한국인공지능연구소 강화학습랩 결과보고서
한국인공지능연구소 강화학습랩 결과보고서한국인공지능연구소 강화학습랩 결과보고서
한국인공지능연구소 강화학습랩 결과보고서
 
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
강화학습 기초_2(Deep sarsa, Deep Q-learning, DQN)
 
Deep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQNDeep sarsa, Deep Q-learning, DQN
Deep sarsa, Deep Q-learning, DQN
 
Reinforcement Learning basics part1
Reinforcement Learning basics part1Reinforcement Learning basics part1
Reinforcement Learning basics part1
 
강화학습기초(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 파트1
강화학습기초(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 파트1강화학습기초(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 파트1
강화학습기초(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 파트1
 

Recently uploaded

TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptxGeorgi Kodinov
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfkalichargn70th171
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareinfo611746
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEJelle | Nordend
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAlluxio, Inc.
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Krakówbim.edu.pl
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageGlobus
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion Clinic
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
 

Recently uploaded (20)

TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting software
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 

Hindsight experience replay paper review

  • 1. Hindsight Experience Replay OpenAI Paper review Presenter : Uijin Jung
  • 2. Presenter • Name : Uijin Jung • Github : github.com/jinPrelude • 16년 부산 원주민 • 경기도 3년 거주 억양 섞임
  • 3. Contents • Abstract 1. Introduction 2. Background 3. Hindsight Experience Replay 4. Experiments 5. Related work 6. Conclusion
  • 5. 1. Introduction • Reward engineering limits the applicability of RL in the real world because it requires both RL expertise and domain-specific knowledge. • But dealing with sparse rewards is also one of the biggest challenges in RL
  • 6. • One ability humans have, unlike the current generation of model-free RL algorithms, is to learn almost as much from achieving an undesired outcome as from the desired one. 1. Introduction
  • 7.
  • 8.
  • 9.
  • 10. :(
  • 11. :(
  • 12. :(
  • 13. :(
  • 14. :( :)
  • 16. 2. Background • Reinforcement Learning • Deep Q Learning (DQN) • Deep Deterministic Policy Gradient (DDPG) • Universal Value Function Approximators (UVFA)
  • 17. 3. Hindsight Experience Replay • Bit flipping environment
  • 18. 3. Hindsight Experience Replay • Bit flipping environment
  • 19. 3. Hindsight Experience Replay • Bit flipping environment {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1}
  • 20. 3. Hindsight Experience Replay • Bit flipping environment {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} n
  • 21. 3. Hindsight Experience Replay • Bit flipping environment {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} n A = {0, 1, …, n-1}
  • 22. 3. Hindsight Experience Replay • Bit flipping environment {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} A = {0, 1, …, n-1}
  • 23. 3. Hindsight Experience Replay • Bit flipping environment {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} A = {0, 1, …, n-1}
  • 24. 3. Hindsight Experience Replay • Bit flipping environment {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} A = {0, 1, …, n-1} g (goal)
  • 25. 3. Hindsight Experience Replay • Bit flipping environment {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} A = {0, 1, …, n-1} 0 1 0 0 … 1 1 0 1 n g (goal)
  • 26. 3. Hindsight Experience Replay • Bit flipping environment {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} A = {0, 1, …, n-1} 0 1 0 0 … 1 1 0 1g (goal) =
  • 27. = 3. Hindsight Experience Replay {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} A = {0, 1, …, n-1} 0 1 0 0 … 1 1 0 1g (goal) • Bit flipping environment
  • 28. = 3. Hindsight Experience Replay {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} A = {0, 1, …, n-1} 0 1 0 0 … 1 1 0 1g (goal) • Bit flipping environment
  • 29. 3. Hindsight Experience Replay {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} • Bit flipping environment
  • 30. 3. Hindsight Experience Replay {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} 40 • Bit flipping environment
  • 31. 3. Hindsight Experience Replay {0,1} {0,1} {0,1} {0,1} … {0,1} {0,1} {0,1} {0,1} 40 • Bit flipping environment
  • 32.
  • 33. 0 1 1 g
  • 34. 1 1 0 s0 0 1 1 g
  • 35. 1 1 0 s0 … 0 1 1 g
  • 36. 1 1 0 s0 … 1 0 0 sT 0 1 1 g
  • 37. 1 1 0 s0 … 1 0 0 sT ≠ 0 1 1 g
  • 38. 1 1 0 s0 … 1 0 0 sT ≠ 0 1 1 g Episode reward : {-1, -1, …, -1, -1}
  • 39. 1 1 0 s0 … 1 0 0 sT ≠ 0 1 1 g Episode reward : {-1, -1, …, -1, -1}
  • 40. 1 1 0 s0 … 1 0 0 sT ≠ 0 1 1 g Episode reward : {-1, -1, …, -1, -1} R r0(s0, a0, r0, s1, g) r1(s1, a1, r1, s2, g) rt(st, at, rt, st+1, g) rT−1(sT−1, aT−1, rT−1, sT, g)
  • 41. 1 1 0 s0 … 1 0 0 sT ≠ 0 1 1 g Episode reward : {-1, -1, …, -1, -1} R r0(s0, a0, r0, s1, g) r1(s1, a1, r1, s2, g) rt(st, at, rt, st+1, g) rT−1(sT−1, aT−1, rT−1, sT, g)
  • 42. 1 1 0 s0 … 1 0 0 sT ≠ 1 0 0 sT Episode reward : {-1, -1, …, -1, -1} R r0(s0, a0, r0, s1, g) r1(s1, a1, r1, s2, g) rt(st, at, rt, st+1, g) rT−1(sT−1, aT−1, rT−1, sT, g)
  • 43. 1 1 0 s0 … 1 0 0 sT 1 0 0 sT = Episode reward : {-1, -1, …, -1, -1} R r0(s0, a0, r0, s1, g) r1(s1, a1, r1, s2, g) rt(st, at, rt, st+1, g) rT−1(sT−1, aT−1, rT−1, sT, g)
  • 44. 1 1 0 s0 … 1 0 0 sT 1 0 0 sT = Episode reward : {-1, -1, …, -1, 0} R r0(s0, a0, r0, s1, g) r1(s1, a1, r1, s2, g) rt(st, at, rt, st+1, g) rT−1(sT−1, aT−1, rT−1, sT, g)
  • 45. 1 1 0 s0 … 1 0 0 sT 1 0 0 sT = Episode reward : {-1, -1, …, -1, 0} R r0(s0, a0, r0, s1, g) r1(s1, a1, r1, s2, g) rt(st, at, rt, st+1, g) rT−1(sT−1, aT−1, rT−1, sT, g) R′ r0(s0, a0, r0, s1, sT) r1(s1, a1, r1, s2, sT) rt(st, at, rt, st+1, sT) rT−1(sT−1, aT−1, rT−1, sT, sT)
  • 46. R r0(s0, a0, r0, s1, g) r1(s1, a1, r1, s2, g) rt(st, at, rt, st+1, g) rT−1(sT−1, aT−1, rT−1, sT, g) R′ r0(s0, a0, r0, s1, sT) r1(s1, a1, r1, s2, sT) rt(st, at, rt, st+1, sT) rT−1(sT−1, aT−1, rT−1, sT, sT)
  • 47. Memory R r0(s0, a0, r0, s1, g) r1(s1, a1, r1, s2, g) rt(st, at, rt, st+1, g) rT−1(sT−1, aT−1, rT−1, sT, g) R′ r0(s0, a0, r0, s1, sT) r1(s1, a1, r1, s2, sT) rt(st, at, rt, st+1, sT) rT−1(sT−1, aT−1, rT−1, sT, sT)
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 61. 4. Experiments • Three different tasks : pushing, sliding, pick&place • How we define MDPs • Does HER improve performance? • Does HER improve performance even if there is only one goal we care about? • How does HER interact with reward shaping? • How many goals should we replay each trajectory with and how to choose them? • Deployment on a physical robot
  • 62. • How many goals should we replay each trajectory with and how to choose them? • future — replay with k random states which come from the same episode as the transition being replayed and were observed after it, • episode — replay with k random states coming from the same episode as the transition being replayed, • random — replay with k random states encountered so far in the whole training procedure.
  • 63. 6. Conclusion • We showed that HER allows training policies which push, slide and pick-and-place objects with a robotic arm to the specified positions while the vanilla RL algorithm fails to solve these tasks.