- Hindsight Experience Replay (HER) is an RL technique that allows agents to learn from unachieved goals as if they were achieved. This helps address the sparse reward problem in RL.
- HER replays experiences with substituted goals, generating pseudo-transitions with new rewards. This increases the effective sample complexity for learning tasks with sparse rewards.
- Experiments show HER improves performance on manipulation tasks like pushing, sliding, and pick-and-place, allowing policies to be learned for these tasks where vanilla RL fails.
5. 1. Introduction
• Reward engineering limits the applicability of RL in the
real world because it requires both RL expertise and
domain-specific knowledge.
• But dealing with sparse rewards is also one of the biggest
challenges in RL
6. • One ability humans have, unlike the current generation of
model-free RL algorithms, is to learn almost as much from
achieving an undesired outcome as from the desired one.
1. Introduction
61. 4. Experiments
• Three different tasks : pushing, sliding, pick&place
• How we define MDPs
• Does HER improve performance?
• Does HER improve performance even if there is only one goal we care
about?
• How does HER interact with reward shaping?
• How many goals should we replay each trajectory with and how to
choose them?
• Deployment on a physical robot
62. • How many goals should we replay each trajectory
with and how to choose them?
• future — replay with k random states which come from the same episode as the
transition being replayed and were observed after it,
• episode — replay with k random states coming from the same episode as the transition
being replayed,
• random — replay with k random states encountered so far in the whole training
procedure.
63. 6. Conclusion
• We showed that HER allows training policies which push,
slide and pick-and-place objects with a robotic arm to the
specified positions while the vanilla RL algorithm fails to
solve these tasks.