I reviewed the PRM-RL paper.
PRM-RL (Probabilistic Roadmap-Reinforcement Learning) is a hierarchical method that combines sampling-based path planning with RL. It uses feature-based and deep neural net policies (DDPG) in continuous state and action spaces. In experiment, authors evaluate PRM- RL, both in simulation and on-robot, on two navigation tasks: end-to-end differential drive indoor navigation in office environments, and aerial cargo delivery in urban environments.
Outline
- Abstract
- Introduction
- Reinforcement Learning
- Methods
- Results
Thank you.
Character Controllers using Motion VAEsDongmin Lee
Title: Character Controllers using Motion VAEs
Proceeding: ACM Transactions on Graphics (TOG) (Proc. SIGGRAPH 2020)
Paper: https://dl.acm.org/doi/abs/10.1145/3386569.3392422
Video: https://www.youtube.com/watch?v=Zm3G9oqmQ4Y
Given example motions, how can we generalize these to produce new purposeful motions?
We take a two-step approach to this problem
• Kinematic generative model based on an autoregressive conditional variational autoencoder or motion VAE (MVAE)
• Learning controller to generate desired motions using Deep Reinforcement Learning (Deep RL)
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Dongmin Lee
I reviewed the PEARL paper.
PEARL (Probabilistic Embeddings for Actor-critic RL) is an off-policy meta-RL algorithm to achieve both meta-training and adaptation efficiency. It performs probabilistic encoder filtering of latent task variables to enables posterior sampling for structured and efficient exploration.
Outline
- Abstract
- Introduction
- Probabilistic Latent Context
- Off-Policy Meta-Reinforcement Learning
- Experiments
Link: https://arxiv.org/abs/1903.08254
Thank you!
Exploration Strategies in Reinforcement LearningDongmin Lee
I presented about "Exploration Strategies in Reinforcement Learning" at AI Robotics KR.
- Exploration strategies in RL
1. Epsilon-greedy
2. Optimism in the face of uncertainty
3. Thompson (posterior) sampling
4. Information theoretic exploration (e.g., Entropy Regularization in RL)
Thank you.
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Chris Ohk
RL 논문 리뷰 스터디에서 Evolving Reinforcement Learning Algorithms 논문 내용을 정리해 발표했습니다. 이 논문은 Value-based Model-free RL 에이전트의 손실 함수를 표현하는 언어를 설계하고 기존 DQN보다 최적화된 손실 함수를 제안합니다. 많은 분들에게 도움이 되었으면 합니다.
Character Controllers using Motion VAEsDongmin Lee
Title: Character Controllers using Motion VAEs
Proceeding: ACM Transactions on Graphics (TOG) (Proc. SIGGRAPH 2020)
Paper: https://dl.acm.org/doi/abs/10.1145/3386569.3392422
Video: https://www.youtube.com/watch?v=Zm3G9oqmQ4Y
Given example motions, how can we generalize these to produce new purposeful motions?
We take a two-step approach to this problem
• Kinematic generative model based on an autoregressive conditional variational autoencoder or motion VAE (MVAE)
• Learning controller to generate desired motions using Deep Reinforcement Learning (Deep RL)
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Dongmin Lee
I reviewed the PEARL paper.
PEARL (Probabilistic Embeddings for Actor-critic RL) is an off-policy meta-RL algorithm to achieve both meta-training and adaptation efficiency. It performs probabilistic encoder filtering of latent task variables to enables posterior sampling for structured and efficient exploration.
Outline
- Abstract
- Introduction
- Probabilistic Latent Context
- Off-Policy Meta-Reinforcement Learning
- Experiments
Link: https://arxiv.org/abs/1903.08254
Thank you!
Exploration Strategies in Reinforcement LearningDongmin Lee
I presented about "Exploration Strategies in Reinforcement Learning" at AI Robotics KR.
- Exploration strategies in RL
1. Epsilon-greedy
2. Optimism in the face of uncertainty
3. Thompson (posterior) sampling
4. Information theoretic exploration (e.g., Entropy Regularization in RL)
Thank you.
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Chris Ohk
RL 논문 리뷰 스터디에서 Evolving Reinforcement Learning Algorithms 논문 내용을 정리해 발표했습니다. 이 논문은 Value-based Model-free RL 에이전트의 손실 함수를 표현하는 언어를 설계하고 기존 DQN보다 최적화된 손실 함수를 제안합니다. 많은 분들에게 도움이 되었으면 합니다.
An overview of gradient descent optimization algorithms Hakky St
勾配降下法についての論文をスライドにしたものです。
This is the slide for study meeting of gradient descent.
I use this paper and this is very good information about gradient descent.
https://arxiv.org/abs/1609.04747
We introduce a novel training procedure for policy gradient methods wherein episodic memory is used to optimize the hyperparameters of reinforcement learning algorithms on-the-fly.
Discusses the concept of Language Models in Natural Language Processing. The n-gram models, markov chains are discussed. Smoothing techniques such as add-1 smoothing, interpolation and discounting methods are addressed.
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksTaesu Kim
Paper review: "Model-Agnostic Meta-learning for fast adaptation of deep networks" by C. Finn et al (ICML2017)
Presented at Tensorflow-KR paper review forum (#PR12) by Taesu Kim
Paper link: https://arxiv.org/abs/1703.03400
Video link: https://youtu.be/fxJXXKZb-ik (in Korean)
http://www.neosapience.com
An overview of gradient descent optimization algorithms Hakky St
勾配降下法についての論文をスライドにしたものです。
This is the slide for study meeting of gradient descent.
I use this paper and this is very good information about gradient descent.
https://arxiv.org/abs/1609.04747
We introduce a novel training procedure for policy gradient methods wherein episodic memory is used to optimize the hyperparameters of reinforcement learning algorithms on-the-fly.
Discusses the concept of Language Models in Natural Language Processing. The n-gram models, markov chains are discussed. Smoothing techniques such as add-1 smoothing, interpolation and discounting methods are addressed.
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksTaesu Kim
Paper review: "Model-Agnostic Meta-learning for fast adaptation of deep networks" by C. Finn et al (ICML2017)
Presented at Tensorflow-KR paper review forum (#PR12) by Taesu Kim
Paper link: https://arxiv.org/abs/1703.03400
Video link: https://youtu.be/fxJXXKZb-ik (in Korean)
http://www.neosapience.com
FastCampus 2018 SLAM Workshop
You can find the code diagrams via the link below.
https://www.dropbox.com/sh/u76i5hzdecd4ey7/AADgs9XzXt6k1j971vyBrFTea?dl=0
Visual odometry & slam utilizing indoor structured environmentsNAVER Engineering
Visual odometry (VO) and simultaneous localization and mapping (SLAM) are fundamental building blocks for various applications from autonomous vehicles to virtual and augmented reality (VR/AR).
To improve the accuracy and robustness of the VO & SLAM approaches, we exploit multiple lines and orthogonal planar features, such as walls, floors, and ceilings, common in man-made indoor environments.
We demonstrate the effectiveness of the proposed VO & SLAM algorithms through an extensive evaluation on a variety of RGB-D datasets and compare with other state-of-the-art methods.
I updated the previous slides.
Previous slides: https://www.slideshare.net/DongMinLee32/causal-confusion-in-imitation-learning-238882277
I reviewed the "Causal Confusion in Imitation Learning" paper.
Paper link: https://papers.nips.cc/paper/9343-causal-confusion-in-imitation-learning.pdf
- Abstract
Behavioral cloning reduces policy learning to supervised learning by training a discriminative model to predict expert actions given observations. Such discriminative models are non-causal: the training procedure is unaware of the causal structure of the interaction between the expert and the environment. We point out that ignoring causality is particularly damaging because of the distributional shift in imitation learning. In particular, it leads to a counter-intuitive “causal misidentification” phenomenon: access to more information can yield worse performance. We investigate how this problem arises, and propose a solution to combat it through targeted interventions—either environment interaction or expert queries—to determine the correct causal model. We show that causal misidentification occurs in several benchmark control domains as well as realistic driving settings, and validate our solution against DAgger and other baselines and ablations.
- Outline
1. Introduction
2. Causality and Causal Inference
3. Causality in Imitation Learning
4. Experiments Setting
5. Resolving Causal Misidentification
- Causal Graph-Parameterized Policy Learning
- Targeted Intervention
6. Experiments
Thank you!
I reviewed the "Causal Confusion in Imitation Learning" paper.
- Abstract
Behavioral cloning reduces policy learning to supervised learning by training a discriminative model to predict expert actions given observations. Such discriminative models are non-causal: the training procedure is unaware of the causal structure of the interaction between the expert and the environment. We point out that ignoring causality is particularly damaging because of the distributional shift in imitation learning. In particular, it leads to a counter-intuitive “causal misidentification” phenomenon: access to more information can yield worse performance. We investigate how this problem arises, and propose a solution to combat it through targeted interventions—either environment interaction or expert queries—to determine the correct causal model. We show that causal misidentification occurs in several benchmark control domains as well as realistic driving settings, and validate our solution against DAgger and other baselines and ablations.
- Outline
1. Introduction
2. Causality and Causal Inference
3. Causality in Imitation Learning
4. Experiments Setting
5. Resolving Causal Misidentification
- Causal Graph-Parameterized Policy Learning
- Targeted Intervention
6. Experiments
Link: https://papers.nips.cc/paper/9343-causal-confusion-in-imitation-learning.pdf
Thank you!
Maximum Entropy Reinforcement Learning (Stochastic Control)Dongmin Lee
I reviewed the following papers.
- T. Haarnoja, et al., “Reinforcement Learning with Deep Energy-Based Policies", ICML 2017
- T. Haarnoja, et al., “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", ICML 2018
- T. Haarnoja, et al., “Soft Actor-Critic Algorithms and Applications", arXiv preprint 2018
Thank you.
안녕하세요. RL korea에서 "GAIL하자!" 라는 프로젝트를 진행했던 프로젝트 매니저 이동민이라고 합니다. 이 자료는 저희가 4개월동안 진행했던 과정들을 간략하게 소개하는 자료입니다.
저희 프로젝트는 Imitation Learning의 방법 중 하나인 "Inverse RL"에 대한 논문들의 이론적 바탕을 이해하고 이를 환경에 구현해보는 프로젝트를 진행하였습니다.
관련 논문 리스트는 다음과 같습니다.
[1] AY. Ng, et al., "Algorithms for Inverse Reinforcement Learning", ICML 2000.
[2] P. Abbeel, et al., "Apprenticeship Learning via Inverse Reinforcement Learning", ICML 2004.
[3] ND. Ratliff, et al., "Maximum Margin Planning", ICML 2006.
[4] BD. Ziebart, et al., "Maximum Entropy Inverse Reinforcement Learning", AAAI 2008.
[5] J. Ho, et al., "Generative Adversarial Imitation Learning", NIPS 2016.
[6] XB. Peng, et al., "Variational Discriminator Bottleneck. Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow", ICLR 2019.
프로젝트 결과로는 논문을 정리한 블로그와 논문을 구현한 Github가 있습니다. 링크는 다음과 같습니다.
- 블로그 : https://reinforcement-learning-kr.github.io/2019/01/22/0_lets-do-irl-guide/
- Github : https://github.com/reinforcement-learning-kr/lets-do-irl
우리 모두 함께 IRL해요!
감사합니다 :)
안녕하세요.
RL Korea에서 피지여행 프로젝트에 참여한 이동민입니다.
이 자료는 8월 25일 토요일에 제 1회 RLKorea 프로젝트 세미나에서 발표한 자료입니다. 아쉽게도 영상은 실행되지 않습니다.
세미나 링크는 아래와 같습니다.
제 1회 RLKorea 프로젝트 세미나 : https://www.facebook.com/groups/ReinforcementLearningKR/permalink/2024537701118793/?__tn__=H-R
피지여행을 간략하게 소개해드리자면, Policy Gradient와 관련된 논문들을 리뷰하여 블로그 정리하고 & 코드를 구현하여 실험하는 프로젝트입니다.
블로그와 깃허브는 다음과 같습니다.
정리 블로그 : https://reinforcement-learning-kr.github.io/…/0_pg-travel-…/
구현 깃허브 : https://github.com/reinforcement-learning-kr/pg_travel
많은 분들이 보시고 도움이 되었으면 좋겠습니다!
감사합니다~~!!!
안녕하세요. 이동민입니다. :)
2018. 8. 9일에 한국항공우주연구원에서 발표한 "Safe Reinforcement Learning" 발표 자료입니다.
목차는 다음과 같습니다.
1. Reinforcement Learning
2. Safe Reinforcement Learning
3. Optimization Criterion
4. Exploration Process
강화학습 계속 공부하면서 실제로 많은 분들이 쓸 수 있게 하려면 더 안전하고 빨라야한다는 생각이 들었습니다. 그래서 이에 관련하여 논문과 각종 자료들로 공부하여 발표하였습니다.
많은 분들께 도움이 되었으면 좋겠습니다. 감사합니다!
안녕하세요.
이번에 '1st 함께하는 딥러닝 컨퍼런스'에서 "안.전.제.일. 강화학습"이란 주제로 발표한 이동민이라고 합니다.
컨퍼런스 관련 링크는 다음과 같습니다.
https://tykimos.github.io/2018/06/28/ISS_1st_Deep_Learning_Conference_All_Together/
그리고 대략적인 개요는 다음과 같습니다.
1. What is Artificial Intelligence?
2. What is Reinforcement Learning?
3. What is Artificial General Intelligence?
4. Planning and Learning
5. Safe Reinforcement Learning
또한 이 자료에는 "Imagination-Augmented Agents for Deep Reinforcement Learning"이라는 논문을 자세히 설명하였습니다.
많은 분들이 보시고 도움이 되셨으면 좋겠습니다~!
Planning and Learning with Tabular MethodsDongmin Lee
Hello~!
This material is reviewed by Dong-Min Lee.
(http://www.facebook.com/dongminleeai)
A reviewed material is Chapter 8. Planning and Learning with Tabular Methods in Reinforcement Learning: An Introduction written by Richard S. Sutton and Andrew G. Barto.
Outline is
1) Introduction
2) Models and Planning
3) Dyna: Integrating Planning, Acting, and Learning
4) When the Model Is Wrong
5) Prioritized Sweeping
6) Expected vs. Sample Updates
7) Trajectory Sampling
8) Planning at Decision Time
9) Heuristic Search
10) Rollout Algorithms
11) Monte Carlo Tree Search
12) Summary
I'm happy to be reviewed Sutton book~!!!
I hope everyone who studies Reinforcement Learning sees this material and help! :)
Thank you!
Hello~! :)
While studying the Sutton-Barto book, the traditional textbook for Reinforcement Learning, I created PPT about the Multi-armed Bandits, a Chapter 2.
If there are any mistakes, I would appreciate your feedback immediately.
Thank you.
안녕하세요.
강화학습을 공부하면서 처음 접하시는 분들을 위해 ppt로 '강화학습의 개요'에 대해서 정리했습니다.
동물이 학습하는 것과 똑같이 시행착오를 겪으면서 학습하는 강화학습은 기계학습 분야에서 상당히 매력적이라고 생각합니다.
https://www.youtube.com/watch?v=PQtDTdDr8vs&feature=youtu.be
위의 링크는 스키너의 쥐 실험 영상입니다.
감사합니다.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Fundamentals of Electric Drives and its applications.pptx
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning
1. PRM-RL: Long-range Robotics Navigation
Tasks by Combining Reinforcement
Learning and Sampling-based Planning
IEEE International Conference on Robotics and Automation (ICRA), 2018
Best Paper Award in Service Robotics
Aleksandra Faust et al.
Google Brain Robotics
Presented by Dongmin Lee
December 1, 2019
3. Abstract
2
PRM-RL (Probabilistic Roadmap-Reinforcement Learning):
• A hierarchical method for long-range navigation task
• Combines sampling-based path planning with RL
• Uses feature-based and deep neural net policies (DDPG) in continuous
state and action spaces
Experiments: simulation and robot on two navigation tasks (end-to-end)
• Indoor (drive) navigation in office environments - selected
• Aerial cargo delivery in urban environments
4. Abstract
3
PRM-RL (Probabilistic Roadmap-Reinforcement Learning):
• A hierarchical method for long-range navigation task
• Combines sampling-based path planning with RL
• Uses feature-based and deep neural net policies (DDPG) in continuous
state and action spaces
Experiments: simulation and robot on two navigation tasks (end-to-end)
• Indoor (drive) navigation in office environments - selected
• Aerial cargo delivery in urban environments
5. Introduction
4
PRM-RL YouTube video
• https://bit.ly/34zCTmd
Traditional Motion Planning (or Path Planning)
• CS287 Advanced Robotics (Fall 2019), Lecture 9: Motion Planning
• https://people.eecs.berkeley.edu/~pabbeel/cs287-fa19/slides/Lec10-
motion-planning.pdf
Probabilistic Roadmap (PRM) YouTube video
• https://bit.ly/34rRKz0
• https://bit.ly/35Nb61Q
Rapidly-exploring Random Tree* (RRT*) YouTube video
• https://bit.ly/2OXiocb
• https://bit.ly/2OQbUvM
6. 5
RL provides a formalism for behaviors
• Problem of a goal-directed agent interacting with an uncertain environment
• Interaction à adaptation
feedback & decision
Reinforcement Learning
7. 6
What are the challenges of RL?
• Huge # of samples: millions
• Fast, stable learning
• Hyperparameter tuning
• Exploration
• Sparse reward signals
• Safety / reliability
• Simulator
Reinforcement Learning
8. 7
What are the challenges of RL?
• Huge # of samples: millions
• Fast, stable learning
• Hyperparameter tuning
• Exploration
• Sparse reward signals due to long-range navigation
• Safety / reliability
• Simulator
Reinforcement Learning
9. 8
What are the challenges of RL?
• Huge # of samples: millions
• Fast, stable learning
• Hyperparameter tuning
• Exploration
• Sparse reward signals due to long-range navigation
à Solve with hierarchical waypoints
• Safety / reliability
• Simulator
Reinforcement Learning
10. 9
So, What’s the advantage of PRM-RL than traditional methods?
• In PRM-RL, an RL agent is trained to execute a local point-to-point task
without knowledge of the topology, learning the task constraints.
• The PRM-RL builds a roadmap using the RL agent instead of the traditional
collision-free straight-line planner.
• Thus, the resulting long-range navigation planner combines the planning
efficiency of a PRM with the robustness of an RL agent.
Introduction
12. 11
Three stages:
1. RL agent training
2. PRM construction (roadmap creation)
3. PRM-RL querying (roadmap querying)
Methods
13. 12
Methods
1. RL agent training
Definition
• 𝑆: robot’s state space
• 𝑠: start state in state space 𝑆
• 𝑔: goal state in state space 𝑆
• C-space: a space of all possible robot configurations
(e.g., state space 𝑆 is a superset of the C-space)
• C-free: a partition of C-space consisting of only collision-free paths
• 𝐿(𝑠): some task predicate (attribute) to satisfies the task constraints
• 𝑝(𝑠): a state space point’s estimate onto C-space that belong in C-free
The task is completed when the system is sufficiently close to the goal state:
∥ 𝑝 𝑠 − 𝑝 𝑔 ∥ ≤ 𝜖
Our goal is to find a transfer function:
𝑠, = 𝑓(𝑠, 𝑎)
14. 1. RL agent training
Markov Decision Process (MDP):
• 𝑆: 𝑆 ⊂ ℝ34 is the state or observation space of the robot
• 𝑠: 𝑠 = (𝑔, 𝑜) ∈ ℝ77
(the goal 𝑔 in polar coordinates and LIDAR observations 𝑜)
• 𝐴: 𝐴 ⊂ ℝ39 is the space of all possible actions that the robot can perform
• 𝑎: 𝑎 = (𝑣;, 𝑣<) ∈ ℝ=
(two-dimensional vector of wheel speeds)
• 𝑃: 𝑆 × 𝐴 → ℝ is a probability distribution over state and actions. We assume a presence of
a simplified black-box simulator without knowing the full non-linear system dynamics
• 𝑅: 𝑆 → ℝ is a scalar reward. We reward the agent for staying away from obstacles.
Our goal is to find a policy 𝜋 ∶ 𝑆 → 𝐴:
𝜋 𝑠 = 𝑎
Given an observed state 𝑠, returns an action 𝑎 that agent should perform to maximize
long-term return:
𝜋∗(𝑠) = arg max
J∈K
𝔼 M
NOP
Q
𝛾N 𝑅 𝑠N
13
Methods
15. 14
Methods
1. RL agent training
Markov Decision Process (MDP):
• 𝑆: 𝑆 ⊂ ℝ34 is the state or observation space of the robot
• 𝑠: 𝑠 = (𝑔, 𝑜) ∈ ℝ77
(the goal 𝑔 in polar coordinates and LIDAR observations 𝑜)
• 𝐴: 𝐴 ⊂ ℝ39 is the space of all possible actions that the robot can perform
• 𝑎: 𝑎 = (𝑣;, 𝑣<) ∈ ℝ=
(two-dimensional vector of wheel speeds)
• 𝑃: 𝑆 × 𝐴 → ℝ is a probability distribution over state and actions. We assume a presence of
a simplified black-box simulator without knowing the full non-linear system dynamics
• 𝑅: 𝑆 → ℝ is a scalar reward. We reward the agent for staying away from obstacles.
Our goal is to find a policy 𝜋 ∶ 𝑆 → 𝐴:
𝜋 𝑠 = 𝑎
Given an observed state 𝑠, returns an action 𝑎 that agent should perform to maximize
long-term return:
𝜋∗(𝑠) = arg max
J∈K
𝔼 M
NOP
Q
𝛾N 𝑅 𝑠N
16. 15
Methods
1. RL agent training
Training with DDPG algorithm for the indoor navigation tasks
18. 17
Methods
3. PRM querying (roadmap querying)
Generate long-range trajectories
• We query a roadmap that return a list of waypoints to a higher-level planner.
• The higher-level planner then invokes a RL agent to produce a trajectory to the next
waypoint.
• When the robot is within the waypoint’s goal range, the higher-level planner changes
the goal with the next waypoint in the list.
19. 18
Results
Indoor navigation
1. Roadmap construction evaluation
2. Expected trajectory characteristics
3. Actual trajectory characteristics
4. Physical robot experiments
à Each roadmap is evaluated on 100 randomly generated queries from the C-free.
20. 1. Roadmap construction evaluation
• The higher sampling density produces larger maps and more successful
queries.
• The number of nodes in the map does not depend on the local planner, but
the number of edges and collision checks do.
• Roadmaps built with the RL local planner are more densely connected with 15%
and 50% more edges.
• The RL agent can go around the corners and small obstacles.
19
Results
21. 20
Results
2. Expected trajectory characteristics
• The RL agent does not require the robot to come to rest at the goal region,
therefore the robot experiences some inertia when the waypoint is
switched. This causes some of the failures.
• The PRM-RL paths contain more waypoints except Building 3.
• Expected trajectory length and duration are longer for the RL agent.
22. 21
Results
3. Actual trajectory characteristics
• We look at the query characteristics for successful versus unsuccessful
queries.
• The RL agent produces higher success rate than the PRM-SL.
• The successful trajectories have fewer waypoints than the expected
waypoints, which means that the shorter queries are more likely to succeed.
23. 4. Physical robot experiments
• To transfer of our approach on a real robot, we created a simple slalom-like
environment with four obstacles.
22
Results