Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Tractable Robust Planning and
Model Learning Under Uncertainty
Jonathan P. How
Aerospace Controls Laboratory
MIT
jhow@mit....
J. How (MIT) 3
Autonomous Systems: Opportunity
• New era of information and data availability
– To perform efficient data ...
J. How (MIT) 4
Example: Driving with Uncertainty
• Goal: Improve road safety for urban driving
• Challenge: World complex ...
J. How (MIT) 5
Planning Without Learning
4/7/2014
J. Leonard, J. How, S. Teller, M. Berger, S. Campbell, G. Fiore, L. Flet...
J. How (MIT) 6
Example: UAV Turing Test
• Challenge: Autonomous operation at uncontrolled airport
– UAV must approach unco...
J. How (MIT) 7
Challenges
• Goal: Automate mission planning to improve performance
for multiple UAVs in dynamic, uncertain...
J. How (MIT) 8
Similar Challenges in Many Domains
Civil UAVs
Military UAVs
Space Vehicles
Manufacturing
J. How (MIT) 9
Planning Challenges
• Issue: most planners are model based, which enables
anticipation
• But models are oft...
J. How (MIT) 10
Planning and Learning
• Two standard approaches
• Baseline Control Algorithms (BCA)
– Fast solutions, but ...
J. How (MIT) 11
Planning and Learning
• Intelligent Cooperative Control
Architecture (iCCA)
– Synergistic integration of p...
J. How (MIT) 12
Reinforcement Learning
• Vision: Agents that learn desired behavior from
demonstrations or environment sig...
J. How (MIT) 13
Learning from Demonstration (LfD)
• LfD intuitive method for teaching autonomous system
• Reward vs policy...
J. How (MIT) 14
Experiment: BNIRL for Learning
Quadrotor Flight Maneuvers
Experiments
J. How (MIT) 15
Experimental Results: GPSRL for
Learning RC Car Driving Maneuvers
Introduction Bayesian Nonparametric IRL ...
J. How (MIT) 16
Experimental Results: GPSRL for
Learning RC Car Driving Maneuvers
• Continuous, unsegmented
demonstration ...
J. How (MIT) 17
Scaling Reinforcement Learning
• Vision: Use learning methods to improve
UAV team performance over time
– ...
J. How (MIT) 18
RLPy: RL for Education & Research
• Provides growing library of fine-
grained modules for experiments
– (5...
J. How (MIT) 19
Multi-Fidelity Reinforcement Learning
• Vision: Leverage simulators to learn optimal behavior with few
rea...
J. How (MIT) 20
Bayesian Nonparametric Models for Robotics
• Often significant uncertainty about behaviors
and intents of ...
J. How (MIT) 21
Fast BNP Learning
• Vision: Flexible learning for temporally evolving data
without sacrificing speed (real...
J. How (MIT) 22
Experimental Implementation
• Sgun movie
4/7/2014
J. How (MIT) 23
Example: Driving with Uncertainty
• Goal: Improve road safety for urban driving
• Challenge: World complex...
J. How (MIT) 24
Approach
• Simultaneous trajectory prediction and robust avoidance}
of multiple obstacle classes (static a...
J. How (MIT) 25
CC-RRT* for Robust Motion Planning
• Real-time optimizing algorithm with guaranteed
probabilistic robustne...
J. How (MIT) 26
Robust Planning Examples
4/7/2014
J. How (MIT) 27
1
2
3
4 5 6 7 8 9
10
11
12
1314
Friday, May 10, 13
Reliable Autonomy for Transportation
• Vision: Safe rel...
J. How (MIT) 28
Multiagent Planning With Learning
• Mission: Visually detect target vehicles, then persistently
perform tr...
J. How (MIT) 29
Conclusions
• New era of information and data availability
– Many new opportunities in guidance/control & ...
Upcoming SlideShare
Loading in …5
×

Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learning Under Uncertainty

  • Login to see the comments

  • Be the first to like this

Autonomy Incubator Seminar Series: Tractable Robust Planning and Model Learning Under Uncertainty

  1. 1. Tractable Robust Planning and Model Learning Under Uncertainty Jonathan P. How Aerospace Controls Laboratory MIT jhow@mit.edu March 17th, 2014
  2. 2. J. How (MIT) 3 Autonomous Systems: Opportunity • New era of information and data availability – To perform efficient data interpretation and information extraction  “Big data” and “Data-to-Decisions” – In many application domains, including transportation, environment, ocean exploration and healthcare • Maturing vehicle GNC raises new challenges in mission design for heterogeneous manned and autonomous assets • Cost savings and throughput demands driving rapid infusion of robotic technologies for airframe manufacturing/maintenance – DARPA driving paradigm shift in rapid prototyping, manufacturing • Rapid solutions being developed to policy issues of autonomous systems integrated into society – Google car, manufacturing
  3. 3. J. How (MIT) 4 Example: Driving with Uncertainty • Goal: Improve road safety for urban driving • Challenge: World complex & dynamic – Must safely avoid many types of uncertain static and dynamic obstacles – Must accurately anticipate other vehicles' intents and assess danger involved Reliable Autonomy for Transportation Systems – Inference/navigation in dynamic and unstructured environments — GPS denied navigation – Provably-correct, real-time planning planning – Safety & Probabilistic risk assessment – Learning — Model and policy learning – Shaping autonomy for use by human operators Navigating busy intersections DGC '07: MIT/Cornell accident 4/7/2014
  4. 4. J. How (MIT) 5 Planning Without Learning 4/7/2014 J. Leonard, J. How, S. Teller, M. Berger, S. Campbell, G. Fiore, L. Fletcher, E. Frazzoli, A. Huang, S. Karaman, et al., A perception-driven autonomous urban vehicle. Springer, 2009.
  5. 5. J. How (MIT) 6 Example: UAV Turing Test • Challenge: Autonomous operation at uncontrolled airport – UAV must approach uncontrolled airport, integrate into traffic pattern and land in a way that is indistinguishable from a human pilot as observed by other aircraft • Problem interesting because while general structure of traffic is known, specifics must be sensed and behavior of other traffic inferred
  6. 6. J. How (MIT) 7 Challenges • Goal: Automate mission planning to improve performance for multiple UAVs in dynamic, uncertain world – Real-time planning – Exploration & exploitation — data fusion – Planning/inference over contested communication networks – Human-autonomy interaction • Challenges: – Uncertainty: World model is not fully known. – Dynamic: Objective, world or world model may change – Stochastic: Same behavior in same situation may result in a different outcome – Safety: Arbitrary behaviors can be detrimental to mission/system 4/7/2014
  7. 7. J. How (MIT) 8 Similar Challenges in Many Domains Civil UAVs Military UAVs Space Vehicles Manufacturing
  8. 8. J. How (MIT) 9 Planning Challenges • Issue: most planners are model based, which enables anticipation • But models are often approximated and/or wrong – Model parameter uncertainties – Modeling errors • Can yield sub-optimal planner output with large performance mismatch – Possibly catastrophic mission impact 4/7/2014
  9. 9. J. How (MIT) 10 Planning and Learning • Two standard approaches • Baseline Control Algorithms (BCA) – Fast solutions, but based on simplified models  sub-optimal – Can provide good foundation to boot-strap learning • Mitigates catastrophic mistakes • Online Adaptation/Learning Algorithms – Handle stochastic system/unknown models – Computational and sample complexity issues – Exploration can be dangerous – Can improve on BCA by adapting to time-varying environment and mission, and generating new strategies that are most beneficial • Issue: how develop architecture that realizes this synergistic combination 4/7/2014
  10. 10. J. How (MIT) 11 Planning and Learning • Intelligent Cooperative Control Architecture (iCCA) – Synergistic integration of planning and safe learning to improve performance – Sand-boxing for planning and learning • Example: 2 UAVs, 6 targets sim (108 state action pairs) – Cooperative learners perform well with respect to overall reward and risk levels when compared with baseline planner (CBBA) and non-cooperative learning algorithms 4/7/2014 1 2 3 .5[2,3] +100 4 .5 [2,3] +100 5 [3,4] +200 5 8 6 +100 .7 7 +300 .6 40% 50% 60% 70% 80% 90% Optimality Learner Planner-Conservative Planner-Aggressive iCCA iCCA+AdaptiveModel iCCA can improve baseline planner performance, but how solve learning problems in real-time? Geramifard et al ``Intelligent cooperative control architecture: A framework for performance improvement using safe learnings,'’ Journal of Intelligent and Robotic Systems, Vol. 72, pp.~83–-103, October 2013.
  11. 11. J. How (MIT) 12 Reinforcement Learning • Vision: Agents that learn desired behavior from demonstrations or environment signals. • Challenge: Continuous/high-dimensional environments make learning intractable Algorithm Properties Bayesian Inverse Reinforcement Learning (BNIRL) Efficient inference of subgoals from human demonstrations in continuous domains Incremental Feature Dependency Discovery (iFDD) Computationally cheap feature expansion & online learning Multi-Fidelity Reinforcement Learning (MFRL) Efficient use of simulators to explore areas where real-world samples not needed 4/7/2014 B. Michini, M. Cutler, and J. P. How, “Scalable reward learning from demonstration,” in IEEE Interna- tional Conference on Robotics and Automation (ICRA), IEEE, 2013. A. Geramifard, F. Doshi, J. Redding, N. Roy, and J. How, “Online discovery of feature dependencies,” in International Conference on Machine Learning (ICML), pp. 881–888, June 2011. M. Cutler, T. J. Walsh, and J. P. How, “Reinforcement learning with multi-fidelity simulators,” in IEEE International Conference on Robotics and Automation (ICRA),, June 2014
  12. 12. J. How (MIT) 13 Learning from Demonstration (LfD) • LfD intuitive method for teaching autonomous system • Reward vs policy learning: succinct representation and transferable, but – Ill-posed (many potential solutions exist) – Must assume model of rationality for demonstrator – Many demonstrations contain multiple tasks • Current methods (e.g. IRL, Ng ‘00) have limitations – Parametric rewards; scalability; single reward per demonstration • Developed Bayesian Nonparametric Inverse RL – Learn multiple subgoal rewards from single demonstration – Number of rewards learned, not specified – Strategies given for scalability (approximations, parallelizable) 4/7/2014 B. Michini, M. Cutler, and J. P. How, “Scalable reward learning from demonstration,” in IEEE Interna- tional Conference on Robotics and Automation (ICRA), IEEE, 2013.
  13. 13. J. How (MIT) 14 Experiment: BNIRL for Learning Quadrotor Flight Maneuvers Experiments
  14. 14. J. How (MIT) 15 Experimental Results: GPSRL for Learning RC Car Driving Maneuvers Introduction Bayesian Nonparametric IRL Gaussian Process SRL Experiments Conclusions 31
  15. 15. J. How (MIT) 16 Experimental Results: GPSRL for Learning RC Car Driving Maneuvers • Continuous, unsegmented demonstration captured and downsampled • GPSRL partitions demonstration and learns corresponding subgoal reward functions
  16. 16. J. How (MIT) 17 Scaling Reinforcement Learning • Vision: Use learning methods to improve UAV team performance over time – Typically very high-dimensional state space – Computationally challenging • Steps: – Developed incremental Feature Dependency Discovery (iFDD) as novel adaptive function approximator • Results: – iFDD has cheap computational complexity and asymptotic convergence guarantees – iFDD outperforms other methods 4/7/2014 A. Geramifard, F. Doshi, J. Redding, N. Roy, and J. How, “Online discovery of feature dependencies,” in International Conference on Machine Learning (ICML), pp. 881–888, June 2011.
  17. 17. J. How (MIT) 18 RLPy: RL for Education & Research • Provides growing library of fine- grained modules for experiments – (5) Agents, (4) Policies, (10) Representations, (20) Domains – Modules can be recombined, frees researcher from reimplementation • Reproducible, parallel, platform- independent experiments – Rapid prototyping (Python), support for optimized C code (Cython) • Tools to automate all parts of experiment pipeline – Domain visualization for troubleshooting – Automatic hyperparameter tuning 4/7/2014 http://acl.mit.edu/RLPy/
  18. 18. J. How (MIT) 19 Multi-Fidelity Reinforcement Learning • Vision: Leverage simulators to learn optimal behavior with few real world samples • Challenges: – What knowledge should be shared between agents learning on different simulators? – Choosing which simulator to sample - Low-fidelity simulators are less costly but less accurate • Contributions: Developed MFRL – Lower-fidelity agents send up values to guide exploration – High-fidelity agents send down learned parameters – Rules for switching levels guarantee limited number of simulator changes and efficient exploration 4/7/2014 Lowest Fidelity Highest Fidelity
  19. 19. J. How (MIT) 20 Bayesian Nonparametric Models for Robotics • Often significant uncertainty about behaviors and intents of other agents in the environment – Bayesian nonparametric models (BNPs) uniquely provide flexibility to learn model size & parameters – Important because it is often very difficult to pre-specify model size • Example: Gaussian Process (GP) BNP for continuous functions – Can learn number of motion models and their velocity fields using Dirichlet process GP mixture (DP-GP) – Can also capture temporally evolving behaviors using DDP-GP • Application: threat assessment – Model, classify & assess intent/behavior of other drivers and pedestrians – Embed in robust planner (CC-RRT*) – Driver aid and/or autonomous car 4/7/2014 T. Campbell, S. S. Ponda, G. Chowdhary, and J. P. How, “Planning under uncertainty using nonparametric Bayesian models,” in AIAA Guidance, Navigation, and Control Conference (GNC), August 2012. G. S. Aoude, B. D. Luders, J. M. Joseph, N. Roy, and J. P. How, “Probabilistically safe motion planning to avoid dynamic obstacles with uncertain motion patterns,” Autonomous Robots, vol. 35, no. 1, pp. 51–76, 2013. D. Lin, E. Grimson, and J. Fisher, “Construction of dependent dirichlet processes based on poisson processes,” in Neural Information Processing Systems, 2010.
  20. 20. J. How (MIT) 21 Fast BNP Learning • Vision: Flexible learning for temporally evolving data without sacrificing speed (real-time robotic systems) • Challenges: – Flexible models are computationally demanding (e.g., Gibbs sampling for DP-GP, DDP-GP) – Computationally cheap models are rigid • Results: Dynamic Means – Derived from low-variance asymptotic analysis of DDP mixture – Cluster birth, death, and transitions – Guaranteed monotonic convergence in clustering cost 4/7/2014 % Label Accuracy log10 CPU Time T. Campbell, M. Liu, B. Kulis, J. P. How, and L. Carin, “Dynamic clustering via asymptotics of the dependent dirichlet process,” in Advances in Neural Information Processing Systems (NIPS), 2013.
  21. 21. J. How (MIT) 22 Experimental Implementation • Sgun movie 4/7/2014
  22. 22. J. How (MIT) 23 Example: Driving with Uncertainty • Goal: Improve road safety for urban driving • Challenge: World complex & dynamic – Must safely avoid many types of uncertain static and dynamic obstacles – Must accurately anticipate other vehicles' intents and assess danger involved • Objective: Develop probabilistic models of environment (cars, pedestrians, cyclists,...), and robust path planner which utilizes models to safely navigate urban environments – Distributions over possible intents, and trajectories for each intent – Efficient enough for real-time use Navigating busy intersections DGC '07: MIT/Cornell accident 4/7/2014
  23. 23. J. How (MIT) 24 Approach • Simultaneous trajectory prediction and robust avoidance} of multiple obstacle classes (static and dynamic) • DP-GP: automatically classifies trajectories into behavior patterns; uses GP mixture model to compute – Probability of being in each motion pattern given observed trajectory – Position distribution within each pattern at future timesteps  probabilistic models for propagated (intent, path) uncertainty • RR-GP: refines predictions based on dynamics, environment • CC-RRT*: optimized, robust motion planning 4/7/2014 B. D. Luders, S. Karaman, and J. P. How, “Robust sampling-based motion planning with asymptotic optimality guarantees,” in AIAA Guidance, Navigation, and Control Conference (GNC), (August 2013. G. S. Aoude, B. D. Luders, J. M. Joseph, N. Roy, and J. P. How, “Probabilistically safe motion planning to avoid dynamic obstacles with uncertain motion patterns,” Autonomous Robots, vol. 35, no. 1, pp. 51–76, 2013.
  24. 24. J. How (MIT) 25 CC-RRT* for Robust Motion Planning • Real-time optimizing algorithm with guaranteed probabilistic robustness to internal/external uncertainty – Leverages RRT: anytime algorithm; quickly explores large state spaces; dynamically feasibility; trajectory-wise constraint checking • CC-RRT: efficient online risk evaluation – Well-suited to real-time planning/updates with DPGP motion models • RRT*: asymptotic optimality • CC-RRT* is a very scalable algorithm 4/7/2014 S. Karaman and E. Frazzoli, “Sampling-based algorithms for optimal motion planning,” International Journal of Robotics Research, vol. 30, pp. 846–894, June 2011.
  25. 25. J. How (MIT) 26 Robust Planning Examples 4/7/2014
  26. 26. J. How (MIT) 27 1 2 3 4 5 6 7 8 9 10 11 12 1314 Friday, May 10, 13 Reliable Autonomy for Transportation • Vision: Safe reliable autonomy crucial component of future acceptance and deployment of autonomous systems • Objective: Develop reliable autonomous systems that can operate safely and effectively for long durations in complex and dynamic environments – Control theory, verification and validation, autonomous systems, and software safety • Currently developing Mobility on Demand system on campus – Builds on SMART (Frazzoli) 4/7/2014
  27. 27. J. How (MIT) 28 Multiagent Planning With Learning • Mission: Visually detect target vehicles, then persistently perform track/surveillance using UGV and UAVs – On-line planning and learning – Sensor failure transition model learned using iFDD – Policy is re-computed online using Dec-MMDP • Cumulative cost reduces during mission – Improved performance due to learning • Number swaps per time period reduces – Team learns that initial probability of sensor failure too pessimistic 4/7/2014 0 0.5 1 1.5 2 2.5 3 3.5 180 190 200 210 220 230 240 250 260 Time (hours) IntermediateCumulativeCost 0 0.5 1 1.5 2 2.5 3 3.5 0 5 10 15 20 25 Time (hours) Numberofswapsper30minutes N. K. Ure, G. Chowdhary, Y. F. Chen, J. P. How, and J. Vian, “Distributed learning for planning under uncertainty problems with heterogeneous teams,” Journal of Intelligent and Robotic Systems, pp. 1–16, 2013.
  28. 28. J. How (MIT) 29 Conclusions • New era of information and data availability – Many new opportunities in guidance/control & robotics • Learning and adaptation are keys to reliable autonomy – Overcome the sample and computational complexity – More realistic applications • Discussed Model Learning, but similar strategies for Policy Learning • Very exciting times: Autonomous cars and UAS in NAS in our lifetime?? • Many references available at http://acl.mit.edu 4/7/2014

×