Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Continual Reinforcement Learning in 3D Non-stationary Environments

7 views

Published on

Dynamic and always-changing environments constitute an hard challenge for current reinforcement learning techniques. Artificial agents, nowadays, are often trained in very static and reproducible conditions in simulation, where the common assumption is that observations can be sampled i.i.d from the environment. However, tackling more complex problems and real-world settings this can be rarely considered the case, with environments often non-stationary and subject to unpredictable, frequent changes. In this talk we discuss about a new open benchmark for learning continually through reinforce in a complex 3D non-stationary object picking task based on VizDoom and subject to several environmental changes. We further propose a number of end-to-end, model-free continual reinforcement learning strategies showing competitive results even without any access to previously encountered environmental conditions or observations.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Continual Reinforcement Learning in 3D Non-stationary Environments

  1. 1. Continual Reinforcement Learning in 3D Non-stationary Environments UPF - Computational Science Lab 29-03-2019 Vincenzo Lomonaco vincenzo.lomonaco@unibo.it Research Fellow @ University of Bologna Founder of ContinualAI.org
  2. 2. About me • Research Fellow @ University of Bologna • Visiting Scholar @ ENSTA ParisTech • Visiting Scholar @ Purdue University • Phd Students’ Representative of the Department of Computer Science and Engineering. • Teaching Assistant of the courses Machine Learning and Computer Architectures. • Author andTechnical reviewer of the online course Deep Learning with R and book R Deep Learning Essentials
  3. 3. ContinualAI non-profit Research Organization http://continualai.org https://continualai.herokuapp.com/
  4. 4. Outline 1. Introduction to Continual Learning (CL) 2. Continual Reinforcement Learning (CRL) 3. CRLMaze: an Ever-changing 3D Environment for CL 4. CRL Strategies 5. Experiments and Results 6. Discussion and Conclusions
  5. 5. State-of-the-art • Deep Learning holds state-of-the-art performances in many tasks. • Mainly supervised training with huge and fixed datasets.
  6. 6. State-of-the-art • Deep Learning holds state-of-the-art performances in many tasks. • Mainly supervised training with huge and fixed datasets.
  7. 7. State-of-the-art • Deep Learning holds state-of-the-art performances in many tasks. • Mainly supervised training with huge and fixed datasets.
  8. 8. State-of-the-art • Deep Learning holds state-of-the-art performances in many tasks. • Mainly supervised training with huge and fixed datasets.
  9. 9. The Curse of Dimensionality # of possible 227x227 RGB images
  10. 10. The Curse of Dimensionality # of possible 227x227 RGB images
  11. 11. The Curse of Dimensionality # of possible 227x227 RGB images
  12. 12. How can we improve AI scalability and adaptability? (Hence ubiquitousness and autonomy)
  13. 13. Continual Learning
  14. 14. Continual Learning
  15. 15. Continual Learning (CL) • Higher and realistic time-scale where data (and tasks) become available only during time. • No access to previously encountered data. • Constant computational and memory resources. • Incremental development of ever more complex knowledge and skills.
  16. 16. Why CL is a challenging (and fun) problem?
  17. 17. CL Strategies Architectural Regularization Rehearsal CWR PNN EWC SI LWF ICARL AR1 GEM Pure Rehearsal
  18. 18. A Gentle Introduction to CL in PyTorch https://github.com/ContinualAI/colab
  19. 19. Common CL benchmarks Dataset Strategy Permuted MNIST EWC, GEM, SI, ... Rotated MNIST GEM MNIST Split SI CIFAR10/100 Split GEM, iCARL, SI, AR1, ... ILSVRC2012 iCARL CORe50 iCARL, GEM, SI, EWC, CWR, CWR+, GEM, ...
  20. 20. CORe50: aVideo Benchmark for CL and Object Recognition, Detection & Segmentation Dataset, Benchmark, code and additional information freely available at: vlomonaco.github.io/core50
  21. 21. AR-1 Combining Architectural and Regularization approaches LomonacoV. and Maltoni D. Continuous Learning in Single-Incremental-Task Scenarios. Pre-print arxiv:1806.08568v2.
  22. 22. What about Continual Reinforcement Learning (CRL)? Reinforcement Learning
  23. 23. What about Continual Reinforcement Learning (CRL)? Continual Reinforcement Learning
  24. 24. CRL Environments Environments Scenarios Atari Multiple 2D games DeepMind Lab Maze Exploration, Object Picking Malmo Multiple tasks OpenAI Gym Multiple 3D tasks MuJoCo Multiple Joint Stiffness VizDoom - Unity 3D - StarCraft II Curriculum learning
  25. 25. Some References for CRL • Al-Shedivat, Maruan, et al. "Continuous adaptation via meta-learning in nonstationary and competitive environments." arXiv preprint arXiv:1710.03641 (2017). • Tessler, Chen, et al. "A deep hierarchical approach to lifelong learning in minecraft."Thirty-First AAAI Conference on Artificial Intelligence. 2017. • Kirkpatrick, James, et al. "Overcoming catastrophic forgetting in neural networks." Proceedings of the national academy of sciences 114.13 (2017): 3521-3526. • Schwarz, Jonathan, et al. "Progress & compress: A scalable framework for continual learning." arXiv preprint arXiv:1805.06370 (2018). • Kaplanis, Christos, Murray Shanahan, and Claudia Clopath. "Continual reinforcement learning with complex synapses." arXiv preprint arXiv:1802.07239 (2018).
  26. 26. CRLMaze: an Ever-changing 3D Environment for CL LomonacoV., Desai K., Maltoni D. and Culurciello, E. Continual Reinforcement Learning in 3D non-stationary environments.To be published.
  27. 27. CRLMaze: an Ever-changing 3D Environment for CL LomonacoV., Desai K., Maltoni D. and Culurciello, E. Continual Reinforcement Learning in 3D non-stationary environments.To be published. VIDEO!
  28. 28. CRLMaze: Build with ZDoom and Slade3 LomonacoV., Desai K., Maltoni D. and Culurciello, E. Continual Reinforcement Learning in 3D non-stationary environments.To be published. DEMO!
  29. 29. Objectives • Avoid Forgetting • Speeding-up Adaptation • Avoid Overfitting Improve Generalization and Robustness … without the task supervised signal!
  30. 30. 3D Non-stationary environment 100% 62% 50%
  31. 31. 100% 62% 50% end 01 end 02 end 03 3D Non-stationary environment
  32. 32. ConsideredVariations
  33. 33. Object andTextures
  34. 34. Experiments: Some Details • Batched A2C RL algorithm, implemented in PyTorch over single GPU • 20 agents running in parallel over 20VizDoom instances • No custom parametrization, memory replay • 1000 ticks for episode (gradient update every 20) • Little shaping reward for speeding-up convergence
  35. 35. CRL Strategies
  36. 36. CRL Strategies • Multi-Environment (baseline): all the environment variations are available in parallel to the agent • Naive (baseline): no strategy is employed when the env. changes • Supervised: the timestep in which the env. change is known to the agent which triggers consolidation • Unsupervised: the agent decides when to trigger consolidation based on the reward difference • Static: consolidation is triggered at fixed timesteps
  37. 37. ElasticWeights Consolidation (EWC) Fisher Information ...
  38. 38. CRL Strategies • Always computed every n timesteps • λ is different depending on the strategy Unsupervised
  39. 39. Experiments Results (avg. over 10 runs)
  40. 40. A Metrics Results
  41. 41. Conclusions • Consider fixed feature extractor + efficiency improvements • Combine CL + SRL + RL • More principled solution embeedding the Unsup solution in the loss function • Look at the reward but also at the environment apparences with a single model Open Questions • Positive transfer makes harder to detect change in the reward. • What if the change in the environment is gradual? • Is consolidation enough for learning continually? Considerations and Future Works
  42. 42. Questions? UPF - Computational Science Lab 29-03-2019 Vincenzo Lomonaco vincenzo.lomonaco@unibo.it Research Fellow @ University of Bologna Founder of ContinualAI.org

×