Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

My Robot Can Learn -Using Reinforcement Learning to Teach my Robot

1,334 views

Published on

Nowadays, everybody is following the hype around machine learning in general and around deep learning (DL), in particular. We are trying to use it for predicting unexpected down-times of machines, or to discover anomalies in data streams observing machines. What is usually missing is the magic. Most often DL is supervised, which means that someone is labelling some data which gets fed into some algorithm. But as an alternative, there is a new star at the horizon: Reinforcement Learning (RL). This is a concept using an agent and an incentive system to train an agent. By taking the incentives the agent can learn and improve his behavior. As a result, this is a self-learning system and only requires some simple rules. The combination of RL and DL eventually takes us to something we could consider as artificial intelligence. With AlphaGo we have seen how the combination of RL and DL can win a Go tournament. This is a very promising step in an interesting direction. This talk will provide an introduction into reinforcement learning. It shows how reinforcement learning and deep learning can be combined towards an AI system by providing some insights into existing projects. Starting with annotated data and using DL, it is possible to create a base model. This model gets refined with RL mechanisms. Finally, this talk will show how this approaches can be used to map it to Internet of Things and Industry 4.0 scenarios, such as a self-learning robot.

Published in: Data & Analytics
  • Be the first to comment

My Robot Can Learn -Using Reinforcement Learning to Teach my Robot

  1. 1. My Robot Can Learn Using Reinforcement Learning to Teach my Robot Marcel Tilly Senior Program Manager Microsoft AI and Research
  2. 2. Once upon in the time…
  3. 3. Agenda • Context for Reinforcement Learning • Motivation for Reinforcement Learning • The Reinforcement Learning Problem • Aspects of an RL Agent • Samples for Reinforcement Learning
  4. 4. Reinforcement Learning Applications RL application areas Process Control 23% Other 8% Finance 4% Autonomic Computing 6% Traffic 6% Robotics 13% Resource Management 18% Networking 21% Survey by Csaba Szepesva of 77 recent application papers, based on an IEEE.o search for the keywords “RL” and “application” signal processing natural language processing web services brain-computer interfaces aircraft control engine control bio/chemical reactors sensor networks routing call admission control network resource management power systems inventory control supply chains customer service mobile robots, motion control, Robocup, visionstoplight control, trains, unmanned vehicles load balancing memory management algorithm tuning option pricing asset management Rick Sutton. Deconstructing Reinforcement Learning. ICML 09
  5. 5. Just some useless information…
  6. 6. Facets of Reinforcement Learning Computer Science Neuroscience Psychology Economics Mathematics Engineering Machine Learning Reward System Classical Operand Conditioning Bounded Rationality Operations Research Optimal Control Reinforcement Learning
  7. 7. Machine Learning We can answer the 4 major questions: • How much/How many? • Which category? • Which groups? [What is wrong?] • Which action?
  8. 8. How much/ How many • What will be the temperature next Thursday? • What will be my energy costs next month? • How many new user will I get? à Regression
  9. 9. Which category? • Is there a cat or a dog on the image? • Which machine failure is causing the significant data signature? • What is the topic/sentiment of this news article? à Classification
  10. 10. Which groups? • Which customer have similar taste? • Which visitor likes the same movies? • Which topics can I extract from the document? • Which data does not fit nicely in what I have seen so far? à Clustering/ Anomaly Detection
  11. 11. Which action? • Should I rise or lower the temperature? • Should I clean the living room or should I stay plugged? • Should I brake or accelerate? • What is the next move for this Go match? à Reinforcement Learning
  12. 12. Machine Learning Supervised UnsupervisedReinforcement Learning Semi- Supervised Active RL Function approx. Learning by example! You do not know what is in your data! Learning by trial and error
  13. 13. Characteristics of RL Why is RL really different? • There is no supervisor, only a reward signal • Feedback is delayed, not instantaneous • Time really matters • Agent’s action affects the subsequent data it receives
  14. 14. Examples for Reinforcement Learning • Fly stunt manoeuvres of helicopter • Recommend restaurants to users • Optimize online music store • Control a house • Control a power station • Make a humaniod robot walk • Play games better than humans • Make a bot have a conversation like a human
  15. 15. What is Reinforcement Learning? “… the idea of a learning system that wants something. This was the idea of a “hedonistoc” learning system, or, as we would day now, the idea of reinforcement learning.” • Agents take actions (A) in an evnvironment and receive rewards (R) • Goal is to find the policy(𝜋) that maximizes rewards • Inspired by research into psychology and animal learning Definition Sutton, Barto
  16. 16. Agent and Environment At each step the agent: • Executes action At • Receives observation Ot • Receives scalar reward Rt The environment: • Receives action At • Emits observation Ot+1 • Emits scalar reward Rt+1 Approaches: • MDP, POMDP • Multi-arm bandit Agent Environment ActionAt ObservationOt Reward Rt
  17. 17. History and State • The history is the sequence of observations • i.e. all observable variable up to time t • i.e. the sensorimotor stream of a robot or embodied agent • What happens next • The agent selects actions • The environment • State is the information used to find next action • Formally, state is a function of the history: Ht= O1,R1,A1 … At-1,Ot,Rt St= f(Ht)
  18. 18. Short RL Experiment ?
  19. 19. Reinforcement Learning on the Lego Mindstorms NXT Robot Taken from: https://www.youtube.com/watch?v=WF9QWc_lxfM&t=17s
  20. 20. Components of an RL agent An RL agent may include one or more of these components: • Policy: agent's behavior function • Maps from state to action • Deterministic policy A=𝜋(S) • Stochastic policy 𝜋 𝐴 𝑆 = ℙ[𝐴|𝑆] • Value function: how good is each state and/or action • How much reward will I get from action • Optimal Value Function 𝑄∗ 𝑆, 𝐴 = 𝔼/0[𝑅 + 𝛾 max 𝑄∗ 𝑆0 , 𝐴0 | 𝑆, 𝐴] • Model: agent's representation of the environment 𝜋 S A 𝑄 S V A 𝑇, 𝑅 S S’ A R
  21. 21. Approaches To Reinforcement Learning • Value-based RL • Estimate the optimal value function Q*(S,A) • This is the maximum value achievable under any policy • Policy-based RL • Search directly for the optimal policy 𝜋* • This is the policy achieving maximum future reward • Model-based RL • Build a model of the environment • Plan (e.g. by lookahead) using model • Use deep neural networks to represent them -> DeepRL
  22. 22. Grid World: Rewards and Goals
  23. 23. Sample: Process Control Environment Action(on|off) Observation (Temp = n) Reward (good | bad)
  24. 24. How could it work? Temp before (Ot) Cooler (Action) Temp after (Ot+1) Opportunities Observations Probability (Reward?) 90 on 80 1 0 0 90 on 82 1 1 1 90 on 84 1 0 0 90 on 86 1 0 0 90 on 88 1 0 0 90 on 90 1 0 0 90 off 88 1 0 0 90 off 90 1 0 0 90 off 92 1 1 1 90 off 94 1 0 0 90 off 96 1 0 0 90 off 98 1 0 0
  25. 25. The result: A model Temp before Cooler [Action] Temp after Opportunities Observations Probability 90 on 80 404 10 0.025 90 on 82 404 134 0.332 90 on 84 404 215 0.532 90 on 86 404 34 0.084 90 on 88 404 9 0.022 90 on 90 404 2 0.005 90 off 88 381 1 0.003 90 off 90 381 23 0.059 90 off 92 381 101 0.261 90 off 94 381 163 0.421 90 off 96 381 75 0.194 90 off 98 381 24 0.062
  26. 26. Now: Take it backward St -> A -> St+1 Temp before Cooler [Action] Temp after Opportunities Observations Probability 90 on 80 404 10 0.025 90 on 82 404 134 0.332 90 on 84 404 215 0.532 90 on 86 404 34 0.084 90 on 88 404 9 0.022 90 on 90 404 2 0.005 90 off 88 381 1 0.003 90 off 90 381 23 0.059 90 off 92 381 101 0.261 90 off 94 381 163 0.421 90 off 96 381 75 0.194 90 off 98 381 24 0.062
  27. 27. How to do it with a Mindstorms Robot? https://www.youtube.com/watch?v=WF9QWc_lxfM&t=17s Angel Martinez-Tenor: Reinforcement Learning on the Lego Mindstorms NXT Robot.
  28. 28. Sample: Atari Games David Silver (DeepMind): Applying RL to Atari Games and try to play better than a human. Agent Environment ActionAt ObservationOt Reward Rt
  29. 29. An example for DeepRL with Atari • End-to-end learning of values Q(S,A) from pixels s • Input state S is stack of raw pixels from last 4 frames • Output is Q(S,A) for 18 joystick/button positions • Reward is change in score for that step
  30. 30. Project Malmo @ MSR • Makes (deep) reinforcement learning available as a platform • Code that helps artificial intelligence agents sense and act within the Minecraft environment • The two components can run on Windows, Linux, or Mac OS • Write your agent in Python, Lua, C#, C++ or Java
  31. 31. Sneak Preview Try it today: https://github.com/Microsoft/malmo#getting-started
  32. 32. … there is one more thing Watch this:
  33. 33. Wrap-up • RL could become the next star in ML • More storage space • More compute power • Applications in IoT, autonomous driving, process control • Good foundation research • Convincing prototypes and applications à Focus shift David Silver “Reinforcement Learning + deep Learning = AI”
  34. 34. Books Sutton and Barto "Reinforcement Learning: An Introduction” (1998) H.M. Schwartz “Multi-Agent Machine Learning: A Reinforcement Approach”(2014) Csaba Szepesvari “Algorithms for Reinforcement Learning” (2010)
  35. 35. References • Some content is reused from • Introduction to Reinforcement Learning - Shane M. Conway • Lecture 1: Introduction to Reinforcement Learning – David Silver • How reinforcement learning works in Becca 7 – Brandon Rohrer • Johnson M., Hofmann K., Hutton T., Bignell D. (2016) The Malmo Platform for Artificial Intelligence Experimentation.Proc. 25th International Joint Conference on Artificial Intelligence, Ed. Kambhampati S., p. 4246. AAAI Press, Palo Alto, California USA. https://github.com/Microsoft/malmo
  36. 36. Thanks! marcel.tilly@microsoft.com

×