Reinforcement Learning Science 8 Unit B: Cells and Systems (Nature of Science Emphasis)
Introduction What does it mean to have a behaviour  reinforced? Let’s look at a famous example first...
Introduction Ivan Pavlov (1849-1936) Born in Russia in 1849, Ivan Pavlov abandoned a religious career for which he had been preparing, and instead went into science.  His work had a great impact on the field of  physiology  (the study of the mechanical, physical, and biochemical functions of living organisms) by studying the mechanisms underlying the digestive system in mammals. Source:  Nobelprize.org
Introduction Pavlov was awarded the Nobel Prize in Physiology or Medicine in 1904. He then turned to studying reflexes, in particular with dogs.  His discoveries led to the science of behaviour. Source:  Nobelprize.org
Introduction Pavlov became interested in studying reflexes when he noticed that dogs sometimes drooled even without food being shown to them.  Although no food was in sight, their saliva still dribbled. It turned out that the dogs were reacting to lab coats.  Source:  Nobelprize.org
Introduction Every time the dogs were served food, the person who served the food was wearing a lab coat.  The lab coats became a “stimulus”. Source:  Nobelprize.org
Introduction A stimulus is anything capable of evoking a response in an organism. Examples of stimuli include sights, sounds, heat, cold, smells, or other sensations. Therefore, the dogs reacted as if food was on its way whenever they saw a lab coat. Source:  Nobelprize.org
Introduction In a series of experiments, Pavlov then tried to figure out why this was happening.  For example, he struck a bell when the dogs were fed. If the bell was sounded close to meal time, the dogs learnt to associate the sound of the bell with food.  After a while, the stimulus of the bell, caused them to drool. Source:  Nobelprize.org
More on Pavlov's Dog You can read more about Pavlov’s dog and see if you can train a dog to drool on command online at the  Nobel Prize website .
Reinforcement Learning Dogs are often trained through a method of reinforcement. For example, if a dog hears the word “sit” and receives a treat, he or she will learn that “sitting” provides a treat. In fact, almost all animals can learn through reinforcement.
Reinforcement Learning Definition: Reinforcement  occurs when an event following a response causes an increase in the probability of that response occurring in the future. So when a dog hears “sit” (response) and receives a treat (event), the dog will more likely sit in the future in hopes of receiving another treat.
Reinforcement Learning If animals (including humans) can learn by reinforcement, can a machine also learn through reinforcement? Computing Scientists at the Centre for Machine Learning believe so, and they are building a robot that learns through reinforcement.
Reinforcement Learning The robot is called “Critterbot”. The robot responds to stimuli in the environment.  For lessons on Critterbot see  Critterbot  for Physics 30  and  Critterbot  for Science 8 .
How can a Machine be Reinforced? In Machine Learning (which is a type of artificial intelligence) the “learner” is a computer that learns by trying to obtain a maximum reward. So what does a computer or robot want as a reward? Just a number.  -1 0 1 -1 0 1 -1 0 1 0 1 -1
How can a Machine be Reinforced? A positive reward will result in a “1” A neutral reward will result in a “0” A negative reward will result in a “-1”
How can a Machine be Reinforced? What separates Reinforcement Learning from other forms of artificial intelligence is that the learner is never told what actions to take. The learner uses a trial-and-error search approach and if it receives a positive reward, will continue that action. But if it receives a negative reward, it will learn to avoid that action.
Questions How is a robot that uses Machine Learning different from robot that is programmed for specific tasks?  Answer: In Machine Learning, the robot is not told what actions to take.  It learns by trial and error.
Questions A robot in a car factory is designed to build cars at a fast rate.  Would Machine Learning be a good application for a car building machine?  Why or why not? Answer: No, probably not.  Robots that build use specific designs to ensure they build exactly as they are told.
Questions Are dogs the only animals that respond to a stimulus by salivating? For example, what happens to you when you are just about to put a pickle in your mouth? Or mustard?  Or a sour candy? Answer: Humans also respond to visual stimuli and will salivate at the sight of some stimuli.
Questions Critterbot was designed to respond to stimuli (plural for stimulus).  Imagine that you had to design a robot to that will automatically shovel snow from your driveway every winter.  The robot cannot have any human assistance, it has to be autonomous (work on its own).  First, come up with a ‘cool’ name for your robot. Use drawings and written descriptions to write up a one page explanation of how your robot would work.  continued...
Question 4 continued. What types of sensors would it need to have to work without your assistance?  Remember, it is only going to shovel  your  driveway, and not wander down the street shovelling every driveway. Animals require energy and use special systems to convert food into energy. For example, the digestive system takes in food, digests it to extract energy and nutrients. How will your robot gets its energy?  Remember, it has to work in winter conditions, most often when it is snowing.
Centre for Mathematics Science and Technology Education (CMASTE) 382 Education South University of Alberta Edmonton AB T6G 2G5 www.CMASTE.ca To download: select  Outreach, Alberta Ingenuity Resources  and  Centre for Machine Learning Filename: AICML6BrainTumourAnalysis Centre for Machine Learning Department of Computing Science University of Alberta 2-21 Athabasca Hall Edmonton AB T6G 2E8 (780) 492-4828 www.machinelearningcentre.ca Alberta Ingenuity 2410 Manulife Place, 10180-101 Street Edmonton AB T5J 3S4 (780) 423-5735 www.albertaingenuity.ca

Lesson12: Reinforcement Learning for Critterbot Science 8

  • 1.
    Reinforcement Learning Science8 Unit B: Cells and Systems (Nature of Science Emphasis)
  • 2.
    Introduction What doesit mean to have a behaviour reinforced? Let’s look at a famous example first...
  • 3.
    Introduction Ivan Pavlov(1849-1936) Born in Russia in 1849, Ivan Pavlov abandoned a religious career for which he had been preparing, and instead went into science. His work had a great impact on the field of physiology (the study of the mechanical, physical, and biochemical functions of living organisms) by studying the mechanisms underlying the digestive system in mammals. Source: Nobelprize.org
  • 4.
    Introduction Pavlov wasawarded the Nobel Prize in Physiology or Medicine in 1904. He then turned to studying reflexes, in particular with dogs. His discoveries led to the science of behaviour. Source: Nobelprize.org
  • 5.
    Introduction Pavlov becameinterested in studying reflexes when he noticed that dogs sometimes drooled even without food being shown to them. Although no food was in sight, their saliva still dribbled. It turned out that the dogs were reacting to lab coats. Source: Nobelprize.org
  • 6.
    Introduction Every timethe dogs were served food, the person who served the food was wearing a lab coat. The lab coats became a “stimulus”. Source: Nobelprize.org
  • 7.
    Introduction A stimulusis anything capable of evoking a response in an organism. Examples of stimuli include sights, sounds, heat, cold, smells, or other sensations. Therefore, the dogs reacted as if food was on its way whenever they saw a lab coat. Source: Nobelprize.org
  • 8.
    Introduction In aseries of experiments, Pavlov then tried to figure out why this was happening. For example, he struck a bell when the dogs were fed. If the bell was sounded close to meal time, the dogs learnt to associate the sound of the bell with food. After a while, the stimulus of the bell, caused them to drool. Source: Nobelprize.org
  • 9.
    More on Pavlov'sDog You can read more about Pavlov’s dog and see if you can train a dog to drool on command online at the Nobel Prize website .
  • 10.
    Reinforcement Learning Dogsare often trained through a method of reinforcement. For example, if a dog hears the word “sit” and receives a treat, he or she will learn that “sitting” provides a treat. In fact, almost all animals can learn through reinforcement.
  • 11.
    Reinforcement Learning Definition:Reinforcement occurs when an event following a response causes an increase in the probability of that response occurring in the future. So when a dog hears “sit” (response) and receives a treat (event), the dog will more likely sit in the future in hopes of receiving another treat.
  • 12.
    Reinforcement Learning Ifanimals (including humans) can learn by reinforcement, can a machine also learn through reinforcement? Computing Scientists at the Centre for Machine Learning believe so, and they are building a robot that learns through reinforcement.
  • 13.
    Reinforcement Learning Therobot is called “Critterbot”. The robot responds to stimuli in the environment. For lessons on Critterbot see Critterbot for Physics 30 and Critterbot for Science 8 .
  • 14.
    How can aMachine be Reinforced? In Machine Learning (which is a type of artificial intelligence) the “learner” is a computer that learns by trying to obtain a maximum reward. So what does a computer or robot want as a reward? Just a number. -1 0 1 -1 0 1 -1 0 1 0 1 -1
  • 15.
    How can aMachine be Reinforced? A positive reward will result in a “1” A neutral reward will result in a “0” A negative reward will result in a “-1”
  • 16.
    How can aMachine be Reinforced? What separates Reinforcement Learning from other forms of artificial intelligence is that the learner is never told what actions to take. The learner uses a trial-and-error search approach and if it receives a positive reward, will continue that action. But if it receives a negative reward, it will learn to avoid that action.
  • 17.
    Questions How isa robot that uses Machine Learning different from robot that is programmed for specific tasks? Answer: In Machine Learning, the robot is not told what actions to take. It learns by trial and error.
  • 18.
    Questions A robotin a car factory is designed to build cars at a fast rate. Would Machine Learning be a good application for a car building machine? Why or why not? Answer: No, probably not. Robots that build use specific designs to ensure they build exactly as they are told.
  • 19.
    Questions Are dogsthe only animals that respond to a stimulus by salivating? For example, what happens to you when you are just about to put a pickle in your mouth? Or mustard? Or a sour candy? Answer: Humans also respond to visual stimuli and will salivate at the sight of some stimuli.
  • 20.
    Questions Critterbot wasdesigned to respond to stimuli (plural for stimulus). Imagine that you had to design a robot to that will automatically shovel snow from your driveway every winter. The robot cannot have any human assistance, it has to be autonomous (work on its own). First, come up with a ‘cool’ name for your robot. Use drawings and written descriptions to write up a one page explanation of how your robot would work. continued...
  • 21.
    Question 4 continued.What types of sensors would it need to have to work without your assistance? Remember, it is only going to shovel your driveway, and not wander down the street shovelling every driveway. Animals require energy and use special systems to convert food into energy. For example, the digestive system takes in food, digests it to extract energy and nutrients. How will your robot gets its energy? Remember, it has to work in winter conditions, most often when it is snowing.
  • 22.
    Centre for MathematicsScience and Technology Education (CMASTE) 382 Education South University of Alberta Edmonton AB T6G 2G5 www.CMASTE.ca To download: select Outreach, Alberta Ingenuity Resources and Centre for Machine Learning Filename: AICML6BrainTumourAnalysis Centre for Machine Learning Department of Computing Science University of Alberta 2-21 Athabasca Hall Edmonton AB T6G 2E8 (780) 492-4828 www.machinelearningcentre.ca Alberta Ingenuity 2410 Manulife Place, 10180-101 Street Edmonton AB T5J 3S4 (780) 423-5735 www.albertaingenuity.ca