• Save
Reward
Upcoming SlideShare
Loading in...5
×
 

Reward

on

  • 928 views

Speaker: Jimmy Lu

Speaker: Jimmy Lu
Topics: Reward
Date: 2010.09.17

WECO Lab, CSIE, FJU

Statistics

Views

Total Views
928
Views on SlideShare
927
Embed Views
1

Actions

Likes
1
Downloads
5
Comments
0

1 Embed 1

http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Reward Reward Presentation Transcript

  • Reward
    Speaker : Jimmy Lu
    Advisor : Hsing Mei
    Web Computing Laboratory(WECO Lab)
    Computer Science and Information Engineering Department
    Fu Jen Catholic University
  • Conditioning
  • Classical Conditioning
    Also called Pavlovian or respondent conditioning.
    Is a form of associative learning.
    The typical procedure for inducing classical conditioning involves presentations of a neutral stimulus along with a stimulus of some significance.
    conditioned stimulus (CS), conditioned response (CR), unconditioned stimulus (US),unconditioned response (CS)
  • Typical Procedure
  • Operant Conditioning
    Or instrumental conditioning.
    Operant conditioning deals with the modification of voluntary behavioror operant behavior.
    Operant behavior "operates" on the environment and is maintained by its consequences, while classical conditioning deals with the conditioning of reflexive (reflex) behaviors which are elicited by antecedent conditions. Behaviors conditioned via a classical conditioning procedure are not maintained by consequences.
  • Core Tools
    Reinforcement is a consequence that causes a behavior to occur with greater frequency.
    Punishment is a consequence that causes a behavior to occur with less frequency.
    Extinction is the lack of any consequence following a behavior. When a behavior is inconsequential, producing neither favorable nor unfavorable consequences, it will occur with less frequency. When a previously reinforced behavior is no longer reinforced with either positive or negative reinforcement, it leads to a decline in the response.
  • Four Contexts
  • Multiple Reward Signals in the Brain
  • Abstract
    This article focuses on recent neurophysiologicalstudies in primates that have revealed that neurons in a limited number of brain structures carry specific signals about past and future rewards. This research provides the first step towards an understanding of how rewards influence behaviour before they are received and how the brain might use reward information to control learningand goal-directed behaviour.
  • Reward Processing and the Brain
  • Reward Detection and Perception
    In various behavioural situations, including classical and instrumental conditioning, most dopamine neurons show short, phasic activation in a rather homogeneous fashion after the presentation of liquid and solid rewards, and visual or auditory stimuli that predict reward.
    By contrast, only a few dopamine neurons show phasic activations to punishers.
  • Reward Prediction Errors
    A closer examination of the properties of the phasic dopamine response suggests that it might encode a reward prediction error rather than reward per se.
    In view of the crucial role that prediction errors are thought to play during learning, a phasic dopamine response that reports a reward prediction error might constitute an ideal teaching signal for approach learning.
    Error-driven learning mechanisms.
  • Reward Prediction Errors
  • Experiments
  • Experiemnts
  • Experiments
  • Experiments
  • Conclusions
    A limited number of brain structures process reward information in several different ways.
    Neurons detect reward prediction errors and produce a global reinforcement signal that might underlie the learning of appropriate behaviours.
    Other neurons detect and discriminate between different rewardsand might be involved in assessing the nature and identity of individual rewards, and might thus underlie the perception of rewards.
  • Conclusions
    Neurons respond to learned stimuli that predict rewards and show sustained activities during periods in which expectations of rewards are evoked.
    They even estimate future rewards and adapt their activity according to ongoing experience.
  • Reference
    [1] Classical conditioning, Wikipedia, http://en.wikipedia.org/wiki/Classical_conditioning
    [2] Operant conditioning, Wikipedia, http://en.wikipedia.org/wiki/Operant_conditioning
    [3] Wolfram Schultz, “Multiple reward signals in the brain”, Nature Reviews Neuroscience 1, 199-207 (December 2000)
    [4] Wolfram Schultz, Peter Dayan, P. Read Montague, “A Neural Substrate of Prediction and Reward”, Science275, 1593 – 1599 (March 1997)