Emotion-Driven
Reinforcement Learning
Bob Marinier & John Laird
University of Michigan, Computer Science and Engineering
C...
2




Introduction
• Interested in the functional benefits of emotion
  for a cognitive agent
 ▫ Appraisal theories of emo...
3




Outline
• Background
 ▫ Integration of emotion and cognition
 ▫ Integration of emotion and reinforcement learning
 ▫...
4



Appraisal Theories of Emotion
 • A situation is evaluated along a number of appraisal
   dimensions, many of which re...
5


  Appraisals to Emotions (Scherer 2001)
                         Joy                  Fear           Anger
           ...
6



Cognitive Control: PEACTIDM (Newell 1990)
Perceive      Obtain raw perception
Encode     Create domain-independent
  ...
7



Unification of PEACTIDM and Appraisal Theories

                                 Perceive
          Environmental    ...
8




Distinction between emotion, mood, and feeling
(Marinier & Laird 2007)
  • Emotion: Result of appraisals
    ▫ Is ab...
10

 Intrinsically Motivated Reinforcement Learning
 (Sutton & Barto 1998; Singh et al. 2004)
                            ...
11


Extending Soar with Emotion
(Marinier & Laird 2007)
                                           Symbolic Long-Term Mem...
12


       Extending Soar with Emotion
       (Marinier & Laird 2007)
                                                   ...
13



Learning task


Start



                Goal
14



Learning task: Encoding
                       North
                       Passable: false
                       O...
15



Learning task: Encoding & Appraisal
                              North
                              Intrinsic Plea...
16


Learning task: Attending,
Comprehending & Appraisal




            South
            Intrinsic Pleasantness: Neutral...
17



Learning task: Tasking
18



Learning task: Tasking




             Optimal Subtasks
19




What is being learned?
•   When to Attend vs Task
•   If Attending, what to Attend to
•   If Tasking, which subtask...
20


                             Learning Results
                           12000
Median Processing Cycles




         ...
21




                     Results: With and without mood
                           300
Median Processing Cycles




   ...
22




Discussion
• Agent learns both internal (tasking) and external
  (movement) actions
• Emotion allows for more frequ...
23




Conclusion & Future Work
• Demonstrated computational model that integrates
  emotion and cognitive control
• Confi...
Upcoming SlideShare
Loading in …5
×

Marinier Laird Cogsci 2008 Emotionrl Pres

281
-1

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
281
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • A cognitive architecture is a set of task-independent mechanisms that interact to give rise to behavior.
  • In this environment, the agent’s sensing is limited: it can only see the cells immediately adjacent to it in the four cardinal directions. The agent has a sensor that tells it its Manhattan distance to the goal. However, the agent has no knowledge as to the effects of its actions, and thus cannot evaluate possible actions relative to the goal until it has actually performed them. Even then, it cannot always blindly move closer to the goal because given the shape of the maze, it must sometimes increase its Manhattan distance to the goal in order to make progress in the maze.
  • Mention relaxation and direction
  • 15 episodes50 trialsCutoff at 10kdcsmedian
  • 1st and 3rd quartiles shownReach optimality at the same time, but mood is less variable
  • This is an extension of previous workThese constraints define a set of equations. This is one possible equation which improves previous work that seems to work well for our current models.
  • This is an extension of previous workUnifies intensity for all feelings in one equation (others use different equations for each “kind” of feeling)Again these constraints define a set of possible functions, of which this is one that seems to work well for us
  • Be careful about how say agent generates appraisal values
  • Marinier Laird Cogsci 2008 Emotionrl Pres

    1. 1. Emotion-Driven Reinforcement Learning Bob Marinier & John Laird University of Michigan, Computer Science and Engineering CogSci’08
    2. 2. 2 Introduction • Interested in the functional benefits of emotion for a cognitive agent ▫ Appraisal theories of emotion ▫ PEACTIDM theory of cognitive control • Use emotion as a reward signal to a reinforcement learning agent ▫ Demonstrates a functional benefit of emotion ▫ Provides a theory of the origin of intrinsic reward
    3. 3. 3 Outline • Background ▫ Integration of emotion and cognition ▫ Integration of emotion and reinforcement learning ▫ Implementation in Soar • Learning task • Results
    4. 4. 4 Appraisal Theories of Emotion • A situation is evaluated along a number of appraisal dimensions, many of which relate the situation to current goals ▫ Novelty, goal relevance, goal conduciveness, expectedness, causal agency, etc. • Appraisals influence emotion • Emotion can then be coped with (via internal or external actions) Situation Goals Coping Appraisals Emotion
    5. 5. 5 Appraisals to Emotions (Scherer 2001) Joy Fear Anger High/medium High High Suddenness High High High Unpredictability Low Intrinsic pleasantness High High High Goal/need relevance Other/nature Other Cause: agent Chance/intentional Intentional Cause: motive Very high High Very high Outcome probability Discrepancy from High High expectation Very high Low Low Conduciveness High Control Very low High Power
    6. 6. 6 Cognitive Control: PEACTIDM (Newell 1990) Perceive Obtain raw perception Encode Create domain-independent representation Attend Choose stimulus to process Comprehend Generate structures that relate stimulus to tasks and can be used to inform behavior Task Perform task maintenance Intend Choose an action, create prediction Decode Decompose action into motor commands Motor Execute motor commands
    7. 7. 7 Unification of PEACTIDM and Appraisal Theories Perceive Environmental Raw Perceptual Change Information Motor Encode Suddenness Stimulus Unpredictability Motor Relevance Goal Relevance Commands Intrinsic Pleasantness Prediction Outcome Decode Attend Probability Causal Agent/Motive Action Stimulus chosen Discrepancy for processing Conduciveness Control/Power Intend Comprehend Current Situation Assessment
    8. 8. 8 Distinction between emotion, mood, and feeling (Marinier & Laird 2007) • Emotion: Result of appraisals ▫ Is about the current situation • Mood: “Average” over recent emotions ▫ Provides historical context • Feeling: Emotion “+” Mood ▫ What agent actually perceives
    9. 9. 10 Intrinsically Motivated Reinforcement Learning (Sutton & Barto 1998; Singh et al. 2004) External Environment Environment Actions Sensations Critic Internal Environment Appraisal Actions Rewards States Critic Process +/- Feeling Decisions Rewards States Intensity Agent Agent “Organism” • Reward = Intensity * Valence
    10. 10. 11 Extending Soar with Emotion (Marinier & Laird 2007) Symbolic Long-Term Memories Procedural Episodic Semantic Reinforcement Chunking Episodic Semantic Learning Learning Learning Short-Term Memory Appraisal Detector Decision Procedure Situation, Goals Visual Perception Action Imagery Body
    11. 11. 12 Extending Soar with Emotion (Marinier & Laird 2007) Symbolic Long-Term Memories Procedural Episodic Semantic Reinforcement Chunking Episodic Semantic Learning Learning Learning Appraisal Detector Feeling .9,.6,.5,-.1,.8,… Short-Term Memory Decision Feelings Procedure Situation, Goals Emotion Mood .5,.7,0,-.4,.3,… .7,-.2,.8,.3,.6,… Visual Perception Action Imagery Body Knowledge Architecture
    12. 12. 13 Learning task Start Goal
    13. 13. 14 Learning task: Encoding North Passable: false On path: false Progress: true East West Passable: false Passable: false On path: true On path: false Progress: true Progress: true South Passable: true On path: true Progress: true
    14. 14. 15 Learning task: Encoding & Appraisal North Intrinsic Pleasantness: Low Goal Relevance: Low Unpredictability: High East West Intrinsic Pleasantness: Low Intrinsic Pleasantness: Low Goal Relevance: High Goal Relevance: Low Unpredictability: High Unpredictability: High South Intrinsic Pleasantness: Neutral Goal Relevance: High Unpredictability: Low
    15. 15. 16 Learning task: Attending, Comprehending & Appraisal South Intrinsic Pleasantness: Neutral Goal Relevance: High Unpredictability: Low Conduciveness: High Control: High …
    16. 16. 17 Learning task: Tasking
    17. 17. 18 Learning task: Tasking Optimal Subtasks
    18. 18. 19 What is being learned? • When to Attend vs Task • If Attending, what to Attend to • If Tasking, which subtask to create • When to Intend vs. Ignore
    19. 19. 20 Learning Results 12000 Median Processing Cycles 10000 8000 6000 4000 2000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Episode Standard RL Feeling=Emotion Feeling=Emotion+Mood
    20. 20. 21 Results: With and without mood 300 Median Processing Cycles 290 280 270 260 250 240 8 9 10 11 12 13 14 15 Episode Feeling=Emotion Feeling=Emotion+Mood Optimal
    21. 21. 22 Discussion • Agent learns both internal (tasking) and external (movement) actions • Emotion allows for more frequent rewards, and thus learns faster than standard RL • Mood “fills in the gaps” allowing for even faster learning and less variability
    22. 22. 23 Conclusion & Future Work • Demonstrated computational model that integrates emotion and cognitive control • Confirmed emotion can drive reinforcement learning • We have already successfully demonstrated similar learning in a more complex domain • Would like to explore multi-agent scenarios

    ×