Marinier Laird Cogsci 2008 Emotionrl Pres


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Be careful about how say agent generates appraisal values
  • Say prediction is our extension
  • A cognitive architecture is a set of task-independent mechanisms that interact to give rise to behavior.
  • In this environment, the agent’s sensing is limited: it can only see the cells immediately adjacent to it in the four cardinal directions. The agent has a sensor that tells it its Manhattan distance to the goal. However, the agent has no knowledge as to the effects of its actions, and thus cannot evaluate possible actions relative to the goal until it has actually performed them. Even then, it cannot always blindly move closer to the goal because given the shape of the maze, it must sometimes increase its Manhattan distance to the goal in order to make progress in the maze.
  • Mention relaxation and direction
  • 15 episodes50 trialsCutoff at 10kdcsmedian
  • 1st and 3rd quartiles shownReach optimality at the same time, but mood is less variable
  • This is an extension of previous workThese constraints define a set of equations. This is one possible equation which improves previous work that seems to work well for our current models.
  • This is an extension of previous workUnifies intensity for all feelings in one equation (others use different equations for each “kind” of feeling)Again these constraints define a set of possible functions, of which this is one that seems to work well for us
  • Marinier Laird Cogsci 2008 Emotionrl Pres

    1. 1. Emotion-Driven Reinforcement Learning<br />Bob Marinier & John Laird<br />University of Michigan, Computer Science and Engineering<br />CogSci’08<br />
    2. 2. Introduction<br />Interested in the functional benefits of emotion for a cognitive agent<br />Appraisal theories of emotion<br />PEACTIDM theory of cognitive control<br />Use emotion as a reward signal to a reinforcement learning agent<br />Demonstrates a functional benefit of emotion<br />Provides a theory of the origin of intrinsic reward<br />2<br />
    3. 3. Outline<br />Background<br />Integration of emotion and cognition<br />Integration of emotion and reinforcement learning<br />Implementation in Soar<br />Learning task<br />Results<br />3<br />
    4. 4. Appraisal Theories of Emotion<br />A situation is evaluated along a number of appraisal dimensions, many of which relate the situation to current goals<br />Novelty, goal relevance, goal conduciveness, expectedness, causal agency, etc.<br />Appraisals influence emotion<br />Emotion can then be coped with (via internal or external actions)<br />Situation<br />Goals<br />Appraisals<br />Coping<br />Emotion<br />4<br />
    5. 5. Appraisals to Emotions (Scherer 2001)<br />5<br />
    6. 6. Cognitive Control: PEACTIDM (Newell 1990)<br />6<br />
    7. 7. Unification of PEACTIDM and Appraisal Theories<br />7<br />Perceive<br />Raw Perceptual Information<br />Environmental Change<br />Encode<br />Motor<br />Suddenness<br />Unpredictability<br />Goal Relevance<br />Intrinsic Pleasantness<br />Stimulus Relevance<br />Motor Commands<br />Prediction<br />Outcome Probability<br />Attend<br />Decode<br />Causal Agent/Motive<br />Discrepancy<br />Conduciveness<br />Control/Power<br />Stimulus chosen for processing<br />Action<br />Comprehend<br />Intend<br />Current Situation Assessment<br />
    8. 8. Distinction between emotion, mood, and feeling(Marinier & Laird 2007)<br />Emotion: Result of appraisals<br />Is about the current situation<br />Mood: “Average” over recent emotions<br />Provides historical context<br />Feeling: Emotion “+” Mood<br />What agent actually perceives<br />8<br />
    9. 9. Emotion, mood, and feeling<br />Cognition<br />Active Appraisals<br />Perceived Feeling<br />Emotion<br />Feeling<br />Combination Function<br />Pull<br />Mood<br />Decay<br />9<br />
    10. 10. Intrinsically Motivated Reinforcement Learning(Sutton & Barto 1998; Singh et al. 2004)<br />10<br />External Environment<br />Environment<br />Actions<br />Sensations<br />Critic<br />“Organism”<br />Internal Environment<br />Actions<br />States<br />Rewards<br />Critic<br />Appraisal Process<br />Agent<br />+/- Feeling Intensity<br />States<br />Rewards<br />Decisions<br />Agent<br />Reward = Intensity * Valence<br />
    11. 11. Extending Soar with Emotion(Marinier & Laird 2007)<br />Episodic<br />Semantic<br />Symbolic Long-Term Memories<br />Procedural<br />Semantic<br />Learning<br />Episodic<br />Learning<br />Chunking<br />Reinforcement<br />Learning<br />Appraisal Detector<br />Short-Term Memory<br />Situation, Goals<br />Decision Procedure<br />Visual<br />Imagery<br />Perception<br />Action<br />Body<br />11<br />
    12. 12. Extending Soar with Emotion(Marinier & Laird 2007)<br />12<br />Episodic<br />Semantic<br />Symbolic Long-Term Memories<br />Procedural<br />Semantic<br />Learning<br />Episodic<br />Learning<br />Chunking<br />Reinforcement<br />Learning<br /> +/-Intensity<br />Appraisal Detector<br />Feeling<br />.9,.6,.5,-.1,.8,…<br />Short-Term Memory<br />Situation, Goals<br />Feelings<br />Decision Procedure<br />Feelings<br />Appraisals<br />Visual<br />Imagery<br />Emotion<br />.5,.7,0,-.4,.3,…<br />Mood<br />.7,-.2,.8,.3,.6,…<br />Perception<br />Action<br />Knowledge<br />Body<br />Architecture<br />
    13. 13. Learning task<br />Start<br />Goal<br />13<br />
    14. 14. Learning task: Encoding<br />14<br />North<br />Passable: false<br />On path: false<br />Progress: true<br />East<br />Passable: false<br />On path: true<br />Progress: true<br />West<br />Passable: false<br />On path: false<br />Progress: true<br />South<br />Passable: true<br />On path: true<br />Progress: true<br />
    15. 15. Learning task: Encoding & Appraisal<br />15<br />North<br />Intrinsic Pleasantness: Low<br />Goal Relevance: Low<br />Unpredictability: High<br />East<br />Intrinsic Pleasantness: Low<br />Goal Relevance: High<br />Unpredictability: High<br />West<br />Intrinsic Pleasantness: Low<br />Goal Relevance: Low<br />Unpredictability: High<br />South<br />Intrinsic Pleasantness: Neutral<br />Goal Relevance: High<br />Unpredictability: Low<br />
    16. 16. Learning task: Attending, Comprehending & Appraisal<br />16<br />South<br />Intrinsic Pleasantness: Neutral<br />Goal Relevance: High<br />Unpredictability: Low<br />Conduciveness: High<br />Control: High …<br />
    17. 17. Learning task: Tasking<br />17<br />
    18. 18. Learning task: Tasking<br />18<br />Optimal Subtasks<br />
    19. 19. What is being learned?<br />When to Attend vs Task<br />If Attending, what to Attend to<br />If Tasking, which subtask to create<br />When to Intend vs. Ignore<br />19<br />
    20. 20. Learning Results<br />20<br />
    21. 21. Results: With and without mood<br />21<br />
    22. 22. Discussion<br />Agent learns both internal (tasking) and external (movement) actions<br />Emotion allows for more frequent rewards, and thus learns faster than standard RL<br />Mood “fills in the gaps” allowing for even faster learning and less variability<br />22<br />
    23. 23. Conclusion & Future Work<br />Demonstrated computational model that integrates emotion and cognitive control<br />Confirmed emotion can drive reinforcement learning<br />We have already successfully demonstrated similar learning in a more complex domain<br />Would like to explore multi-agent scenarios<br />23<br />
    24. 24. 24<br />HIGH INTENSITY<br />alert<br />tense<br />excited<br />nervous<br />elated<br />stressed<br />happy<br />upset<br />NEGATIVE VALENCE<br />POSITIVE VALENCE<br />sad<br />contented<br />depressed<br />serene<br />lethargic<br />relaxed<br />fatigued<br />calm<br />LOW INTENSITY<br />Circumplex models<br />Emotions can be described in terms of intensity and valence, as in a circumplex model:<br />Adapted from Feldman Barrett & Russell (1998)<br />
    25. 25. Computing Feeling from Emotion and Mood<br />25<br />Assumption: Appraisal dimensions are independent<br />Limited Range: Inputs and outputs are in [0,1] or [-1,1]<br />Distinguishability: Very different inputs should lead to very different outputs<br />Non-linear: Linearity would violate limited range and distinguishability<br />
    26. 26. Computing Feeling Intensity<br />26<br />Motivation: Intensity gives a summary of how important (i.e., how good or bad) the situation is<br />Limited range: Should map onto [0,1]<br />No dominant appraisal: No single value should drown out all the others<br />Can’t just multiply values, because if any are 0, then intensity is 0<br />Realization principle: Expected events should be less intense than unexpected events<br />