Marinier Laird Cogsci 2008 Emotionrl Pres
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
267
On Slideshare
267
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Be careful about how say agent generates appraisal values
  • Say prediction is our extension
  • A cognitive architecture is a set of task-independent mechanisms that interact to give rise to behavior.
  • In this environment, the agent’s sensing is limited: it can only see the cells immediately adjacent to it in the four cardinal directions. The agent has a sensor that tells it its Manhattan distance to the goal. However, the agent has no knowledge as to the effects of its actions, and thus cannot evaluate possible actions relative to the goal until it has actually performed them. Even then, it cannot always blindly move closer to the goal because given the shape of the maze, it must sometimes increase its Manhattan distance to the goal in order to make progress in the maze.
  • Mention relaxation and direction
  • 15 episodes50 trialsCutoff at 10kdcsmedian
  • 1st and 3rd quartiles shownReach optimality at the same time, but mood is less variable
  • This is an extension of previous workThese constraints define a set of equations. This is one possible equation which improves previous work that seems to work well for our current models.
  • This is an extension of previous workUnifies intensity for all feelings in one equation (others use different equations for each “kind” of feeling)Again these constraints define a set of possible functions, of which this is one that seems to work well for us

Transcript

  • 1. Emotion-Driven Reinforcement Learning
    Bob Marinier & John Laird
    University of Michigan, Computer Science and Engineering
    CogSci’08
  • 2. Introduction
    Interested in the functional benefits of emotion for a cognitive agent
    Appraisal theories of emotion
    PEACTIDM theory of cognitive control
    Use emotion as a reward signal to a reinforcement learning agent
    Demonstrates a functional benefit of emotion
    Provides a theory of the origin of intrinsic reward
    2
  • 3. Outline
    Background
    Integration of emotion and cognition
    Integration of emotion and reinforcement learning
    Implementation in Soar
    Learning task
    Results
    3
  • 4. Appraisal Theories of Emotion
    A situation is evaluated along a number of appraisal dimensions, many of which relate the situation to current goals
    Novelty, goal relevance, goal conduciveness, expectedness, causal agency, etc.
    Appraisals influence emotion
    Emotion can then be coped with (via internal or external actions)
    Situation
    Goals
    Appraisals
    Coping
    Emotion
    4
  • 5. Appraisals to Emotions (Scherer 2001)
    5
  • 6. Cognitive Control: PEACTIDM (Newell 1990)
    6
  • 7. Unification of PEACTIDM and Appraisal Theories
    7
    Perceive
    Raw Perceptual Information
    Environmental Change
    Encode
    Motor
    Suddenness
    Unpredictability
    Goal Relevance
    Intrinsic Pleasantness
    Stimulus Relevance
    Motor Commands
    Prediction
    Outcome Probability
    Attend
    Decode
    Causal Agent/Motive
    Discrepancy
    Conduciveness
    Control/Power
    Stimulus chosen for processing
    Action
    Comprehend
    Intend
    Current Situation Assessment
  • 8. Distinction between emotion, mood, and feeling(Marinier & Laird 2007)
    Emotion: Result of appraisals
    Is about the current situation
    Mood: “Average” over recent emotions
    Provides historical context
    Feeling: Emotion “+” Mood
    What agent actually perceives
    8
  • 9. Emotion, mood, and feeling
    Cognition
    Active Appraisals
    Perceived Feeling
    Emotion
    Feeling
    Combination Function
    Pull
    Mood
    Decay
    9
  • 10. Intrinsically Motivated Reinforcement Learning(Sutton & Barto 1998; Singh et al. 2004)
    10
    External Environment
    Environment
    Actions
    Sensations
    Critic
    “Organism”
    Internal Environment
    Actions
    States
    Rewards
    Critic
    Appraisal Process
    Agent
    +/- Feeling Intensity
    States
    Rewards
    Decisions
    Agent
    Reward = Intensity * Valence
  • 11. Extending Soar with Emotion(Marinier & Laird 2007)
    Episodic
    Semantic
    Symbolic Long-Term Memories
    Procedural
    Semantic
    Learning
    Episodic
    Learning
    Chunking
    Reinforcement
    Learning
    Appraisal Detector
    Short-Term Memory
    Situation, Goals
    Decision Procedure
    Visual
    Imagery
    Perception
    Action
    Body
    11
  • 12. Extending Soar with Emotion(Marinier & Laird 2007)
    12
    Episodic
    Semantic
    Symbolic Long-Term Memories
    Procedural
    Semantic
    Learning
    Episodic
    Learning
    Chunking
    Reinforcement
    Learning
    +/-Intensity
    Appraisal Detector
    Feeling
    .9,.6,.5,-.1,.8,…
    Short-Term Memory
    Situation, Goals
    Feelings
    Decision Procedure
    Feelings
    Appraisals
    Visual
    Imagery
    Emotion
    .5,.7,0,-.4,.3,…
    Mood
    .7,-.2,.8,.3,.6,…
    Perception
    Action
    Knowledge
    Body
    Architecture
  • 13. Learning task
    Start
    Goal
    13
  • 14. Learning task: Encoding
    14
    North
    Passable: false
    On path: false
    Progress: true
    East
    Passable: false
    On path: true
    Progress: true
    West
    Passable: false
    On path: false
    Progress: true
    South
    Passable: true
    On path: true
    Progress: true
  • 15. Learning task: Encoding & Appraisal
    15
    North
    Intrinsic Pleasantness: Low
    Goal Relevance: Low
    Unpredictability: High
    East
    Intrinsic Pleasantness: Low
    Goal Relevance: High
    Unpredictability: High
    West
    Intrinsic Pleasantness: Low
    Goal Relevance: Low
    Unpredictability: High
    South
    Intrinsic Pleasantness: Neutral
    Goal Relevance: High
    Unpredictability: Low
  • 16. Learning task: Attending, Comprehending & Appraisal
    16
    South
    Intrinsic Pleasantness: Neutral
    Goal Relevance: High
    Unpredictability: Low
    Conduciveness: High
    Control: High …
  • 17. Learning task: Tasking
    17
  • 18. Learning task: Tasking
    18
    Optimal Subtasks
  • 19. What is being learned?
    When to Attend vs Task
    If Attending, what to Attend to
    If Tasking, which subtask to create
    When to Intend vs. Ignore
    19
  • 20. Learning Results
    20
  • 21. Results: With and without mood
    21
  • 22. Discussion
    Agent learns both internal (tasking) and external (movement) actions
    Emotion allows for more frequent rewards, and thus learns faster than standard RL
    Mood “fills in the gaps” allowing for even faster learning and less variability
    22
  • 23. Conclusion & Future Work
    Demonstrated computational model that integrates emotion and cognitive control
    Confirmed emotion can drive reinforcement learning
    We have already successfully demonstrated similar learning in a more complex domain
    Would like to explore multi-agent scenarios
    23
  • 24. 24
    HIGH INTENSITY
    alert
    tense
    excited
    nervous
    elated
    stressed
    happy
    upset
    NEGATIVE VALENCE
    POSITIVE VALENCE
    sad
    contented
    depressed
    serene
    lethargic
    relaxed
    fatigued
    calm
    LOW INTENSITY
    Circumplex models
    Emotions can be described in terms of intensity and valence, as in a circumplex model:
    Adapted from Feldman Barrett & Russell (1998)
  • 25. Computing Feeling from Emotion and Mood
    25
    Assumption: Appraisal dimensions are independent
    Limited Range: Inputs and outputs are in [0,1] or [-1,1]
    Distinguishability: Very different inputs should lead to very different outputs
    Non-linear: Linearity would violate limited range and distinguishability
  • 26. Computing Feeling Intensity
    26
    Motivation: Intensity gives a summary of how important (i.e., how good or bad) the situation is
    Limited range: Should map onto [0,1]
    No dominant appraisal: No single value should drown out all the others
    Can’t just multiply values, because if any are 0, then intensity is 0
    Realization principle: Expected events should be less intense than unexpected events