A toy model of human cognition: Utilizing fluctuation in uncertain and non-stationary environments

  • 490 views
Uploaded on

http://www.yukawa.kyoto-u.ac.jp/contents/seminar/detail.php?SNUM=51633 …

http://www.yukawa.kyoto-u.ac.jp/contents/seminar/detail.php?SNUM=51633

Tatsuji Takahashi1, Yu Kohno1,2
Seminar on science of complex systems
(organized by Yukio-Pegio Gunji),
Yukawa Institute for Theoretical Physics,
Kyoto University,
Jan. 20, 2014

1 Tokyo Denki University,
2 JSPS (from Apr., 2014)

More in: Education , Technology , Sports
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
490
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. A  toy  model  of     human  cognition:     ! Utilizing  fluctuation  in  uncertain   and  non-­‐‑stationary  environments 1 1,2 Tatsuji  Takahashi ,  Yu  Kohno   Seminar  on  science  of  complex  systems  (organized  by   Yukio-­‐‑Pegio  Gunji),  Yukawa  Institute  for  Theoretical   Physics,  Kyoto  University,  Jan.  20,  2014   1 Tokyo  Denki  University,  2JSPS  (from  Apr.,  2014)
  • 2. Contents !2
  • 3. Contents The  loosely  symmetric  (LS)  model   !2
  • 4. Contents The  loosely  symmetric  (LS)  model   Cognitive  properties  or  cognitive  biases !2
  • 5. Contents The  loosely  symmetric  (LS)  model   Cognitive  properties  or  cognitive  biases Analysis  of  reconstruction  of  LS   !2
  • 6. Contents The  loosely  symmetric  (LS)  model   Cognitive  properties  or  cognitive  biases Analysis  of  reconstruction  of  LS   Result:  Efficacy  in  reinforcement  learning !2
  • 7. Contents The  loosely  symmetric  (LS)  model   Cognitive  properties  or  cognitive  biases Analysis  of  reconstruction  of  LS   Result:  Efficacy  in  reinforcement  learning Utilization  of  fluctuation  in  non-­‐‑ stationary  environments !2
  • 8. A  toy  model  of  human  cognition !3
  • 9. A  toy  model  of  human  cognition Modeling  focussing  on  deviations  from  rational  standards:   cognitive  biases !3
  • 10. A  toy  model  of  human  cognition Modeling  focussing  on  deviations  from  rational  standards:   cognitive  biases the  differences  from  “machines” !3
  • 11. A  toy  model  of  human  cognition Modeling  focussing  on  deviations  from  rational  standards:   cognitive  biases the  differences  from  “machines” Principal  properties  implemented  in  a  form  as  simple  as   possible !3
  • 12. A  toy  model  of  human  cognition Modeling  focussing  on  deviations  from  rational  standards:   cognitive  biases the  differences  from  “machines” Principal  properties  implemented  in  a  form  as  simple  as   possible so  that  it  can  be  analyzed  and  applied  easily !3
  • 13. A  toy  model  of  human  cognition Modeling  focussing  on  deviations  from  rational  standards:   cognitive  biases the  differences  from  “machines” Principal  properties  implemented  in  a  form  as  simple  as   possible so  that  it  can  be  analyzed  and  applied  easily Intuition  of  human  beings !3
  • 14. A  toy  model  of  human  cognition Modeling  focussing  on  deviations  from  rational  standards:   cognitive  biases the  differences  from  “machines” Principal  properties  implemented  in  a  form  as  simple  as   possible so  that  it  can  be  analyzed  and  applied  easily Intuition  of  human  beings as  simple,  again:  not  the  policy  (or  strategy)  that  is  learnt   through  education  and  culture !3
  • 15. LS  as  a  toy  model  of  cognition !4
  • 16. LS  as  a  toy  model  of  cognition We  treat  the  loosely  symmetric  (LS)  model   proposed  by  Shinohara  (2007).  LS:   !4
  • 17. LS  as  a  toy  model  of  cognition We  treat  the  loosely  symmetric  (LS)  model   proposed  by  Shinohara  (2007).  LS:   models  cognitive  biases !4
  • 18. LS  as  a  toy  model  of  cognition We  treat  the  loosely  symmetric  (LS)  model   proposed  by  Shinohara  (2007).  LS:   models  cognitive  biases merely  a  function  over  co-­‐‑occurrence  information   between  two  events !4
  • 19. LS  as  a  toy  model  of  cognition We  treat  the  loosely  symmetric  (LS)  model   proposed  by  Shinohara  (2007).  LS:   models  cognitive  biases merely  a  function  over  co-­‐‑occurrence  information   between  two  events faithfully  describes  the  causal  intuition  of  humans !4
  • 20. LS  as  a  toy  model  of  cognition We  treat  the  loosely  symmetric  (LS)  model   proposed  by  Shinohara  (2007).  LS:   models  cognitive  biases merely  a  function  over  co-­‐‑occurrence  information   between  two  events faithfully  describes  the  causal  intuition  of  humans which  form  the  basis  of  decision-­‐‑making  and  action  for   adaptation  in  the  world !4
  • 21. The  loosely  symmetric  (LS)  model !5
  • 22. The  loosely  symmetric  (LS)  model A  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional   probability  P(-­‐‑|-­‐‑). !5
  • 23. The  loosely  symmetric  (LS)  model A  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional   probability  P(-­‐‑|-­‐‑). Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q !5
  • 24. The  loosely  symmetric  (LS)  model A  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional   probability  P(-­‐‑|-­‐‑). Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q posterior event q prior event !5 p ¬p ¬q a c b d
  • 25. The  loosely  symmetric  (LS)  model A  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional   probability  P(-­‐‑|-­‐‑). Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q The  relationship  from  p  to  q:  LS(q|p) posterior event q prior event !5 p ¬p ¬q a c b d
  • 26. The  loosely  symmetric  (LS)  model A  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional   probability  P(-­‐‑|-­‐‑). Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q The  relationship  from  p  to  q:  LS(q|p) LS  describes  the  causal  intuition  of  human  beings  the  most   faithfully  (among  more  than  40  existing  models).   posterior event q prior event !5 p ¬p ¬q a c b d
  • 27. The  loosely  symmetric  (LS)  model A  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional   probability  P(-­‐‑|-­‐‑). Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q The  relationship  from  p  to  q:  LS(q|p) LS  describes  the  causal  intuition  of  human  beings  the  most   faithfully  (among  more  than  40  existing  models).   posterior event a P (q|p) = a+b q prior event !5 p ¬p ¬q a c b d
  • 28. The  loosely  symmetric  (LS)  model A  quasi-­‐‑probability  function  LS(-­‐‑|-­‐‑)  like  conditional   probability  P(-­‐‑|-­‐‑). Defined  over  the  co-­‐‑occurrence  information  of  events  p  and  q The  relationship  from  p  to  q:  LS(q|p) LS  describes  the  causal  intuition  of  human  beings  the  most   faithfully  (among  more  than  40  existing  models).   posterior event a P (q|p) = a+b LS(q|p) = a+ a+ b b+d d q b b+d d +b+ a a+c c !5 prior event p ¬p ¬q a c b d
  • 29. The  loosely  symmetric  (LS)  model posterior event LS(q|p) = a+ a+ b b+d d q b b+d d a a+c c +b+ !6 prior event p ¬p ¬q a c b d
  • 30. The  loosely  symmetric  (LS)  model Inductive  inference  of  causal  relationship posterior event LS(q|p) = a+ a+ b b+d d q b b+d d a a+c c +b+ !6 prior event p ¬p ¬q a c b d
  • 31. The  loosely  symmetric  (LS)  model Inductive  inference  of  causal  relationship How  humans  form  the  intensity  of  causal  relationship  from  p  to   q,   posterior event LS(q|p) = a+ a+ b b+d d q b b+d d a a+c c +b+ !6 prior event p ¬p ¬q a c b d
  • 32. The  loosely  symmetric  (LS)  model Inductive  inference  of  causal  relationship How  humans  form  the  intensity  of  causal  relationship  from  p  to   q,   when  p  is  the  candidate  cause  of  the  effect  q  in  focus? posterior event LS(q|p) = a+ a+ b b+d d q b b+d d a a+c c +b+ !6 prior event p ¬p ¬q a c b d
  • 33. The  loosely  symmetric  (LS)  model Inductive  inference  of  causal  relationship How  humans  form  the  intensity  of  causal  relationship  from  p  to   q,   when  p  is  the  candidate  cause  of  the  effect  q  in  focus? The  function  form  of  f(a,  b,  c,  d)  for  the  human  causal  intuition posterior event LS(q|p) = a+ a+ b b+d d q b b+d d a a+c c +b+ !6 prior event p ¬p ¬q a c b d
  • 34. The  loosely  symmetric  (LS)  model Inductive  inference  of  causal  relationship How  humans  form  the  intensity  of  causal  relationship  from  p  to   q,   when  p  is  the  candidate  cause  of  the  effect  q  in  focus? The  function  form  of  f(a,  b,  c,  d)  for  the  human  causal  intuition posterior event LS(q|p) = a+ a+ b b+d d q b b+d d a a+c c +b+ prior event p ¬p ¬q a c b d Meta analysis as in Hattori & Oaksford (2007) !6
  • 35. The  loosely  symmetric  (LS)  model Inductive  inference  of  causal  relationship How  humans  form  the  intensity  of  causal  relationship  from  p  to   q,   when  p  is  the  candidate  cause  of  the  effect  q  in  focus? The  function  form  of  f(a,  b,  c,  d)  for  the  human  causal  intuition posterior event LS(q|p) = a+ a+ b b+d d q b b+d d prior event a a+c c +b+ p ¬p ¬q a c b d Meta analysis as in Hattori & Oaksford (2007) Experiment r for LS r for ΔP AS95 0.95 0.88 BCC03.1 BCC03.3 0.98 0.92 0.98 0.84 !6 H03 0.98 0.00 H06 0.97 0.71 LS00 W03.2 W03.6 0.85 0.95 0.85 0.88 0.28 0.46
  • 36. In  2-­‐‑armed  bandit  problems !7
  • 37. In  2-­‐‑armed  bandit  problems later on bandit problems !7
  • 38. In  2-­‐‑armed  bandit  problems LS  used  as  the  value   function  in   reinforcement  learning:   later on bandit problems !7
  • 39. In  2-­‐‑armed  bandit  problems LS  used  as  the  value   function  in   reinforcement  learning:   later on bandit problems The  agent  evaluates  the   actions  according  to   the  causal  intuition  of   humans. !7
  • 40. In  2-­‐‑armed  bandit  problems LS  used  as  the  value   function  in   reinforcement  learning:   1.0 LS CP ToWH0.5L SMH0.3L SMH0.7L 0.9 Accuracy rate The  agent  evaluates  the   actions  according  to   the  causal  intuition  of   humans. later on bandit problems 0.8 0.7 0.6 0.5 1 5 10 50 step !7 100 500 1000
  • 41. In  2-­‐‑armed  bandit  problems LS  used  as  the  value   function  in   reinforcement  learning:   Very  good  adaptation   to  the  environment,   both  in  short  term  and   long  term. 1.0 LS CP ToWH0.5L SMH0.3L SMH0.7L 0.9 Accuracy rate The  agent  evaluates  the   actions  according  to   the  causal  intuition  of   humans. later on bandit problems 0.8 0.7 0.6 0.5 1 5 10 50 step !7 100 500 1000
  • 42. The  loosely  symmetric  (LS)  model
  • 43. The  loosely  symmetric  (LS)  model From  the  analysis  of  LS,  we  found  the  following  cognitive   properties:  
  • 44. The  loosely  symmetric  (LS)  model From  the  analysis  of  LS,  we  found  the  following  cognitive   properties:   Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010)
  • 45. The  loosely  symmetric  (LS)  model From  the  analysis  of  LS,  we  found  the  following  cognitive   properties:   Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010) Comparative  valuation
  • 46. The  loosely  symmetric  (LS)  model From  the  analysis  of  LS,  we  found  the  following  cognitive   properties:   Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010) Comparative  valuation psychology:  Tversky  &  Kahneman,  Science,  1974.
  • 47. The  loosely  symmetric  (LS)  model From  the  analysis  of  LS,  we  found  the  following  cognitive   properties:   Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010) Comparative  valuation psychology:  Tversky  &  Kahneman,  Science,  1974. brain  science:  Daw  et  al.,  Nature,  2006.
  • 48. The  loosely  symmetric  (LS)  model From  the  analysis  of  LS,  we  found  the  following  cognitive   properties:   Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010) Comparative  valuation psychology:  Tversky  &  Kahneman,  Science,  1974. brain  science:  Daw  et  al.,  Nature,  2006. Idiosyncratic,  asymmetric  risk  a8itude  as  in  the  prospect  theory
  • 49. The  loosely  symmetric  (LS)  model From  the  analysis  of  LS,  we  found  the  following  cognitive   properties:   Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010) Comparative  valuation psychology:  Tversky  &  Kahneman,  Science,  1974. brain  science:  Daw  et  al.,  Nature,  2006. Idiosyncratic,  asymmetric  risk  a8itude  as  in  the  prospect  theory Kahneman  &  Tversky,  Am.,  Psy.,  1984,  Boorman  et  al.,  Neuron,  2009
  • 50. The  loosely  symmetric  (LS)  model From  the  analysis  of  LS,  we  found  the  following  cognitive   properties:   Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010) Comparative  valuation psychology:  Tversky  &  Kahneman,  Science,  1974. brain  science:  Daw  et  al.,  Nature,  2006. Idiosyncratic,  asymmetric  risk  a8itude  as  in  the  prospect  theory Kahneman  &  Tversky,  Am.,  Psy.,  1984,  Boorman  et  al.,  Neuron,  2009 Satisficing  
  • 51. The  loosely  symmetric  (LS)  model From  the  analysis  of  LS,  we  found  the  following  cognitive   properties:   Ground-­‐‑invariance  (like  visual  acention,  Takahashi  et  al.,  2010) Comparative  valuation psychology:  Tversky  &  Kahneman,  Science,  1974. brain  science:  Daw  et  al.,  Nature,  2006. Idiosyncratic,  asymmetric  risk  a8itude  as  in  the  prospect  theory Kahneman  &  Tversky,  Am.,  Psy.,  1984,  Boorman  et  al.,  Neuron,  2009 Satisficing   Simon,  Psy.  Rev.,  1954,  Kolling  et  al.,  Science,  2012.
  • 52. Principal  human  cognitive  biases !9
  • 53. Principal  human  cognitive  biases Humans:   !9
  • 54. Principal  human  cognitive  biases Humans:   Satisficing:  do  not  optimize  but  satisfice. !9
  • 55. Principal  human  cognitive  biases Humans:   Satisficing:  do  not  optimize  but  satisfice. become  satisfied  when  it  is  becer  than  the  reference   level !9
  • 56. Principal  human  cognitive  biases Humans:   Satisficing:  do  not  optimize  but  satisfice. become  satisfied  when  it  is  becer  than  the  reference   level Comparative  valuation:  evaluate  states  and  actions  in   a  relative  manner !9
  • 57. Principal  human  cognitive  biases Humans:   Satisficing:  do  not  optimize  but  satisfice. become  satisfied  when  it  is  becer  than  the  reference   level Comparative  valuation:  evaluate  states  and  actions  in   a  relative  manner Asymmetric  risk  a:itude:  asymmetrically  recognize   gain  and  loss !9
  • 58. Satisficing A1 A2 A1 A2 reference all arms are over reference No pursuit of arms over the reference level given reference all arms are under reference Search hard for an arm over the reference level
  • 59. Satisficing A1 A2 A1 A2 reference all arms are over reference No pursuit of arms over the reference level given reference all arms are under reference Search hard for an arm over the reference level Risk  a:itude  (Reliability  consideration) Risk-avoiding over the reference Expected value 0.75 win (o) and lose (x) in the past = ○×○○○ ×○○○○ ○○○×○ ○○×○× comparison considering reliability Choose 15/20 than 3/4 75% ○×○○ > Risk-seeking under the reference 25% reflection effect = ×○××× ○×××× ×××○× ××○×○ 25% ×○×× < Gamble on 1/4 rather than 5/20
  • 60. Satisficing A1 A2 A1 reference all arms are over reference No pursuit of arms over the reference level given reference all arms are under reference Search hard for an arm over the reference level A2 Risk  a:itude  (Reliability  consideration) Risk-avoiding over the reference Expected value 0.75 win (o) and lose (x) in the past = ○×○○○ ×○○○○ ○○○×○ ○○×○× ○×○○ Choose 15/20 than 3/4 Comparative  evaluation value of A1 A2 75% = 25% reflection effect ×○××× ○×××× ×××○× ××○×○ > comparison considering reliability A1 Risk-seeking under the reference 25% ×○×× < Gamble on 1/4 rather than 5/20 Try arms other than A1 by comparative valuation value of A2 (see-saw) Choose A1 and lose comparative absolute A1 A2
  • 61. The  generalized  LS  with  variable   reference  (LSVR) Variable Reference Abstract image LSVR is a generalization of LS with an autonomously adjusted parameter of reference.
  • 62. n-­‐‑armed  bandit  problem  (nABP) !12
  • 63. n-­‐‑armed  bandit  problem  (nABP) The  simplest  framework  in  reinforcement  learning,   exhibiting  the  exploration-­‐‑exploitation  dilemma  and   the  speed-­‐‑accuracy  tradeoff. !12
  • 64. n-­‐‑armed  bandit  problem  (nABP) The  simplest  framework  in  reinforcement  learning,   exhibiting  the  exploration-­‐‑exploitation  dilemma  and   the  speed-­‐‑accuracy  tradeoff. It  is  to  maximize  the  total  reward  acquired  from  n   actions  (sources)  with  unknown  reward  distribution. !12
  • 65. n-­‐‑armed  bandit  problem  (nABP) The  simplest  framework  in  reinforcement  learning,   exhibiting  the  exploration-­‐‑exploitation  dilemma  and   the  speed-­‐‑accuracy  tradeoff. It  is  to  maximize  the  total  reward  acquired  from  n   actions  (sources)  with  unknown  reward  distribution. One-­‐‑armed  bandit  is  a  slot  machine  that  gives  a   reward  (win)  or  not  (lose). !12
  • 66. n-­‐‑armed  bandit  problem  (nABP) The  simplest  framework  in  reinforcement  learning,   exhibiting  the  exploration-­‐‑exploitation  dilemma  and   the  speed-­‐‑accuracy  tradeoff. It  is  to  maximize  the  total  reward  acquired  from  n   actions  (sources)  with  unknown  reward  distribution. One-­‐‑armed  bandit  is  a  slot  machine  that  gives  a   reward  (win)  or  not  (lose). n-­‐‑armed  bandit  is  a  slot  machine  with  n  arms  that  have   different  probability  of  winning.   !12
  • 67. Performance  indices  for  nABP !13
  • 68. Performance  indices  for  nABP Accuracy:   !13
  • 69. Performance  indices  for  nABP Accuracy:   the  average  percentage  of  choosing  the  optimal   action !13
  • 70. Performance  indices  for  nABP Accuracy:   the  average  percentage  of  choosing  the  optimal   action Regret  (expected  loss):   !13
  • 71. Performance  indices  for  nABP Accuracy:   the  average  percentage  of  choosing  the  optimal   action Regret  (expected  loss):   the  difference  of  the  actually  acquired   accumulated  rewards  from  the  best  possible   sequence  of  actions  (where  accuracy=1.0  all   through  the  trial) !13
  • 72. Result n=100, the reward probability for each action is taken uniformly from [0,1]. LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 LS LS-VR UCB1-tuned LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 10 Expected loss 0.6 0 0.0 0.2 5 0.4 Accuracy rate 0.8 15 1.0 LS LS-VR UCB1-tuned 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 0e+00 Steps Accuracy: highest The more there are actions, the better the performance of LSVR becomes. 2e+05 4e+05 6e+05 8e+05 1e+06 Steps Regret: smallest Kohno & Takahashi, 2012; in prep.
  • 73. Non-­‐‑stationary  bandits The  reward  probabilities  change  while   playing. !15
  • 74. Result in non-stationary environment 1 n=16, the reward probability is from [0,1]. The probabilities are totally reset every 10,000 steps. 200 100 150 Expected loss 0.6 0.4 50 0.2 0 0.0 Accuracy rate 0.8 250 1.0 LS LS-VR UCB1-tuned LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 300 LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 LS LS-VR UCB1-tuned 0 10000 20000 30000 40000 50000 0 10000 Steps Accuracy: highest Kohno & Takahashi, in prep. 20000 30000 40000 Steps Regret: smallest 50000
  • 75. Result in non-stationary environment 2 n=20, the initial probability from [0,1]. The probability of each action is reset at the probability of 0.0001. LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 0.6 0.4 0.2 0.0 Accuracy rate 0.8 1.0 LS LS-VR UCB1-tuned 0 10000 20000 30000 Steps 40000 50000 Accuracy (the rate of the optimal action at the time chosen)
  • 76. Result in non-stationary environment 2 n=20, the initial probability from [0,1]. The probability of each action is reset at the probability of 0.0001. 0.6 0.4 0.2 0.0 Accuracy rate 0.8 1.0 Even when a not well-tried action becomes the new optimal, it can switch to the optimal action. LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 LS LS-VR UCB1-tuned 0 10000 20000 30000 Steps 40000 50000 Accuracy (the rate of the optimal action at the time chosen)
  • 77. Result in non-stationary environment 2 n=20, the initial probability from [0,1]. The probability of each action is reset at the probability of 0.0001. 0.6 0.4 0.2 0.0 Accuracy rate 0.8 1.0 Even when a not well-tried action becomes the new optimal, it can switch to the optimal action. LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 LS LS-VR UCB1-tuned 0 10000 20000 30000 Steps 40000 50000 Accuracy (the rate of the optimal action at the time chosen)
  • 78. Result in non-stationary environment 2 n=20, the initial probability from [0,1]. The probability of each action is reset at the probability of 0.0001. 0.6 0.2 0.4 If the reward is given deterministically, this is impossible. 0.0 Accuracy rate 0.8 1.0 Even when a not well-tried action becomes the new optimal, it can switch to the optimal action. LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 LS LS-VR UCB1-tuned 0 10000 20000 30000 Steps 40000 50000 Accuracy (the rate of the optimal action at the time chosen)
  • 79. Result in non-stationary environment 2 n=20, the initial probability from [0,1]. The probability of each action is reset at the probability of 0.0001. 0.6 0.2 0.4 If the reward is given deterministically, this is impossible. 0.0 Accuracy rate 0.8 1.0 Even when a not well-tried action becomes the new optimal, it can switch to the optimal action. LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 LS LS-VR UCB1-tuned 0 10000 20000 30000 Steps 40000 50000 Accuracy (the rate of the optimal action at the time chosen)
  • 80. Result in non-stationary environment 2 n=20, the initial probability from [0,1]. The probability of each action is reset at the probability of 0.0001. 0.6 0.2 0.4 If the reward is given deterministically, this is impossible. Efficient search utilizing uncertainty and fluctuation in non-stationary Accuracy (the rate of the optimal environments action at the time chosen) 0.0 Accuracy rate 0.8 1.0 Even when a not well-tried action becomes the new optimal, it can switch to the optimal action. LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 LS LS-VR UCB1-tuned 0 10000 20000 30000 Steps 40000 50000
  • 81. Results !18
  • 82. Results LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 0.6 0.2 0.4 stationary 0.0 Accuracy rate 0.8 1.0 LS LS-VR UCB1-tuned 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 Steps !18
  • 83. Results LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 The  more  there  are  options,   the  becer  the  performance   of  LSVR  becomes. 0.6 0.2 0.4 stationary 0.0 Accuracy rate 0.8 1.0 LS LS-VR UCB1-tuned 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 Steps !18
  • 84. Results LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 The  more  there  are  options,   the  becer  the  performance   of  LSVR  becomes. 0.6 stationary 0.0 0.2 0.4 Accuracy rate 0.8 1.0 LS LS-VR UCB1-tuned 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 Steps LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 0.6 non-stationary 2 0.0 0.2 0.4 Accuracy rate 0.8 1.0 LS LS-VR UCB1-tuned 0 10000 20000 30000 40000 50000 Steps LSVR  can  trace  the  unobserved   change,  amplifying  fluctuation. !18
  • 85. Results LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 The  more  there  are  options,   the  becer  the  performance   of  LSVR  becomes. 0.6 stationary 2e+05 4e+05 6e+05 8e+05 1e+06 non-stationary! synchronous 0.2 0.6 0.0 non-stationary 2 0.4 Accuracy rate 0.8 0.4 Accuracy rate LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 LS LS-VR UCB1-tuned 1.0 0.8 Steps 0.6 0e+00 LS γ= 0.999 LS-VR γ= 0.999 UCB1-tuned γ= 0.999 LS LS-VR UCB1-tuned 1.0 0.0 0.2 0.4 Accuracy rate 0.8 1.0 LS LS-VR UCB1-tuned 0 10000 20000 30000 0.2 Steps 10000 20000 30000 40000 50000 Steps LSVR  can  trace  the  unobserved   change,  amplifying  fluctuation. 50000 LSVR  can  trace  the  change  in   non-­‐‑stationary  environments. 0.0 0 40000 !18
  • 86. Discussion !19
  • 87. Discussion The  cognitive  biases  of  humans,  when  combined:   !19
  • 88. Discussion The  cognitive  biases  of  humans,  when  combined:   Effectively  works  for  adaptation  under  uncertainty !19
  • 89. Discussion The  cognitive  biases  of  humans,  when  combined:   Effectively  works  for  adaptation  under  uncertainty Conflates  an  action  and  the  set  of  the  actions   through  comparative  valuation. !19
  • 90. Discussion The  cognitive  biases  of  humans,  when  combined:   Effectively  works  for  adaptation  under  uncertainty Conflates  an  action  and  the  set  of  the  actions   through  comparative  valuation. Symbolizes  the  whole  situation  into  a  virtual  action. !19
  • 91. Discussion The  cognitive  biases  of  humans,  when  combined:   Effectively  works  for  adaptation  under  uncertainty Conflates  an  action  and  the  set  of  the  actions   through  comparative  valuation. Symbolizes  the  whole  situation  into  a  virtual  action. Utilizes  fluctuation  from  uncertainty  and  enables   adaptation  to  non-­‐‑stationary  environments. !19
  • 92. Conflating  part  and  whole !20
  • 93. Conflating  part  and  whole Comparative  valuation  conflates  the   information  of  an  action  and  of  the  whole   set  of  actions. !20
  • 94. Conflating  part  and  whole Comparative  valuation  conflates  the   information  of  an  action  and  of  the  whole   set  of  actions. Universal  in  living  systems  from  slime   molds  (Lacy  &  Beekman,  2011)  to  neurons   (Royer  &  Paré,  2003)  to  animals  and  human   beings. !20
  • 95. Relative  evaluation  is  especially   important if absolute value of A1 value of A2 if relative Try arms other than A1 by relative evaluation (see-saw) Choose A1 and lose value of A1 value of A2 value of A1 value of A2
  • 96. Relative  evaluation  is  especially   important ★ Relative  evaluation:   if absolute value of A1 value of A2 if relative Try arms other than A1 by relative evaluation (see-saw) Choose A1 and lose value of A1 value of A2 value of A1 value of A2
  • 97. Relative  evaluation  is  especially   important ★ Relative  evaluation:   ★ is  what  even  slime  molds  and  real  neural  networks  (conservation   of  synaptic  weights)  do.  Behavioral  economics  found  that  humans   comparatively  evaluate  actions  and  states. if absolute value of A1 value of A2 if relative Try arms other than A1 by relative evaluation (see-saw) Choose A1 and lose value of A1 value of A2 value of A1 value of A2
  • 98. Relative  evaluation  is  especially   important ★ Relative  evaluation:   ★ is  what  even  slime  molds  and  real  neural  networks  (conservation   of  synaptic  weights)  do.  Behavioral  economics  found  that  humans   comparatively  evaluate  actions  and  states. ★ weakens  the  dilemma  between  exploitation  and  exploration  with   the  see-­‐‑saw  game  like  competition  among  arms:   if absolute value of A1 value of A2 if relative Try arms other than A1 by relative evaluation (see-saw) Choose A1 and lose value of A1 value of A2 value of A1 value of A2
  • 99. Relative  evaluation  is  especially   important ★ Relative  evaluation:   ★ is  what  even  slime  molds  and  real  neural  networks  (conservation   of  synaptic  weights)  do.  Behavioral  economics  found  that  humans   comparatively  evaluate  actions  and  states. ★ weakens  the  dilemma  between  exploitation  and  exploration  with   the  see-­‐‑saw  game  like  competition  among  arms:   ★ Through  failure  (low  reward),  choice  of  greedy  action  may  quickly   trigger  to  the  next  choice  of  the  previously  second  best,  non-­‐‑greedy  arm. if absolute value of A1 value of A2 if relative Try arms other than A1 by relative evaluation (see-saw) Choose A1 and lose value of A1 value of A2 value of A1 value of A2
  • 100. Relative  evaluation  is  especially   important ★ Relative  evaluation:   ★ is  what  even  slime  molds  and  real  neural  networks  (conservation   of  synaptic  weights)  do.  Behavioral  economics  found  that  humans   comparatively  evaluate  actions  and  states. ★ weakens  the  dilemma  between  exploitation  and  exploration  with   the  see-­‐‑saw  game  like  competition  among  arms:   ★ Through  failure  (low  reward),  choice  of  greedy  action  may  quickly   trigger  to  the  next  choice  of  the  previously  second  best,  non-­‐‑greedy  arm. ★ Through  success  (high  reward),  choice  of  greedy  action  may  quickly   trigger  to  focussing  on  the  currently  greedy  action,  lessening  the   possibility  of  choosing  non-­‐‑greedy  arms  by  decreasing  the  value  of  other   arms. if absolute value of A1 value of A2 if relative Try arms other than A1 by relative evaluation (see-saw) Choose A1 and lose value of A1 value of A2 value of A1 value of A2
  • 101. Symbolization of the whole and comparative valuation with multi actions 777 777 777 A1 A2 An ...
  • 102. Symbolization of the whole and comparative valuation with multi actions 777 777 777 A1 A2 An ...
  • 103. Symbolization of the whole and comparative valuation with multi actions 777 777 777 A1 A2 An ...
  • 104. Symbolization of the whole and comparative valuation with multi actions Virtual machine representing the whole 777 Ag 777 777 777 A1 A2 An ...
  • 105. Comparative valuation with a virtual action representing the whole Virtual machine representing the whole 777 777 777 777 Ag A1 A2 A2 An ... “>” or “<”? “>” or “<”? “>” or “<”?
  • 106. Comparative valuation with a virtual action representing the whole Virtual machine representing the whole 777 777 777 777 Ag A1 A2 A2 An ... “>” or “<”? “>” or “<”? “>” or “<”?
  • 107. Conclusion 24
  • 108. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution 24
  • 109. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution Applicable  in  engineering,  in  machine  learning  and  robot  control 24
  • 110. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution Applicable  in  engineering,  in  machine  learning  and  robot  control Implications  to  brain  science  (brain  as  a  machine  learning  equipment) 24
  • 111. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution Applicable  in  engineering,  in  machine  learning  and  robot  control Implications  to  brain  science  (brain  as  a  machine  learning  equipment) Modeling  PFC  and  vmPFC 24
  • 112. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution Applicable  in  engineering,  in  machine  learning  and  robot  control Implications  to  brain  science  (brain  as  a  machine  learning  equipment) Modeling  PFC  and  vmPFC 24
  • 113. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution Applicable  in  engineering,  in  machine  learning  and  robot  control Implications  to  brain  science  (brain  as  a  machine  learning  equipment) Modeling  PFC  and  vmPFC Brain  science  and  the  three  cognitive  biases:   24
  • 114. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution Applicable  in  engineering,  in  machine  learning  and  robot  control Implications  to  brain  science  (brain  as  a  machine  learning  equipment) Modeling  PFC  and  vmPFC Brain  science  and  the  three  cognitive  biases:   Satisficing   24
  • 115. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution Applicable  in  engineering,  in  machine  learning  and  robot  control Implications  to  brain  science  (brain  as  a  machine  learning  equipment) Modeling  PFC  and  vmPFC Brain  science  and  the  three  cognitive  biases:   Satisficing   Kolling  et  al.,  Science,  2012. 24
  • 116. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution Applicable  in  engineering,  in  machine  learning  and  robot  control Implications  to  brain  science  (brain  as  a  machine  learning  equipment) Modeling  PFC  and  vmPFC Brain  science  and  the  three  cognitive  biases:   Satisficing   Kolling  et  al.,  Science,  2012. Comparative  valuation  of  state-­‐‑action   value 24
  • 117. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution Applicable  in  engineering,  in  machine  learning  and  robot  control Implications  to  brain  science  (brain  as  a  machine  learning  equipment) Modeling  PFC  and  vmPFC Brain  science  and  the  three  cognitive  biases:   Satisficing   Kolling  et  al.,  Science,  2012. Comparative  valuation  of  state-­‐‑action   value Daw  et  al.,  Nature,  2006. 24
  • 118. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution Applicable  in  engineering,  in  machine  learning  and  robot  control Implications  to  brain  science  (brain  as  a  machine  learning  equipment) Modeling  PFC  and  vmPFC Brain  science  and  the  three  cognitive  biases:   Satisficing   Kolling  et  al.,  Science,  2012. Comparative  valuation  of  state-­‐‑action   value Daw  et  al.,  Nature,  2006. Idiosyncratic  risk  evaluation 24
  • 119. Conclusion The  cognitive  biases  that  look  irrational  are,  when  appropriately   combined  together  as  in  humans,  actually  rational  for  adapting  to   uncertain  environments  and  survival  through  evolution Applicable  in  engineering,  in  machine  learning  and  robot  control Implications  to  brain  science  (brain  as  a  machine  learning  equipment) Modeling  PFC  and  vmPFC Brain  science  and  the  three  cognitive  biases:   Satisficing   Kolling  et  al.,  Science,  2012. Comparative  valuation  of  state-­‐‑action   value Daw  et  al.,  Nature,  2006. Idiosyncratic  risk  evaluation Boorman  et  al.,  Neuron,  2009. 24
  • 120. Applications  of  bandit  problems Game-tree
  • 121. Applications  of  bandit  problems Game-tree ★ Monte-­‐‑Carlo  tree  search  (Go-­‐‑AI)
  • 122. Applications  of  bandit  problems Game-tree ★ Monte-­‐‑Carlo  tree  search  (Go-­‐‑AI) ★ Online  advertisement
  • 123. Applications  of  bandit  problems Game-tree ★ Monte-­‐‑Carlo  tree  search  (Go-­‐‑AI) ★ Online  advertisement ★ e.g.,  A/B  test
  • 124. Applications  of  bandit  problems Game-tree ★ Monte-­‐‑Carlo  tree  search  (Go-­‐‑AI) ★ Online  advertisement ★ e.g.,  A/B  test ★ Design  of  medical  treatment  
  • 125. Applications  of  bandit  problems Game-tree ★ Monte-­‐‑Carlo  tree  search  (Go-­‐‑AI) ★ Online  advertisement ★ e.g.,  A/B  test ★ Design  of  medical  treatment   ★ Reinforcement  learning
  • 126. Robotic motion learning Learning giant-swing motion with no prior knowledge and under coarse-grained states through trial-and-error. Real$Robot$ Simulator$ free$joint 1st$joint$(free) Posi%on'State' 2nd$joint$ (ac,ve) ac,ve$joint 1st$link Acquired#reward#per#1000#steps 2nd$link Typical(case Average(of(100(trials 600# 500# 400# 400# 300# P23 P0 P1 P22 P2 P21 P3 P20 P4 P19 P5 P18 P6 P17 P7 P16 P8 P15 P9 P14 P10 P13P12P11 Posture'State' 5/6π'[rad] R4 R3 [rad/s] R2 R1 W6 W5 W4 W3 W2 W1 W0 3π 0 .3π 300# 0# 20# 40# 60# 80# 100# 0# 20# 40# 60# 80# A1 A2 100# A0 Learning#steps#[#/1000#steps] Reward' r'='1 4.0'[rad/s] 200# 200# Ac%on' LS>Q# Q# 0.0'[rad/s] .4.0'[rad/s] r'='|θ%p'/'π| Uragami, D., Takahashi, T., Matsuo,Y., Cognitively inspired reinforcement learning architecture and its application to giant-swing motion control, BioSystems, 116, 1–9. (2014) R0 0'[rad] 600# 500# Velocity'State' r'='0