7. 𝑟 が変動する場合
変形すると・・・
→ exponential, recency-weighted average
Q学習(即時報酬だけの場合)
),(),(),( 11 asQrasQasQ kkkk
i
ik
k
i
k
k rQQ
)1()1(
1
0
12. Formal Theory of Creativity & Fun & Intrinsic
Motivation (1990-2010) by Jürgen Schmidhuber
http://people.idsia.ch/~juergen/creativity.html
• (A) an adaptive predictor of the growing data
history as the agent is interacting with its
environment
• (B) a reinforcement learner selecting the
actions that shape the history
• (B) is motivated to learn to invent
interesting things that (A) does not yet know
but can easily learn.
13. (つづき)
• To maximize future expected reward, (B)
learns more and more complex behaviors that
yield initially surprising (but eventually
boring) novel patterns that make (A) quickly
improve.
14. (つづき)
• O(t): the state of some observer O at time t
• H(t): its history of previous actions &
sensations & rewards until time t
• Beauty B(D,O(t)) of any data D: the negative
number of bits required to encode D
• Interestingness I(D,O(t)) of data D for
observer O at discrete time
step t>0: I(D,O(t))= B(D,O(t))-B(D,O(t-1))
25. 参考資料
• Second Interdisciplinary Symposium on
Information-Seeking, Curiosity and Attention
https://openlab-flowers.inria.fr/t/second-
interdisciplinary-symposium-on-information-
seeking-curiosity-and-attention-neurocuriosity-
2016/187
• Information-seeking, curiosity, and attention:
computational and neural mechanisms
http://www.pyoudeyer.com/TICSCuriosity2013.pdf