論文紹介:
Pan, Wei-Xing, et al. "Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network." The Journal of neuroscience 25.26 (2005): 6235-6242.
This document presents Principal Sensitivity Analysis (PSA) as a method to summarize and visualize the knowledge learned by machine learning models. PSA identifies the principal directions in the input space that the model is most sensitive to through Principal Sensitivity Maps (PSMs). PSMs can distinguish how different input features characterize different classes. Local sensitivity measures show how PSMs contribute to specific classifications. PSA was demonstrated on a neural network for digit classification, finding that different PSMs helped distinguish different digit pairs. PSA provides insights into machine learning models beyond what is possible with traditional sensitivity analysis.
強化学習勉強会・論文紹介(第30回)Ensemble Contextual Bandits for Personalized RecommendationNaoki Nishimura
論文紹介:
Tang, Liang, et al. "Ensemble contextual bandits for personalized recommendation." Proceedings of the 8th ACM Conference on Recommender Systems. ACM, 2014.
強化学習勉強会・論文紹介(第50回)Optimal Asset Allocation using Adaptive Dynamic Programming...Naoki Nishimura
Optimal Asset Allocation using Adaptive Dynamic Programming
Neuneier. Ralph, In Advances in Neural Information Processing Systems. 1996.
Enhancing Q-Learning for Optimal Asset Allocation
Neuneier. Ralph, In Advances in Neural Information Processing Systems. 1998.
This document presents Principal Sensitivity Analysis (PSA) as a method to summarize and visualize the knowledge learned by machine learning models. PSA identifies the principal directions in the input space that the model is most sensitive to through Principal Sensitivity Maps (PSMs). PSMs can distinguish how different input features characterize different classes. Local sensitivity measures show how PSMs contribute to specific classifications. PSA was demonstrated on a neural network for digit classification, finding that different PSMs helped distinguish different digit pairs. PSA provides insights into machine learning models beyond what is possible with traditional sensitivity analysis.
強化学習勉強会・論文紹介(第30回)Ensemble Contextual Bandits for Personalized RecommendationNaoki Nishimura
論文紹介:
Tang, Liang, et al. "Ensemble contextual bandits for personalized recommendation." Proceedings of the 8th ACM Conference on Recommender Systems. ACM, 2014.
強化学習勉強会・論文紹介(第50回)Optimal Asset Allocation using Adaptive Dynamic Programming...Naoki Nishimura
Optimal Asset Allocation using Adaptive Dynamic Programming
Neuneier. Ralph, In Advances in Neural Information Processing Systems. 1996.
Enhancing Q-Learning for Optimal Asset Allocation
Neuneier. Ralph, In Advances in Neural Information Processing Systems. 1998.
Li, Mu, et al. "Efficient mini-batch training for stochastic optimization." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014.
http://www.cs.cmu.edu/~muli/file/minibatch_sgd.pdf
KDD2014勉強会関西会場: http://www.ml.ist.i.kyoto-u.ac.jp/kdd2014reading
Karl Fristonが提唱している「自由エネルギー原理(free-energy principle = FEP)」について、北大文学部の聴衆を対象にして、物理学や機械学習の知識の前提抜きにして、説明を行い、その意義を説明したものです。FEPの意識研究への応用に向けて、FEPとエナクション説の近接性について強調したものとなっております。
Li, Mu, et al. "Efficient mini-batch training for stochastic optimization." Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2014.
http://www.cs.cmu.edu/~muli/file/minibatch_sgd.pdf
KDD2014勉強会関西会場: http://www.ml.ist.i.kyoto-u.ac.jp/kdd2014reading
Karl Fristonが提唱している「自由エネルギー原理(free-energy principle = FEP)」について、北大文学部の聴衆を対象にして、物理学や機械学習の知識の前提抜きにして、説明を行い、その意義を説明したものです。FEPの意識研究への応用に向けて、FEPとエナクション説の近接性について強調したものとなっております。
1. 強化学習勉強会(第22回)
Pan et al., 2005, Dopamine cells respond to predicted events during classical
conditioning: evidence for eligibility traces in the reward-learning network
@sotetsuk
4. Agenda
1. 前提知識
a. Dorpamine (DA) cells in midbrain
b. 脳報酬系
c. TD(λ)
d. Schultz et al., 1997
2. Pan et al., 2005
a. Schultz et al., 1997のモデルの問題点
b. Experiments
c. Results
d. Conclusion
23. 参考文献
1. Dayan and Abbott, 2001, Theoretical Neuroscience.
2. Szepesvári 2010, Algorithms of Reinforcement Learning
3. Schultz et al., 1997, A neural substrate of prediction and reward. Science
4. Pan et al., 2005, Dopamine Cells Respond to Predicted Events during Classical Conditioning: Evidence for
Eligibility Traces in the Reward-Learning Network. J. Neurosci