ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement LearningPreferred Networks
Introduction of Deep Reinforcement Learning, which was presented at domestic NLP conference.
言語処理学会第24回年次大会(NLP2018) での講演資料です。
http://www.anlp.jp/nlp2018/#tutorial
本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement LearningPreferred Networks
Introduction of Deep Reinforcement Learning, which was presented at domestic NLP conference.
言語処理学会第24回年次大会(NLP2018) での講演資料です。
http://www.anlp.jp/nlp2018/#tutorial
本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
Computational Motor Control: Reinforcement Learning (JAIST summer course) hirokazutanaka
This is lecure 6 note for JAIST summer school on computational motor control (Hirokazu Tanaka & Hiroyuki Kambara). Lecture video: https://www.youtube.com/watch?v=GHMcx5F0_j8
NIPS KANSAI Reading Group #7: 逆強化学習の行動解析への応用Eiji Uchibe
Can AI predict animal movements? Filling gaps in animal trajectories using inverse reinforcement learning, Ecosphere,
Modeling sensory-motor decisions in natural behavior, PLoS Comp. Biol.
6. 柔軟物の操作の学習における報酬
• エントロピ正則された強化学習(Deep Dynamic Policy Programming)
• シミュレータの使用なしで学習
Tsurumine, Y., Cui, Y., Uchibe, E., and Matsubara, T. (2017). Deep dynamic policy programming for robot control
with raw images. In Proc. of IROS.
7. シャツの折り畳みの場合
実用的な報酬を準備するのは
難しい
Tsurumine, Y., Cui, Y., Uchibe, E., and Matsubara, T. (2019). Deep reinforcement learning with smooth policy
update: Application to robotic cloth manipulation. Robotics and Autonomous Systems, 112: 72-83.
13. 行動クローニングの問題点
• エキスパートと学習者の状態行動分布は異なる(共変量シフト)
• 行動し続けることで誤差が蓄積し,エキスパートの分布から逸脱
– 元の分布に戻る手段がない
Ross, S. & Bagnell, J.A. (2010). Efficient Reductions for Imitation Learning. In Proc. of AISTATS, 9:661–668.
Osa, T., Pajarinen, J., Neumann, G., Bagnell, J.A., Abbeel, P.A., & Peters, J. (2018). An Algorithmic Perspective on
Imitation Learning. Foundations and Trends in Robotics 7, no. 1–2, 1–179.
14. 敵対的生成ネットワーク(Generative Adversarial
Network; GAN)
• 生成器(Generator)と識別器(Discriminator)の競合によって
データを生成するモデル
https://deephunt.in/the-gan-zoo-79597dc8c347
識別器𝐷(𝑥)生成器𝐺(𝑧)
識別器𝐷(𝑥)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014).
Generative Adversarial Nets. NeurIPS 27, 2672–2680.
45. References
• Blondé, L., & Kalousis, A. (2019). Sample-Efficient Imitation Learning via Generative Adversarial Nets.
Proc. of the 22nd International Conference on Artificial Intelligence and Statistics, 3138–48.
• Finn, C., Christiano, P., Abbeel, P., and Levine, S. (2016). A Connection Between Generative
Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models. NIPS 2016
Workshop on Adversarial Training.
• Fu, J., Luo, K., and Levine, S. (2018). Learning robust rewards with adversarial inverse reinforcement
learning. In Proc. of ICLR.
• Fujimoto, S., van Hoof, H., & Meger, D. (2018). Addressing Function Approximation Error in Actor-
Critic Methods. Proc. of the 35th International Conference on Machine Learning.
• Henderson, P., Chang, W.-D., Bacon, P.-L., Meger, D., Pineau, J., & Precup, D. (2018). OptionGAN:
Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning.
In Proc. of AAAI.
• Hirakawa, T., Yamashita, T., Tamaki, T., Fujiyoshi, H., Umezu, Y., Takeuchi, I., Matsumoto, S., and
Yoda, K. (2018). Can AI predict animal movements? Filling gaps in animal trajectories using inverse
reinforcement learning. Ecosphere.
46. References
• Ho, J. and Ermon, S. (2016). Generative adversarial imitation learning. NIPS29.
• Kalakrishnan, M., Pastor, P., Righetti, L., & Schaal, S. (2013). Learning objective functions for
manipulation. In Proc. of ICRA, 1331–1336.
• Kostrikov, I., Agrawal, K.K., Dwibedi, D., Levine, S., & Tompson, J. (2019). Discriminator-Actor-Critic:
Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning. Proc. of the 7th
ICLR.
• Kozuno, T., Uchibe, E., and Doya, K. (2019). Theoretical analysis of efficiency and robustness of
softmax and gap-increasing operators in reinforcement learning. In Proc. of AISTATS.
• Li, Y., Song, J., & Ermon, S. (2017). InfoGAIL: Interpretable Imitation Learning from Visual
Demonstrations. NIPS30.
• Peng, X.B., Kanazawa, A., Toyer, S., Abbeel, P., & Levine, S. (2019). Variational Discriminator
Bottleneck: Improving Imitation Learning, Inverse RL, and GANs by Constraining Information Flow.
In Proc. of the 7th International Conference on Learning Representations. ICLR, 2019.
• Sasaki, F., Yohira, T., & Kawaguchi, A. (2019). Sample Efficient Imitation Learning for Continuous
Control. Proc. of the 7th International Conference on Learning Representations.
47. References
• Schaul, T., Horgan, D., Gregor, K., & Silver, D. (2015). Universal Value Function Approximators. In Proc.
of ICML, 1312–1320.
• Shimosaka, M., Kaneko, T., & Nishi, K. (2014). Modeling risk anticipation and defensive driving on
residential roads with inverse reinforcement learning. Proc. of the 17th International IEEE Conference
on Intelligent Transportation Systems, 1694–1700.
• Sugiyama, M., Suzuki, T., & Kanamori, T. (2012). Density ratio estimation in machine learning.
Cambridge University Press.
• Sun, M., & Ma, X. (2019). Adversarial Imitation Learning from Incomplete Demonstrations. In Proc. of
IJCAI, 2019.
• Suzuki, Y., Wee, W.M., & Nishioka, I. (2019). TV Advertisement Scheduling by Learning Expert
Intentions. In Proc. of the 25th ACM SIGKDD International Conference on Knowledge Discovery &
Data Mining, pp. 3071–81.
• Torabi, F., Warnell, G., & Stone, P. (2019). Generative Adversarial Imitation from Observation. ICML
2019 Workshop on Imitation, Intent, and Interaction.
• Uchibe, E. & Doya, K. (2014). Inverse reinforcement learning using dynamic policy programming. In
Proc. of ICDL and Epirob.
48. References
• Uchibe, E. (2018). Model-Free Deep Inverse Reinforcement Learning by Logistic Regression. Neural
Processing Letters, 47(3): 891-905.
• 内部. (2019). エントロピ正則された強化学習を用いた模倣学習. 第33回人工知能学会全国大会
予稿集.
• Uchibe, E. (2019). Imitation learning based on entropy-regularized forward and inverse
reinforcement learning. Proc. of RLDM.
• Uchibe, E., & Doya, K. (in preparation). Imitation learning based on entropy-regularized forward and
inverse reinforcement learning.
• Wulfmeier, M., Rao, D., Wang, D.Z., Ondruska, P., & Posner, I. (2017). Large-scale cost function
learning for path planning using deep inverse reinforcement learning. International Journal of
Robotics Research, vol. 36, no. 10: 1073–1087.
• Yamaguchi, S., Honda, N., Ikeda, M., Tsukada, Y., Nakano, S., Mori, I., and Ishii, S. (2018).
Identification of animal behavioral strategies by inverse reinforcement learning. PLoS Computational
Biology.