1) The document discusses recent advances in deep reinforcement learning algorithms for continuous control tasks. It examines factors like network architecture, reward scaling, random seeds, environments and codebases that impact reproducibility of deep RL results.
2) It analyzes the performance of algorithms like ACKTR, PPO, DDPG and TRPO on benchmarks like Hopper, HalfCheetah and identifies unstable behaviors and unfair comparisons.
3) Simpler approaches like nearest neighbor policies are explored as alternatives to deep networks for solving continuous control tasks, especially in sparse reward settings.
1) The document discusses recent advances in deep reinforcement learning algorithms for continuous control tasks. It examines factors like network architecture, reward scaling, random seeds, environments and codebases that impact reproducibility of deep RL results.
2) It analyzes the performance of algorithms like ACKTR, PPO, DDPG and TRPO on benchmarks like Hopper, HalfCheetah and identifies unstable behaviors and unfair comparisons.
3) Simpler approaches like nearest neighbor policies are explored as alternatives to deep networks for solving continuous control tasks, especially in sparse reward settings.
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement LearningPreferred Networks
Introduction of Deep Reinforcement Learning, which was presented at domestic NLP conference.
言語処理学会第24回年次大会(NLP2018) での講演資料です。
http://www.anlp.jp/nlp2018/#tutorial
本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
ゼロから始める深層強化学習(NLP2018講演資料)/ Introduction of Deep Reinforcement LearningPreferred Networks
Introduction of Deep Reinforcement Learning, which was presented at domestic NLP conference.
言語処理学会第24回年次大会(NLP2018) での講演資料です。
http://www.anlp.jp/nlp2018/#tutorial
本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
ERATO感謝祭 Season IV
【参考】Satoshi Hara and Takanori Maehara. Enumerate Lasso Solutions for Feature Selection. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI'17), pages 1985--1991, 2017.
Computational Motor Control: Reinforcement Learning (JAIST summer course) hirokazutanaka
This is lecure 6 note for JAIST summer school on computational motor control (Hirokazu Tanaka & Hiroyuki Kambara). Lecture video: https://www.youtube.com/watch?v=GHMcx5F0_j8
[DL輪読会]Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning
1. DEEP LEARNING JP
[DL Papers]
Learn What Not to Learn: Action Elimination with Deep
Reinforcement Learning
Koichiro Tamura, Matsuo Lab
http://deeplearning.jp/
2. Agenda
1. Paper Information
2. Problem to Solve
3. Abstract
4. Related Work
5. Action Elimination
6. Method
7. Experiment Results
2
3. PAPER INFORMATION
• Learn What Not to Learn: Action Elimination with Deep Reinforcement
Learning
– Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J. Mankowitz, Shie Mannor
– https://arxiv.org/abs/1809.02121
– Submitted on 6 Sep 2018
– NIPS2018 accepted
– RLにおいて,選択可能な行動が多い場合学習が難しい. contextual multi-armed
bandits を導入し,「どの行動を取るべきではないか」というAction-Elimination機構を
深層強化学習に取り入れることで,より高速でロバストな学習を可能にし,膨大な行
動空間を持つゲーム`Zork`などで優れたパフォーマンスを示した.
3
6. 関連研究
• DRL with linear function approximation
– DNNの最終層において,線形関数を用いて価値関数を更新する
• Shallow Updates for Deep Reinforcement Learning[Levine et al., 2017]
– 深層強化学習は学習が不安定なので,DLの認識力の高さを活かしつつ,最終層のみ別途線形関数を
更新して学習するやり方
• Deep Bayesian Bandits Showdown[Requelme, 2018](ICLR2018)
– Contextual linear banditsでは,neuro-linear Thompson samplingが優れている
6
7. 関連研究
• RL in Large Action Spaces
– 多くの既存研究は行動空間をバイナリ空間に要素分解することに注力
– Fast reinforcement learning with large action sets using error-correcting outputs
codes for mdp factorization[Dulac, 2012]
• 離散的な行動空間を連続(微分可能)な空間に埋め込む方法を提案
– 行動空間を「eliminate」すること自体は,Learning rates for Q-learning[Even-Dar,
2003]で提案されている
• 状態ごとに価値関数の信頼区間を学習することで確率的に可能性が低い行動をeliminateする
• Combating Reinforcement Learning‘s Sisyphean Curse with Intrinsic Fear[Lipton et al., 2016]で
は, (再起不能な行動に伴う)危険な状態を忘却しないようにする重要性が述べられている
7