This document provides an overview of POMDP (Partially Observable Markov Decision Process) and its applications. It first defines the key concepts of POMDP such as states, actions, observations, and belief states. It then uses the classic Tiger problem as an example to illustrate these concepts. The document discusses different approaches to solve POMDP problems, including model-based methods that learn the environment model from data and model-free reinforcement learning methods. Finally, it provides examples of applying POMDP to games like ViZDoom and robot navigation problems.
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.Deep Learning JP
Deep reinforcement learning algorithms often fail to learn complex tasks. Recent works have identified three issues that form a "deadly triad" contributing to this problem: non-stationary targets, high variance, and positive correlation. New algorithms aim to address these issues by improving exploration, stabilizing learning, and decorrelating updates. Overall, deep reinforcement learning remains a challenging area with opportunities to develop more data-efficient and generally applicable algorithms.
1. The document discusses energy-based models (EBMs) and how they can be applied to classifiers. It introduces noise contrastive estimation and flow contrastive estimation as methods to train EBMs.
2. One paper presented trains energy-based models using flow contrastive estimation by passing data through a flow-based generator. This allows implicit modeling with EBMs.
3. Another paper argues that classifiers can be viewed as joint energy-based models over inputs and outputs, and should be treated as such. It introduces a method to train classifiers as EBMs using contrastive divergence.
This document provides an overview of POMDP (Partially Observable Markov Decision Process) and its applications. It first defines the key concepts of POMDP such as states, actions, observations, and belief states. It then uses the classic Tiger problem as an example to illustrate these concepts. The document discusses different approaches to solve POMDP problems, including model-based methods that learn the environment model from data and model-free reinforcement learning methods. Finally, it provides examples of applying POMDP to games like ViZDoom and robot navigation problems.
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.Deep Learning JP
Deep reinforcement learning algorithms often fail to learn complex tasks. Recent works have identified three issues that form a "deadly triad" contributing to this problem: non-stationary targets, high variance, and positive correlation. New algorithms aim to address these issues by improving exploration, stabilizing learning, and decorrelating updates. Overall, deep reinforcement learning remains a challenging area with opportunities to develop more data-efficient and generally applicable algorithms.
1. The document discusses energy-based models (EBMs) and how they can be applied to classifiers. It introduces noise contrastive estimation and flow contrastive estimation as methods to train EBMs.
2. One paper presented trains energy-based models using flow contrastive estimation by passing data through a flow-based generator. This allows implicit modeling with EBMs.
3. Another paper argues that classifiers can be viewed as joint energy-based models over inputs and outputs, and should be treated as such. It introduces a method to train classifiers as EBMs using contrastive divergence.
30. 作成した学習モデル
番号 生物学的制約 実際の処理
A 揺らぎ
10%の確率で, 0.5秒間, 入力画面の座標を
-20〜20ピクセルの間でランダムにずらす
B 認識の遅れ 入力する画面を0.1秒前のものにする
C 行動の遅れ キー操作時に0.05秒の遅延を挿入
D 緊張
10連続でミスをしなかったらランダムな行動を
取る確率を2倍にして0.05秒遅延を挿入
E 無し 生物学的制約無し
42. A 揺らぎ
B 認識の
遅れ
C 行動の
遅れ
D 緊張 E 無し F 初級者 G 上級者
平均評価値 2.17 2.08 3.83 3.08 3.33 3.75 4.33
標準偏差 0.80 0.86 1.14 0.76 1.18 1.09 0.75
◆ 実際の人間のプレイはかなり見破られた
◆「行動の遅れ」は実際の人間レベルに高評価
A B C D E F G
AIのプレイ映像 人間のプレイ映像