Koh Takeuchi, Ryo Nishida, Hisashi Kashima, Masaki Onishi. "Grab the Reins of Crowds: Estimating the Effects of Crowd Movement Guidance Using Causal Inference", AAMAS, 2021.
のスライド
Koh Takeuchi, Ryo Nishida, Hisashi Kashima, Masaki Onishi. "Grab the Reins of Crowds: Estimating the Effects of Crowd Movement Guidance Using Causal Inference", AAMAS, 2021.
のスライド
This document presents mathematical formulas for calculating gradients and updates in reinforcement learning. It defines a formula for calculating the gradient of a value function with respect to its parameters, a formula for calculating the gradient of a policy based on the reward and value, and a formula for calculating the gradient of a parameter vector that is a weighted combination of its previous value and the policy gradient.
- The document introduces Deep Counterfactual Regret Minimization (Deep CFR), a new algorithm proposed by Noam Brown et al. in ICML 2019 that incorporates deep neural networks into Counterfactual Regret Minimization (CFR) for solving large imperfect-information games.
- CFR is an algorithm for computing Nash equilibria in two-player zero-sum games by minimizing cumulative counterfactual regret. It scales poorly to very large games that require abstraction of the game tree.
- Deep CFR removes the need for abstraction by using a neural network to generalize the strategy across the game tree, allowing it to solve previously intractable games like no-limit poker.
This document presents mathematical formulas for calculating gradients and updates in reinforcement learning. It defines a formula for calculating the gradient of a value function with respect to its parameters, a formula for calculating the gradient of a policy based on the reward and value, and a formula for calculating the gradient of a parameter vector that is a weighted combination of its previous value and the policy gradient.
- The document introduces Deep Counterfactual Regret Minimization (Deep CFR), a new algorithm proposed by Noam Brown et al. in ICML 2019 that incorporates deep neural networks into Counterfactual Regret Minimization (CFR) for solving large imperfect-information games.
- CFR is an algorithm for computing Nash equilibria in two-player zero-sum games by minimizing cumulative counterfactual regret. It scales poorly to very large games that require abstraction of the game tree.
- Deep CFR removes the need for abstraction by using a neural network to generalize the strategy across the game tree, allowing it to solve previously intractable games like no-limit poker.