This document summarizes a presentation on the bandit problem and algorithms to solve it. The presentation will:
1) Explain what the bandit problem is and provide a simple example.
2) Describe algorithms for solving the bandit problem, including epsilon-greedy and Thompson sampling.
3) Discuss how to apply bandit algorithms to problems that include contextual information.
This document summarizes a presentation on the bandit problem and algorithms to solve it. The presentation will:
1) Explain what the bandit problem is and provide a simple example.
2) Describe algorithms for solving the bandit problem, including epsilon-greedy and Thompson sampling.
3) Discuss how to apply bandit algorithms to problems that include contextual information.
The document summarizes recent research related to "theory of mind" in multi-agent reinforcement learning. It discusses three papers that propose methods for agents to infer the intentions of other agents by applying concepts from theory of mind:
1. The papers propose that in multi-agent reinforcement learning, being able to understand the intentions of other agents could help with cooperation and increase success rates.
2. The methods aim to estimate the intentions of other agents by modeling their beliefs and private information, using ideas from theory of mind in cognitive science. This involves inferring information about other agents that is not directly observable.
3. Bayesian inference is often used to reason about the beliefs, goals and private information of other agents based
The document discusses hyperparameter optimization in machine learning models. It introduces various hyperparameters that can affect model performance, and notes that as models become more complex, the number of hyperparameters increases, making manual tuning difficult. It formulates hyperparameter optimization as a black-box optimization problem to minimize validation loss and discusses challenges like high function evaluation costs and lack of gradient information.
* Satoshi Hara and Kohei Hayashi. Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach. AISTATS'18 (to appear).
arXiv ver.: https://arxiv.org/abs/1606.09066#
* GitHub
https://github.com/sato9hara/defragTrees
- The document presents a method for efficiently evaluating counterfactual policies using bandit feedback data.
- It proposes an efficient estimator that achieves the semiparametric efficiency bound, minimizing asymptotic variance among consistent estimators.
- The method involves first estimating choice probabilities from logged bandit data, then using these estimates in a two-step procedure to evaluate counterfactual policies while achieving optimal statistical efficiency.
The document summarizes recent research related to "theory of mind" in multi-agent reinforcement learning. It discusses three papers that propose methods for agents to infer the intentions of other agents by applying concepts from theory of mind:
1. The papers propose that in multi-agent reinforcement learning, being able to understand the intentions of other agents could help with cooperation and increase success rates.
2. The methods aim to estimate the intentions of other agents by modeling their beliefs and private information, using ideas from theory of mind in cognitive science. This involves inferring information about other agents that is not directly observable.
3. Bayesian inference is often used to reason about the beliefs, goals and private information of other agents based
The document discusses hyperparameter optimization in machine learning models. It introduces various hyperparameters that can affect model performance, and notes that as models become more complex, the number of hyperparameters increases, making manual tuning difficult. It formulates hyperparameter optimization as a black-box optimization problem to minimize validation loss and discusses challenges like high function evaluation costs and lack of gradient information.
* Satoshi Hara and Kohei Hayashi. Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach. AISTATS'18 (to appear).
arXiv ver.: https://arxiv.org/abs/1606.09066#
* GitHub
https://github.com/sato9hara/defragTrees
- The document presents a method for efficiently evaluating counterfactual policies using bandit feedback data.
- It proposes an efficient estimator that achieves the semiparametric efficiency bound, minimizing asymptotic variance among consistent estimators.
- The method involves first estimating choice probabilities from logged bandit data, then using these estimates in a two-step procedure to evaluate counterfactual policies while achieving optimal statistical efficiency.
A slide for introducing a paper, validating causal inference models via influence functions, which is written by Alaa and van der Schaar and accepted by ICML2019
This document summarizes a paper titled "A Bandit Approach to Multiple Testing with False Discovery Control" by Jamieson and Jain from NIPS 2018. It introduces the problem of multiple hypothesis testing with the goals of controlling false discovery rate (FDR) and family-wise error rate (FWER) while maximizing true positive rate (TPR) and family-wise probability of detection (FWPD). It describes using an adaptive sampling strategy based on multi-armed bandits to achieve these goals with near-optimal sample complexity.
1) The paper introduces the influence function for interpreting black-box machine learning models. The influence function traces a model's predictions back to the training data by examining how the model's parameters would change if a particular training point was removed or perturbed.
2) The influence function approximates this change in parameters by assuming a quadratic approximation to the empirical risk function around the learned parameters and taking a single Newton step. It shows the parameter change due to removing a point is approximated by the influence function.
3) The paper demonstrates how the influence function can be used to understand model behavior, find adversarial examples, debug issues, and correct errors, among other applications. It also proposes practical methods to compute the influence function for
This document discusses methods for one-shot learning using siamese neural networks. It provides an overview of several key papers in this area, including using siamese networks for signature verification (1993) and one-shot image recognition (2015), and introducing matching networks for one-shot learning (2016). Matching networks incorporate an attention mechanism into a neural network to rapidly learn from small datasets by matching training and test conditions. The document also reviews experiments demonstrating one-shot and few-shot learning on datasets like Omniglot using these siamese and matching network approaches.
18. より効率的なスコアリングに基づく方策
LUCB方策はスコアリングによって一様選択で発生する余分な施行を減らそうとした.
しかし,「腕 𝑖 ∗ と腕 𝑖 ∗∗ を引く」というプロセスが入っているために,逐次削除方策とは
逆に最適腕の選択数が過度に多くなってしまうという問題が生じる.
UGapE方策では,反復ごとに腕 𝑖 ∗
と腕 𝑖 ∗∗
のうちサンプル数が小さい(期待値の不確かさ
が大きい)もののみを選択する.
V. Gabillon, M. Ghavamzadeh, and A. Lazaric. Best arm identification: a unified approach to fixed
budget and fixed confidence. NeuIPS, 2012.
18
47. 参考洋書
- V. Gabillon, M. Ghavamzadeh, and A. Lazaric. Best arm identification: a unified approach to fixed
budget and fixed confidence. In Advances in Neural Information Processing Systems (NIPS), 2012.
- K. Jamieson, M. Malloy, R. Nowak, and S. Bubeck. lil’UCB: an optimal exploration algorithm for
multi-armed bandits. In Conference on Learning Theory (COLT), 2014.
- Jamieson, Kevin G and Jain, Lalit, A Bandit Approach to Sequential Experimental Design with
False Discovery Control, NeuIPS, 2018.
- Hahn, Hirano, and Karlan. Adaptive Experimental Design Using the Propensity Score, Journal of
Business and Economic Statistics, 2009.
- Efficient Counterfactual Learning from Bandit Feedback, Yusuke Narita, Shota Yasui, and Kohei
Yata, AAAI 2019
47