論文紹介:
Tang, Liang, et al. "Ensemble contextual bandits for personalized recommendation." Proceedings of the 8th ACM Conference on Recommender Systems. ACM, 2014.
強化学習勉強会・論文紹介(第50回)Optimal Asset Allocation using Adaptive Dynamic Programming...Naoki Nishimura
Optimal Asset Allocation using Adaptive Dynamic Programming
Neuneier. Ralph, In Advances in Neural Information Processing Systems. 1996.
Enhancing Q-Learning for Optimal Asset Allocation
Neuneier. Ralph, In Advances in Neural Information Processing Systems. 1998.
強化学習勉強会・論文紹介(第50回)Optimal Asset Allocation using Adaptive Dynamic Programming...Naoki Nishimura
Optimal Asset Allocation using Adaptive Dynamic Programming
Neuneier. Ralph, In Advances in Neural Information Processing Systems. 1996.
Enhancing Q-Learning for Optimal Asset Allocation
Neuneier. Ralph, In Advances in Neural Information Processing Systems. 1998.
28. 数値実験の設定
• データ
- Yahoo! Today News のアクセスログ
- KDD Cup 2012 Online Advertise での提供データ
• 評価方法
- Yahoo! Today News:Replayer Methodという指標
→ 過去にユーザ訪問ログを利用して評価(次ページ)
- KDD Cup 2012:訪問ログから各コンテクストに各腕を引いた
ときの確率をロジスティック回帰によりを学習し、
予測したCTRを用いてシミュレーション
→ 一様乱数なレコメンドでないためReplayer Methodが使えない
28
O. Chapelle and L. Li. An empirical evaluation of thompson sampling. In NIPS, 2011.
29. Replayer Method
29
L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of
contextual-bandit-based news article recommendation algorithms. In WSDM, 2011.