SlideShare a Scribd company logo
1 of 36
バンディットアルゴリズム勉強会
2014/05/30 第三回
@a_macbee
Bandit周りの論文紹介
本日紹介する論文
• Pandey, S., Agarwal, D., Chakrabarti, D., &
Josifovski, V. (2007, January). Bandits for
Taxonomies: A Model-based Approach.
In SDM.
• Chakrabarti, D., Kumar, R., Radlinski, F., &
Upfal, E. (2008). Mortal Multi-Armed
Bandits. In NIPS (pp. 273-280).
Bandits for Taxonomies:
A Model-based Approach
広告におけるBandit Problem
• 広告における報酬:CTR,Conversion…
• CTRが非常に低いため,探索によって得られる腕の
情報量がほとんどない
• 腕同士の差異が小さい場合は,無駄な探索が増え,
結果としてepsilon-greedyよりも劣る場合もある
• 腕が増えるにつれ,収束もどんどん遅くなる
• 構造的な情報を利用することで,この問題を解決する
ことが出来ないか

→ 分類体系(Taxonomies)に注目
広告におけるBandit Problem
• 広告における報酬:CTR,Conversion…
• CTRが非常に低いため,探索によって得られる腕の
情報量がほとんどない
• 腕同士の差異が小さい場合は,無駄な探索が増え,
結果としてepsilon-greedyよりも劣る場合もある
• 腕が増えるにつれ,収束もどんどん遅くなる
• 構造的な情報を利用することで,この問題を解決する
ことが出来ないか

→ 分類体系(Taxonomies)に注目
arms=[0.1, 0.1, 0.1, 0.1, 0.12]
シミュレーション回数:5000
腕がひける回数:250
広告におけるBandit Problem
• 広告における報酬:CTR,Conversion…
• CTRが非常に低いため,探索によって得られる腕の
情報量がほとんどない
• 腕同士の差異が小さい場合は,無駄な探索が増え,
結果としてepsilon-greedyよりも劣る場合もある
• 腕が増えるにつれ,収束もどんどん遅くなる
• 構造的な情報を利用することで,この問題を解決する
ことが出来ないか

→ 分類体系(Taxonomies)に注目
1. pageのparent class
を同定する (Block or row)
2. pageのparent classにとって最適な
adのparent classを同定する
3. 同じparant classに属するadの中
から,最適なadを同定 (root→leaf)
探索する腕の数が減少 → ベストな腕を早く探せる
各ブロックの更新式
α.Priorblock + (1-α).Scell / Ncell
Priorblock: ブロックのCTR
Scell: セルのクリック数
Ncell: セルのインプレッション数
→ (Scell / Ncell: 観測されたCTR)
α: 任意の値(0.0 α 1.0)
※バンディット本 p.121 に類似した式が掲載されている
任意の値αについての議論は同ページ参照
実 験
※adとpageのクラス数について
実験内容
• 1日分のログデータ( 2.3億imp)を利用
• 25,000回腕をひく
• 40回のシミュレーション
• 以下の3つを比較
• Multi-level (提案手法)
• UCB1
• Round-robin
  ※pageのparent class 同定後にUCB or Round-robin
平均報酬の比較
size for set U. For U = ik, U = B⇡ik
and for U = R(i; +), U = v B
not fit in B⇡ik
, ˜pik = ˆpik and U = 0 for U = ik, R(i; +).
Figure 8: UCB1 w/ shrinkage pol
0
200
400
600
800
1000
1200
0 5000 10000 15000 20000 25000
Revenue
Number of pulls
Multi-level
UCB1
Round-robin
2
3
4
5
6
7
8
9
10
11
10000
MSE
M
Ro
(a) Revenue profile
まとめ
• CTRを報酬としてBanditProblemを考えた場合,
探求における情報量の少なさが問題
• CTRに相関があるように,広告配信の対象とな
るWebページ,配信される広告,それぞれをク
ラス分類することが有効
Mortal Multi-Armed
Bandits
広告におけるBandit Problem
• 一般的なBandit Problemとの相違点
• 腕が払う報酬は時間毎に変化
• 腕はいずれなくなる
  → Bandit Algorithmの中に組み込む

    Mortal Multi-Armed Bandit
腕の 死亡率 をモデリング
• Budget death:

表示可能回数(lifetime: L)を超えた
• Timed death:

adが止められてしまう確率pに従って表示可能回
数が決まる (L=1/p)
キーとなる考え方
• 腕の払う報酬の累積分布を調査
!
!
!
!
(a)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 0.2 0.4 0.6 0.8 1
Fractionofarms
Payoff probability (scaled)
Ad Payoff Distribution
(b)
0
0.1
0.2
0.3
0.4
0.5
100 1000
Regretpertimestep
Expected
Stochasti
Figure 2: (a) Distribution of real world ad payoffs, scaled linearly such tha
支払いが発生する
確率と,支払いが
発生する腕の割合
↓
探求と活用の
トレードオフの観点
から基準値μを決める
全腕の数 基準報酬
額
全腕の数 基準報酬
額
キーとなる考え方
• 腕の払う報酬の累積分布を調査
!
!
!
!
(a)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 0.2 0.4 0.6 0.8 1
Fractionofarms
Payoff probability (scaled)
Ad Payoff Distribution
(b)
0
0.1
0.2
0.3
0.4
0.5
100 1000
Regretpertimestep
Expected
Stochasti
Figure 2: (a) Distribution of real world ad payoffs, scaled linearly such tha
基準値μ=0.6
だった場合
腕が報酬を支払う
確率が0.6以上
 → その腕を活用
!
腕が報酬を支払う
確率が0.6未満
 → 別の腕を探索
活用探求
事前調査
• 腕の払う報酬の累積分布を調査
!
!
!
!
(a)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 0.2 0.4 0.6 0.8 1
Fractionofarms
Payoff probability (scaled)
Ad Payoff Distribution 支払いが発生する
確率と,支払いが
発生する腕の割合
↓
探求と活用の
トレードオフの観点
から基準値μを決める
nd on the mean reward per step of any such algorithm for the state-aware
then use reductions between the different models to show that this bou
ious, timed death cases as well.
he bound assuming we always have new arms available. The expected r
om a cumulative distribution F(µ) with support in [0, 1]. For X ⇠ F(µ)
n of X over F(µ). We assume that the lifetime of an arm has an expone
meter p, and denote its expectation by L = 1/p. The following funct
tween exploration and exploitation in our setting and plays a major role
(µ) =
E[X] + (1 F(µ))(L 1)E[X|X µ]
1 + (1 F(µ))(L 1)
.
3
事前調査
• 腕の払う報酬の累積分布を調査
!
!
!
!
(a)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0 0.2 0.4 0.6 0.8 1
Fractionofarms
Payoff probability (scaled)
Ad Payoff Distribution 支払いが発生する
確率と,支払いが
発生する腕の割合
↓
探求と活用の
トレードオフの観点
から基準値μを決める
nd on the mean reward per step of any such algorithm for the state-aware
then use reductions between the different models to show that this bou
ious, timed death cases as well.
he bound assuming we always have new arms available. The expected r
om a cumulative distribution F(µ) with support in [0, 1]. For X ⇠ F(µ)
n of X over F(µ). We assume that the lifetime of an arm has an expone
meter p, and denote its expectation by L = 1/p. The following funct
tween exploration and exploitation in our setting and plays a major role
(µ) =
E[X] + (1 F(µ))(L 1)E[X|X µ]
1 + (1 F(µ))(L 1)
.
3 詳細は論文を参照!
いまひいた腕が報酬を支払う確率は?
↓
これを可能な限り効率的に求める
論文中で紹介されている
アルゴリズムのうち,
性能の良いものを1つ紹介
Stoch. with Early Stopping
TOPT for the state-oblivious case. The intuition behind
instead of pulling an arm once to determine its payoff
d abandons it unless it looks promising. A variant, called
bandons the arm earlier if its maximum possible future
For n = O log L/✏2
, STOCHASTIC gets an expected
-optimal; the details are omitted due to space constraints.
e L
n (1)]
imes]
ever ]
Algorithm STOCH. WITH EARLY STOPPING
input: Distribution F(µ), expected lifetime L
µ⇤
argmaxµ (µ) [ is defined in (1)]
while we keep playing
[Play random arm as long as necessary]
i random new arm; r 0; d 0
while d < n and n d nµ⇤
r
Pull arm i; r r + R(µi); d d + 1
end while
if r > nµ⇤
[If it is good, stay with it forever]
Pull arm i every turn until it dies
end if
end while
ly use a standard multi-armed bandit (MAB) algorithm
AB algorithms invest a lot of pulls on all arms (at least
Stoch. with Early Stopping
TOPT for the state-oblivious case. The intuition behind
instead of pulling an arm once to determine its payoff
d abandons it unless it looks promising. A variant, called
bandons the arm earlier if its maximum possible future
For n = O log L/✏2
, STOCHASTIC gets an expected
-optimal; the details are omitted due to space constraints.
e L
n (1)]
imes]
ever ]
Algorithm STOCH. WITH EARLY STOPPING
input: Distribution F(µ), expected lifetime L
µ⇤
argmaxµ (µ) [ is defined in (1)]
while we keep playing
[Play random arm as long as necessary]
i random new arm; r 0; d 0
while d < n and n d nµ⇤
r
Pull arm i; r r + R(µi); d d + 1
end while
if r > nµ⇤
[If it is good, stay with it forever]
Pull arm i every turn until it dies
end if
end while
ly use a standard multi-armed bandit (MAB) algorithm
AB algorithms invest a lot of pulls on all arms (at least
入 力
!
累積分布:F(μ)
lifetime:L
!
累積分布から
想定される基準値:
μ*
活用
Stoch. with Early Stopping
TOPT for the state-oblivious case. The intuition behind
instead of pulling an arm once to determine its payoff
d abandons it unless it looks promising. A variant, called
bandons the arm earlier if its maximum possible future
For n = O log L/✏2
, STOCHASTIC gets an expected
-optimal; the details are omitted due to space constraints.
e L
n (1)]
imes]
ever ]
Algorithm STOCH. WITH EARLY STOPPING
input: Distribution F(µ), expected lifetime L
µ⇤
argmaxµ (µ) [ is defined in (1)]
while we keep playing
[Play random arm as long as necessary]
i random new arm; r 0; d 0
while d < n and n d nµ⇤
r
Pull arm i; r r + R(µi); d d + 1
end while
if r > nµ⇤
[If it is good, stay with it forever]
Pull arm i every turn until it dies
end if
end while
ly use a standard multi-armed bandit (MAB) algorithm
AB algorithms invest a lot of pulls on all arms (at least
これまでひいたこと
のない腕を
ランダムにひく:
腕 i
Stoch. with Early Stopping
TOPT for the state-oblivious case. The intuition behind
instead of pulling an arm once to determine its payoff
d abandons it unless it looks promising. A variant, called
bandons the arm earlier if its maximum possible future
For n = O log L/✏2
, STOCHASTIC gets an expected
-optimal; the details are omitted due to space constraints.
e L
n (1)]
imes]
ever ]
Algorithm STOCH. WITH EARLY STOPPING
input: Distribution F(µ), expected lifetime L
µ⇤
argmaxµ (µ) [ is defined in (1)]
while we keep playing
[Play random arm as long as necessary]
i random new arm; r 0; d 0
while d < n and n d nµ⇤
r
Pull arm i; r r + R(µi); d d + 1
end while
if r > nµ⇤
[If it is good, stay with it forever]
Pull arm i every turn until it dies
end if
end while
ly use a standard multi-armed bandit (MAB) algorithm
AB algorithms invest a lot of pulls on all arms (at least
「全腕の数ーこれま
 で腕iをひいた数」
「全腕の数 基準値
ー腕iをひいて得ら
れた累積値」
腕iを
ひき続ける
Stoch. with Early Stopping
TOPT for the state-oblivious case. The intuition behind
instead of pulling an arm once to determine its payoff
d abandons it unless it looks promising. A variant, called
bandons the arm earlier if its maximum possible future
For n = O log L/✏2
, STOCHASTIC gets an expected
-optimal; the details are omitted due to space constraints.
e L
n (1)]
imes]
ever ]
Algorithm STOCH. WITH EARLY STOPPING
input: Distribution F(µ), expected lifetime L
µ⇤
argmaxµ (µ) [ is defined in (1)]
while we keep playing
[Play random arm as long as necessary]
i random new arm; r 0; d 0
while d < n and n d nµ⇤
r
Pull arm i; r r + R(µi); d d + 1
end while
if r > nµ⇤
[If it is good, stay with it forever]
Pull arm i every turn until it dies
end if
end while
ly use a standard multi-armed bandit (MAB) algorithm
AB algorithms invest a lot of pulls on all arms (at least
腕をひくことで
得られる報酬額が
低ければ
早くループを抜ける
Stoch. with Early Stopping
TOPT for the state-oblivious case. The intuition behind
instead of pulling an arm once to determine its payoff
d abandons it unless it looks promising. A variant, called
bandons the arm earlier if its maximum possible future
For n = O log L/✏2
, STOCHASTIC gets an expected
-optimal; the details are omitted due to space constraints.
e L
n (1)]
imes]
ever ]
Algorithm STOCH. WITH EARLY STOPPING
input: Distribution F(µ), expected lifetime L
µ⇤
argmaxµ (µ) [ is defined in (1)]
while we keep playing
[Play random arm as long as necessary]
i random new arm; r 0; d 0
while d < n and n d nµ⇤
r
Pull arm i; r r + R(µi); d d + 1
end while
if r > nµ⇤
[If it is good, stay with it forever]
Pull arm i every turn until it dies
end if
end while
ly use a standard multi-armed bandit (MAB) algorithm
AB algorithms invest a lot of pulls on all arms (at least
腕iをひくことで得ら
れた累積報酬額
>
全腕の数
基準報酬額
腕iのlifetime
が尽きるまで
ひき続ける
Stoch. with Early Stopping
TOPT for the state-oblivious case. The intuition behind
instead of pulling an arm once to determine its payoff
d abandons it unless it looks promising. A variant, called
bandons the arm earlier if its maximum possible future
For n = O log L/✏2
, STOCHASTIC gets an expected
-optimal; the details are omitted due to space constraints.
e L
n (1)]
imes]
ever ]
Algorithm STOCH. WITH EARLY STOPPING
input: Distribution F(µ), expected lifetime L
µ⇤
argmaxµ (µ) [ is defined in (1)]
while we keep playing
[Play random arm as long as necessary]
i random new arm; r 0; d 0
while d < n and n d nµ⇤
r
Pull arm i; r r + R(µi); d d + 1
end while
if r > nµ⇤
[If it is good, stay with it forever]
Pull arm i every turn until it dies
end if
end while
ly use a standard multi-armed bandit (MAB) algorithm
AB algorithms invest a lot of pulls on all arms (at least
再び,これまでひい
たことのない腕を
ランダムにひく:
腕 i
実 験
実験内容
• 300のショッピングクラスの広告
• lifetime=100∼100,000の間で変化
• ステップごとの損失を調査
• 損失:

現在ある腕の中でベストな腕を選択していたら
得られたであろう値から,現在得られた値を
引いたもの
(b)
0
0.1
0.2
0.3
0.4
0.5
100 1000 10000 100000
Regretpertimestep
Expected arm lifetime
Stochastic
Stochastic with Early Stopping
AdaptiveGreedy
UCB1
UCB1-k/c
まとめ
• Bandit Algorithmに腕のlifetimeという概念を
導入
• 良い腕はlifetime分使いきろうという「Stoch.
with Early Stopping」の考え方は,UCB1より
も効果的

 → UCB1は無限に腕がひけて,無限回数Try

   した際に最終結果が良くなるはず
参考資料
• Introduction to Computational Advertising -
Stanford University

http://www.stanford.edu/class/msande239/

More Related Content

Recently uploaded

Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 

Recently uploaded (20)

Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

バンディットアルゴリズム勉強会

  • 3. 本日紹介する論文 • Pandey, S., Agarwal, D., Chakrabarti, D., & Josifovski, V. (2007, January). Bandits for Taxonomies: A Model-based Approach. In SDM. • Chakrabarti, D., Kumar, R., Radlinski, F., & Upfal, E. (2008). Mortal Multi-Armed Bandits. In NIPS (pp. 273-280).
  • 4. Bandits for Taxonomies: A Model-based Approach
  • 5. 広告におけるBandit Problem • 広告における報酬:CTR,Conversion… • CTRが非常に低いため,探索によって得られる腕の 情報量がほとんどない • 腕同士の差異が小さい場合は,無駄な探索が増え, 結果としてepsilon-greedyよりも劣る場合もある • 腕が増えるにつれ,収束もどんどん遅くなる • 構造的な情報を利用することで,この問題を解決する ことが出来ないか
 → 分類体系(Taxonomies)に注目
  • 6. 広告におけるBandit Problem • 広告における報酬:CTR,Conversion… • CTRが非常に低いため,探索によって得られる腕の 情報量がほとんどない • 腕同士の差異が小さい場合は,無駄な探索が増え, 結果としてepsilon-greedyよりも劣る場合もある • 腕が増えるにつれ,収束もどんどん遅くなる • 構造的な情報を利用することで,この問題を解決する ことが出来ないか
 → 分類体系(Taxonomies)に注目 arms=[0.1, 0.1, 0.1, 0.1, 0.12] シミュレーション回数:5000 腕がひける回数:250
  • 7. 広告におけるBandit Problem • 広告における報酬:CTR,Conversion… • CTRが非常に低いため,探索によって得られる腕の 情報量がほとんどない • 腕同士の差異が小さい場合は,無駄な探索が増え, 結果としてepsilon-greedyよりも劣る場合もある • 腕が増えるにつれ,収束もどんどん遅くなる • 構造的な情報を利用することで,この問題を解決する ことが出来ないか
 → 分類体系(Taxonomies)に注目
  • 8.
  • 9. 1. pageのparent class を同定する (Block or row) 2. pageのparent classにとって最適な adのparent classを同定する 3. 同じparant classに属するadの中 から,最適なadを同定 (root→leaf) 探索する腕の数が減少 → ベストな腕を早く探せる
  • 10. 各ブロックの更新式 α.Priorblock + (1-α).Scell / Ncell Priorblock: ブロックのCTR Scell: セルのクリック数 Ncell: セルのインプレッション数 → (Scell / Ncell: 観測されたCTR) α: 任意の値(0.0 α 1.0) ※バンディット本 p.121 に類似した式が掲載されている 任意の値αについての議論は同ページ参照
  • 13. 実験内容 • 1日分のログデータ( 2.3億imp)を利用 • 25,000回腕をひく • 40回のシミュレーション • 以下の3つを比較 • Multi-level (提案手法) • UCB1 • Round-robin   ※pageのparent class 同定後にUCB or Round-robin
  • 14. 平均報酬の比較 size for set U. For U = ik, U = B⇡ik and for U = R(i; +), U = v B not fit in B⇡ik , ˜pik = ˆpik and U = 0 for U = ik, R(i; +). Figure 8: UCB1 w/ shrinkage pol 0 200 400 600 800 1000 1200 0 5000 10000 15000 20000 25000 Revenue Number of pulls Multi-level UCB1 Round-robin 2 3 4 5 6 7 8 9 10 11 10000 MSE M Ro (a) Revenue profile
  • 17. 広告におけるBandit Problem • 一般的なBandit Problemとの相違点 • 腕が払う報酬は時間毎に変化 • 腕はいずれなくなる   → Bandit Algorithmの中に組み込む
     Mortal Multi-Armed Bandit
  • 18. 腕の 死亡率 をモデリング • Budget death:
 表示可能回数(lifetime: L)を超えた • Timed death:
 adが止められてしまう確率pに従って表示可能回 数が決まる (L=1/p)
  • 19. キーとなる考え方 • 腕の払う報酬の累積分布を調査 ! ! ! ! (a) 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 0.2 0.4 0.6 0.8 1 Fractionofarms Payoff probability (scaled) Ad Payoff Distribution (b) 0 0.1 0.2 0.3 0.4 0.5 100 1000 Regretpertimestep Expected Stochasti Figure 2: (a) Distribution of real world ad payoffs, scaled linearly such tha 支払いが発生する 確率と,支払いが 発生する腕の割合 ↓ 探求と活用の トレードオフの観点 から基準値μを決める 全腕の数 基準報酬 額 全腕の数 基準報酬 額
  • 20. キーとなる考え方 • 腕の払う報酬の累積分布を調査 ! ! ! ! (a) 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 0.2 0.4 0.6 0.8 1 Fractionofarms Payoff probability (scaled) Ad Payoff Distribution (b) 0 0.1 0.2 0.3 0.4 0.5 100 1000 Regretpertimestep Expected Stochasti Figure 2: (a) Distribution of real world ad payoffs, scaled linearly such tha 基準値μ=0.6 だった場合 腕が報酬を支払う 確率が0.6以上  → その腕を活用 ! 腕が報酬を支払う 確率が0.6未満  → 別の腕を探索 活用探求
  • 21. 事前調査 • 腕の払う報酬の累積分布を調査 ! ! ! ! (a) 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 0.2 0.4 0.6 0.8 1 Fractionofarms Payoff probability (scaled) Ad Payoff Distribution 支払いが発生する 確率と,支払いが 発生する腕の割合 ↓ 探求と活用の トレードオフの観点 から基準値μを決める nd on the mean reward per step of any such algorithm for the state-aware then use reductions between the different models to show that this bou ious, timed death cases as well. he bound assuming we always have new arms available. The expected r om a cumulative distribution F(µ) with support in [0, 1]. For X ⇠ F(µ) n of X over F(µ). We assume that the lifetime of an arm has an expone meter p, and denote its expectation by L = 1/p. The following funct tween exploration and exploitation in our setting and plays a major role (µ) = E[X] + (1 F(µ))(L 1)E[X|X µ] 1 + (1 F(µ))(L 1) . 3
  • 22. 事前調査 • 腕の払う報酬の累積分布を調査 ! ! ! ! (a) 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 0.2 0.4 0.6 0.8 1 Fractionofarms Payoff probability (scaled) Ad Payoff Distribution 支払いが発生する 確率と,支払いが 発生する腕の割合 ↓ 探求と活用の トレードオフの観点 から基準値μを決める nd on the mean reward per step of any such algorithm for the state-aware then use reductions between the different models to show that this bou ious, timed death cases as well. he bound assuming we always have new arms available. The expected r om a cumulative distribution F(µ) with support in [0, 1]. For X ⇠ F(µ) n of X over F(µ). We assume that the lifetime of an arm has an expone meter p, and denote its expectation by L = 1/p. The following funct tween exploration and exploitation in our setting and plays a major role (µ) = E[X] + (1 F(µ))(L 1)E[X|X µ] 1 + (1 F(µ))(L 1) . 3 詳細は論文を参照!
  • 25. Stoch. with Early Stopping TOPT for the state-oblivious case. The intuition behind instead of pulling an arm once to determine its payoff d abandons it unless it looks promising. A variant, called bandons the arm earlier if its maximum possible future For n = O log L/✏2 , STOCHASTIC gets an expected -optimal; the details are omitted due to space constraints. e L n (1)] imes] ever ] Algorithm STOCH. WITH EARLY STOPPING input: Distribution F(µ), expected lifetime L µ⇤ argmaxµ (µ) [ is defined in (1)] while we keep playing [Play random arm as long as necessary] i random new arm; r 0; d 0 while d < n and n d nµ⇤ r Pull arm i; r r + R(µi); d d + 1 end while if r > nµ⇤ [If it is good, stay with it forever] Pull arm i every turn until it dies end if end while ly use a standard multi-armed bandit (MAB) algorithm AB algorithms invest a lot of pulls on all arms (at least
  • 26. Stoch. with Early Stopping TOPT for the state-oblivious case. The intuition behind instead of pulling an arm once to determine its payoff d abandons it unless it looks promising. A variant, called bandons the arm earlier if its maximum possible future For n = O log L/✏2 , STOCHASTIC gets an expected -optimal; the details are omitted due to space constraints. e L n (1)] imes] ever ] Algorithm STOCH. WITH EARLY STOPPING input: Distribution F(µ), expected lifetime L µ⇤ argmaxµ (µ) [ is defined in (1)] while we keep playing [Play random arm as long as necessary] i random new arm; r 0; d 0 while d < n and n d nµ⇤ r Pull arm i; r r + R(µi); d d + 1 end while if r > nµ⇤ [If it is good, stay with it forever] Pull arm i every turn until it dies end if end while ly use a standard multi-armed bandit (MAB) algorithm AB algorithms invest a lot of pulls on all arms (at least 入 力 ! 累積分布:F(μ) lifetime:L ! 累積分布から 想定される基準値: μ* 活用
  • 27. Stoch. with Early Stopping TOPT for the state-oblivious case. The intuition behind instead of pulling an arm once to determine its payoff d abandons it unless it looks promising. A variant, called bandons the arm earlier if its maximum possible future For n = O log L/✏2 , STOCHASTIC gets an expected -optimal; the details are omitted due to space constraints. e L n (1)] imes] ever ] Algorithm STOCH. WITH EARLY STOPPING input: Distribution F(µ), expected lifetime L µ⇤ argmaxµ (µ) [ is defined in (1)] while we keep playing [Play random arm as long as necessary] i random new arm; r 0; d 0 while d < n and n d nµ⇤ r Pull arm i; r r + R(µi); d d + 1 end while if r > nµ⇤ [If it is good, stay with it forever] Pull arm i every turn until it dies end if end while ly use a standard multi-armed bandit (MAB) algorithm AB algorithms invest a lot of pulls on all arms (at least これまでひいたこと のない腕を ランダムにひく: 腕 i
  • 28. Stoch. with Early Stopping TOPT for the state-oblivious case. The intuition behind instead of pulling an arm once to determine its payoff d abandons it unless it looks promising. A variant, called bandons the arm earlier if its maximum possible future For n = O log L/✏2 , STOCHASTIC gets an expected -optimal; the details are omitted due to space constraints. e L n (1)] imes] ever ] Algorithm STOCH. WITH EARLY STOPPING input: Distribution F(µ), expected lifetime L µ⇤ argmaxµ (µ) [ is defined in (1)] while we keep playing [Play random arm as long as necessary] i random new arm; r 0; d 0 while d < n and n d nµ⇤ r Pull arm i; r r + R(µi); d d + 1 end while if r > nµ⇤ [If it is good, stay with it forever] Pull arm i every turn until it dies end if end while ly use a standard multi-armed bandit (MAB) algorithm AB algorithms invest a lot of pulls on all arms (at least 「全腕の数ーこれま  で腕iをひいた数」 「全腕の数 基準値 ー腕iをひいて得ら れた累積値」 腕iを ひき続ける
  • 29. Stoch. with Early Stopping TOPT for the state-oblivious case. The intuition behind instead of pulling an arm once to determine its payoff d abandons it unless it looks promising. A variant, called bandons the arm earlier if its maximum possible future For n = O log L/✏2 , STOCHASTIC gets an expected -optimal; the details are omitted due to space constraints. e L n (1)] imes] ever ] Algorithm STOCH. WITH EARLY STOPPING input: Distribution F(µ), expected lifetime L µ⇤ argmaxµ (µ) [ is defined in (1)] while we keep playing [Play random arm as long as necessary] i random new arm; r 0; d 0 while d < n and n d nµ⇤ r Pull arm i; r r + R(µi); d d + 1 end while if r > nµ⇤ [If it is good, stay with it forever] Pull arm i every turn until it dies end if end while ly use a standard multi-armed bandit (MAB) algorithm AB algorithms invest a lot of pulls on all arms (at least 腕をひくことで 得られる報酬額が 低ければ 早くループを抜ける
  • 30. Stoch. with Early Stopping TOPT for the state-oblivious case. The intuition behind instead of pulling an arm once to determine its payoff d abandons it unless it looks promising. A variant, called bandons the arm earlier if its maximum possible future For n = O log L/✏2 , STOCHASTIC gets an expected -optimal; the details are omitted due to space constraints. e L n (1)] imes] ever ] Algorithm STOCH. WITH EARLY STOPPING input: Distribution F(µ), expected lifetime L µ⇤ argmaxµ (µ) [ is defined in (1)] while we keep playing [Play random arm as long as necessary] i random new arm; r 0; d 0 while d < n and n d nµ⇤ r Pull arm i; r r + R(µi); d d + 1 end while if r > nµ⇤ [If it is good, stay with it forever] Pull arm i every turn until it dies end if end while ly use a standard multi-armed bandit (MAB) algorithm AB algorithms invest a lot of pulls on all arms (at least 腕iをひくことで得ら れた累積報酬額 > 全腕の数 基準報酬額 腕iのlifetime が尽きるまで ひき続ける
  • 31. Stoch. with Early Stopping TOPT for the state-oblivious case. The intuition behind instead of pulling an arm once to determine its payoff d abandons it unless it looks promising. A variant, called bandons the arm earlier if its maximum possible future For n = O log L/✏2 , STOCHASTIC gets an expected -optimal; the details are omitted due to space constraints. e L n (1)] imes] ever ] Algorithm STOCH. WITH EARLY STOPPING input: Distribution F(µ), expected lifetime L µ⇤ argmaxµ (µ) [ is defined in (1)] while we keep playing [Play random arm as long as necessary] i random new arm; r 0; d 0 while d < n and n d nµ⇤ r Pull arm i; r r + R(µi); d d + 1 end while if r > nµ⇤ [If it is good, stay with it forever] Pull arm i every turn until it dies end if end while ly use a standard multi-armed bandit (MAB) algorithm AB algorithms invest a lot of pulls on all arms (at least 再び,これまでひい たことのない腕を ランダムにひく: 腕 i
  • 33. 実験内容 • 300のショッピングクラスの広告 • lifetime=100∼100,000の間で変化 • ステップごとの損失を調査 • 損失:
 現在ある腕の中でベストな腕を選択していたら 得られたであろう値から,現在得られた値を 引いたもの
  • 34. (b) 0 0.1 0.2 0.3 0.4 0.5 100 1000 10000 100000 Regretpertimestep Expected arm lifetime Stochastic Stochastic with Early Stopping AdaptiveGreedy UCB1 UCB1-k/c
  • 35. まとめ • Bandit Algorithmに腕のlifetimeという概念を 導入 • 良い腕はlifetime分使いきろうという「Stoch. with Early Stopping」の考え方は,UCB1より も効果的
  → UCB1は無限に腕がひけて,無限回数Try
    した際に最終結果が良くなるはず
  • 36. 参考資料 • Introduction to Computational Advertising - Stanford University
 http://www.stanford.edu/class/msande239/