SlideShare a Scribd company logo
1 of 18
Download to read offline
M = {S, A, pT, p0, g}
Pr{St+1 = s′

|At = a, St = s, …} = Pr{St+1 = s′

|At = a, St = s}
=: pT(s′

|s, a), Pr(S0 = s) =: p0(s)
π ∈ ΠM
Pr(At = a|St = s, …) = Pr(At = a|St = s)
=: π(a|s)
V*
Vπ
(s) :=
𝔼
π
[C0 |S0 = s], Ct :=
∞
∑
i=0
γi
g(At+i, St+i), γ ∈ [0,1)
f(π)
f(π) :=
∑
s∈S
p0(s)Vπ
(s)
π∈ΠM
f(π) M
V* = max
π∈ΠM
Vπ
= max
a∈A
(g(a, ⋅ ) + γ
∑
s′

∈S
pT(s′

| ⋅ ,a)V*(s′

))
= B*(V*)
⇒ V*
π*
π*d
= arg max
a∈A
g(a, ⋅ ) + γ
∑
s′

∈S
pT(s′

| ⋅ ,a)V*(s′

)
B*
⇔ ∥B*(v) − B*(u)∥ ≤ γ∥v − u∥
vk+1 = B*(vk), v0 ∈ Rn
⇒ vk → V* k → ∞
Bπ
Vπ
(s):=
∑
a∈A
π(a|s)[g(a, s) + γ
∑
s′

∈S
pT(s′

|s, a)Vπ
(s′

)]
=
𝔼
π
[g(St, At) + γVπ
(St+1) St = s, ]
B*V*(s):= max
a∈A
(g(a, s) + γ
∑
s′

∈S
pT(s′

|s, a)V*(s′

))
= max
π∈ΠM
𝔼
π
[g(St, At) + γV*(St+1) St = s]
Ct
Ct :=
∞
∑
i=0
γi
g(At+i, St+i), γ ∈ [0,1)
Vπ
Vπ
(s) :=
𝔼
π
[C0 |S0 = s]
V*
Vπ
(s) := max
π∈ΠM
𝔼
π
[C0 |S0 = s]
Qπ
Qπ
(s, a) :=
𝔼
π
[C0 |S0 = s, A0 = a]
Q*
Q*(s, a) := max
π∈ΠM
𝔼
π
[C0 |S0 = s, A0 = a]
Vπ
(s) =
∑
a∈A
Qπ
(s, a)π(a|s), V*(s) = max
a∈A
Q*(s, a)
π*d
= arg max
a∈A
Q*( ⋅ , a)
Υπ
Qπ
(s, a):=
𝔼
π
[g(St, At) + γQπ
(St+1, At+1) St = s, At = a]
= g(s, a) + γ
∑
s′

,a′

∈S×A
pT(s′

|s, a)π(a′

|s′

)Qπ
(s′

, a′

)
Υ*Q*(s, a):=
𝔼
π
[g(St, At) + γ max
a′

∈A
Q*(St+1, a′

) St = s, At = a]
= g(s, a) + γ max
a′

∈A ∑
s′

∈S
pT(s′

|s, a)π(a′

|s′

)Q*(s′

, a′

)
Υπ
(q) = g( ⋅ ) + γ
∑
s′

,a′

∈S×A
pT(s′

| ⋅ )π(a′

|s′

)q(s′

, a′

)
Υ*(q) = g( ⋅ ) + γ max
a′

∈A ∑
s′

∈S
pT(s′
| ⋅ )π(a′

|s′

)q(s′

, a′

)
q, q′

: S × A → ℝ
q ≤ q′

⇔ q(s, a) ≤ q′

(s, a), ∀s, a ∈ S × A
∥q − q′

∥ := max
s,a∈S×A
|q(s, a) − q′

(s, a)|
q ≤ q′

⇒ Υ(q) ≤ Υ(q′

)
Υ(q + c) = Υ(q) + γc, ∀c ∈ ℝ
⇔ ∥Υ(q) − Υ(q′

)∥ ≤ γ∥q − q′

∥
qk+1 = Υ*(qk), q0 ∈ Rn×m
⇒ qk → Q* k → ∞
π*d
= arg max
a∈A
Q*( ⋅ , a)
Hπ
t := {S0, A0, R0, …, St−1, At−1, Rt−1, At M(π)}
hπ
t := {s0, a0, r0, …, st−1, at−1, rt−1, st M(π)}
̂Υπ
(q; hπ
T)(s, a)
:=
∑
T−1
t=0
𝕀
{s=st}
𝕀
{a=at}
(rt + γq(st+1, at+1)
)
∑
T−1
t=0
𝕀
{s=st}
𝕀
{a=at}
, ∑
T−1
t=0
𝕀
{s=st}
𝕀 {a=at} > 0
q(s, a),
̂Υ*(q; hπ
T)(s, a)
:=
∑
T−1
t=0
𝕀
{s=st}
𝕀
{a=at}
(rt + γ maxa∈A q(st+1, a′

)
)
∑
T−1
t=0
𝕀
{s=st}
𝕀
{a=at}
, ∑
T−1
t=0
𝕀
{s=st}
𝕀
{a=at} > 0
q(s, a),
Υπ
Qπ
(s, a):=
𝔼
π
[g(St, At) + γQπ
(St+1, At+1) St = s, At = a]
Υ*Q*(s, a):=
𝔼
π
[g(St, At) + γ max
a′

∈A
Q*(St+1, a′

) St = s, At = a]
lim
T→∞
1
T
T
∑
i=1
Pr(St = s, At = a|M(π)) > 0, ∀(s, a) ∈ S × A
̂Υπ
( ⋅ ; hT) → Υπ
, ̂Υ*( ⋅ ; hT) → Υ* T → ∞
q ≤ q′

⇒ ̂Υ(q) ≤ ̂Υ(q′

)
̂Υ(q + c) = ̂Υ(q) + γc, ∀c ∈ ℝ
⇔ ∥ ̂Υ(q) − ̂Υ(q′

)∥ ≤ γ∥q − q′

∥
qk+1 = ̂Υ*(qk), q0 ∈ Rn×m
⇒ qk → ̂Q* k → ∞
̂π*d
= arg max
a∈A
̂Q*( ⋅ , a)
̂Υπ
(q; hπ
T)(s, a)
:=
∑
T−1
t=0
𝕀
{s=st}
𝕀
{a=at}
(rt + γq(st+1, at+1)
)
∑
T−1
t=0
𝕀 {s=st}
𝕀
{a=at}
, ∑
T−1
t=0
𝕀
{s=st}
𝕀
{a=at} > 0
q(s, a),
̂Υ*(q; hπ
T)(s, a)
:=
∑
T−1
t=0
𝕀
{s=st}
𝕀
{a=at}
(rt + γ maxa∈A q(st+1, a′
)
)
∑
T−1
t=0
𝕀
{s=st}
𝕀
{a=at}
, ∑
T−1
t=0
𝕀
{s=st}
𝕀
{a=at} > 0
q(s, a),
qk+1 = ̂Υ*(qk : hπ
∞), q0 ∈ Rn×m
⇒ qk → Q* k → ∞
qt+1 = (1 − αt)qt + αt
̂Υ*(qt : {St, At, Rt, St+1}),
𝔼
[∥q0∥] ≤ const
αt ≥ 0, ∀t ∈ ℤ≥0
∑
t∈ℤ≥0
αt
𝕀
{s=st}
𝕀
{a=at} = ∞, ∀(s, a) ∈ S × A
∑
t∈ℤ≥0
α2
t
𝕀
{s=st}
𝕀
{a=at} < ∞, ∀(s, a) ∈ S × A
lim
t→∞
𝔼
[∥qt − Q*∥2
] = 0
qk+1 = ̂Υ*(qk : hπ
∞), q0 ∈ Rn×m
⇒ qk → Q* k → ∞
qt+1 = (1 − αt)qt + αt
̂Υ*(qt : {St, At, Rt, St+1}),
𝔼
[∥q0∥] ≤ const
at ∼ π( ⋅ |st)
rt, st+1 ∼ g(st, at), pT( ⋅ : st, at)
̂qt+1(st, at) = ̂qt+1(st, at) + αt(rt + γ max
a′

∈A
̂qt(st+1, at) − ̂q(st, at))
π*d
= arg max
a∈A
̂q∞( ⋅ , a)
vk+1 = B*(vk), v0 ∈ Rn
⇒ vk → V* k → ∞
qt+1 = (1 − αt)qt + αt
̂Υ*(qt : {St, At, Rt, St+1})
xk+1 = ft(xk)
x*
ft(x*) = 0
lim
t→∞
∥xt − x*∥ = 0
vk+1 = B*(vk), v0 ∈ Rn
⇒ vk → V* k → ∞
qt+1 = (1 − αt)qt + αt
̂Υ*(qt : {St, At, Rt, St+1})
xk+1 = ft(xk, ω)
x*
ft(x*, ω) = 0, ∀ω ∈ Ω
lim
t→∞
E[∥xt − x*∥2
] = 0
qt+1 = (1 − αt)qt + αt
̂Υ*(qt : {St, At, Rt, St+1})
= (1 − αt)qt + αt(Υ*(qt) + Xt)
Xt := ̂Υ*(qt : {St, At, Rt, St+1}) − Υ*(qt)
𝔼
[Xt] = 0,
𝔼
[∥Xt∥2
] ≤ const

More Related Content

Similar to 強化学習勉強会6の資料

Наибольшая общая мера: 2500 лет
Наибольшая общая мера: 2500 летНаибольшая общая мера: 2500 лет
Наибольшая общая мера: 2500 летsixtyone
 
Oceans 2019 tutorial-geophysical-nav_7-updated
Oceans 2019 tutorial-geophysical-nav_7-updatedOceans 2019 tutorial-geophysical-nav_7-updated
Oceans 2019 tutorial-geophysical-nav_7-updatedFrancisco Curado-Teixeira
 
Responsibility as Indian - Protection of Dharma, Samskriti and Society
Responsibility as Indian - Protection of Dharma, Samskriti and SocietyResponsibility as Indian - Protection of Dharma, Samskriti and Society
Responsibility as Indian - Protection of Dharma, Samskriti and SocietySajjana Bharathi
 
تحطيم الأوهام الإدارية
تحطيم الأوهام الإداريةتحطيم الأوهام الإدارية
تحطيم الأوهام الإداريةDr Ghaiath Hussein
 
شرح أركان الإيمان لأمة الإسلام من عقيدة العوام
شرح أركان الإيمان لأمة الإسلام من عقيدة العوامشرح أركان الإيمان لأمة الإسلام من عقيدة العوام
شرح أركان الإيمان لأمة الإسلام من عقيدة العوامAbdel-Rahman Al-Khattab
 
ゲーム理論BASIC 第44回 -続・シャープレイ値-
ゲーム理論BASIC 第44回 -続・シャープレイ値-ゲーム理論BASIC 第44回 -続・シャープレイ値-
ゲーム理論BASIC 第44回 -続・シャープレイ値-ssusere0a682
 
Diploma - French Diploma
Diploma - French DiplomaDiploma - French Diploma
Diploma - French DiplomaIlham Aminuddin
 
【ゲーム理論応用】 - 寡占市場分析4 -
【ゲーム理論応用】 - 寡占市場分析4 -【ゲーム理論応用】 - 寡占市場分析4 -
【ゲーム理論応用】 - 寡占市場分析4 -ssusere0a682
 
Fisica matematica final
Fisica matematica finalFisica matematica final
Fisica matematica finaldanbohe
 
09.sdcd_lugar_geometrico_raices
09.sdcd_lugar_geometrico_raices09.sdcd_lugar_geometrico_raices
09.sdcd_lugar_geometrico_raicesHipólito Aguilar
 
Functional Gradient Boosting based on Residual Network Perception
Functional Gradient Boosting based on Residual Network PerceptionFunctional Gradient Boosting based on Residual Network Perception
Functional Gradient Boosting based on Residual Network PerceptionAtsushi Nitanda
 
Kriya Sharir_Hand_Book.pdf
Kriya Sharir_Hand_Book.pdfKriya Sharir_Hand_Book.pdf
Kriya Sharir_Hand_Book.pdfRahulTale6
 
とちぎRuby会議01(原)
とちぎRuby会議01(原)とちぎRuby会議01(原)
とちぎRuby会議01(原)Shin-ichiro HARA
 
Ejercicios prueba de algebra de la UTN- widmar aguilar
Ejercicios prueba de algebra de la UTN-  widmar aguilarEjercicios prueba de algebra de la UTN-  widmar aguilar
Ejercicios prueba de algebra de la UTN- widmar aguilarWidmar Aguilar Gonzalez
 

Similar to 強化学習勉強会6の資料 (20)

raseswara.compressed
raseswara.compressedraseswara.compressed
raseswara.compressed
 
Наибольшая общая мера: 2500 лет
Наибольшая общая мера: 2500 летНаибольшая общая мера: 2500 лет
Наибольшая общая мера: 2500 лет
 
Oceans 2019 tutorial-geophysical-nav_7-updated
Oceans 2019 tutorial-geophysical-nav_7-updatedOceans 2019 tutorial-geophysical-nav_7-updated
Oceans 2019 tutorial-geophysical-nav_7-updated
 
Polar regions hindii
Polar regions hindiiPolar regions hindii
Polar regions hindii
 
Responsibility as Indian - Protection of Dharma, Samskriti and Society
Responsibility as Indian - Protection of Dharma, Samskriti and SocietyResponsibility as Indian - Protection of Dharma, Samskriti and Society
Responsibility as Indian - Protection of Dharma, Samskriti and Society
 
32.28
32.2832.28
32.28
 
تحطيم الأوهام الإدارية
تحطيم الأوهام الإداريةتحطيم الأوهام الإدارية
تحطيم الأوهام الإدارية
 
شرح أركان الإيمان لأمة الإسلام من عقيدة العوام
شرح أركان الإيمان لأمة الإسلام من عقيدة العوامشرح أركان الإيمان لأمة الإسلام من عقيدة العوام
شرح أركان الإيمان لأمة الإسلام من عقيدة العوام
 
Prelude to halide_public
Prelude to halide_publicPrelude to halide_public
Prelude to halide_public
 
ゲーム理論BASIC 第44回 -続・シャープレイ値-
ゲーム理論BASIC 第44回 -続・シャープレイ値-ゲーム理論BASIC 第44回 -続・シャープレイ値-
ゲーム理論BASIC 第44回 -続・シャープレイ値-
 
Diploma - French Diploma
Diploma - French DiplomaDiploma - French Diploma
Diploma - French Diploma
 
【ゲーム理論応用】 - 寡占市場分析4 -
【ゲーム理論応用】 - 寡占市場分析4 -【ゲーム理論応用】 - 寡占市場分析4 -
【ゲーム理論応用】 - 寡占市場分析4 -
 
Data bank KALLARA village vaikom.Kallara grama panchayath - James Joseph adh...
Data bank KALLARA  village vaikom.Kallara grama panchayath - James Joseph adh...Data bank KALLARA  village vaikom.Kallara grama panchayath - James Joseph adh...
Data bank KALLARA village vaikom.Kallara grama panchayath - James Joseph adh...
 
College raging2
College raging2College raging2
College raging2
 
Fisica matematica final
Fisica matematica finalFisica matematica final
Fisica matematica final
 
09.sdcd_lugar_geometrico_raices
09.sdcd_lugar_geometrico_raices09.sdcd_lugar_geometrico_raices
09.sdcd_lugar_geometrico_raices
 
Functional Gradient Boosting based on Residual Network Perception
Functional Gradient Boosting based on Residual Network PerceptionFunctional Gradient Boosting based on Residual Network Perception
Functional Gradient Boosting based on Residual Network Perception
 
Kriya Sharir_Hand_Book.pdf
Kriya Sharir_Hand_Book.pdfKriya Sharir_Hand_Book.pdf
Kriya Sharir_Hand_Book.pdf
 
とちぎRuby会議01(原)
とちぎRuby会議01(原)とちぎRuby会議01(原)
とちぎRuby会議01(原)
 
Ejercicios prueba de algebra de la UTN- widmar aguilar
Ejercicios prueba de algebra de la UTN-  widmar aguilarEjercicios prueba de algebra de la UTN-  widmar aguilar
Ejercicios prueba de algebra de la UTN- widmar aguilar
 

Recently uploaded

Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoordharasingh5698
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfRagavanV2
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Recently uploaded (20)

Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 

強化学習勉強会6の資料

  • 1.
  • 2.
  • 3. M = {S, A, pT, p0, g} Pr{St+1 = s′  |At = a, St = s, …} = Pr{St+1 = s′  |At = a, St = s} =: pT(s′  |s, a), Pr(S0 = s) =: p0(s) π ∈ ΠM Pr(At = a|St = s, …) = Pr(At = a|St = s) =: π(a|s) V* Vπ (s) := 𝔼 π [C0 |S0 = s], Ct := ∞ ∑ i=0 γi g(At+i, St+i), γ ∈ [0,1) f(π) f(π) := ∑ s∈S p0(s)Vπ (s) π∈ΠM f(π) M
  • 4. V* = max π∈ΠM Vπ = max a∈A (g(a, ⋅ ) + γ ∑ s′  ∈S pT(s′  | ⋅ ,a)V*(s′  )) = B*(V*) ⇒ V* π* π*d = arg max a∈A g(a, ⋅ ) + γ ∑ s′  ∈S pT(s′  | ⋅ ,a)V*(s′  ) B* ⇔ ∥B*(v) − B*(u)∥ ≤ γ∥v − u∥ vk+1 = B*(vk), v0 ∈ Rn ⇒ vk → V* k → ∞
  • 5. Bπ Vπ (s):= ∑ a∈A π(a|s)[g(a, s) + γ ∑ s′  ∈S pT(s′  |s, a)Vπ (s′  )] = 𝔼 π [g(St, At) + γVπ (St+1) St = s, ] B*V*(s):= max a∈A (g(a, s) + γ ∑ s′  ∈S pT(s′  |s, a)V*(s′  )) = max π∈ΠM 𝔼 π [g(St, At) + γV*(St+1) St = s]
  • 6.
  • 7. Ct Ct := ∞ ∑ i=0 γi g(At+i, St+i), γ ∈ [0,1) Vπ Vπ (s) := 𝔼 π [C0 |S0 = s] V* Vπ (s) := max π∈ΠM 𝔼 π [C0 |S0 = s] Qπ Qπ (s, a) := 𝔼 π [C0 |S0 = s, A0 = a] Q* Q*(s, a) := max π∈ΠM 𝔼 π [C0 |S0 = s, A0 = a] Vπ (s) = ∑ a∈A Qπ (s, a)π(a|s), V*(s) = max a∈A Q*(s, a) π*d = arg max a∈A Q*( ⋅ , a) Υπ Qπ (s, a):= 𝔼 π [g(St, At) + γQπ (St+1, At+1) St = s, At = a] = g(s, a) + γ ∑ s′  ,a′  ∈S×A pT(s′  |s, a)π(a′  |s′  )Qπ (s′  , a′  ) Υ*Q*(s, a):= 𝔼 π [g(St, At) + γ max a′  ∈A Q*(St+1, a′  ) St = s, At = a] = g(s, a) + γ max a′  ∈A ∑ s′  ∈S pT(s′  |s, a)π(a′  |s′  )Q*(s′  , a′  )
  • 8. Υπ (q) = g( ⋅ ) + γ ∑ s′  ,a′  ∈S×A pT(s′  | ⋅ )π(a′  |s′  )q(s′  , a′  ) Υ*(q) = g( ⋅ ) + γ max a′  ∈A ∑ s′  ∈S pT(s′ | ⋅ )π(a′  |s′  )q(s′  , a′  ) q, q′  : S × A → ℝ q ≤ q′  ⇔ q(s, a) ≤ q′  (s, a), ∀s, a ∈ S × A ∥q − q′  ∥ := max s,a∈S×A |q(s, a) − q′  (s, a)| q ≤ q′  ⇒ Υ(q) ≤ Υ(q′  ) Υ(q + c) = Υ(q) + γc, ∀c ∈ ℝ ⇔ ∥Υ(q) − Υ(q′  )∥ ≤ γ∥q − q′  ∥ qk+1 = Υ*(qk), q0 ∈ Rn×m ⇒ qk → Q* k → ∞ π*d = arg max a∈A Q*( ⋅ , a)
  • 9.
  • 10. Hπ t := {S0, A0, R0, …, St−1, At−1, Rt−1, At M(π)} hπ t := {s0, a0, r0, …, st−1, at−1, rt−1, st M(π)} ̂Υπ (q; hπ T)(s, a) := ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} (rt + γq(st+1, at+1) ) ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} , ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} > 0 q(s, a), ̂Υ*(q; hπ T)(s, a) := ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} (rt + γ maxa∈A q(st+1, a′  ) ) ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} , ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} > 0 q(s, a), Υπ Qπ (s, a):= 𝔼 π [g(St, At) + γQπ (St+1, At+1) St = s, At = a] Υ*Q*(s, a):= 𝔼 π [g(St, At) + γ max a′  ∈A Q*(St+1, a′  ) St = s, At = a]
  • 11. lim T→∞ 1 T T ∑ i=1 Pr(St = s, At = a|M(π)) > 0, ∀(s, a) ∈ S × A ̂Υπ ( ⋅ ; hT) → Υπ , ̂Υ*( ⋅ ; hT) → Υ* T → ∞ q ≤ q′  ⇒ ̂Υ(q) ≤ ̂Υ(q′  ) ̂Υ(q + c) = ̂Υ(q) + γc, ∀c ∈ ℝ ⇔ ∥ ̂Υ(q) − ̂Υ(q′  )∥ ≤ γ∥q − q′  ∥ qk+1 = ̂Υ*(qk), q0 ∈ Rn×m ⇒ qk → ̂Q* k → ∞ ̂π*d = arg max a∈A ̂Q*( ⋅ , a) ̂Υπ (q; hπ T)(s, a) := ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} (rt + γq(st+1, at+1) ) ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} , ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} > 0 q(s, a), ̂Υ*(q; hπ T)(s, a) := ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} (rt + γ maxa∈A q(st+1, a′ ) ) ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} , ∑ T−1 t=0 𝕀 {s=st} 𝕀 {a=at} > 0 q(s, a),
  • 12.
  • 13. qk+1 = ̂Υ*(qk : hπ ∞), q0 ∈ Rn×m ⇒ qk → Q* k → ∞ qt+1 = (1 − αt)qt + αt ̂Υ*(qt : {St, At, Rt, St+1}), 𝔼 [∥q0∥] ≤ const αt ≥ 0, ∀t ∈ ℤ≥0 ∑ t∈ℤ≥0 αt 𝕀 {s=st} 𝕀 {a=at} = ∞, ∀(s, a) ∈ S × A ∑ t∈ℤ≥0 α2 t 𝕀 {s=st} 𝕀 {a=at} < ∞, ∀(s, a) ∈ S × A lim t→∞ 𝔼 [∥qt − Q*∥2 ] = 0
  • 14. qk+1 = ̂Υ*(qk : hπ ∞), q0 ∈ Rn×m ⇒ qk → Q* k → ∞ qt+1 = (1 − αt)qt + αt ̂Υ*(qt : {St, At, Rt, St+1}), 𝔼 [∥q0∥] ≤ const at ∼ π( ⋅ |st) rt, st+1 ∼ g(st, at), pT( ⋅ : st, at) ̂qt+1(st, at) = ̂qt+1(st, at) + αt(rt + γ max a′  ∈A ̂qt(st+1, at) − ̂q(st, at)) π*d = arg max a∈A ̂q∞( ⋅ , a)
  • 15.
  • 16. vk+1 = B*(vk), v0 ∈ Rn ⇒ vk → V* k → ∞ qt+1 = (1 − αt)qt + αt ̂Υ*(qt : {St, At, Rt, St+1}) xk+1 = ft(xk) x* ft(x*) = 0 lim t→∞ ∥xt − x*∥ = 0
  • 17. vk+1 = B*(vk), v0 ∈ Rn ⇒ vk → V* k → ∞ qt+1 = (1 − αt)qt + αt ̂Υ*(qt : {St, At, Rt, St+1}) xk+1 = ft(xk, ω) x* ft(x*, ω) = 0, ∀ω ∈ Ω lim t→∞ E[∥xt − x*∥2 ] = 0
  • 18. qt+1 = (1 − αt)qt + αt ̂Υ*(qt : {St, At, Rt, St+1}) = (1 − αt)qt + αt(Υ*(qt) + Xt) Xt := ̂Υ*(qt : {St, At, Rt, St+1}) − Υ*(qt) 𝔼 [Xt] = 0, 𝔼 [∥Xt∥2 ] ≤ const