Successfully reported this slideshow.
Upcoming SlideShare
×

# Recent rl

3,142 views

Published on

Published in: Technology
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv

Are you sure you want to  Yes  No

### Recent rl

1. 1. Qo (s,a) = r(s,a)+γ max a' Qo (s',a') Qo L = (r(s,a)+γ max a' Qθ o (s',a')−Qθ o (s,a))2
2. 2. ∇θ J = ∇θ Eπθ [ γ τ Rτ ] τ =0 ∞ ∑ = ∇θ P( ′s | st ,a)πθ (a | st ) γ τ Rτ τ =0 ∞ ∑ a ∑ ′s ∑ = P( ′s | st ,a)∇θπθ (a | st ) γ τ Rτ τ =0 ∞ ∑ a ∑ ′s ∑ = P( ′s | st ,a)πθ (a | st ) ∇θπθ (a | st ) πθ (a | st ) γ τ Rτ τ =0 ∞ ∑ a ∑ ′s ∑ = P( ′s | st ,a)πθ (a | st )∇θ log(πθ (a | st )) γ τ Rτ τ =0 ∞ ∑ a ∑ ′s ∑ = Eπθ [∇θ log(πθ (a | st )) γ τ Rτ ] τ =0 ∞ ∑
3. 3. Eπθ [∇θ log(πθ (a | st )) γ τ Rτ ] τ =0 ∞ ∑ = 1 M ∇θ log(πθ (ai T | si T ))( γ τ Rτ T ) τ =0 ∞ ∑ i ∑ T ∑ T = s0 T ,a0 T ,r0 T ,!sn T ,an T ,rn T
4. 4. 1 M ∇θ log(πθ (ai T | si T ))( γ τ Rτ T τ =0 ∞ ∑ i ∑ T ∑ )
5. 5. 1 M ∇θ log(πθ (ai T | si T ))( γ τ Rτ T τ =0 ∞ ∑ i ∑ T ∑ ) 1 M ∇θ log(πθ (ai T | si T )) i ∑ T ∑ A(si T ,ai T )
6. 6. Qaux (a,i, j) LQ = E[(Rt:t+n +γ n max a' Q(s',a';θ− )−Q(s,a;θ))2 ]
7. 7. LVR = Eπ [(Rt:t+n +γ n V(st+n+1,θ− )−V(st ,θ))2 ]
8. 8. Ep[ f (x)] = p(x) f (x)x∑ Eq[ f (x)] = q(x) f (x)x∑ = q(x) p(x) p(x) f (x)x∑ = p(x) q(x) p(x) f (x)x∑ = Ep[ q(x) p(x) f (x)]
9. 9. LA3C = Lπ + LV − Es∼π [αH(π(⋅| s))]
10. 10. !Qπ (s,a) = α(log(π(s,a)+ Hπ (s))+Vπ (s)
11. 11. Q∗ (s,a) = r(s,a)+γτ log exp(Q∗ (s',a') /τ )a'∑ Q∗
12. 12. V∗ (s) = −τ logπ∗ (a | s)+ r(s,a)+γV∗ (s') −V∗ (s1)+γ t−1 V∗ (st )+ R(s1:t )−τG(s1:t ,π∗ ) = 0 R(sm:n ) = γ i r(sm+i ,am+i ) i=0 n−m−1 ∑ G(sm:n,π) = γ i logπ(am+i | sm+i ) i=0 n−m−1 ∑
13. 13. Cθ,φ (s1:t ) = −Vφ (s1)+γ t−1 Vφ (st )+ R(s1:t )−τG(s1:t ,πθ ) Δθ ∝Cθ,φ (s1:t )∇θG(s1:t ,πθ ) Δφ ∝Cθ,φ (s1:t )(∇φVφ (s1)− ∇φγ t−1 Vφ (st ))