Successfully reported this slideshow.
Your SlideShare is downloading. ×

[DL輪読会]VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

Ad

DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/

Ad

Mi = (S, A, Ri, Ti, Ti,0, r, H)
S A R T
T0 γ H

Ad

Mi ∼ p(M) Mi p(M)
Mi Ri Ti
p(M)
i = Ntrain
i = 2
i = 1
Mi =
(S, A, Ri, Ti, Ti,0, r, H)
i = Ntest
i = 2
i = 1
Mi =
(S, A, R...

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Check these out next

1 of 24 Ad
1 of 24 Ad
Advertisement

More Related Content

More from Deep Learning JP

Advertisement

[DL輪読会]VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

  1. 1. DEEP LEARNING JP [DL Papers] http://deeplearning.jp/
  2. 2. Mi = (S, A, Ri, Ti, Ti,0, r, H) S A R T T0 γ H
  3. 3. Mi ∼ p(M) Mi p(M) Mi Ri Ti p(M) i = Ntrain i = 2 i = 1 Mi = (S, A, Ri, Ti, Ti,0, r, H) i = Ntest i = 2 i = 1 Mi = (S, A, Ri, Ti, Ti,0, r, H) ∼i.i.d. p(M)
  4. 4. T R
  5. 5. bt = p(R, T|τ:t), where τ:t = {s0, a0, r1, s1, a1, . . . , st} bt s+ t = [st, bt] s+ t
  6. 6. bt = p(R, T|τ:t)
  7. 7. bt = p(R, T|τ:t) pθ(m|τ:t) bt bt = pθ(m|τ:t) pθ(m|τ:t) p(M) πψ(at |s+ t ) R, T bt bt
  8. 8. πψ(at |s+ t = {st, bt}) bt = p(R, T|τ:t) ψ R, T ∼i.i.d. p(R, T) bt = p(R, T|τ:t) πψ(at |s+ t = {st, bt}) p(R, T|τ:t)
  9. 9. M+ = (S+ , A, R+ , T+ , T+ 0 , r, H) R+ T+ st ∈ S+ at ∈ At rt R+ T+ R+ T+ p(M) πψ(at |s+ t ) πψ(at |s+ t )
  10. 10. bt = p(R, T|τ:t) m pθ(τ:H+ |a:H+−1) bt { Ep(M,τ:H+)[log pθ(τ:H+ |a:H+−1)] = Ep(M,τ:H+)[∑ log Epθ(m|τt)pθ(st+1 |at, st, m)pθ(rt+1 |st, at, m)] pθ(m|τ:t) = p(R, T|τ:t) = bt pθ(m|τ:t) st st+1 at rt+1 m
  11. 11. m H+
  12. 12. bt = p(R, T|τ:t) pθ(m|τ:t) bt bt = pθ(m|τ:t) pθ(m|τ:t) p(M) πψ(at |s+ t ) R, T bt bt
  13. 13. st
  14. 14. q(m|τ:t)
  15. 15. ̂bt ̂bt ̂bt

×