SlideShare a Scribd company logo
1 of 30
Playing Atari with Deep Reinforcement Learning
𝑉𝜋 = 𝐸 𝜋 𝑅1, +𝑣𝑅2 + ⋯ |𝑠
= 𝐸 𝑇=
𝑡=1
𝑇
𝛾 𝑡−1
𝑅t 𝑠
𝑉𝜋
𝑖+1
s =
1
ⅈ + 1
𝑔𝑖+1 − 𝑉𝜋
𝑖
(𝑠)
𝑉𝜋
1
s =
1
1
𝑔1 + 𝑉𝜋
0
(𝑠) 𝑉𝜋
1
s = 𝑔1
𝑉𝜋
2
s =
1
2
𝑔2 + 𝑉𝜋
1
(𝑠) 𝑉𝜋
2
s =
1
2
𝑔1 + 𝑔2
𝑉𝜋
3
s =
1
3
𝑔3 + 𝑉𝜋
2
(𝑠) 𝑉𝜋
3
s =
1
3
𝑔1 + 𝑔2 + 𝑔3
𝑉𝜋
𝑖+1
s =
1
ⅈ + 1
𝑔𝑖+1 − 𝑉𝜋
𝑖
(𝑠)
𝑉𝜋
𝑖+1
s = α 𝑔𝑖+1 − 𝑉𝜋
𝑖
(𝑠)
𝑉𝜋
𝑖+1
s = (1 − α)𝑉𝜋
𝑖
(𝑠) + α𝑔𝑖+1
𝑉𝜋
𝑖+1
s = α 𝑔𝑖+1 − 𝑉𝜋
𝑖
(𝑠)
𝑄 𝑆𝑡, 𝐴 𝑡 < − 𝑄 𝑆𝑡, 𝐴 𝑡 + 𝛼 𝑅 + 𝛾𝑄 𝑆𝑡+1, 𝐴 𝑡+1 − 𝑄 𝑆𝑡, 𝐴 𝑡
𝑄 𝑆𝑡, 𝐴 𝑡 < − 𝑄 𝑆𝑡, 𝐴 𝑡 + 𝛼 𝑅𝑡+1 + 𝛾 max
𝑎
𝑄 𝑆𝑡+1, 𝑎 − 𝑄 𝑆𝑡, 𝐴 𝑡
- 미니 배치 크리 32
- 리플레이 메모리 크리 400000
- ε : 1부터 0.1까지 100000스텝 동안 감소
- 감가율 0.99
- 학습속도 0.00025
References
https://www.youtube.com/watch?v=lvoHnicueoEStanford University School of Engineering
https://www.youtube.com/watch?v=V7_cNTfm2i8&list=P
L0oFI08O71gKjGhaWctTPvvM7_cVzsAtK&index=5Sung Kim
파이썬과 케라스로 배우는 강화학습
좌충우돌 강화학습의 이론과 구현[출처] 좌충우돌 강화학습의 이론과 구현(원고)|작성자 숨은원리 출판사

More Related Content

More from KyeongUkJang

GAN - Generative Adversarial Nets
GAN - Generative Adversarial NetsGAN - Generative Adversarial Nets
GAN - Generative Adversarial NetsKyeongUkJang
 
Distilling the knowledge in a neural network
Distilling the knowledge in a neural networkDistilling the knowledge in a neural network
Distilling the knowledge in a neural networkKyeongUkJang
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationKyeongUkJang
 
Gaussian Mixture Model
Gaussian Mixture ModelGaussian Mixture Model
Gaussian Mixture ModelKyeongUkJang
 
CNN for sentence classification
CNN for sentence classificationCNN for sentence classification
CNN for sentence classificationKyeongUkJang
 
Visualizing data using t-SNE
Visualizing data using t-SNEVisualizing data using t-SNE
Visualizing data using t-SNEKyeongUkJang
 
Chapter 20 Deep generative models
Chapter 20 Deep generative modelsChapter 20 Deep generative models
Chapter 20 Deep generative modelsKyeongUkJang
 
Chapter 19 Variational Inference
Chapter 19 Variational InferenceChapter 19 Variational Inference
Chapter 19 Variational InferenceKyeongUkJang
 
Natural Language Processing(NLP) - basic 2
Natural Language Processing(NLP) - basic 2Natural Language Processing(NLP) - basic 2
Natural Language Processing(NLP) - basic 2KyeongUkJang
 
Natural Language Processing(NLP) - Basic
Natural Language Processing(NLP) - BasicNatural Language Processing(NLP) - Basic
Natural Language Processing(NLP) - BasicKyeongUkJang
 
Chapter 17 monte carlo methods
Chapter 17 monte carlo methodsChapter 17 monte carlo methods
Chapter 17 monte carlo methodsKyeongUkJang
 
Chapter 16 structured probabilistic models for deep learning - 2
Chapter 16 structured probabilistic models for deep learning - 2Chapter 16 structured probabilistic models for deep learning - 2
Chapter 16 structured probabilistic models for deep learning - 2KyeongUkJang
 
Chapter 16 structured probabilistic models for deep learning - 1
Chapter 16 structured probabilistic models for deep learning - 1Chapter 16 structured probabilistic models for deep learning - 1
Chapter 16 structured probabilistic models for deep learning - 1KyeongUkJang
 
Chapter 15 Representation learning - 2
Chapter 15 Representation learning - 2Chapter 15 Representation learning - 2
Chapter 15 Representation learning - 2KyeongUkJang
 
Chapter 15 Representation learning - 1
Chapter 15 Representation learning - 1Chapter 15 Representation learning - 1
Chapter 15 Representation learning - 1KyeongUkJang
 
Chapter 6 Deep feedforward networks - 2
Chapter 6 Deep feedforward networks - 2Chapter 6 Deep feedforward networks - 2
Chapter 6 Deep feedforward networks - 2KyeongUkJang
 

More from KyeongUkJang (20)

AlphagoZero
AlphagoZeroAlphagoZero
AlphagoZero
 
GoogLenet
GoogLenetGoogLenet
GoogLenet
 
GAN - Generative Adversarial Nets
GAN - Generative Adversarial NetsGAN - Generative Adversarial Nets
GAN - Generative Adversarial Nets
 
Distilling the knowledge in a neural network
Distilling the knowledge in a neural networkDistilling the knowledge in a neural network
Distilling the knowledge in a neural network
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Gaussian Mixture Model
Gaussian Mixture ModelGaussian Mixture Model
Gaussian Mixture Model
 
CNN for sentence classification
CNN for sentence classificationCNN for sentence classification
CNN for sentence classification
 
Visualizing data using t-SNE
Visualizing data using t-SNEVisualizing data using t-SNE
Visualizing data using t-SNE
 
Chapter 20 - GAN
Chapter 20 - GANChapter 20 - GAN
Chapter 20 - GAN
 
Chapter 20 - VAE
Chapter 20 - VAEChapter 20 - VAE
Chapter 20 - VAE
 
Chapter 20 Deep generative models
Chapter 20 Deep generative modelsChapter 20 Deep generative models
Chapter 20 Deep generative models
 
Chapter 19 Variational Inference
Chapter 19 Variational InferenceChapter 19 Variational Inference
Chapter 19 Variational Inference
 
Natural Language Processing(NLP) - basic 2
Natural Language Processing(NLP) - basic 2Natural Language Processing(NLP) - basic 2
Natural Language Processing(NLP) - basic 2
 
Natural Language Processing(NLP) - Basic
Natural Language Processing(NLP) - BasicNatural Language Processing(NLP) - Basic
Natural Language Processing(NLP) - Basic
 
Chapter 17 monte carlo methods
Chapter 17 monte carlo methodsChapter 17 monte carlo methods
Chapter 17 monte carlo methods
 
Chapter 16 structured probabilistic models for deep learning - 2
Chapter 16 structured probabilistic models for deep learning - 2Chapter 16 structured probabilistic models for deep learning - 2
Chapter 16 structured probabilistic models for deep learning - 2
 
Chapter 16 structured probabilistic models for deep learning - 1
Chapter 16 structured probabilistic models for deep learning - 1Chapter 16 structured probabilistic models for deep learning - 1
Chapter 16 structured probabilistic models for deep learning - 1
 
Chapter 15 Representation learning - 2
Chapter 15 Representation learning - 2Chapter 15 Representation learning - 2
Chapter 15 Representation learning - 2
 
Chapter 15 Representation learning - 1
Chapter 15 Representation learning - 1Chapter 15 Representation learning - 1
Chapter 15 Representation learning - 1
 
Chapter 6 Deep feedforward networks - 2
Chapter 6 Deep feedforward networks - 2Chapter 6 Deep feedforward networks - 2
Chapter 6 Deep feedforward networks - 2
 

Playing atari with deep reinforcement learning

  • 1. Playing Atari with Deep Reinforcement Learning
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18. 𝑉𝜋 = 𝐸 𝜋 𝑅1, +𝑣𝑅2 + ⋯ |𝑠 = 𝐸 𝑇= 𝑡=1 𝑇 𝛾 𝑡−1 𝑅t 𝑠 𝑉𝜋 𝑖+1 s = 1 ⅈ + 1 𝑔𝑖+1 − 𝑉𝜋 𝑖 (𝑠) 𝑉𝜋 1 s = 1 1 𝑔1 + 𝑉𝜋 0 (𝑠) 𝑉𝜋 1 s = 𝑔1 𝑉𝜋 2 s = 1 2 𝑔2 + 𝑉𝜋 1 (𝑠) 𝑉𝜋 2 s = 1 2 𝑔1 + 𝑔2 𝑉𝜋 3 s = 1 3 𝑔3 + 𝑉𝜋 2 (𝑠) 𝑉𝜋 3 s = 1 3 𝑔1 + 𝑔2 + 𝑔3
  • 19. 𝑉𝜋 𝑖+1 s = 1 ⅈ + 1 𝑔𝑖+1 − 𝑉𝜋 𝑖 (𝑠) 𝑉𝜋 𝑖+1 s = α 𝑔𝑖+1 − 𝑉𝜋 𝑖 (𝑠) 𝑉𝜋 𝑖+1 s = (1 − α)𝑉𝜋 𝑖 (𝑠) + α𝑔𝑖+1 𝑉𝜋 𝑖+1 s = α 𝑔𝑖+1 − 𝑉𝜋 𝑖 (𝑠) 𝑄 𝑆𝑡, 𝐴 𝑡 < − 𝑄 𝑆𝑡, 𝐴 𝑡 + 𝛼 𝑅 + 𝛾𝑄 𝑆𝑡+1, 𝐴 𝑡+1 − 𝑄 𝑆𝑡, 𝐴 𝑡 𝑄 𝑆𝑡, 𝐴 𝑡 < − 𝑄 𝑆𝑡, 𝐴 𝑡 + 𝛼 𝑅𝑡+1 + 𝛾 max 𝑎 𝑄 𝑆𝑡+1, 𝑎 − 𝑄 𝑆𝑡, 𝐴 𝑡
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26. - 미니 배치 크리 32 - 리플레이 메모리 크리 400000 - ε : 1부터 0.1까지 100000스텝 동안 감소 - 감가율 0.99 - 학습속도 0.00025
  • 27.
  • 28.
  • 29.
  • 30. References https://www.youtube.com/watch?v=lvoHnicueoEStanford University School of Engineering https://www.youtube.com/watch?v=V7_cNTfm2i8&list=P L0oFI08O71gKjGhaWctTPvvM7_cVzsAtK&index=5Sung Kim 파이썬과 케라스로 배우는 강화학습 좌충우돌 강화학습의 이론과 구현[출처] 좌충우돌 강화학습의 이론과 구현(원고)|작성자 숨은원리 출판사